Visitor Feedback ROC Retreat Lake Tahoe

Visitor Feedback

ROC+Oceanstore Retreat
Lake Tahoe, June 2002

(scribed by George Candea)

Bill

Statistical approach to consistency:

neat, keep it up

Operator error study:

good to finally see data

Proposal for the next Oceanstore app and future work:

look at email attachments (which are bigger than email) and are uselessly duplicated
attachments are probably more valuable than the email itself
look at disconnected/weakly connected operation

Kim

lots of pubs since last time -- good job!

Utility functions for dependable service design

lots of interest, synergistic w/ some of the work being done at HPL
perhaps certain axes (e.g., performance) can be viewed more discretely, with a few diff. levels instead of continuous values

"Theory" of undo

upcoming SIGOPS paper coming from HPL on related work
time travel -- very important idea

Oceanstore

saw actual demo of Oceanstore -- cool! (would have liked to be able to pull the NIC out)
are we serious about "performance doesn't matter that much anymore" ? if yes, why blast Sean for 30-msec writes ?
would be great to have real users for Oceanstore and report on results
would be interested in hearing more about introspection and dissemination
too much of the coolness of Oceanstore seems to be due to Tapestry's coolness; that may be a little too limiting
could you use Oceanstore as the storage substrate for undo?

Overall

Panel session went quite well
Definitely have the papers available beforehand -- very useful
Bridge the gap between dependable systems community and the systems community -- will have a similar workshop to last year's EASY right before ASPLOS in October + one-day workshop on dependability basics

Mark

Analysis of failure in Internet services

good, continue working on that!
many such places don't have good ways to capture information about failure; think about tools that allow to do that more effectively

"Theory" of undo

promising idea

MTTR ">>" MTTF

although this is true, don't drop the quality bar

Adaptable systems

if you can't build a simple system to be reliable, what makes you think you can build a complex, adaptable system reliably?

Oceanstore

definitely build something that can be used; at the next retreat, want to see people's home directories live in Oceanstore
build 3-4 apps running on top of Oceanstore to demonstrate usefullness

Overall

spend time figuring out how to analyze and visualize data
the "data store API" breakout was very useful; formalize what the constraints are, develop standard vocabulary, figure out canonical architectures

James

Overall

ROC is totally the right answer and it's what the industry needs at this point in time
good contributions
totally convinced that system-level undo is very interesting area to work in
collaboration with Stanford is super-productive
encourage the OS/DBs class at Berkeley

Utility-based approach to design

things seemed a little bit too much black and white (e.g., banks always trade data quality for availability) --> this supports strongly the point made by the presented work
watch out for cliffs in the design space

RAINS

ROC is not a patch for bad systems; RAINS seems to be too far down the spectrum of unreliability

Using VMs for ROC

maybe run multiple VMs in the same address space; could use them for component-level fault containment

Theory of undo

cool stuff; unless you build a system, the theory is not interesting (so need solid evidence)
have running system and study the results with and w/out undo

Oceanstore

performance doesn't matter as long as you're within 2x or 3x of other systems
skeptical of the security aspects, particularly deletion and revocation
a huge amount of the world's data is designed to be public and persistent --> good for Oceanstore

Misha

Probabilistic consistency

very interesting
need to make sure your failure model is realistic and applies

MTTR ">>" MTTF

good, particularly as it seems to head to a better definition of availability

Oceanstore

a certain level of performance is still required, even though availability is the overarching goal
make archival real (that would be really cool)

Overall

you can recover from a problem w/out knowing what it was; need to have more diagnosis

Jeff

Oceanstore

very exciting to see demo working
very dismal record of people adopting filesystems; the Oceanstore web cache would be the most compelling app
if you want this to last for 100+ years, need to consider making it a lot more evolvable
need to explore the rights of digital right mgmt
time-travel data store is very useful (a bunch of startups in stealth mode are trying to address this); there is a lot of stuff to figure out: data model, APIs, etc.

Design tradeoffs

look at utility funcs in terms of impact on market size/share
try to incorporate ideas of uncertainty and risks of design (current models/processes don't deal very well with that)

RAINS

consider using checkpoints, so a restart doesn't lead to a blank slate

Mendel

Analysis of failures

will be a real, solid contribution

Using VMs for ROC

interesting, because systems are complex and have a lot of bugs (VMs can help make the mess run more reliably)

Oceanstore

based on experience with Sprite, but you can spend a lot of effort to take over the world that is not research
need to spend just enough effort to make your papers convincing, but not more

Overall

talks need slides with related work; it helps by (1) fitting your work into the landscape; (2) provides evidence of which systems you are familiar with (Bill suggests that the slide be at the beginning of the talk, instead of the end)

Blue

Overall

get out of the box and stay there
you are doing research, not shipping products, so don't worry about it
seek more feedback from industry, because many of them are very interested in what you're doing
ROC *is* a patch for bad systems and that's good, because we won't be able to stay ahead of the bug curve
the "API for storage" breakout -- very good, may even add headcount at Veritas to explore this issue further
it's not OK to sacrifice performance, but it is OK to sacrifice "real performance" for "perceived performance"
don't worry too much about optimizing for performance; industry pays a lot of people a lot of money to do that
you seldomly see root-cause analysis in data centers (perhaps 1 out of every 1,000 problems gets a root-cause analysis), so ROC is very useful in that context
look for specific implementable solutions and publish them -- others will implement them
I believe in solving scalability with simplicity; pay attention to emergent behaviors in the large scale

Oceanstore

the claim that configuring web caches is hard is not true (they're really easy, even on a large scale)
intent-based logging/configuration/APIs is a cool idea

Lisa

FIG / fault injection

keep in mind the diff. between fault injection and error injection (errors become apparent at the app-level)

Benchmarking

don't do benchmarks in order to embarass people; benchmarks are marketing tools, so nobody in industry will run a benchmark that makes them look bad