Visitor Feedback
ROC+Oceanstore Retreat
Lake Tahoe, June 2002
(scribed by George Candea)
Bill
Statistical approach to consistency:
Operator error study:
Proposal for the next Oceanstore app and future work:
- look at email attachments (which are bigger than email) and are uselessly
duplicated
- attachments are probably more valuable than the email itself
- look at disconnected/weakly connected operation
Kim
- lots of pubs since last time -- good job!
Utility functions for dependable service design
- lots of interest, synergistic w/ some of the work being done at HPL
- perhaps certain axes (e.g., performance) can be viewed more discretely,
with a few diff. levels instead of continuous values
"Theory" of undo
- upcoming SIGOPS paper coming from HPL on related work
- time travel -- very important idea
Oceanstore
- saw actual demo of Oceanstore -- cool! (would have liked to be able
to pull the NIC out)
- are we serious about "performance doesn't matter that much anymore" ?
if yes, why blast Sean for 30-msec writes ?
- would be great to have real users for Oceanstore and report on results
- would be interested in hearing more about introspection and dissemination
- too much of the coolness of Oceanstore seems to be due to Tapestry's
coolness; that may be a little too limiting
- could you use Oceanstore as the storage substrate for undo?
Overall
- Panel session went quite well
- Definitely have the papers available beforehand -- very useful
- Bridge the gap between dependable systems community and the systems
community -- will have a similar workshop to last year's EASY right before
ASPLOS in October + one-day workshop on dependability basics
Mark
Analysis of failure in Internet services
- good, continue working on that!
- many such places don't have good ways to capture information about
failure; think about tools that allow to do that more effectively
"Theory" of undo
MTTR ">>" MTTF
- although this is true, don't drop the quality bar
Adaptable systems
- if you can't build a simple system to be reliable, what makes you think
you can build a complex, adaptable system reliably?
Oceanstore
- definitely build something that can be used; at the next retreat, want to
see people's home directories live in Oceanstore
- build 3-4 apps running on top of Oceanstore to demonstrate usefullness
Overall
- spend time figuring out how to analyze and visualize data
- the "data store API" breakout was very useful; formalize what the
constraints are, develop standard vocabulary, figure out canonical
architectures
James
Overall
- ROC is totally the right answer and it's what the industry needs at this
point in time
- good contributions
- totally convinced that system-level undo is very interesting area to work
in
- collaboration with Stanford is super-productive
- encourage the OS/DBs class at Berkeley
Utility-based approach to design
- things seemed a little bit too much black and white (e.g., banks always
trade data quality for availability) --> this supports strongly the point made
by the presented work
- watch out for cliffs in the design space
RAINS
- ROC is not a patch for bad systems; RAINS seems to be too far down the
spectrum of unreliability
Using VMs for ROC
- maybe run multiple VMs in the same address space; could use them for
component-level fault containment
Theory of undo
- cool stuff; unless you build a system, the theory is not interesting (so
need solid evidence)
- have running system and study the results with and w/out undo
Oceanstore
- performance doesn't matter as long as you're within 2x or 3x of other
systems
- skeptical of the security aspects, particularly deletion and revocation
- a huge amount of the world's data is designed to be public and persistent
--> good for Oceanstore
Misha
Probabilistic consistency
- very interesting
- need to make sure your failure model is realistic and applies
MTTR ">>" MTTF
- good, particularly as it seems to head to a better definition of
availability
Oceanstore
- a certain level of performance is still required, even though availability
is the overarching goal
- make archival real (that would be really cool)
Overall
- you can recover from a problem w/out knowing what it was; need to have
more diagnosis
Jeff
Oceanstore
- very exciting to see demo working
- very dismal record of people adopting filesystems; the Oceanstore web
cache would be the most compelling app
- if you want this to last for 100+ years, need to consider making it a lot
more evolvable
- need to explore the rights of digital right mgmt
- time-travel data store is very useful (a bunch of startups in stealth mode
are trying to address this); there is a lot of stuff to figure out: data
model, APIs, etc.
Design tradeoffs
- look at utility funcs in terms of impact on market size/share
- try to incorporate ideas of uncertainty and risks of design (current
models/processes don't deal very well with that)
RAINS
- consider using checkpoints, so a restart doesn't lead to a blank slate
Mendel
Analysis of failures
- will be a real, solid contribution
Using VMs for ROC
- interesting, because systems are complex and have a lot of bugs (VMs can
help make the mess run more reliably)
Oceanstore
- based on experience with Sprite, but you can spend a lot of effort to take
over the world that is not research
- need to spend just enough effort to make your papers convincing, but not
more
Overall
- talks need slides with related work; it helps by (1) fitting your work
into the landscape; (2) provides evidence of which systems you are familiar
with (Bill suggests that the slide be at the beginning of the talk, instead of
the end)
Blue
Overall
- get out of the box and stay there
- you are doing research, not shipping products, so don't worry about it
- seek more feedback from industry, because many of them are very interested
in what you're doing
- ROC *is* a patch for bad systems and that's good, because we won't
be able to stay ahead of the bug curve
- the "API for storage" breakout -- very good, may even add headcount at
Veritas to explore this issue further
- it's not OK to sacrifice performance, but it is OK to sacrifice "real
performance" for "perceived performance"
- don't worry too much about optimizing for performance; industry pays a lot
of people a lot of money to do that
- you seldomly see root-cause analysis in data centers (perhaps 1 out of
every 1,000 problems gets a root-cause analysis), so ROC is very useful in
that context
- look for specific implementable solutions and publish them -- others will
implement them
- I believe in solving scalability with simplicity; pay attention to
emergent behaviors in the large scale
Oceanstore
- the claim that configuring web caches is hard is not true (they're really
easy, even on a large scale)
- intent-based logging/configuration/APIs is a cool idea
Lisa
FIG / fault injection
- keep in mind the diff. between fault injection and error injection (errors
become apparent at the app-level)
Benchmarking
- don't do benchmarks in order to embarass people; benchmarks are marketing
tools, so nobody in industry will run a benchmark that makes them look bad