Notes from ROC Retreat Feedback Session, 6/12/2002 -------------------------------------------------- 1) Bill Tetzlaff, IBM - interested in fast restart - glad to see data on failures - unfortunate that have to go back to hand-written problem reports; shows ineffectiveness of data-collection/etc. system - proposal for next oceanstore app: email attachments; big & tremendously duplicated (seems to be arguing for single-instance store); provide ability to search them, find latest version, etc. - oceanstore: look at disconnected/weakly-connected operation - MTTR: big stateful systems; 20-40 minutes common. This is a big problem and very different from stateless web-server/cache. Also ability to come up on different hardware [VMs?] or even on different software--as a way to try out new software or revert to older software [Undo can be spun toward this?]. Need 1-2 orders of magnitude faster, and probably needs interesting new hardware to store state [IBM 3090-like, NVRAM,...] - important challenge is segregating state: you're trying to escape from state-gone-bad and you need to be able to discard it. 2) Kim Keeton, HP - Overlap w/HP work in storage system space - utility functions, multi-constraint optimization for perf. and avail. - Chang Lee PhD (CMU) on multi-constraint optimization - declarative specifications are important: users/admins specify goals, requirements, *intentions* and let the system do the right thing - they've done this in perf space (storage), starting into avail - IWGOS paper on Rome (HPs spec language), upcoming SIGOPS - time-travel is important; how to specify in interface - collab through Citris on avail stuff - oceanstore: so, does performance matter, after all? - we keep saying it doesn't, but Ostore jumped on because of long latency - challenge: pull-the-nic demo at next retreat - how real will ostore be? backup/archival service for real users? - wants to hear more about introspection: what to monitor, how to use the data - oceanstore: a lot of the coolness is based on the properties of tapestry, requiring reader to relatively-deeply understand tapestry. Needs a more self-contained explanation - could you use Oceanstore as storage layer for undo? - what about interesting file semantics atop Ostore, things like Elephant where important versions are identified, etc. - bridging the gap between FT and Systems community - another EASY-style workshop before ASPLOS w/1-day tutorial on traditional FT/dependability. Submission deadline in July. 3) Mark Verber, Tellme - a lot of services don't have a way to collect data and verify that it's good; think about tools & methodologies that people could use to do this - excited about undo work toward algebra describing system/operation properties, some sort of declarative specification - last retreat, panned undo. Now sees approach of building a framework that people could plug into is a very promising idea. - ostore: build something that people could actually use - wants to see developers' home directories in Oceanstore by next time - avoid danger of building fancy infrastructure and never getting to the apps, since the apps really show the edge conditions, unexpected constraints, etc. best source of insight. Suggests 3-4 apps - be ruthless about function so that you get something done yet still prove some important points - statistical stability - MTTR important and oft-neglected, but don't drop the quality bar as you focus on MTTR. - like Yahoo example of not doing mem-mgt, garbage collection. - there are domains where this isn't appropriate/doesn't work - use more care in building things; figure out how to help people do this - adaptive systems, statistically-stable systems: interesting - but if people can't build single-component systems to work effectively, what hope is there for far more complex systems? - today, hard/impossible to even build a stable, reliable ethernet - build-in instrumentation, be rigorous about collecting it, do data analysis and visualization as you build/deploy it - they've found it very valuable @ Tellme - breakout on storage API valuable - too expensive and unscaleable to store everything in nice ACID store - usual solutions are to buy weblogic or build adhoc system - should formalize what constraints are, what tradeoffs are; develop standard set of vocabulary identifying canonical architectures; look again @ Bayou, J2EE service classes - engineers actually do know the constraints/requirements of what they're building, so could express to API 4) James Hamilton, Microsoft - ROC is "totally cool", the right area, exactly what industry needs, we have opportunity to make significant improvements/contributions because the world today is such a disaster - undo: initially didn't buy it or its importance, but now has come to realize how essential it is, especially after seeing the data he's collected - stanford/ucb collaboration is a great thing - likes combined DB/OS course @ Berkeley 262a/b - Kim says it may be in danger of being discontinued - 262b is getting too specialized; maybe better to have combined 262a, then separate versions of 262b on different specializations - James says it's better to put people from diff disciplines together - likes approach of quantifying properties (QAPSL stuff) - thinks that some of stuff in talk was too black-and-white; even banks make continuous tradeoffs and get pretty far down toward availability over quality - Bill: do they later have a way to discover degraded quality? Yes, via audits, etc. And loss rate is low enough that it's worth it, since cost of getting it better would be much higher than loss penalty - everyone wants ACID, but not deadlocks, single points of failure, unserved customers, etc. So they're willing to sacrifice lots - danger in George's stuff is that there are lots and lots of cliffs in the space - RAINS: ROC is not a patch for bad systems. Recovery costs a lot of resources and you don't want to do it too often. You've got to have it, of course, since systems aren't perfect, but RAINS goes too far in throwing stuff out too quickly. Maybe an interesting case to study, though. - VMs: exception handling doesn't work for fault containment in practice, and maybe VMs could act as better fault containment domains - internet service failure data: would like to see it turn into benchmarks - theory of undo work: until you build a system, not interesting - 1) need real data to motivate that admins are the problem - 2) need to have a running system and study the results w/ and w/o undo - oceanstore - performance does matter. don't make a system suck because performance is a problem. Worth getting it within a factor of 2-3, but that's enough, then focus on real problems. - need data on cost to buy and administer a terabyte, and use that to motivate. if focus too much on performance, you'll lose the value of oceanstore, which is to get the admin out of the game - skeptical of security (privacy), deletion, revocation story - worried that no way to delete data; Kim agrees. It's the way the world works - [discussion of deletion and feasibility] - is throwing the keys away sufficient? Yes, if you trust the security story, which James doesn't trust. Due to Moore's law, if there's an important document, you can marshal resources to crack it - need crypto that can last 50 years when attacked by 25% of world's total computing resources - Geo: legacy systems issue. Is it worthwhile to apply ROC to legacy systems? - yes. Look at Schwab trying to integrate old crufty DB system with new tech. So legacy is a big source of problems and an area that needs to be addressed, and maybe ROC could do this. 5) Nisha Talagala, Sun - liked the probablistic consistency work - but to trust it, have to believe (and define well) the model of failure - data gathering very valuable - operator error: should also do work on diagnosis. Can often recover without diagnosis, but problems may well recurr over and over - focus on MTTR interesting. Going toward a more reasonable definition of availability (removing time-averaging component). Should continue work toward better definitions of availability - Kim says also look at performability, not just level of 9's, but amount of performance degradation vs. normal behavior and for what length of time - [discussion about whether customers/users will accept more complex definitions that can't be reduced to a single number; consensus is yes if it provides significant value] - Mark: collect lots of metrics. Try to classify behavior into: - successful - delayed (txn succeeded but exceeded performance/latency window] - degenerate (didn't get ideal/perfect function, but something happened to allow forward progress. Example: misqueueing request into less- efficient, but still working queue) - likes archival part of oceanstore; good application, hard problem, ostore matches needs pretty well 6) Jeff Darcy, EMC - excited to see oceanstore working - george's stuff: look at utility functions in terms of market impact of decisions. Incorporate ideas of uncertainty and design/impl. risk into tradeoffs, since not captured - RAINS: look at rolling checkpoints instead of/in addition to rolling restarts - oceanstore: run protocols for insert, etc. through verification tools (smit??, murphy??) - fact that cost of signing things is a bottleneck is an important result to disseminate - apps: people are unlikely to adopt new distributed filesystems; email attachement problem isn't great; web cache is probably best application now to motivate it - need more support for evolvability if really want 50+ year durability, especially with data formats, protocol versions, etc. - need to think about Digital Rights Mgmt; won't get away w/ignoring it - idea of time-travel data store w/better granularity than snapshots/ checkpoints is very interesting & important. Conceptual models, APIs, efficiency of implementation. 7) Mendel Rosenblum, VMWare & Stanford - analysis of failures will be a real contribution - VM approaches are interesting. Systems have 100M lines of code; not surprising that there are bugs, and unlikely that you could rewrite it to be bug-free even if you took 10 years w/o change in functionality. ROC is great approach, but extremely hard problem. - No related work slides == bad. Even a list just to show you've read it. - oceanstore: he worked in project that put a lot of effort into taking over the world (sprite). It didn't. Don't invest too much time into taking over the world, especially if it overwhelms the research. - but you should still "eat your own dog food" 8) Blue Lang, Veritas - get outside the box and stay there - research is what's important. don't worry about shipping products. - won't be able to build infinitely-provable systems, so don't bother - ROC is a patch for unreliable systems. We won't be able to stay ahead of the bug curve for the forseeable future, especially as systems continue to scale - excited about API discussion, would like to move forward, collaborate w/Veritas - it's not OK to sacrifice performance. But OK to sacrifice real perf. in exchange for perceived performance - but even unoptimized new algorithms are interesting, and should be published. They pay people to optimize, but the ideas are what's really important to get from research community - almost never see full root-cause analysis in large data centers/sites; operators/customers don't want to deploy the manpower necessary to do the diagnosis (even at employee-centric place like IGS). Maybe 1/1000 problems ever got a real RCA in IBM Global Services. - advantage of ROC is may never need to get to that point - don't necessarily have to implement solutions, but design implementable ones, publish, and they [industry] will implement them - re: simplicity: scale really changes things. - oceanstore & web caches. It's really easy to configure web caches today; it's not clear why oceanstore is needed to simplify things. Talk to real administrators to sanity check these things. - intent-based logging/apis/etc (undo stuff) is a massive idea that could really change things. 9) Lisa Spainhower, IBM, via Armando - re: FIG, fault injection, etc. Keep in mind the difference between fault injection (bit flips, etc.) and *error* injection (things that become apparent at level of app or API). - error injection is the most important, more representative of real observed failures - re: benchmarking: to keep goodwill of indu collaborators, don't design benchmarks whose purpose is to embarass people into doing the right thing. For example, TPC is so expensive that you're not going to run it unless you look good. Do benchmarks that are small, that people can run individually, etc. Don't create the kind of benchmark that's difficult to run, costly to put together, and primarily will embarass people (noone will report their results) - Mendel: there's a hazard to small benchmarks.