Retreat

The Berkeley/Stanford
Recovery-Oriented Computing (ROC)
Project Summer 2002 Retreat

Slides
- ROC Update - Armando Fox
- OceanStore Update - John Kubiatowitz
- A Utility-Centered Approach to Designing Dependable Internet Services - George Candea
- Probabilistic Consistency and Durability in RAINS: Redundant Array of Independent, Non-Durable Stores - Andy Huang
- Virtual Machines for ROC: Intial Impressions - Pete Broadwell
- Why Internet Services Fail - David Oppenheimer
- Towards a theory of Undo - Aaron Brown
- MTTR ">>" MTTF - Armando Fox
- Autonomic Computing - Alfred Spector, IBM
- The OceanStore Write Path: A Quantitative Analysis - (Sean Rhea)
- Towards Building the OceanStore Web Cache - (Patrick Eaton)
- Making the Archive Real - [ppt] (Hakim Weatherspoon
- Designing a Global EMail Repository Using OceanStore - [ppt] - (Steven Czerwinski)
- Brocade: Landmark Routing on Peer to Peer Networks - [ppt] - (Ling Huang)
- Simultaneous Insertsions in Tapestry - [ppt] - (Kris Hildrum)
- Breakout Sessions
- What belongs in state storage API's? - [ppt]
- Virtual Machines
- Active Server Availability Feedback - [ppt] - (James Hamilton, Microsoft)
- Data Management in Application Servers - [ppt] - (Sam Pullara, BEA)
- Visitor Feedback (George Candea)
- Visitor Feedback (Aaron Brown)
- Related talks from the SAHARA session.
- Locality Aware Mechanisms for Large-scale Networks: The Tapestry Infrastructure - John Kubiatowitz
Posters
- Brocade: Landmark Routing on Overlay Networks - Ling Huang
  Peer-to-peer networks offer attractive features such as decentralized computing, fault tolerance, sharing and aggregating of resources. Due to the theoretical approach taken in these systems, they always assume that most nodes in the system are uniform in resources. This results in messages being routed on the overlay with minimum consideration to actual network topology and differences between node resources. Many theories and algorithms used by existing peer-to-peer infrastructures comply with this tenet, and such systems often show sub-optimal performance. We believe that such system can benefit from some form of internal organization and differentiation among its nodes, thus enjoying the efficiency of a well-organized system, while still providing users with the P2P features. We construct Brocade, and propose a secondary overlay to be layered on top of existing P2P systems, that exploits knowledge of underlying network characteristics. The secondary overlay builds a location layer between "supernodes," nodes that are situated near network access points, such as gateways to administrative domains. By associating local nodes with their nearby "supernode," messages across the wide-area can take advantage of the highly connected network infrastructure between these supernodes to shortcut across distant network domains, greatly improve point-to-point routing distance and reducing overall network bandwidth usage. We present the initial architecture of a brocade secondary overlay on top of a Tapestry network, and demonstrate its potential performance benefits by simulation.
- Exploiting Routing Redundancy Using a Wide-area Overlay - Ben Zhao
  As new and interesting peer-to-peer applications combine with advancements in networking technology, they are reaching millions of users across the globe. Numerous studies have shown, however, that loss of connectivity is common on the wide-area network, due to hardware and software failures, and network misconfigurations. Despite the natural redundancy present in underlying network links, the current IP layer fails to recognize and recover from these frequent failures in a timely fashion. This work presents fault-tolerant routing on the Tapestry overlay network, which exploits existing network redundancy by dynamically switching traffic onto precomputed alternate routes. Furthermore, messages in our system can be duplicated and multicast ``around'' network congestion and failure hotspots with rapid reconvergence to drop duplicates. Our simulations show fault-tolerant Tapestry to be highly effective at circumventing link and node failures, with reasonable cost in terms of additional routing latency and additional bandwidth.
- Failure of the PSTN: 2000 - Patricia Enriquez
- FIG: Fault Injection in glibc -- A Tool for Online Verification of Recovery Mechanisms - Pete Broadwell
  Enhanced software tools are necessary to evaluate the reliability and recoverability of applications under operating environment failures. FIG is a lightweight, extensible software testing package that intercepts calls from applications to the operating system and injects errors to simulate system faults. We evaluate the effectiveness of FIG and explore classes of recovery methods by using FIG to inject faults into several common UNIX applications.
- Hash-History Approach for Reconciling Mutual Inconsistency in Optimistic Replication - B. Hoon Kang
  Most of the optimistic replication systems are based on the variant of version-vector mechanism that maintains the state for each replica sites. The fact that a state has to be maintained for each site fundamentally limits the scalability in terms of replica sites and has complexity in managing replica site creation and retirement. We present a "hash history" based approach where one does not have to maintain a state for each site to detect and reconcile conflicts. Our simulation result shows that hash history is correct and does scale as the number of site increases and has other interesting properties. Hash history is capable of capturing the case when two different series of non-commutative operations produce the same output independently. Our simulation verified that this property helps to reduce the conflict rate during anti-entropy reconciliations.
- Mailbench - An e-mail dependability benchmark - Leonard Chung
  E-mail was originally conceived as a "best effort" service with little effort directed towards attaining 100% dependability. Despite these humble beginnings, e-mail has now become a mission-critical service. Spurred on by it's increasing ubiquity and increased expectations for reliability, 100% dependability is no longer just a desirable feature but rather a virtual requirement in modern day systems. Given e-mail was never designed with such a goal in mind, we seek to define metrics of dependability and have created a dependability benchmark to allow comparison of e-mail dependability between different e-mail platforms. We present the structure of the benchmark, possible applications and future directions for the tool.
- Performance of the Archive - Hakim Weatherspoon,
  From a performance standpoint, is archiving realistic. There are three options for archiving data and each decision has tradeoffs. First, no archiving at all (tradeoff: no durability). Second, archiving offline or delayed (tradeoff: time to durability). Third, archiving inline (response time is reduced, but have higher durability guarantees).
- Platform Modular Redundancy with Virtual Machines - James Cutler
  Platforms include the hardware, operating system, and middleware used by an application. Traditional fault tolerant approaches have produced reliable hardware. The application coders are responsible for the application. But who verifies correct operation of the OS and other middleware? Two interesting examples include: 1) There is an applications with a bug, which, when activated results in different failure modes on different platforms (i.e. livelock, segmentation fault, silently wrong), 2) The application is correct but the platforms are faulty. In this work, we explore the use of virtual machine technology to provide redundancy and genetic diversity in platform technology. Modular redundancy and voting is used to detect and mask failures in the application and platform as described in 1 and 2.
- RAINS: Redundant Array of Indepedent, Non-Durable Stores - Andy Huang, Armando Fox
  Some types of state (e.g., shopping cart state) don't require absolute consistency and durability, but since there isn't a better alternative, this state is stored in storage servers that provide absolute guaranteeBrocade: Landmark Routing on Overlay Networkss (e.g, DBMS, DDS). These guarantees come at a high cost; they make node failures expensive (operations are suspended during recovery), make scaling expensive (adding nodes requires downtime and administration), and limit performance (coordination incurs overhead). Our hypothesis is that we can reduce or eliminate these costs by relaxing consistency and durability. We propose a storage solution that achieves probabilistic consistency and durability by storing state redundantly in an array of independent, non-durable state stores. In our initial implementation, we intend to find out whether usable levels consistency and durability can be achieved while reducing the costs associated with node failures, scaling, and coordination overhead.
- A Recursively Restartable iRoom - JP Schnapper-Casteras, George Candea, Armando Fox
  A recursively restartable (RR) system is one capable of gracefully tolerating restarts at multiple levels; RR was proposed as a recovery-oriented technique for improving availability in complex software systems. We have successfully applied RR to a satellite ground station and improved its recovery time fourfold. We are now applying RR to the Stanford Interactive Workspaces Room (iRoom), a multi-node, Windows-based service-rich environment. The iRoom is a complex system, built from COTS software as well as custom-written applications; it is a mix of closed and open source programs, written in several different programming languages.
- Supporting Rapid Mobility via Locality in an Overlay Network - Ben Zhao
  In this poster, we present "Mobile Tapestry", an extension to the Tapestry overlay network protocol, that enables scalable, fault-tolerant, and timely delivery of network messages, including multimedia streams, to and from rapidly moving nodes. Mobile Tapestry efficiently supports individual mobile nodes and, by using an approach we call hierarchical mobility, it also supports large groups of mobile nodes simultaneously moving together. Mobile Tapestry leverages the Tapestry's locality mechanisms to reduce the latency and bandwidth of mobility update traffic, while eliminating the routing inefficiencies and availability problems of IP-based mobility protocols, such as Mobile IP. Our simulation results show that Mobile Tapestry significantly outperforms Mobile IP by providing lower latency and higher fault tolerance for message delivery.
- Tapestry: Motivation and Algorithms - Kris Hildrum
  This poster will have two parts. In the first part, we will motivate Tapestry by exploring the problem of finding objects in a distributed network. We will explain why the simple solutions are not good enough, and compare Tapestry to other less-simple systems like Chord, CAN, and Pastry. The second part will focus on the algorithms needed to maintain Tapestry, including node insertion and deletion, with a special focus on what needs to be done to handle simultaneous insertions. These results will appear in SPAA 2002.
- Towards Building the OceanStore Web Cache - Patrick Eaton
  We have previously presented a cooperative web caching architecture built on top of OceanStore. We present refinements to the original design based on experiences building and measuring the system. We describe further challenges for the system and ideas for overcoming these issues.
- Towards a Theory for Undo - Aaron Brown
  A major tenet of the ROC vision is that service systems should provide a system-wide Undo mechanism. Undo promotes a more forgiving environment for the system operator, allowing effective trial-and-error learning and harnessing the unique human capacity of hindsight; Undo can be used to recover from human operator errors, unexpected side-effects of upgrades and reconfigurations, and even external attacks. Implementing an effective system-level Undo is rather challenging, requiring extensive mental gymnastics on the part of the system designer, who has to anticipate all of the possible "paradoxes" that can arise when system state is rewound backward in time, altered, then replayed forward again. To address this complexity, and to provide a framework for describing and analyzing the behavior of Undoable systems, we have begun to develop a theory and model for Undo. Our theory defines a basic architecture for Undoable systems, establishes a model for tracked operations and undo history, and relates the existence of external inconsistencies to properties of those operations and histories. It thus provides a framework for reasoning about Undo-induced paradoxes, establishes criteria for the correctness and completeness of an Undo implementation, and provides guidance in implementing a generalized Undo layer. Finally, our theory establishes a set of requirements that must be met if a service system is to be extended with Undo functionality.
- Using Introspection to Improve Internet Services - Steve Czerwinski
  This poster describes how Internet services can use introspection to improve their adaptability to changes in client and network resources. The key idea is to monitor and measure client actions in order to build models that can be used to predict costs of future actions. Using this, Internet services can automatically choose which optimizations should be enabled to improve performance. These optimizations are built using code and data migration techniques. This is ongoing research, and the results shown come from my recent qualifying examination.
- Why do Internet services fail, and what can be done about it? - David Oppenheimer
  We studied the causes of failure in three large-scale Internet services. We found that operator error is the largest single contributor to service failure (end-user visible failures). These errors are the most difficult type of failure to mask, and are generally due to configuration problems. We also found that front-end software can be a significant cause of service failures and that back-end failures, while infrequent, take longer to repair than do front-end failures. Finally, we found that for at least one of the services, online correctness testing and better exposing and detecting of component failures (hardware, software, and human) would have mitigated many of the service failures that were observed.

Last Updated: 02/12/2004 09:21

The Berkeley/Stanford Recovery-Oriented Computing (ROC) Project Summer 2002 Retreat

The Berkeley/Stanford
Recovery-Oriented Computing (ROC)
Project Summer 2002 Retreat