ROC Retreat Agenda

The Berkeley/Stanford
Recovery-Oriented Computing (ROC)
Project Summer 2003 Retreat

Slides
- Why Recovery Should Be free, And Often Can Be - Armando Fox
- Path-Based Macroanalysis for Large Distributed Systems - Mike Chen
- On the way to ROC-2” (JAGR: JBoss + App-Generic Recovery) - George Candea
- Detecting and Diagnosing Application-Level Failures in Internet Services - Emre Kiciman
- Distributed Hash Table Benchmarking Experience - David Oppenheimer
- DStore: Recovery-friendly, self-managing clustered hash table - Andy Huang
- Strider: a scientific approach to systems management and support - Yi-Min Wang, Microsoft
- Overview of New Research Proposal “Robust Adaptive Distributed Systems - Randy Katz
- Faculty Investigator Panel Session: Jordan/Katz/Necula/Patterson/Stoica/Tygar
- What do E-mail system administrators do? - Billy Kakes
- Undo: Update and Futures - Aaron Brown
- Profiling and Diagnosing Large-scale Decentralized Systems - David Oppenheimer
- Latency as a Performability Metric for Internet Services - Pete Broadwell
- The vMatrix: Server Switching - Amr Awadallah
- Performability at Yahoo Search - Amr Awadallah
- A Recovery-Friendly, Self-Managing Session State Store - Ben Ling
RADS Breakouts—Developing a New Research Agenda
- Integrating Service/Server/Network Monitoring, Measurement & Management
- Reliability Benchmarking for Networks, Servers, and Services
- Managing Denial of Service and Service Failures in Systems
- Deploying P2P and Overlay Networks
- Minimizing the Effect of Operator Errors and Misconfigurations in System Failures
- Verifying and Learning Correct Service and Protocol Behaviors
Posters
- Network Storage Systems for the Future - Atsushi Ishikawa, Ryusuke Ito - Hitachi Ltd.
  In recent years, the relationship between network technology and storage technology became very close, which has led to a lot of useful solutions for computer systems. In the first half of this poster (Part I), we clarify users' system requirements and priorities, as well as trends in network storage technology (e.g. NAS, iSCSI, overlay networks, IP-VPN, DWDM). We will present a near-future vision of existing storage system architectures which are categorized by their scale (e.g. LAN/DAS scale, Intranet/SAN scale, WAN/Wide-area SAN scale). Then in the second half of our poster (Part II), we will focus on "distributed storage systems" and present a vision and possibilities of them with their assumptive topologies and applications by considering recent research's merit & demerit.
- Improving Service Availability Measurements - Steve Zhang
  We currently base web service availability on the success or failure of individual HTTP requests, where each request has equal weighting in determining the availability of the service. However, this technique is subject to several pitfalls. Most often, the measured availability is inflated by numerous requests for images embedded in a webpage. This poster explores several ways to mitigate this effect. This is a work-in-progress type of poster designed to generate discussion and feedback as we try to find a good way of measuring the availability of the future ROC-2 platform.
- Latency as a Performability Metric for Internet Services - Pete Broadwell
  Compared with throughput or availability, response time offers a better view of the end-user experience that an interactive service provides during failures. This ongoing study considers the best ways to record, summarize and examine latency-based measurements in order to improve the reliability of online services. Of particular interest is how latency measurements interact with other aspects of the user experience, such as data quality.
- Automating Data Dependability - Kim Keeton, John Wilkes, HP Labs
  Constructing dependable storage systems is difficult, because there are many techniques to pick from that interact in often unforeseen ways. The resulting storage systems are often either over-provisioned, or provide inadequate protection, or both. We assert that automating our way out of this dilemma is both desirable and achievable, and we present some lessons we have learned from our initial efforts at doing so. The result is a first step down the path of self-managing, dependability-aware storage systems, including a better understanding of the problem space and its tradeoffs, and a number of insights that we believe will be helpful to others.

Last Updated: 02/12/2004 09:21

The Berkeley/Stanford Recovery-Oriented Computing (ROC) Project Summer 2003 Retreat

The Berkeley/Stanford
Recovery-Oriented Computing (ROC)
Project Summer 2003 Retreat