 |
The Berkeley/Stanford
Recovery-Oriented Computing (ROC)
Project
|
The Recovery-Oriented Computing (ROC) project is a joint Berkeley/Stanford
research project that is investigating novel techniques for building highly-dependable
Internet services.
Read an overview of our research
into Recovery-Oriented
Computing.
Quick navigate to: [ Research | People |
Publications | Talks | Retreats
| Projects | Stanford
Page | Internal ]
ROC News
People
Courses
Fall 2004
- Berkeley
CS294-4/Stanford CS444A "Reliable Adaptive Distributed Systems".
Thursdays, 3:15-6:00 PM (Gates 100, Stanford) or 2:10-6:00 (320
Soda Hall, Berkeley). From 9 September to 9 December 2004.
Fall 2001
Publications
General ROC
- Patterson, D. A., A. Brown, P. Broadwell, G. Candea, M. Chen, J. Cutler,
P. Enriquez, A. Fox, E. Kiciman, M. Merzbacher, D. Oppenheimer, N. Sastry,
W. Tetzlaff, J. Traupman, N. Treuhaft. Recovery-Oriented Computing (ROC):
Motivation, Definition, Techniques, and Case Studies. UC Berkeley
Computer Science Technical Report UCB//CSD-02-1175, March 15, 2002. [pdf]
[doc]
- Brown, A. and D. A. Patterson. Embracing Failure: A Case for
Recovery-Oriented Computing (ROC). 2001 High Performance
Transaction Processing Symposium, Asilomar, CA, October 2001. [pdf]
- Brown, A. and D. A. Patterson. To Err is Human. Proceedings of the First
Workshop on Evaluating and Architecting System dependabilitY (EASY '01),
Göteborg, Sweden, July 2001. [pdf]
- Brown, A. Accepting Failure: Availability through Repair-Centric System Design. UC Berkeley Qualifying Examination Proposal,
Berkeley, CA, April 2001. [pdf]
ROC Techniques
- Brown, A. A Recovery-Oriented Approach to Dependable Services:
Repairing Past Errors With System-Wide Undo,
UC Berkeley Computer Science Division Technical Report
UCB//CSD-04-1304, December 2003.
[abstract]
[pdf]
- Brown, A. and D. A. Patterson. Undo for Operators: Building an Undoable
E-mail Store. In Proceedings of the 2003 USENIX Annual
Technical Conference, San Antonio, TX, June 2003 (Best Paper Award). [pdf]
[html]
- Brown, A. and D. A. Patterson. Rewind, Repair, Replay: Three R's to
Dependability. 10th ACM SIGOPS European Workshop, Saint-Emilion,
France, September 2002. [pdf]
- George Candea and Armando Fox. A
Utility-Centered Approach to Building Dependable Infrastructure Services,
Appears in Proc. 10th ACM SIGOPS
European Workshop (EW-2002), Saint-Émilion, France, September 2002. [pdf]
- Broadwell, P., N. Sastry and J. Traupman. FIG: A Prototype Tool for Online
Verification of Recovery Mechanisms. To appear in Workshop on
Self-Healing, Adaptive and self-MANaged Systems (SHAMAN), New York, NY,
June 2002. [pdf]
- George Candea, James Cutler, Armando Fox, Rushabh Doshi, Priyank Garg,
Rakesh Gowda.
Reducing Recovery Time in a Small Recursively Restartable System. Proc.
International Conference on Dependable Systems and Networks (DSN-2002),
Washington, D.C., June 2002. [pdf]
- George Candea and Armando Fox.
Recursive
Restartability: Turning the Reboot Sledgehammer into a Scalpel. Proc. 8th
Workshop on Hot Topics in Operating Systems (HotOS-VIII), Schloss Elmau,
Germany, May 2001. [pdf]
Diagnosis
- Oppenheimer, D. The importance of understanding distributed system
configuration. System Administrators are Users, Too:
Designing Workspaces for Managing Internet-Scale Systems (CHI 2003 (Conference on Human Factors in
Computing Systems) workshop), April 2003. [pdf]
- Chen, M., E. Kiciman, E. Fratkin, E. Brewer and A. Fox. Pinpoint: Problem Determination in Large, Dynamic, Internet Services.
Proceedings of the International Conference on Dependable Systems and Networks
(IPDS Track), Washington D.C., 2002. [Abstract]
[pdf]
- George Candea and Armando Fox.
Designing
for High Availability and Measurability. Proc. 1st Workshop on Evaluating and
Architecting System Dependability (EASY), Göteborg, Sweden, July 2001. [pdf]
- Brown, A., G. Kar, and A. Keller. An Active Approach to Characterizing
Dynamic Dependencies for Problem Determination in a Distributed Environment.
Proceedings of the Seventh IFIP/IEEE International Symposium on
Integrated Network Management (IM 2001), Seattle, WA, May 2001. [pdf]
Benchmarking and System Measurement
- Brown, A., L. Chung, W. Kakes, C. Ling, and
D.A. Patterson. Experience with Evaluating Human-Assisted Recovery
Processes. Proceedings of the 2004 International Conference on
Dependable Systems and Networks. Florence, Italy, June 2004.
[pdf] [materials]
- Broadwell, P. Response Time as a Performability Metric for Online Services.
UC Berkeley Computer Science Technical Report UCB CSD-04-1324, May 2004.
[pdf]
- Oppenheimer, D., Archana Ganapathi, and
David A. Patterson. Why do Internet services fail, and what can be done
about it? 4th USENIX Symposium on Internet Technologies and Systems (USITS
'03), March 2003. [pdf] [talk
slides]
- Oppenheimer, D., Aaron B. Brown, Jonathan Traupman, Pete Broadwell, and
David A. Patterson. Practical issues in dependability benchmarking. Second
Workshop on Evaluating and Architecting System Dependability (EASY),
October 2002. [pdf]
- Oppenheimer, D. and D. A. Patterson. Studying and using failure data from
large-scale Internet services. 10th ACM SIGOPS European Workshop,
Saint-Emilion, France, September 2002. [pdf]
- Merzbacher, M and Dan Patterson. Measuring End-User Availability on the Web:
Practical Experience. International Performance and Dependability Symposium,
Washington DC, June 2002. [pdf]
[ps] [doc]
- Oppenheimer, D. Why do Internet services fail, and what can be done about
it? UC Berkeley Computer Science Division Technical Report UCB//CSD-02-1185,
May 2002. [pdf]
- Patterson, D. A. A simple way to estimate the cost of downtime. Submission to
16th Systems Administration Conference (LISA '02), 2002. [pdf]
- Brown, A., L. C. Chung, D. A. Patterson. Including the Human Factor in
Dependability Benchmarks. To appear in the 2002 DSN
Workshop on Dependability Benchmarking, Washington, D.C., June 2002. [pdf]
- Oppenheimer, D. and D. A. Patterson. Architecture, operation, and
dependability of large-scale Internet services: three case studies. IEEE Internet Computing special issue on Global
Deployment of Data Centers, September/October 2002. [pdf]
- Brown, A. Towards Availability and Maintainability Benchmarks: A Case
Study of Software RAID Systems. UC Berkeley Masters Report, also
available as UC Berkeley Computer Science Division Technical Report UCB//CSD-01-1132,
Berkeley, CA, January 2001. [pdf] [ps]
- Brown, A. Availability Benchmarking of a Database System. Unpublished
report, soon to be a Technical Report, Berkeley, CA, December 2000.
- Brown, A. and D.A. Patterson. Towards Availability Benchmarks: A Case
Study of Software RAID Systems. Proceedings of the 2000 USENIX Annual
Technical Conference, San Diego, CA, June 2000. [pdf]
[ps]
[html]
ROC Hardware
- Oppenheimer, D., A. Brown, J. Beck, D. Hettena, J. Kuroda, N. Treuhaft,
D.A. Patterson, and K. Yelick. ROC-1: Hardware Support for Recovery-Oriented
Computing. IEEE Transactions on Computers, vol. 51, no. 2, February
2002. [pdf]
Talks
General ROC
- A Simple Way to Estimate the Cost of Downtime. David Patterson. USENIX
16th System Administrators Conference (LISA '02). Presented November 7, 2002,
Philadelphia, CA. [ppt] [pdf]
- Recovery Oriented Computing. David Patterson. Presented at Princeton
University, University of Illinois, and University of Michigan, October
2002. [ppt] [pdf]
- Recovery Oriented Computing: A New Research Agenda for a New Century.
David Patterson. 8th
International Symposium on High-Performance Computer Architecture (HPCA 8)
Keynote address, Presented February 6, 2002, Boston, MA. [Abstract]
[ppt] [pdf]
[MADtv clip]
[MadTV clip script]
- Availability and Maintainability >> Performance: New Focus for a New
Century. David Patterson. USENIX
Conference on File and Storage Technologies (FAST '02) Keynote address,
Presented January 29, 2002, Monterey, CA. [Abstract]
[ppt] [pdf]
- Recovery-Oriented Computing. Keynote Address by David Patterson at High
Performance Transaction Systems Workshop (HPTS), October 2001.
[ppt] [pdf]
- CS 294-4 First lecture. David Patterson. September 6, 2001 [ppt]
[pdf]
- Recovery-Oriented Computing. David Patterson. HP Labs, June 6, 2001.
[Abstract] [ppt] [pdf]
- Embracing Failure: Availability through Recovery-Oriented Computing (ROC).
Aaron Brown. Stanford CS548 Guest Lecture, May 2, 2001.
[Abstract]
[ppt] [pdf]
- Embracing Failure: Availability through Repair-Centric Design. Aaron
Brown. UC Berkeley Qualifying Examination Presentation, April 13, 2001.
[ppt] [pdf]
-
Reboot-Based High Availability.
George Candea. Work-in-progress talk and poster, Symposium for Operating System Design and
Implementation (OSDI),San Diego, CA, October 2000. [pdf](Abstract)
[pdf](Poster)
-
Measuring End-User Availability and the Web:
Practical Experience. Matthew Merzbacher. International Performance and Dependability Symposium,
Washington DC, June 24, 2002. [ppt -
61.5KB]
Undo and Human Error
- Rewind, Repair, Replay: Three R's to Dependability. Aaron Brown.
SIGOPS European Workshop, St. Emilion, France, September 2002. [ppt]
- Rewind, Repair, Replay: Three R's to cope with human error. Talk given
at IBM Almaden, March 2002.
[ppt] [pdf]
- Bringing Undo to System Administration: A New Paradigm for Recovery.
Work-in-progress talk, 15th Annual Systems Administration Conference (LISA 2001),
December 2001. [ppt] [pdf]
- To Err is Human. First EASY Workshop, Göteborg, Sweden, July 1,
2001. [ppt] [pdf]
- Addressing Human Error with Undo. Summer 2001 ISTORE Retreat,
Granlibakken, CA, June 2001. [ppt] [pdf]
ROC Techniques
Diagnosis
- An Active Approach to Characterizing Dynamic Dependencies for Problem
Determination. IM 2001 Conference, May 16, 2001. [ppt]
[pdf]
Benchmarking
- Availability and Maintainability Benchmarks: A Case Study of Software RAID
Systems. UC Berkeley CS294-8 Guest Lecture, November 7, 2000. [ppt]
[pdf]
Retreat Talks and Posters
Hardware
Projects
The ROC project is funded by NSF grant no. CCR-0085899, the NASA CICT (Computing, Information & Communication Technologies) Program,
an NSF CAREER award, Allocity, Hewlett Packard, IBM, Microsoft, NEC, and Sun Microsystems.
Contact: roc-group at cs.berkeley.edu.
Last updated:
06/20/2005 11:21