Undo for System Administrators and Operators

Recovery-Oriented Computing Research Group
University of California, Berkeley

One of the key tenets of the Recovery-Oriented Computing (ROC) philosophy is that systems should provide undo functionality for their operators and administrators, to allow those operators and administrators to recover from human errors that they make, as well as to recover from failed operations like software upgrades, installs, and configuration updates. A useful undo-based recovery mechanism must go beyond traditional system-wide recovery mechanisms (like backup/restore) by preserving  intervening work that has been performed by end-users of the system being recovered. This additional criterion raises several interesting research challenges, notably how to reconcile the conflicting goals of transparency for the end-user and recovery power for the administrator.

To explore and address these research challenges, we have built a prototype system that provide a system-wide undo facility for administrators and operators. The research prototype consists of a generic undo manager that coordinates the undo process and that provides a framework for expressing application-specific policy with regard to externally-visible consistency across undo cycles. The consistency policies are the key to the undo system's operation, as they provide the application-specific knowledge needed to reconcile end-user transparency and recovery power. The application-specific policies are expressed in terms of verbs, objects that represent end-user actions on the service and that are annotated with a notion of end-user acceptable consistency and compensation.

The prototype we have built also includes an example specialization of the generic undo manager framework to provide undo functionality for an e-mail store service supporting the IMAP and SMTP mail protocols. This prototype implementation wraps an existing e-mail store service and provides undo functionality via an interposed proxy and rewindable storage layer.

The design of the undo system and the e-mail case study are described in much greater detail in the following paper and dissertation:

A. Brown and D. Patterson, "Undo for Operators: Building an Undoable E-Mail Store." In Proceedings of the 2003 USENIX Annual Technical Conference, San Antonio, TX, June 2003. [pdf] [html]

A. B. Brown, "A Recovery-Oriented Approach to Dependable Services: Repairing Past Errors with System-Wide Undo." UC Berkeley Computer Science Division Technical Report UCB//CSD-04-1304, December 2003. [abstract] [pdf]

Source code for our research prototype is also available. Please be aware that this code is still highly experimental and not optimized for production use. It is intended solely to illustrate the techniques and structures needed to enhance an e-mail server or other service applications with "operator undo" functionality. As such, it may be missing important pieces of functionality or error-handling code that would be expected in a production version. No express or implied guarantees are made concerning its completeness, robustness, performance, safety, or correctness.

Download the source code from the following links:

This code is made available under a Berkeley-style license; complete copyright and terms can be found here.

Last modified 03 May 2005 11:20:42 -0700 . Contact: roc-group at cs.berkeley.edu.