CS 444A Supplementary Reading List

Last modified: $Date: 2001/09/12 17:06:54 $

Recommended Textbooks

Systems Overview

Design For Failure

Glenn Reeves' excellent description also includes the following comment, which I found highly illuminating and with which I resonated quite a bit:

We did have one other thing on our side; we knew how robust our system was because that is the way we designed it.

We knew that if this problem occurred we would reset. We built in mechanisms to recover the current activity so that there would be no interruptions in the science data (although this wasn't used until later in the landed mission). We built in the ability (and tested it) to go through multiple resets while we were going through the Martian atmosphere. We designed the software to recover from radiation induced errors in the memory or the processor. The spacecraft would have even done a 60 day mission on its own, including deploying the rover, if the radio receiver had broken when we landed. There are a large number of safeguards in the system to ensure robust, continued operation in the event of a failure of this type. These safeguards allowed us to designate problems of this nature as lower priority.

We had our priorities right.

Models and Characterizations

Techniques

 


fox@cs.stanford.edu