Alpha Co. Incident Report

Alpha Co. Incident Report

Paul Lanzi

Writer’s comment: The incident report is one of the many technical reports we learned about in English 104A. After an incident has occurred, a formal process is typically followed in order to address both the incident’s cause and its repercussions. The incident report is critical in that process: it serves as an unbiased presentation of the facts and offers recommendations to prevent further such incidents.
         I am indebted to my professor, Victor Squitieri, for his gracious assistance. The knowledge I gained in English 104A has already proven invaluable in the working world. As an analyst in a Fortune 500 company, I am often required to produce professional reports for upper management. Sometimes the decision whether or not to fund a project may hinge on the quality of a single report. Suffice it to say that, armed with the experience gained in Dr. Squitieri’s 104A class, I have not yet had one of my projects turned down for funding.
—Paul Lanzi

Instructor’s comment: In a technical writing course, one of the toughest challenges students face is understanding just how radically professional audiences differ from academic audiences. Familiar only with writing for their instructors, students struggle with the task of designing reports that aim to accomplish real goals within professional organizational environments. To craft a memo or report that suits the divergent needs of a complex audience, students must consciously adopt a new writing paradigm—a paradigm, moreover, that breaks nearly all the artificial “rules of good writing” methodically beaten into their heads in high school and lower-division English classes.
         Paul Lanzi clearly understands the difference between student and professional writing. His incident report is precise, informative, and professional. And when one looks at his accompanying organizational chart and detailed audience analysis (both included below) the acuity of his report design process is abundantly clear.
—Victor Squitieri, English Department

 

M E M O R A N D U M

October 17, 2000

To: Fletcher Wilson, Vice President of Network and System Operations
From: Paul Lanzi, Chief Technical Officer
Subject: Unplanned System Downtime on October 13, 2000, and Recom- mended System and Procedural Enhancements
Distribution: Nathan Farr, President.; Anna Massey, Vice President of Sales and Marketing; Danielle DeLong, Vice President of Finance and Human Resources; William Kennick, West Coast Technical Operations Director; Suzy McSuze, East Coast Technical Operations Director; Clandestine Blackops, Chief Operations Officer

Incident

On Friday, October 13, 2000, at approximately 9:00 am, the Alpha Co. Corporate Data Management System Extranet suffered a massive systems failure. The malfunction originated at the Main Data Center, located at One Alpha Co. Circle in Seattle, Washington, and quickly spread to several Regional Data Centers: San Diego, California; Houston, Texas; New York, New York; and Lansing, Michigan. By 9:05 am, all data traffic on the main corporate extranet had stopped.

Cause

The full scope of the failure was not discovered until William Kennick, West Coast Technical Operations Director (TOD) arrived in the Seattle office at approximately noon. Kennick found six inches of standing water in the primary router room, caused by a water leak in the pipe running through the ceiling. Construction work on the floor above ruptured the pipe at approximately 8:45 am. The water leak was the direct cause of the disruption of the core router’s power. Donna Stevens of Big Building Management gave notice of the construction work (as required in paragraph 21 of the lease contract for the building) to Kennick on October 1, 2000.

Results

Backup routers should have automatically engaged within five minutes of the primary systems failure. Martin Friend, System Operations Team Lead for the West Coast (located at the San Diego Data Center) unsuccessfully attempted to contact Kennick at 9:06 am. Kennick was at a meeting with his daughter’s pre-school teacher at the time and was not reachable at either his office nor home telephone numbers (both of which Friend tried). Corporate Network Security Policy 4.25J requires the authorization of a TOD or above to switch the Corporate Data Management System to backup routers. At 9:07 am, Martin Friend manually remotely initialized the backup routers in Seattle without authorization from a TOD. By 9:09 am, the corporate extranet was once again carrying traffic, albeit at a diminished rate. During the nine minutes of downtime, approximately 150 megabytes of data were lost. The following calculable expenses came about as a direct result of the failure:

 

Cisco 5721 Enterprise Router $19,345
3COM Fiber Splitter $3,345
Raised floor replacement $35,000
False ceiling replacement $1,500
Repair to damaged (but salvage- able) networking equipment $5,670

Total Cost: $64,860
 

Recommendations

The corporate extranet carries traffic between data centers, including secure e-mail messages, scheduling updates and mission-critical server-to-server messages. Due to the vital nature of the corporate extranet, ensuring uptime on this system is a top operational priority. To that end, please submit your feedback on the following recommendations to me as soon as possible:

  1. Reduce the authorization restrictions in Corporate Network Security Policy 4.25J. Specifically, allow any Team Lead to make the determination to switch to backup routers.
  2. Clarify the policy requiring Data Center supervisors (in this case, Kennick) to be on site during construction. Possibly issue an information flash via company e-mail to the relevant Data Center supervisors.
  3. Direct Kennick to assign support staff to examine why the five-minute tripwire wasn’t engaged. Staff should revise scripts and/or implement new systems as needed. Also, have the support staff send an update report to Chief Technical Officer in two weeks.
  4. Assign a pager to each TOD so that in the event of an emergency, they can be quickly contacted and consulted. Publish the pager numbers of the TODs on a need-to-know basis.
  5. Direct Chief Operations Officer to investigate changing the location of the Main Data Center to a more fault-tolerant location. With the recent relocation of Friend’s team to the San Diego Datacenter, several office locations have been vacated. One of these offices might provide a more suitable location for the primary core routers.

I invite any queries you might have regarding this incident, or any of the recommendations specified above. Jessica Harris, my administrative assistant, has scheduled a meeting on your Microsoft Outlook calendars for October 24th at 10:15 am (PDT) in the Enterprise Conference Room to discuss this incident. In attendance will be Fletcher Wilson, Vice President of Operations; Paul Lanzi, Chief Technical Officer; Suzy McSuze, East Coast Technical Operations (via teleconference); William Kennick, West Coast Technical Operations; and Clandestine Blackops, Chief Operations Officer (via videoconference).

PL/jh

Audience Analysis

PRIMARY AUDIENCE

Fletcher Wilson, Vice President of Operations
Use of Report: Evaluate proposed recommendations, assign appropriate resources
Information Needed:
1. Financial and procedural impact of recommendations.
2. Are proposed recommendations sufficient to prevent problems in the future?
3. Are there other procedural changes currently in progress that could have an affect on the recommendations presented in this report?
4. What resources will be required to put recommendations into motion?

SECONDARY AUDIENCE

Clandestine Blackops, Chief Operations Officer
Use of Report: Assess recommendation #5 and offer subsequent advice.
Information Needed:
1. Financial and procedural impact of recommendation #5.
2. What are the requirements of the Main Data Center?
3. What resources (office spaces) are available?

William Kennick, West Coast Technical Operations Director
Use of Report: Assign appropriate resources to approved recommendations. Serve as an informed participant in Wilson’s decision-making process.
Information Needed:
1. Financial and procedural impact of recommendations.
2. Impact on day-to-day operations for Network and Systems teams.
3. Availability of resources to be assigned to the recommendations.

Suzy McSuze, East Coast Technical Operations Director
Use of Report: Assign appropriate resources to approved recommendations. Serve as an informed participant in Wilson’s decision-making process. Submit recommendations to CTO regarding how to stop similar events from happening in offices under her administration.
Information Needed:
1. Financial and procedural impact of recommendations.
2. Impact on day-to-day operations for Network and Systems teams.
3. Availability of resources to be assigned to the recommendations.

IMMEDIATE AUDIENCE

Anna Massey, Vice President of Sales and Marketing
Use of Report: To inform clients and staff why the extranet was unavailable for the time period specified. Also, to inform staff as to the measures currently under consideration to prevent such a failure in the future.
Information Needed: An easy to understand explanation as to why the extranet was unavailable.

Danielle DeLong, Vice President of Finance and Human Resources
Use of Report: To inform clients and staff why the extranet was unavailable for the time period specified. Also, to inform staff as to the measures currently under consideration to prevent such a failure in the future. Possible upcoming budget increases for the affected departments.
Information Needed:
1. Financial and procedural impact of recommendations.
2. A simple, easy to understand explanation as to why the extranet was unavailable.