Anne Marie Porrello
in partial completion of the requirements for
Computer Science 440
Dennis W. Butler, Instructor
Imagine that your doctor has recently informed you that you have cancer, most probably terminal, and your only hope for a cure involves radiation therapy. How much are you going to question the radiation process and equipment? Are you going to ask:
I would guess that you might ask a few of these questions, but that you would assume that the machine delivering the radiation is safe and that the people who designed it and manipulate it are properly qualified. So, whose job is it to ask, and answer, the other questions?
Between June 1995 and January 1987, six patients were seriously injured or killed by unsafe administration of radiation from the Therac-25 medical linear accelerator. In this paper I will first explain what a medical linear accelerator is and then describe the birthing process of the Therac-25. Next, I will examine the accident history, and explore the causes behind the accidents. I will study various propositions, regarding the accidents, made by Atomic Energy of Canada Limited (AECL), the company who designed and produces the Therac-25. In addition, I will examine the Therac-25's software bugs. Lastly, I will look at the government's reactions and explore what has been done to prevent similar accidents in the future. By the end, we should have an answer to the question: Whose job is it to ask, and answer, medical equipment safety questions?
The Therac-25 is a medical linear accelerator manufactured by AECL. A linear accelerator ("linac") is a particle accelerator, a device that increases the energy of electrically charged atomic particles. The charged particle are accelerated by the introduction of an electric field, producing beams of particles which are then focused by magnets.
Linacs are used to treat cancer patients. A patient is exposed to beams of particles, or radiation, in doses designed to kill a malignancy. Since malignant tissues are more sensitive than normal tissues to radiation exposure, a treatment plan can be developed that permits the absorption of an amount of radiation that is fatal to tumor cells but causes relatively minor damage to normal tissue. Shallow tissue is treated with electrons, but to reach deeper tissue, X-ray photons are needed (Grolier, 1985).
AECL combined forces with a French company, CGR, and created two linacs before the Therac-25: the Therac-6 and the Therac-20. The Therac-6 is a six million electron volt (MeV) accelerator that produced X-rays only; and the Therac-20 is a 20-MeV X-ray or electron accelerator. (An eV, the electron volt, is a unit of work needed to move an electron through a potential of 1 volt (Grolier, 1985).) Eventually, after the companies ended their partnership, AECL developed the Therac-25. Like the Therac-20, the Therac-25 is a dual-mode machine, but it requires much less space because it has a unique design structure (Leveson and Turner, 1993, p.19) . The Therac-25 uses two magnets to fold the electrons 180 degrees and 270 degrees before reaching their target. By positioning elements correctly, a turntable controls which mode the machine will use. When the machine is in electron mode, magnets on the turntable spread the beam to a safe concentration. In electron mode, various levels of energy are available (from 5 to 25-MeV) (O'Brien, 1985, p. 101). In photon mode, a much greater electron-beam current is needed because a "beam flattener" is used to produce a consistent treatment area. Only one level of energy (25-MeV) is available in photon mode (O'Brien, 1985, p. 101). If the beam flattener is not in position, a dangerously high output rate will occur; this is a significant hazard of a dual-mode machine, because it is possible that not all the devices will be lined up properly and a high output could occur. The turntable also includes a third mode, the field-light position, which uses a light to help position patients correctly. When the machine is in field light position, no mechanism is used to control the beam concentration because no beam is expected. This produces another possible hazard of the machine, in the event that a beam is incorrectly produced (Leveson and Turner, 1993, p. 25).
The Therac-25 is enclosed in a radiation treatment room in order to prevent unnecessary radiation exposure to individuals working near the machine. The machine operator has contact with the patient through visual and audio monitors located within the treatment room (Leveson and Turner, 1993, p. 25) .
The design of real-time computing systems is the most challenging and complex task that can be undertaken by a software engineer. By its very nature, software for real-time systems makes demands on analysis, design, and testing techniques that are unknown in other application areas. (Pressman, 1992, p. 481)
The Therac-25's software was developed from the Therac-20's software, which was developed from the Therac-6's software. One programmer, over several years, revised the Therac-6 software into the Therac-25 software (AECL has not released any information about the programmer or his credentials). An important difference between the Therac-20 software and the Therac-25 software is the overall role that each plays in the machine. In the Therac-20, the role of software is limited. The software simply adds convenience to the hardware. However, in the Therac-25, software exclusively performs many of the critical safety checks of the system; these safety checks are also included in the hardware of the Therac-20, but were not included in the Therac-25 hardware. The Therac-25 software is responsible for:
The Therac-25 runs on an custom-designed real-time operating system. The software has four major components: stored data, a scheduler, a set of critical and non-critical tasks, and interrupt services. The interrupt services include (among others): a treatment console screen interrupt handler and a treatment console keyboard interrupt handler. The scheduler directs all non-interrupt events and orders simultaneous events. Tasks are divided into critical and non-critical categories. Every 0.1 seconds tasks are initiated and critical tasks are executed first, with non-critical tasks taking up any remaining time. Critical tasks include:
Non-critical tasks include (among others):
The software of the Therac-25 also controls the positioning of the turntable, a possible hazard discussed previously, and checks the position of the turntable so that all necessary devices are in place (Leveson and Turner, 1993, p. 21).
The Therac-25 software also contained several "user-friendly" features. During system testing, operators complained that it took too long to enter the treatment plan, since it had to be done twice: once in the treatment room and a second time at a terminal outside of the room. For convenience, AECL redesigned the software so operators could simply use a set of carriage returns, at the terminal outside the treatment room, to verify the data input within in the room (Leveson and Turner, 1993 p. 24) . Another "convenient" feature of the Therac-25 involved a "proceed" key. There were two ways that the Therac-25 could shut down: a treatment suspend or a treatment pause. A treatment suspend indicated a serious error and required a complete system restart. A treatment pause, which was apparently not as serious, required only a single-key command (the "P" key) to restart the machine, and all treatment specifications remained intact. A treatment pause could occur five times before the machine required a complete system restart. With a treatment pause, a simple error message would occur, i.e. "malfunction" followed by a number of the malfunction. However, there were no indication in the users manual as to what each malfunction number meant (Leveson and Turner, 1993, p. 24) .
In later sections, I will discuss how the real-time nature of the system, the addition of user-friendly features, poor documentation, and failures to secure safety, contributed to the radiation accidents.
The first accident occurred at Kennestone Regional Oncology Center in Marietta. On June 3, 1985, a sixty-one year old woman was receiving follow-up treatment after a malignant tumor was removed from her breast. When the machine was activated, she felt "a tremendous rush of heat&ldots;this red-hot sensation." She told the operator of the Therac-25 "you burned me." Although later she developed reddening and swelling in the center of the treatment area, AECL denied that the machine burned the patient. and the swelling was attributed to normal treatment reaction. Eventually, her shoulder froze and she began to experience spasms. She was admitted to the hospital, but her doctors continued to send her for Therac-25 radiation treatments. Eventually the patient's breast had to be removed, and she completely lost the use of her shoulder and arm (Leveson and Turner, 1993, p. 22).
The second accident occurred at the Ontario Cancer Foundation clinic in Canada. On 26 July 1985, a 40-year old patient received her 24th Therac-25 treatment. During the treatment, the machine caused a treatment pause and issued an "H-tilt" error message. The operator proceeded to push the "P" button since the machine indicated that no dose had been delivered to the patient. The machine continued to shut down and the operator pushed the "P" button each time until the machine suspended after the fifth attempt. Each time the machine indicated that no dose had been given to the patient. The operator of the Therac-25 was used to this type of behavior from the machine and called the technician, who found nothing wrong with the machine. This also was a common situation. The patient, however, complained of an "electric tingling shock" in her hip. Eventually radiation overexposure was suspected and the patient was hospitalized. She died three months later of cancer, but a total hip-replacement would have been necessary if she had continued to live (Leveson and Turner, 1993, p. 23) .
The third accident involved a woman who developed red parallel stripes on her hip, the treatment area. She was treated at the Yakima Valley Memorial Hospital in 1985. Her doctors continue to order treatments for her even after these stripes appeared. Radiation overexposure was not considered as a cause until over a year later. Eventually, the patient received surgical treatment and, except for minor disability and scarring, is alive and well today (Leveson and Turner, 1993, p. 26-27).
Another Therac-25 accident, the fourth in the series, developed
at the East Texas Cancer Center in March of 1986. A male
patient was to receive therapy on his upper back. The Therac-25
operator had typed in incorrect treatment information by
indicating X-ray mode instead of electron mode. She merely used
the "cursor up" key to edit the mode entry and then quickly
pressed "enter" (one of the user-friendly features), and started
treatment. The machine shut down with treatment pause, and a
"malfunction 54
" error message was displayed on the screen.
This error message indicated that either a dose too high or a
dose too low had been delivered. Since an underdose value
appeared on the screen and the operator was used to quirks in
the machine, she hit the "P
" key to continue with the treatment.
The machine repeated the "Malfunction 54
" error message and
indicated the same underdose was delivered. The operator had no
contact with the patient, because the usual audio and video
monitors were not working properly. After the first attempt at
treatment, the patient felt an "electric shock" or as if
"someone had poured hot coffee" on his back. He knew this was
not normal and began to get up from the treatment table when the
second treatment was delivered. The patient felt a tremendous
shock in his arm, and felt that "his hand was leaving his body".
He had to pound on the treatment room door to get the
operator's attention. The patient eventually loss the use of
his left arm and both legs, was unable to speak, and had several
other complications. He died from complications five months
later (Leveson and Turner, 1993, p. 27-28).
A fifth accident occurred, the second at the East Texas Cancer
Center, in April of 1986, just one month later. As in the
previous accident, the same operator entered the wrong mode of
treatment and quickly edited the correct mode in and hit a quick
serious of enter keys. The machine shut down again with a
"Malfunction 54
" message. This time, however, the intercom had
been working and the operator heard a loud noise followed by
moaning from the patient. The patient was receiving radiation
on the side of his face. He died three weeks after the
accident, after falling into a coma and suffering severe
neurological damage (Leveson and Turner, 1993, p. 28) .
The last of the accidents occurred at the Yakima Valley
Memorial Hospital. On January 17, 1987 an operator placed a
patient on the turntable in the field-light position for small
position verification doses. After attempting to administer the
treatment dose, the machine shut down with a quick malfunction
message and a treatment pause. The operator pushed the "P
"
button, and the machine paused again. The machine indicated
that the patient had received his prescribed 7 rad of treatment.
The patient, however, complained of a "burning sensation" and
died three months later from complications related to the
overdose (Leveson and Turner, 1993, p. 33) .
Date of the Accident | Location of the Accident | Extent of injuries to patient | Number of months after the first accident |
---|---|---|---|
June 3, 1985 | Marietta, GA | Breast removal, loss of use of arm | |
July 26, 1985 | Ontario, Canada | Total hip replacement needed | 1 |
January 6, 1986 | Yakima, WA | Minor disability and scarring | 7 |
March 21, 1986 | Tyler, TX | Death | 9 |
April 11, 1986 | Tyler, TX | Death | 10 |
January 17,1987 | Yakima, WA | Death | 19 |
After the first accident, in 1985, AECL was informed about the situation and was asked if the Therac-25 could operate in electron mode without scanning to spread the beam (as described in the hardware section). When AECL responded three days later, it was to say that improper scanning was impossible. The hospital staff had a difficult time discerning the cause of the first burn, because they had never seen a radiation burn of this severity. Eventually, the patient was estimated to have received a dose in the range of 15,000 - 20,000 rad (radiation absorbed dose). To help put this dosage amount into perspective, a normal dose is in the "200-rad range, and doses of 500 -1,000 rad can be fatal if delivered to the whole body (Leveson and Turner 23)." The patient eventually initiated a lawsuit against the hospital and AECL. Even upon notification of the lawsuit, AECL did not proceed to investigate the possible occurrence of scanning failure. They continued to believe that such an event was impossible (Leveson and Turner, 1993, p 23).
AECL responded to the second accident by sending a service engineer to investigate the Therac-25 machine. He was unable to reproduce the malfunction that took place, but suspected that the problem lie in a microswitch used to determine turntable position. In trying to fix this situation, AECL uncovered some problems involving the turntable positioning. AECL made some hardware and software changes to fix these problems. After the changes, AECL wrote a letter to the hospital claiming to have increased the safety of the machine by "at least five orders of magnitude", yet they did not really discover why the accident occurred. The were merely guessing. AECL informed only four users in the United States to discontinue treatment with an "H-tilt" error message. AECL voluntarily recalled the machine while making the above mentioned changes to it (Leveson and Turner, 1993, p. 23).
AECL's reaction to the third accident is perplexing. Upon receiving a letter of notification from the hospital, describing the patient injury, AECL responded with a letter informing the hospital that "after careful consideration, we are of the opinion that this damage could not have been produced by any malfunction of the Therac-25" (Leveson and Turner, 1993, p. 27). The letter continued to explain that an overdose was impossible and that there had been no other similar accidents! The hospital was under the opinion that the safety improvements of the machine, as proclaimed by AECL (as a 10,000,000 percent improvement!) guaranteed that the Therac-25 could not be responsible for the burn. No further action was taken (Leveson and Turner, 1993, p. 23-26).
AECL responded to the fourth accident by suggesting that an electrical problem could have caused the accident. Another engineering firm tested the machine for electrical problems, but found none. AECL continued to claim that the Therac-25 could not possibly overdose a patient, and that no other accidents had been reported to them. No other action was taken (Leveson and Turner, 1993, p. 28).
Unfortunately, it was not until the fifth accident that AECL responded in a thorough way. By this point, however, the FDA was also investigating the Therac-25. The next section will discuss further action taken by AECL.
At about this same time, the FDA was also investigating the Therac-25 accidents. They determined that the Therac-25 was defective and required that AECL submit a corrective action plan (CAP) for FDA approval. They also mandated that AECL inform all users of the Therac-25 of possible machine malfunctions. In response, AECL wrote a letter to users understating the problems with the machine. The FDA responded to AECL's letter as follows:
AECL submitted their first CAP, containing six items which included: fixing the software problem which caused the fifth accident, having Malfunctions 1 - 64 cause a machine suspend rather than pause, and adding a new circuit that only administrative staff can reset if a high pulse is detected. No hardware safety interlocks were mentioned. However, AECL concluded, again, that the CAP changes would improve "machine safety by many orders of magnitude" (of an already ridiculously high figure) "and virtually eliminates the possibility of lethal doses as delivered in the Tyler incident (Leveson and Turner, 1993, p. 32)." However, the FDA was not satisfied and was concerned with the overall software engineering practice of AECL. There was an absence of documentation of software specifications and details of software test plans (Leveson and Turner, 1993, p.32). (Obviously, no one at AECL had taken CSc440 with Dennis Butler.) Later on, in an FDA hearing, the quality assurance manager described testing as done in two parts. The first was a "small amount" of software testing, and the second involved total system testing of 2,700 total hours. He later qualified that this was 2,700 hours of actual machine use (Leveson and Turner 20)! The FDA eventually had to require that AECL do extensive testing on the system each time a software change was made, and that they should write up a software testing plan and installation testing plan.
Before AECL turned in their complete CAP, the sixth and last accident occurred. Although this problem was caused by a separate software problem than in the Taylor accident, the changes specified in the CAP would have prevented the final accident. After the accident, the FDA declared the Therac-25 to be defective and informed users of the serious potential problems and asked that the machine be used only if need outweighs potential risks (Leveson and Turner, 1993).
AECL eventually turned in there completed CAP. In reference to the test plan, a FDA reviewer wrote: "Amazingly, the test data presented to show that the software changes to handle the edit problems in the Therac-25 are appropriate prove the exact opposite result " (Leveson and Turner, 1993, p. 37). Although a data entry problem was determined to be the cause of the incorrect test results, the problem seems indicative of AECL's lack of quality control. Eventually, the CAP was accepted, which included a hardware safety interlock among 20 other hardware and software changes. The Therac-25 machine is still in use today.
The Therac-25 software errors that cause radiation overexposures can be reduced down to interface errors. The first of these errors involved the entering of treatment data by the machine operator. Once an operator enters treatment information at the terminal outside of treatment room, the magnets used to filter and control radiation levels are set. There are several magnets, and the process takes about 8 seconds. If the operator makes a very, very quick change of the treatment information, within 1 second, the change is registered. Or, if the operator is rather slow about it, takes more than 8 seconds, the change is also registered. However, if the change occurs within the eight seconds it takes to set the magnets, the change is not detected and the magnets continue to be set up improperly, and thus the level of radiation is set up improperly. As I mentioned earlier, this is the main hazard of a dual -mode system, and is what happened in the fourth and fifth accidents (Leveson and Turner, 1993, p. 30). Once the magnets are set, there is no test performed to double check that the treatment information entered matches how the magnets are set. Another variable, which controls whether photon or electron mode is to be used, does detect the operator edit and sets the mode to the edited mode. As mentioned earlier, much higher levels of radiation are needed in photon mode to produce the same levels of output in electron mode. Therefore, if the beam is set for photon mode, but the turntable is set up for electron mode, a radiation overdose occurs (Jacky, 1989).
The second of the software errors, causing the sixth and possibly other accidents, involved the nature of the real-time system. When the turntable is in test mode, a variable called Class3 is set to a non-zero value. As long as the operator is testing the position of the light beam, the variable increments. Once testing procedures are complete, the variable is set to zero and the radiation beam is allowed to pass. Class3, however, was stored in one byte of memory. As a result, every 265th increment results in the value of zero assigned to it. In the sixth accident, the operator pushed the set button at the exact moment that Class3 rolled over to zero. As a result, a full prescription beam was released without any of the beam flatteners in place (Leveson and Turner, 1993).
However, AECL was not completely to blame for the Therac-25 accidents; machine operators and technicians also contributed their share of mistakes. For example, it seems strange to me that the operators of the Therac-25 would eventually become comfortable with operating the machine despite frequent error messages. According to Jacky, the Therac-25 typically issued as many as forty error messages a day! Especially since the consequences of machine failure could be death, I believe that the operators had the responsibility of insisting that the machine function properly without the errors, or at least require better documentation of possible errors and their causes. Also, operators of the machine seemed to rely too much on the inflated machine safety statistics, as defined by AECL, as reasons for not further investigating possible overdoses.
Lastly, even the federal government contributed their share to the Therac-25 accidents. Despite knowledge of AECL's poor engineering practices, the FDA allowed the Therac-25 to continue to be used. The FDA also appeared to have too much confidence in AECL's machine safety figures.
I believe that manufacturers are ultimately responsible for the safety of medical equipment that they produce, since they know the machine better than anyone else. And, as Jacky ( 1989) points out, safety should be manufactured into the product, not an after though added to it. However, technicians and operators of medical equipment also share in the responsibility of ensuring safety. Since they have day-to-day contact with the machine, they should be aware of any quirks or inconsistencies. Doctors who monitor patients receiving treatment from medical equipment are also responsible for ensuring patient safety. If a doctor suspects possible equipment malfunction, the matter should be investigated. It also seems to me that treatment should be suspended until the investigation is complete. The federal government should also be involved in ensuring the safety of medical equipment if any question of safety should arise. Finally, patients should always question the safety of medical equipment that could cause injury or death, especially in a society where profit and prestige can take precedence over health and safety. Since the patients are the ones at risk, unfortunately the final responsibility ultimately falls on their shoulders.