|Home | Articles | Forum | Glossary | Books|
In this first section, I will make the point that a significant part of effective troubleshooting lies in the way that you think about the problem. The next section will cover the equipment you should buy and build to help you diagnose problems. Other sections will illuminate some of the more subtle and elusive characteristics of passive and active components, and the PC boards and cables that interconnect them.
Troubleshooting Is More Effective with the Right Philosophy
If you recall that the most boring class in school was a philosophy class, and you think this book will be boring that way, well, WRONG. We are going to talk about the real world and examples of mistakes, goofs, and how we can recover from these mistakes. We are going to talk about all the nasty problems the world tries to inflict on US. We are talking about Trouble with a capital T, and how to overcome it.
Here at National Semiconductor, we decided a couple of years ago that we ought to write a book about switch-mode power supplies. Within the applications and design groups, nearly all of the engineers volunteered to write a section, and I volunteered to do a section on troubleshooting. At present, the status of that book is pretty dubious. But, the "troubleshooting section" is going strong, and you readers are among the first to benefit, because that one section has expanded to become this entire book. Although I am probably not the world's best analog-circuit troubleshooter, I am fairly good, and I just happened to be the guy who sat down and put all these stories in writing.
Furthermore, the techniques you need to troubleshoot a switch-mode power supply apply, in general, to a lot of other analog circuits and may even be useful for some basic digital hardware. You don't have to build switchers to find this book useful; if you design or build any analog circuits, this book is for you.
Maybe there are some engineers who are knowledgeable about digital circuits, computers, microprocessors, and software, who may someday write about the troubleshooting of those types of circuits. That sure would suit me fine, because I am certainly not going to talk about those circuits!! Everybody has to be ignorant about something, and that is exactly what I am ignorant about.
If Only Everything Would Always Go Right...
Why are we interested in troubleshooting? Because even the best engineers take on projects whose requirements are so difficult and challenging that the circuits don't work as expected-at least not the first time. I don't have data on switching regulators, but I read in an industry study that when disk drives are manufactured, the fraction that fails to function when power is first applied typically ranges from 20 to 70%. Of course, this fraction may occasionally fall as low as 1 % and rise as high as 100%. But, on the average, production engineers and technicians must be prepared to repair 20, 40, or 60% of these complex units. Switching-regulated power supplies can also be quite complex. If you manufacture them in batches of 100, you shouldn't be surprised to find some batches with 12 pieces that require troubleshooting and other batches that have 46 such pieces. The troubleshooting may, as you well know, be tough with a new product whose bugs haven't been worked out. But it can be even tougher when the design is old and the parts it now uses aren't quite like the ones you used to be able to buy. Troubleshooting can be tougher still when there isn't much documentation describing how the product is supposed to work, and the designer isn't around any more. If there's ever a time when troubleshooting isn't needed, it's just a temporary miracle. You might try to duck your troubleshooting for a while.
You might pretend that you can avoid the issue.
And, what if you decide that troubleshooting isn't necessary? You may find that your first batch of products has only three or four failures, so you decide that you don't need to worry. The second batch has a 12% failure rate, and most of the rejects have the same symptoms as those of the first batch. The next three batches have failure rates of 23, 49, and 76%, respectively. When you finally find the time to study the problems, you will find that they would have been relatively easy to fix if only you had started a couple of months earlier. That's what Murphy's Law can do to you if you try to slough off your troubleshooting chores ... we have all seen it happen.
If you have a bunch of analog circuits that you have to troubleshoot, well, why don't you just look up the troubleshooting procedures in a book? The question is excellent, and the answer is very simple: Until now, almost nothing has been written about the troubleshooting of these circuits. The best previous write-up that I have found is a couple pages in a book by Jiri Dostal (Ref. 1). He gives some basic procedures for looking for trouble in a fairly straightforward little circuit: a voltage reference regulator. As far as Dostal goes, he does quite well. But, he only offers a few pages of troubleshooting advice, and there is much to explain beyond what he has written.
Another book that has several good pages about the philosophy of troubleshooting is by John I. Smith (Ref. 2). Smith explains some of the foibles of wishing you had designed a circuit correctly when you find that it doesn't work "right." Unfortunately, it's out of print. Analog Devices sells a Data Converter Handbook (Ref. 3), and it has a few pages of good ideas and suggestions on what to look for when troubleshooting data converter and analog circuits.
What's missing, though, is general information. When I started writing about this troubleshooting stuff, I realized there was a huge vacuum in this area. So I have filled it up, and here we are.
You'll probably use general-purpose test equipment. What equipment can you buy for troubleshooting? I'll cover that subject in considerable detail in the next section.
For now, let me observe that if you have several million dollars worth of circuits to troubleshoot, you should consider buying a $100,000 tester. Of course, for that price you only get a machine at the low end of the line. And, after you buy the machine, you have to invest a lot of time in fixturing and software before it can help you. Yes, you can buy a $90 tester that helps locate short circuits on a PC board but, in the price range between $90 and $100,000, there isn't a lot of specialized troubleshooting equipment available. If you want an oscilloscope, you have to buy a general-purpose oscilloscope; if you want a DVM, it will be a general-purpose DVM.
[1. I must say, I recently re-read Mr. Dostal's book, and it is still just about the best technical book on operational amplifiers. It's more complete, more technical, but less intuitive than Tom Frederiksen's Intuitive IC Op Amps. Of course, for $113, it ought to be pretty good. It is getting a little old and dated, and I hope he plans to update it with a new revision soon.]
Now, it's true that some scopes and some DVMs are more suitable for troubleshooting than others (and I will discuss the differences in the next section), but, to a large extent, you have to depend on your wits.
Your wits: Ah, very handy to use, your wits--but, then what? One of my favorite quotes from Jiri Dostal's book says that troubleshooting should resemble fencing more closely than it resembles wrestling. When your troubleshooting efforts seem like wrestling in the mud with an implacable opponent (or component), then you are probably not using the right approach. Do you have the right tools, and are you using them correctly? I'll discuss that in the next section. Do you know how a failed component will affect your circuit, and do you know what the most likely failure modes are? I'll deal with components in subsequent sections. Ah, but do you know how to think about Trouble? That is this section's main lesson.
Even things that can't go wrong, do. One of the first things you might do is make a list of all the things that could be causing the problem. This idea can be good up to a point. I am an aficionado of stones about steam engines, and here is a story from the book Muster Builders of Steam (Ref. 4). A class of new 3-cylinder 4-6-0 (four small pilot wheels in front of the drive wheels, six drive wheels, no little trailing wheels) steam engines had just been designed by British designer W. A. Stanier, and they were ". . . perfect stinkers. They simply would not steam." So the engines' designers made a list of all the things that could go wrong and a list of all the things that could not be at fault; they set the second list aside.
The designers specified changes to be made to each new engine in hopes of solving the problem: "Teething troubles bring modifications, and each engine can carry a different set of modifications." The manufacturing managers "shuddered as these modified drawings seemed to pour in from Derby (Ed: site of the design facility-the Drawing Office), continually upsetting progress in the works." (Lots of fun for the manufacturing guys, eh?) In the end, the problem took a long time to find because it was on the list of "things that couldn't go wrong." Allow me to quote the deliciously horrifying words from the text: "Teething troubles always present these two difficulties: that many of the clues are very subjective and that the 'confidence trick' applies. By the latter I mean when a certain factor is exonerated as trouble-free based on a sound premise, and everyone therefore looks elsewhere for the trouble: whereas in fact, the premise is not sound and the exonerated factor is guilty. In Stanier's case this factor was low super-heat. So convinced was he that a low degree of super-heat was adequate that the important change to increased superheater area was delayed far longer than necessary. There were some very sound men in the Experimental Section of the Derby Loco Drawing Office at that time, but they were young and their voice was only dimly heard.
Some of their quit painstaking superheater test results were disbelieved." But, of course, nothing like that ever happened to anybody you know--right?
Experts Have No Monopoly on Good Advice
Another thing you can do is ask advice only of "experts." After all, only an expert knows how to solve a difficult problem--right? Wrong! Sometimes, a major reason you can't find your problem is because you are too close to it-you are blinded by your familiarity. You may get excellent results by simply consulting one or two of your colleagues who are not as familiar with your design: they may make a good guess at a solution to your problem. Often a technician can make a wise (or lucky) guess as easily as can a savvy engineer. When that happens, be sure to remember who saved your neck. Some people are not just "lucky"-- they may have a real knack for solving tricky problems, for finding clues, and for deducing what is causing the trouble. Friends like these can be more valuable than gold.
At National Semiconductor, we usually submit a newly designed circuit layout to a review by our peers. I invite everybody to try to win a Beverage of Their Choice by catching a real mistake in my circuit. What we really call this, is a "Beercheck." It's fun because if I give away a few pitchers of brew, I get some of my dumb mistakes corrected--mistakes that I myself might not have found until a much--later, more-painful, and more-expensive stage. Furthermore, we all get some education. And, you can never predict who will find the little picky errors or the occasional real killer mistake. All technicians and engineers are invited.
Learn to Recognize Clues
There are four basic questions that you or I should ask when we are brought in to do troubleshooting on someone else's project:
Did it ever work right?
What are the symptoms that tell you it's not working right?
When did it start working badly or stop working?
What other symptoms showed up just before, just after, or at the same time as the failure?
FIG. 1. Peer review is often effective for wringing problems out of designs. Here, the author gets his comeuppance from colleagues who have spotted a problem because they are not as overly familiar with his circuit layout as he is.
As you can plainly see, the clues you get from the answers to these questions might easily solve the problem right away; if not, they may eventually get you out of the woods. So even if a failure occurs on your own project, you should ask these four questions-as explicitly as possible-of yourself or your technician or whoever was working on the project. For example, if your roommate called you to ask for a lift because the car had just quit in the middle of a freeway, you would ask whether anything else happened or if the car just died. If you're told that the headlights seemed to be getting dimmer and dimmer, that's a clue.
Ask Questions; Take Notes; Record Amount of Funny
When you ask these four questions, make sure to record the answers on paper--preferably in a notebook. As an old test manager I used to work with, Tom Milligan, used to tell his technicians, "When you are taking data, if you see something funny, Record Amount of Funny." That was such a significant piece of advice, we called it "Milligan's Law." A few significant notes can save you hours of work. Clues are where you find them; they should be saved and savored.
Ask not only these questions but also any other questions suggested by the answers. For example, a neophyte product engineer will sometimes come to see me with a batch of ICs that have a terrible yield at some particular test. I'll ask if the parts failed any other tests, and I'll hear that nobody knows because the tester doesn't continue to test a part after it detects a failure. A more experienced engineer would have already retested the devices in the RUN ALL TESTS mode, and that is exactly what I instruct the neophyte to do.
Likewise, if you are asking another person for advice, you should have all the facts laid out straight, at least in your head, so that you can be clear and not add to the confusion. I've worked with a few people who tell me one thing and a minute later start telling me the opposite. Nothing makes me lose my temper faster! Nobody can help you troubleshoot effectively if you aren't sure whether the circuit is running from +12 V or -12 V and you start making contradictory statements.
And, if I ask when the device started working badly, don't tell me, "At 3:25 PM." I'm looking for clues, such as, "About two minutes after I put it in the 125 C oven," or, "Just after I connected the 4 R load." So just as we can all learn a little more about troubleshooting, we can all learn to watch for the clues that are invaluable for fault diagnosis.
Methodical, Logical Plans Ease Troubleshooting
Even a simple problem with a resistive divider offers an opportunity to concoct an intelligent troubleshooting plan. Suppose you had a series string of 128 1 k-ohm resistors.
(See FIG. 2.) If you applied 5 V to the top of the string and 0 V to the bottom, you would expect the midpoint-of the string to be at 2.5 V. If it weren't 2.5 V but actually 0 V, you could start your troubleshooting by checking the voltage on each resistor, working down from the top, one by one. But that strategy would be absurd! Check the voltage at, say, resistor #E%, the resistor which is halfway up from the midpoint to the top. Then, depending on whether that test is high, low, or reasonable, try at #112 or #88 at 5/8 or 7/8 of the span-then at #120 or #104 or #88 or #72, branching along in a sort of binary search-that would be much more effective. With just a few trials (about seven) you could find where a resistor was broken open or shorted to ground. Such branching along would take a lot fewer than the 64 tests you would need to walk all the way down the string.
Further, if an op-amp circuit's output were pegged, you would normally check the circuit's op amp, resistors, or conductors. You wouldn't normally check the capacitors, unless you guessed that a shorted capacitor could cause the output to peg.
Conversely, if the op amp's V_out was a few dozen millivolts in error, you might start checking the resistors for their tolerances. You might not check for an open-circuited or wrong-value capacitor, unless you checked the circuit's output with a scope and discovered it oscillating!! So, in any circuit, you must study the data--your "clues"--until they lead you to the final test that reveals the true cause of your problem.
Thus, you should always first formulate a hypothesis and then invent a reasonable test or series of tests, the answers to which will help narrow down the possibilities of what is bad, and may in fact support your hypothesis. These tests should be performable. But you may define a test and then discover it is not performable or would be much too difficult to perform. Then I often think, "Well, if I could do that test, the answer would either come up 'good' or 'bad.' OK, so I can't easily run the test. But if I assume that I'd get one or the other of the answers, what would I do next to nail down the solution? Can I skip to the next test??"
For example, if I had to probe the first layer of metal on an IC with two layers of metal (because I had neglected to bring an important node up from the first metal to the second metal), I might do several other tests instead. I would do the other tests hoping that maybe I wouldn't have to do that probing, which is rather awkward even if I can "borrow" a laser to cut through all the layers of oxide. If I'm lucky, I may never have to go back and do that "very difficult or nearly impossible" test.
Of course, sometimes the actual result of a test is some completely unbelievable answer, nothing like the answers I expected. Then I have to reconsider--where were my assumptions wrong? Where was my thinking erroneous? Or, did I take my measurements correctly? Is my technician's data really valid? That's why troubleshooting is such a challenging business-almost never boring.
On the other hand, it would be foolish for you to plan everything and test nothing.
Because if you did that, you would surely plan some procedures that a quick test would show are unnecessary. That's what they call "paralysis by analysis." All things being equal, I would expect the planning and testing to require equal time. If the tests are very complicated and expensive, then the planning should be appropriately comprehensive. If the tests are simple, as in the case of the 128 resistors in series, you could make them up as you go along. For example, the list above of resistors #80, 112, 120, 104, 88, or 72 are nominally binary choices. You don't have to go to exactly those places-an approximate binary search would be just fine.
You Can Make Murphy's Law Work for You
Murphy's Law is quite likely to attack even our best designs: "If anything can go wrong, it will." But, I can make Murphy's Law work for me. For example, according to this interpretation of Murphy's Law, if I drive around with a fire extinguisher, if I am prepared to put out any fire--will that make sure that I never have a fire in my car? When you first hear it, the idea sounds dumb. But, if I'm the kind of meticulous person who carries a fire extinguisher, I may also be neat and refuse to do the dumb things that permit fires to start.
Similarly, when designing a circuit I leave extra safety margins in areas where I cannot surely predict how the circuit will perform. When I design a breadboard, I often tell the technician, "Leave 20% extra space for this section because I'm not sure that it will work without modifications. And, please leave extra space around this resistor and this capacitor because I might have to change those values." When I design an IC, I leave little pads of metal at strategic points on the chip's surface, so that I can probe the critical nodes as easily as possible. To facilitate probing when working with 2-layer metal, I bring nodes up from the first metal through vias to the second metal. Sometimes I leave holes in my Vapox passivation to facilitate probing dice. The subject of testability has often been addressed for large digital circuits, but the underlying ideas of Design For Testability are important regardless of the type of circuit you are designing. You can avoid a lot of trouble by thinking about what can go wrong and how to keep it from going wrong before the ensuing problems lunge at you. By planning for every possibility, you can profit from your awareness of Murphy's Law. Now, clearly, you won't think of every possibility. (Remember, it was something that couldn't go wrong that caused the problems with Stanier's locomotives.) But, a little forethought can certainly minimize the number of problems you have to deal with.
Consider Appointing a Czar for a Problem Area
A few years ago we had so many nagging little troubles with band-gap reference circuits at National, that I decided (unilaterally) to declare myself "Czar of Band Gaps." The main rules were that all successful band-gap circuits should be registered with the Czar so that we could keep a log book of successful circuits; all unsuccessful circuits, their reasons for failure, and the fixes for the failures should likewise be logged in with the Czar so that we could avoid repeating old mistakes; and all new circuits should be submitted to the Czar to allow him to spot any old errors. So far, we think we've found and fixed over 50% of the possible errors, before the wafers were fabricated, and we're gaining. In addition, we have added Czars for start-up circuits and for trim circuits, and a Czarina for data-sheet changes, and we are considering other czardoms. It's a bit of a game, but it's also a serious business to use a game to try to prevent expensive errors.
I haven't always been a good troubleshooter, but my "baptism of fire" occurred quite a few years ago. I had designed a group of modular data converters. We had to ship 525 of them, and some foolish person had bought only 535 PC (printed circuit) boards. When less than half of the units worked, I found myself in the troubleshooting business because nobody else could imagine how to repair them. I discovered that I needed my best-triggering scope and my best DVM. I burned a lot of midnight oil. I got half-a-dozen copies each of the schematic and of the board layout. I scribbled notes on them of what the DC voltages ought to be, what the correct AC waveforms looked like, and where I could best probe the key waveforms. I made little lists of, "If this frequency is twice as fast as normal, look for 417 to be damaged, but if the frequency is low, look for a short on bus B."
I learned where to look for solder shorts, hairline opens, cold-soldered joints, and intermittent. I diagnosed the problems and sent each unit back for repair with a neat label of what to change. When they came back, did they work? Some did-and some still had another level or two of problems. That's the Onion Syndrome: You peel off one layer, and you cry; you peel off another layer, and you cry some more. . . . By the time I was done, I had fixed all but four of the units, and I had gotten myself one hell of a good education in troubleshooting.
After I found a spot of trouble, what did I do about it? First of all, I made some notes to make sure that the problem really was fixed when the offending part was changed. Then I sent the units to a good, neat technician who did precise repair work-much better than a slob like me would do. Lastly, I sent memos to the manufacturing and QC departments to make sure that the types of parts that had proven troublesome were not used again, and I confirmed the changes with ECOs (Engineering Change Orders). It is important to get the paperwork scrupulously correct, or the alligators will surely circle back to vex you again.
Sloppy Documentation Can End in Chapter 11
I once heard of a similar situation where an insidious problem was causing nasty reliability problems with a batch of modules. The technician had struggled to find the solution for several days. Finally, when the technician went out for lunch, the design engineer went to work on the problem. When the technician came back from lunch, the engineer told him, I found the problem; it's a mismatch between 417 and R18.
Write up the ECO, and when I get back from lunch I'll sign it. Unfortunately, the good rapport between the engineer and the technician broke down: there was some miscommunication. The technician got confused and wrote up the ECO with an incorrect version of what should be changed. When the engineer came back from lunch, he initialed the ECO without really reading it and left for a two-week vacation.
When he came back, the modules had all been "fixed," potted, and shipped, and were starting to fail out in the field. A check of the ECO revealed the mistake-too late. The company went bankrupt. It's a true story and a painful one. Don't get sloppy with your paperwork; don't let it happen to you.
One of the reasons you do troubleshooting is because you may be required to do a Failure Analysis on the failure. That's just another kind of paperwork. Writing a report is not always fun, but sometimes it helps clarify and crystallize your understanding of the problem. Maybe if a customer had forced my engineer friend to write exactly what happened and what he proposed to do about it, that disaster would not have occurred. When I have nailed down my little problem, I usually write down a scribbled quick report. One copy often goes to my boss, because he is curious why it's been taking me so long. I usually give a copy to friends who are working on similar projects. Sometimes I hang a copy on the wall, to warn all my friends. Sometimes I send a copy to the manufacturer of a component that was involved. If you communicate properly, you can work to avoid similar problems in the future.
Then there are other things you can do in the course of your investigation. When you find a bad component, don't just throw it in a wastebasket. Sometimes people call me and say, "Your ICs have been giving me this failure problem for quite a while." I ask, "Can you send me some of the allegedly bad parts?' And they reply, "Naw, we always throw them in the wastebasket . . ." Please don't do that, because often the ability to troubleshoot a component depends on having several of them to study. Sometimes it's even a case of "NTF"--"No Trouble Found." That happens more after than not. So if you tell me, "Pease, your lousy op amps are failing in my circuit," and there's actually nothing wrong with the op amps, but it's really a misapplication problem--I can't help you very well if the parts all went in the trash. Please save them, at least for a while. Label them, too.
Another thing you can do with these bad parts is to open them up and see what you can see inside. Sometimes on a metal-can IC, after a few minutes with a hacksaw, it's just as plain as day. For example, your technician says, "This op amp failed, all by itself, and I was just sitting there, watching it, not doing anything." But when you look inside, one of the input's lead-bond wires has blown out, evaporated, and in the usage circuit, there are only a couple 10 k-ohm resistors connected to it. Well, you can't blow a lead bond with less than 300 mA. Something must have bumped against that input lead and shorted it to a source that could supply half an ampere. There are many cases where looking inside the part is very educational. When a capacitor fails, or a trim-pot, I get my hammer and pliers and cutters and hack-saw and look inside just to see how nicely it was (or wasn't) built. To see if I can spot a failure mechanism-or a bad design. I'm just curious. But sometimes I learn a lot.
Now, when I have finished my inspection, and I am still mad as hell because I have wasted a lot of time being fooled by a bad component-what do I do? I usually WIDLARIZE it, and it makes me feel a lot better. How do you WIDLARIZE something? You take it over to the anvil part of the vice, and you beat on it with a hammer, until it is all crunched down to tiny little pieces, so small that you don't even have to sweep it off the floor. It sure makes you feel better. And you know that that component will never vex you again. That's not a joke, because sometimes if you have a bad pot or a bad capacitor, and you just set it aside, a few months later you find it slipped back into your new circuit and is wasting your time again. When you WIDLARIZE something, that is not going to happen. And the late Bob Widlar is the guy who showed me how to do it.
Troubleshooting by Phone--A Tough Challenge
These days, I do quite a bit of troubleshooting by telephone. When my phone rings, I never know if a customer will be asking for simple information or submitting a routine application problem, a tough problem, or an insoluble problem. Often I can give advice just off the top of my head because I know how to fix what is wrong. At other times, I have to study for a while before I call back. Sometimes, the circuit is so complicated that I tell the customer to mail or transmit the schematic to me. On rare occasions, the situation is so hard to analyze that I tell the customer to put the circuit in a box with the schematic and a list of the symptoms and ship it to me. Or, if the guy is working just a few miles up the road, I will sometimes drop in on my way home, to look at the actual problem.
out and I have to guess what situation caused the overstress. Here's an example: In Sometimes the problem is just a misapplication. Sometimes parts have been blown June, a manufacturer of dental equipment complained of an unacceptable failure rate on LM317 regulators. After a good deal of discussion, I asked, "Where did these failures occur?" Answer: North Dakota. "When did they start to occur?" Answer: In February. I put two and two together and realized that the climate in a dentist's office in North Dakota in February is about as dry as it can be, and is conducive to very high electrostatic potentials. The LM317 is normally safe against electrostatic discharges as high as 3 or 4 kV, but walking across a carpeted floor in North Dakota in February can generate much higher voltages than that. To make matters worse, the speed-control rheostat for this dental instrument was right out in the handle. The wiper and one end of the rheostat were wired directly to the LM317's ADJUST pin; the other end of the rheostat was connected to ground by way of a 1 k-ohm resistor located back in the main assembly (see FIG. 3). The speed-control rheostat was just wired up to act as a lightning rod that conducted the ESD energy right into the ADJUST pin.
The problem was easily solved by rewiring the resistor in series with the IC's ADJUST pin. By swapping the wires and connecting the rheostat wiper to ground (see FIG. 4), much less current would take the path to the ADJUST pin and the diffused resistors on the chip would not be damaged or zapped by the current surges. Of course, adding a small capacitor from the ADJ pin to ground would have done just as well, but some customers find it easier to justify moving a component than adding one . . . . A similar situation occurs when you get a complaint from Boston in June, "Your op amps don't meet spec for bias current." The solution is surprisingly simple: Usually a good scrub with soap and water works better than any other solvent to clean off the residual contaminants that cause leakage under humid conditions.
(Fingerprints, for example. . .) Refer to Section 5 for notes on how a dishwasher can clean up a leaky PC board-r a leaky, dirty IC package.
[2. If you don't think troubleshooting of cars can be entertaining, tune in Car Talk with Tom and Ray Magliozzi. Ask your local National Public Radio station for the broadcast time . . . GOOD STUFF! ]
When Computers Replace Troubleshooters, Look Out
Now, let's think--what needs troubleshooting? Circuits? Television receivers?
Cars? People? Surely doctors have a lot of troubleshooting to -- they listen to symptoms and try to figure out the solution. What is the natural temptation? To let a computer do all the work! After all, a computer is quite good at listening to complaints and symptoms, asking wise questions, and proposing a wise diagnosis. Such a computer system is sometimes called an Expert System-part of the general field of Artificial Intelligence. But, I am still in favor of genuine intelligence. Conversely, people who rely on Artificial Intelligence are able to solve some kinds of problems, but you can never be sure if they can accommodate every kind of Genuine Stupidity as well as Artificial Stupidity. (That is the kind that is made up especially to prove that Artificial Intelligence works just great.)
I won't argue that the computer isn't a natural for this job; it will probably be cost effective, and it won't be absent-minded. But, I am definitely nervous because if computers do all the routine work, soon there will be nobody left to do the thinking when the computer gives up and admits it is stumped. I sure hope we don't let the computers leave the smart troubleshooting people without jobs, whether the object is circuits or people.
My concern is shared by Dr. Nicholas Lembo, the author of a study on how physicians make diagnoses, which was published in the New England Journal of Medicine. He recently told the Los Angeles Times, "With the advent of all the new technology, physicians aren't all that much interested (in bedside medicine) because they can order a $300 to $400 test to tell them something they could have found by listening." An editorial accompanying the study commented sadly: "The present trend. . . may soon leave us with a whole new generation of young physicians who have no confidence in their own ability to make worthwhile bedside diagnoses." Troubleshooting is still an art, and it is important to encourage those artists.
The Computer Is Your Helper… and Friend... ???
I read in the San Francisco Chronicle (Ref. 5) about a case when SAS, the Scandinavian airline, implemented an "Expert System" for its mechanics: "Management knew something was wrong when the quality of the work started decreasing. It found the system was so highly mechanized that mechanics never questioned its judgment. So the mechanics got involved in its redesign. They made more decisions on the shop floor and used the computer to augment those decisions, increasing productivity and cutting down on errors. 'A computer can never take over everything,' said one mechanic. 'Now there are greater demands on my judgment, (my job) is more interesting."' What can I add? Just be thoughtful. Be careful about letting the computers take over.
No Problems?? No Problem...Just Wait...
Now, let's skip ahead and presume we have all the necessary tools and the right receptive attitude. What else do we need? What is the last missing ingredient? That reminds me of the little girl in Sunday School who was asked what you have to do to obtain forgiveness of sin. She shyly replied, "First you have to sin." So, to do troubleshooting, first you have to have some trouble. But, that's usually not a problem; just wait a few hours, and you'll have plenty. Murphy's Law implies that if you are not prepared for trouble, you will get a lot of it. Conversely, if you have done all your homework, you may avoid most of the possible trouble.
I've tried to give you some insights on the philosophy of how to troubleshoot.
Don't believe that you can get help on a given problem from only one specific person. In any particular case, you can't predict who might provide the solution.
Conversely, when your buddy is in trouble and needs help, give it a try--you could turn out to be a hero. And, even if you don't guess correctly, when you do find out what the solution is, you'll have added another tool to your bag of tricks.
When you have problems, try to think about the right plan to attack and nail down the problem. When you have intermittent problems-those are the nastiest types--we even have some advice for that case. (It's cleverly hidden in Section 12.) So, if you do your "philosophy homework," it may make life easier and better for you.
You'll be able not only to solve problems, but maybe even to avoid problems. That sounds like a good idea to me!
1. Dostal, Jiri, Operational Amplifiers, Elsevier Scientific, The Netherlands, 198 1; also, Elsevier Scientific, Inc., 655 Avenue of the Americas, NY, NY 10010. (212) 989-5800 ($1 13 in 1990).
2. Smith, John I., Modern Operational Circuit Design, John Wiley & Sons, New York, NY, 1971.
3. Data Converter Handbook, Analog Devices Cop., P.O. Box 9106, Norwood MA 02062, 1984.
4. Bulleid, H. A. V., Muster Builders of Steam, Ian Allan Ltd., London, UK, 1963, pp. 146-147.
5. Caruso, Denise, "Technology designed by its users," The San Francisco Examiner, p. E15. Sunday, March 18, 1990.