Diagnose and Repair Modern Electronics: A DIY Guide: Philosophy of Troubleshooting (part 1)

Home | Articles | Forum | Glossary | Books




Imagine if your doctor saw you as a collection of organs, nerves and bones, never considering the synergistic result of their working together, supplying each other with the chemicals and signals necessary for life. No organ could survive on its own, but together they make a living, breathing, occasionally snoring you! Now consider how tough it'd be to solve a murder case without considering the motives, personalities and circumstances of the victim and all potential suspects. The knife is right there next to the body, but anybody on earth could have done the crime. Why was the victim killed? Who knew him? Who might have wanted him dead? Troubleshooting, which involves skills somewhat like those of doctors and detectives, is a lot like that. You can think of an electronic device as a bunch of transistors, chips and capacitors stuffed into a box, and sometimes that's enough to find simple failures. Taking such a myopic view, though, limits you to being a mediocre technician, one who will be stumped when the problem isn't obvious. To be a top-notch tech requires consideration of the bigger picture. Who made this product, and what were the design goals? How is it supposed to work? How do various sections interact, and what is the likely result of a failure of one area on another? Machines are systems. Being built by humans, they naturally reflect our biological origins, with cameras for eyes, microphones for ears, speakers for larynxes and microprocessors for brains. Even the names of many parts sound like us: tape recorders, hard drives, and optical disc players have heads, turntables have arms, chips have legs and picture tubes have necks. Some products even exhibit personalities, or at least it feels that way to us. Their features and quirks can be irritating, humorous or soothing. Their failures are much like our own, too, with symptoms that may be far removed from what's causing them, thanks to some obscure interaction that nobody, not even the circuit's designer, could have foreseen.

The more you come to understand how devices work at the macro level, the more sense their problems will make. The more you can consider products as metal and silicon expressions of human thinking, the better sleuthing skills you will attain.

Before we get to the nitty-gritty of transistors, current flow and signals, let's put on our philosophers' hats and become the Socrates of circuitry, the Erasmus of electron. Let's look at why products work and why they don't, and how to avoid some of the common pitfalls developing techs encounter. Let's become one with the machines.

Why Things Work in the First Place

When you get a few thousand parts together and apply power to them, they can interact in many ways. The designing engineer had one particular way in mind, but that doesn't mean the confounded conglomeration of components will cooperate! Analog circuitry has a wider range of variation in its behaviors than does digital, but even today's all-digital gear can be surprisingly inconsistent. I've witnessed two identical laptop computers running exactly the same software, with exactly the same settings, but drawing significantly different amounts of current from their power supplies. I've also seen all kinds of minor variations in color quality between identical digital still and video cameras. I remember a ham radio transceiver whose digital control system exhibited a bizarre, obscure behavior in its memory storage operation that no other radio of that model was reported to have, and I never found any bad parts that might explain the symptom. I finally had to modify the radio to get it to work like all the others.

Sure, you string a few gates together and you will be able to predict their every state. Get a few thousand or more going, run them millions of times per second, and mysterious behaviors may start to crop up.

It's useful to think of all circuitry as a collection of resistors impeding the passage of current from the power supply terminal to circuit ground. As the current trickles through them, it’s used to do work, be it switching the gates in a microprocessor, generating laser light for a disc player, or spinning the disc. Electrons, though, are little devils that will go anywhere they can. If there's a path, they'll find it.

Malfunctions can be considered either as paths that shouldn't be there or a lack of paths that should.

In essence, when machines work properly, it's because they have no choice. The designer has carefully considered all the possible paths and correctly engineered the circuit to keep those pesky electrons moving along only where and when they should, locking out all possible behaviors except the desired one. When choice arises, through, failing components, user-inflicted damage or design errors, the electrons go on a spree like college students at spring break, and the unit lands on your workbench.

Products as Art

A machine is an extension of its designer much as a concerto is an extension of its composer. Beethoven sounds like Beethoven, and never like Rachmaninoff, because Ludwig's bag of tricks and way of thinking were uniquely his, right? It's much the same with products. In this case, however, they tend to have unifying characteristics more reflective of their manufacturing companies than of a specific person. Still, I suspect that an individual engineer's or manager's viewpoints and preferences set the standard, good or bad, which lives on in a company's product line long after that employee's retirement.

Understanding that companies have divergent design philosophies and quirks may help your repair work, because you can keep an eye on issues that tend to crop up in different manufacturers' machines. You may notice that digital cameras from one maker have a high rate of imaging chip failures, so you'll go looking for that instead of some other related problem when a troublesome case hits your bench. Or perhaps you've found that tape-type camcorders from a particular company often have mechanical loading problems because that manufacturer uses loading arms and other metal structures in the tape transport that are too thin, so they bend.

When you've fixed enough products, you'll begin to recognize what company made a machine just by looking at its circuit board or mechanical sections. The layouts, the styles of capacitors, the connectors, and even the overall look of the copper traces on a board are different and consistent enough to be dead giveaways.

If It Only Had a Brain

Continuing our anatomical analogy, yesterday's tech product was like a zombie. Perhaps it had an ear (microphone), some memory (recording tape) and a mouth (speaker). Each system did its simple job, with support from a stomach (power supply) and some muscles (motors, amplifiers).

What was missing was a brain. Today's gear is cranium-heavy, laden with computing power. Gone are simple mechanical linkages to control sequencing and movement of mechanisms. Instead, individual actuators move parts in a sequence determined by software, positional information gets fed back to the microprocessor, and malfunctions might originate in the mechanics, the sensors, the software, or some subtle interaction of those elements. No longer are there potentiometers (variable resistors) to set volume or brightness; buttons signal the brain to change the parameters. Heck, most gadgets today don't even have "hard" on/off switches that actually disconnect power from the circuitry. Instead, the power button does nothing more than send a signal to the microprocessor, requesting it to energize or shut down the product's circuitry.

In addition to the brain, many modern products have nervous systems consisting of intermediary chips and transistors to decode the micro's commands and fan them out to the various muscles, and organs doing the actual work. Failures in these areas can be tough to trace, because their incoming signals from the computer chip are dependent on tricky timing relationships between various signal lines. This is a profound shift from the old way of building devices, and it adds new layers of complication to repair work. Is the circuit not working due to its own malfunction, or is it playing dead because the micro didn't wake it up?

Today's machines are complete electro-beings with pretty complex heads on their shoulders. Some offer updatable software, while many have the coding hardwired into their chips. Which would you like to be today: surgeon or psychiatrist?

The Good, the Bad and the Sloppy

It's easy for an experienced tech to tell when a repair attempt has been made by an unqualified person. The screws will be stripped, or there will be poorly soldered joints with splashes of dripped solder lying across pads on the board. Wires may be spliced with no solder and, perhaps, covered in cellophane tape, if at all. Adjustments will be turned, insulation melted, and so on. In a word: sloppiness.

That might sound exaggerated, but I used to run into it a lot when I worked in repair facilities. Most shops have policies of refusing to work on items mangled by amateurs, so discovery of obvious, inept tampering was followed by a phone call to the item's owner, who would stubbornly insist that the unit had never been apart and had simply quit working. Um, right, Sony used Scotch tape to join unsoldered wires.

Sure, buddy. I remember one incident in which I refused to repair a badly damaged and obviously tampered-with shortwave radio. The owner was so angry that he called my boss and tried to have me fired! The boss took one look inside the set, clapped me on the back, laughed, and told the guy to come pick up his ruined radio and go away. Don'tcha wish all bosses were that great? The key to performing a proper, professional-quality repair job is meticulous attention to detail. Think of yourself as a surgeon, for that's exactly what you are. You are about to open up the body of this mechanical "organism" and attempt to right its ills. As the medical saying goes, "First, do no harm." Now and then, repair jobs go awry and machines get ruined-it happens even to the best techs, though rarely-but your aim is to get in and back out as cleanly as possible. In Sections 9 through 13, we'll explore the steps and techniques required for proper disassembly, repair and reassembly.

Mistakes Beginners Make

Beyond sloppy work, beginners tend to make a few conceptual errors, leading to lots of lost time, internal damage to products, and failure to find and fix the problem. Here are some common quagmires to avoid.

Adjusting to Cover the Real Trouble

Analog devices often have adjustments to keep their circuit stages producing signals with the characteristics required for the other stages to do their jobs properly. TVs and radios are full of trimpots (variable resistors), trimcaps (variable capacitors) and tunable coils, and their interactions can be quite complex. With today's overwhelmingly digital circuits, adjustments are much less common. Many are performed in software with special programming devices to which you won't have access, but some good-old fashioned screwdriver-adjustable parts still exist. Power supplies usually have voltage adjustments, for instance, and earlier-generation CD players were loaded with servo adjustments to keep the laser beam properly focused and centered on the track. Even a digital media receiver may have tunable stages in its radio sections.

It can be very tempting to twiddle with adjustments in the hope that the device will return to normal operation. While it's true that circuits do go out of alignment-if they didn't, the controls wouldn't be there in the first place-that is a gradual process. It never causes drastic changes in performance. If the unit suddenly won't do something it did fine the day before, it's not out of adjustment, it's broken. Messing with the adjustments will only get you into trouble later on when you find the real problem, and now the machine really is way out of alignment, because you made it that way. Leave those internal controls alone! Turn them only when you're certain everything else is working, and then only if you know precisely what they do and have a sure way to put them back the way they were, just in case you're wrong. Marking the positions of trimpots and trimcaps with a felt-tip marker before you turn them can help, but it's no guarantee you will be able to reset a control exactly to its original position. There's too much mechanical play in them for that technique to be reliable. In some cases, close is good enough. In others, slight mis-adjustments can seriously degrade circuit performance.

I once worked on a pair of infrared cordless headphones with a weak, distorted right channel. After some testing, it was clear that the transmitter was the culprit, and its oscillator for that channel had drifted off frequency. A quick adjustment and, sure enough, the headphones worked fine for a little while. Then the symptom returned.

The real problem: a voltage regulator that was drifting with temperature. Luckily, readjusting the oscillator was easy after the new part was installed. When multiple adjustments have been made, it can be exceedingly difficult to get them back in proper balance with each other.

Making the Data Fit the Theory

Most techs have been guilty of this at some time. In my early years, mea culpa, that's for sure. You look at the symptoms, and they seem to point to a clear diagnosis-all except for one. You fixate on those that make sense, convince yourself that they add up, and do your best to ignore that anomaly, hoping it's not significant. Trust me, it is, and you are about to embark on a long, frustrating hunting expedition leading to a dreary dead end. Always keep this in mind: If a puzzle won't fit together, there's a piece missing! There's something you don't know, and that is what you should be chasing.

Often, the anomaly you're pushing aside is the real clue, and overlooking it’s the worst mistake you can make. Many maddening hours later, when you finally do solve the mystery, you'll think to yourself, "Why didn't I consider how that odd symptom might be the key to the whole thing? It was right in front of me from the start!" Ah, hindsight…. Nobody needs glasses for that.

Going Around in Circles

Sometimes you think you've found the problem, but trying to solve it creates new problems, so you go after those. Those lead to still more odd circuit behavior, so off you go, around and around until you're right back where you started. When addressing symptoms creates more symptoms, take it as a strong hint that you are on the wrong track. It's incredibly rare for multiple, unrelated breakdowns to occur.

Almost always, there is one root cause of all the strangeness, and it'll make total sense once you find it. "Oh, the power supply voltage was too low, and that's why the focus wouldn't lock and the sled motor wouldn't make the laser head go looking for the track." If you're lucky, you'll have discovered that before you've spent hours fiddling with the limit switches and the control circuitry, tracing signals back to the microprocessor. Again, if the puzzle won't fit together, find that missing piece!

That's How It Goes

As with illness in the human body, just about anything can go wrong with an electronic device. Problems range from the obvious to the obscure; I've fixed machines in 5 minutes, and I've run across some oddball cases for which a diagnosis of demonic possession seemed appropriate! These digital days, circuitry is much more reliable than in the old analog age, yet modern gear often has a much shorter life span. How can both of those statements be true? Today's products are of tremendously greater complexity, with lots of components, interconnections and interactions, so there's more to go wrong. Unlike the hand soldered boards filled with a wide variety of component types we used to have, today's small-signal boards, with their rows of surface-mounted, machine-soldered chips, don't fail that often. But with so much more going on, they include complicated power supplies and a multitude of connectors and ribbon cables. Plus, some parts work much harder than they used to and wear out or fail catastrophically from the stress. And thanks to the rapid pace of technological change, the competition to produce products at bare-bones prices and the high cost of repair versus replacement, extended longevity is not the design goal it once was. Manufacturers figure you'll want to buy a new, more advanced gadget in a couple of years anyway. Contrary to popular myth, nobody deliberately builds things to break. They don't have to; keeping affordable products working for long periods is tough enough. Keeping expensive items functioning isn't easy either! Laptop computers, some of the costliest gadgets around, are also some of the most failure-prone, because they're very complex and densely packed, and they produce plenty of heat.

It may seem like electronic breakdowns are pretty random. Some part blows for reasons no one can fathom, and the unit just quits. That does happen, but it's not common. Oh, sure, when you make millions of chips, capacitors and transistors, a small number of flawed ones will slip through quality control, no matter how much testing you do. It's a tiny percentage, though. Much more often, products fail in a somewhat predictable pattern, with a cascading series of events stemming from well recognized weaknesses inherent in certain types of components and construction techniques. In other words, nothing is perfect! Let's look at the factors behind most product failures.

Infant Mortality

This rather unpleasant term refers to that percentage of units destined to stop working very soon after being put into service. Imperfect solder joints, molecular-level flaws in semiconductors and design errors cause most of these. While many products are tested after construction, cost and time constraints prohibit extensive "burning in" of all but very expensive machines. Typical infant mortality cases crop up within a week or two of purchase and land in a warranty repair center after being returned for exchange. So, you may never see one unless you bought something from halfway around the world, and it's not worth the expense and trouble to return it. Or perhaps the seller refuses to accept it back, and you get stuck with a brand-new, dead device you want to resurrect.

Mechanical Wear

By far, moving parts break down more often than do electronic components. Hard drives, VCR and camcorder mechanisms, disc trays, laser head sleds and disc-spinning motors are all huge sources of trouble.

Bearings wear out, lubrication dries up, rubber belts stretch, leaf switches (internal position-sensing switches) bend, nylon gears split, pet hairs bind motor shafts, and good old wear and tear grind down just about anything that rubs or presses against anything else. If a device has moving parts and it turns on but doesn't work properly, look at those first before assuming the electronics behind them are faulty. For every transistor you will change, you'll fix five mechanical problems.


Fgr. 1 Leaf switch

Connections

Connections are also mechanical, and they go bad very, very often. Suspect any connection in which contacts are pressed against each other without being soldered.

That category includes switches, relays, plugs, sockets, and ribbon cables and connectors.

The primary culprit is corrosion of the contacts, caused by age and sometimes, in the case of switches and relays, sparking when the contacts are opened and closed.

Also, a type of lubricating grease used by some manufacturers on leaf switches tends to dry out over time and become an effective insulator. If the contact points on a leaf switch are black, it's a good bet they are coated with this stuff and are not passing any current when the switch closes. See Fgr. 1.

A particularly nasty type of bad connection occurs in multilayer printed circuit boards. At one time, a dual-layer board, with traces on both sides, was an exotic construct employed only in the highest-end products. Today, dual-layer boards are pretty much standard in larger, simpler devices, while smaller, more complex gadgets may utilize as many as six layers! The problems crop up in the connections between layers. Those connections are constructed differently by the various manufacturers. The best, most reliable style is with plated-through holes, in which copper plating joins the layers. As boards have shrunk, plated-through construction has gotten more difficult, resulting in a newer technique that is, alas, far less reliable: holes filled with conductive glue. This type of interconnect is recognizable by a raised bump at the connection point that looks like, well, a blob of glue (see the translucent glue over the holes in Fgr. 2). Conductive glue can fail from flexure of the board, excessive current and repeated temperature swings. Repairing bad glue interconnects is hard, too. I always cringe when I see those little blobs.


Fgr. 2 Conductive glue interconnects.

Solder Joints

Though they're supposed to be molecularly bonded and should last indefinitely, solder joints frequently fail and develop resistance, impeding or stopping the current.

When it happens in small-signal, cool-running circuitry, it's usually the fault of a flaw in the manufacturing process, even if it takes years to show up. Heat-generating components like output transistors, voltage regulators and video processing chips on computer motherboards can run hot enough to degrade their solder joints gradually without getting up to a temperature high enough to actually melt them. Over time, the damage gets done and the joints become resistive or intermittent.

Many bad solder joints are visually identifiable by their dull, mottled or cracked appearance. Now and then, though, you'll find one that looks perfect but still doesn't work, because the incomplete molecular bonding lies beneath the surface. Bonding may be poor due to corrosion on the lead or pad of the soldered components; solder just won't flow into corroded or oxidized metal. When you go to re-solder it, you'll have problems getting a good joint unless you scrape things clean first, after removing the old solder.

Heat Stress

Heat is the enemy of electronics. It's not an issue with most pocket-sized gadgets, but larger items like video projectors, TVs and audio amplifiers often fail from excessive heating. So do backlight inverters (the circuits that light the fluorescent lamps behind LCD screens) and computer motherboards. Power supplies create a fair amount of heat and are especially prone to dying from it.

Overheating from excessive current due to a shorted component can quickly destroy semiconductors and resistors, but normal heat generated by using a properly functioning product can also gradually degrade electrolytic capacitors, those big ones used as power supply filtering elements, until they lose most of their capacitance.

Electrical Stress

Running a device on too high a voltage can damage it in many ways. The unit's voltage regulator may overheat from dissipating all the extra power, especially if it's a linear regulator. Electrolytic capacitors can short out from being run too close to, or over, their voltage limits. Semiconductors with inherent voltage requirements may die very quickly.

Overvoltage can be applied by using the wrong AC adapter, a malfunctioning adapter, a bad voltage regulator, or using alkaline batteries in a device made for operation only with nickel-metal hydride (NiMH) rechargeable cells. Those cells produce 1.2 volts each, compared to the 1.5 volts of alkalines. So, with four cells, you get 6 volts with the alks, compared to the 5 volts the device expects. Most circuits can handle that, but some can't.

Believe it or not, a few products can be damaged by too little voltage. Devices with switching power supplies or regulators compensate for the lower voltage by pushing more current through their transformers with wider pulses, to keep the output voltage at its required level. That can cause overheating of the rectifiers and other parts converting the pulses back to regulated DC.

The ultimate electrical stress is a lightning strike. A direct strike, as may occur to a TV or radio with an outdoor antenna that gets zapped, or from a hit to the AC line, will probably result in complete destruction of the product. Now and then, only one section is destroyed and the rest survives, but don't bet on it. Lightning cases tend to be write-offs; you don't even want their remains in your stack of old boards, lest their surviving parts have internal damage limiting their life spans.

Power surges, in which the AC line's voltage rises to high levels only momentarily, can do plenty of damage. Such surges are sometimes the result of utility company errors, but more often lightning has struck nearby and induced the surge without actually hitting the line, or it has hit the line far away. Often, the power supply section of the product is badly damaged but the rest of the unit is unharmed.

When too much current passes through components, they overheat and can burn out, sometimes literally. Resistors get reduced to little shards of carbon, and transistors can exhibit cracks in their plastic cases. The innards, of course, are wiped out. This kind of stress rarely occurs from outside, because you can't force current through a circuit; that takes voltage. When overcurrent occurs, it's because some other component is shorting to ground, pulling excessive current through whatever is connected in series with it.

Nothing kills solid-state circuitry quite as fast as reversed polarity. Many semiconductors, and especially IC chips, can't handle current going the wrong way for more than a fraction of a second.

Batteries can be installed backward. Back when 9-volt batteries were the power source of choice for pocket gadgets, all it took was to touch the battery to the clip with the male and female contacts the wrong way around and the power switch turned on. Now that AAA cells and proprietary rechargeable batteries run our diminutive delights, that kind of error occurs less often, because it's routine for designers to shape battery compartments to prevent reversed contacts from touching, but it still happens on occasion.

By far, the most frequent cause of reversed polarity is an attempt to power a device from the wrong AC adapter. Today, most AC adapters connect positive to the center of their coaxial DC power plugs and negative to the outside, so that an automotive cigarette lighter adapter made for the same gadget doesn't present the risk of having positive come in contact with the metal car body, which would cause a short and blow the car's fuse. At one time, though, many adapters had negative on the center instead, and a few still do on items like answering machines, which will never be used in cars. Even from the same manufacturer, both schemes may be employed on their various products.

The train wreck occurs when the user plugs in the wrong adapter, and it happens to have the plug wired opposite to what the device wants. Damage may be limited only to a few parts in the power supply section, or it can be extreme, taking out critical components like microprocessors and display drivers.

Not all electrical stress is caused by external factors or random component failures.

Sometimes design errors are inherent in a product, and their resulting malfunctions don't start showing up until many units are in the field for awhile. When a manufacturer begins getting lots of warranty repair claims for the same failure, the alarm bells go off, and a respectable company issues an ECO, or engineering change order, to amend the design. Units brought in for repair get updated parts, correcting the problem. A really diligent manufacturer will extend free ECO repairs beyond the warranty period if it's clear that the design fault is bad enough to render all or most of the machines in the field inoperative, or if any danger to the user could be involved.

At least that's how it's supposed to work. Sometimes companies don't want to spend the money to fix their mistakes, so they simply deny the problem. Or, if only some machines exhibit the symptom, they're treated as random failures, even though they're not. Perhaps it takes a certain kind of use or sequence of operations for the issue to become evident, and the manufacturer genuinely believes the design is sound. And some units aren't used often enough to have experienced the failure, though they will eventually, masking its ultimate ubiquity.

Any of these situations can result in your working on a product with a problem that will recur, perhaps months later, after you've properly solved it. If the thing keeps coming back with the same issue, suspect a defective design.

cont. to part 2 >>

Top of Page

PREV.   NEXT   HOME