|Home | Articles | Forum | Glossary | Books|
21. FAILURE MODES AND EFFECTS ANALYSIS
The purpose of the FMEA is to identify potential hardware deficiencies including undetectable failure modes and single point failures. This is done by a thorough, systematic, and documented analysis of the ways in which a system can fail, the causes for each failure mode, and the effects of each failure. Its primary objective is the identification of catastrophic and critical failure possibilities so that they can be eliminated or minimized through design change. The FMEA results may be either qualitative or quantitative, although most practitioners attempt to quantify the results.
In FMEA, each component in the system is assumed to fail catastrophically in one of several failure modes and the impact on system performance is assessed.
That is, each potential failure studied is considered to be the only failure in the system, i.e., a single point failure. Some components are considered critical be cause their failure leads to system failure or an unsafe condition. Other components will not cause system failure because the system is designed to be tolerant of the failures. If the failure rates are known for the specific component failure modes, then the probability of system malfunction or failure can be estimated.
The design may then be modified to make it more tolerant of the most critical component failure modes and thus make it more reliable. The FMEA is also useful in providing information for diagnostic testing of the system because it produces a list of the component failures that can cause a system malfunction.
The FMEA, as mentioned, can be a useful tool for assessing designs, developing robust products, and guiding reliability improvements. However, it is time consuming, particularly when the system includes a large number of components.
Frequently it does not consider component degradation and its impact on system performance. This leads to the use of a modified FMEA approach in which only failures of the high risk or critical components are considered, resulting in a simpler analysis involving a small number of components. It is recommended that the FMEA include component degradation as well as catastrophic failures.
Although the FMEA is an essential reliability task for many types of system design and development, it provides limited insight into the probability of system failure. Another limitation is that the FMEA is performed for only one failure at a time. This may not be adequate for systems in which multiple failure modes can occur, with reasonable likelihood, at the same time. However, the FMEA provides valuable information about the system design and operation.
The FMEA is usually iterative in nature. It should be conducted concurrently with the design effort so that the design will reflect the analysis conclusions and recommendations. The FMEA results should be utilized as inputs to system interfaces, design tradeoffs, reliability engineering, safety engineering, maintenance engineering, maintainability, logistic support analysis, test equipment de sign, test planning activities, and so on. Each failure mode should be explicitly defined and should be addressed at each interface level.
The FMEA utilizes an inductive logic or bottom-up approach. It begins at the lowest level of the system hierarchy (normally at the component level) and using knowledge of the failure modes of each part it traces up through the system hierarchy to determine the effect that each potential failure mode will have on system performance. The FMEA focus is on the parts which make up the system.
The FMEA provides
1. A method for selecting a design with a high probability of operational success and adequate safety.
2. A documented uniform method of assessing potential failure modes and their effects on operational success of the system.
3. Early visibility of system interface problems.
4. A list of potential failures which can be ranked according to their seriousness and the probability of their occurrence.
5. Identification of single point failures critical to proper equipment function or personnel safety.
6. Criteria for early planning of necessary tests.
7. Quantitative, uniformly formatted input data for the reliability prediction, assessment, and safety models.
8. The basis for troubleshooting procedures and for the design and location of performance monitoring and false sensing devices.
9. An effective tool for the evaluation of a proposed design, together with any subsequent operational or procedural changes and their impacts on proper equipment functioning and personnel safety.
TABLE 21 Potential Failure Mode and Effects Analysis Summary
The FMEA effort is typically led by reliability engineering, but the actual analysis is done by the design and component engineers and others who are intimately familiar with the product and the components used in its design. If the design is composed of several subassemblies, the FMEA may be done for each subassembly or for the product as a whole. If the subassemblies were designed by different designers, each designer needs to be involved, as well as the product engineer or systems engineer who is familiar with the overall product and the subassembly interface requirements. For purchased assemblies, like power sup plies and disk drives, the assembly design team needs to provide an FMEA that meets the OEM's needs. We have found, as an OEM, that a team of responsible engineers working together is the best way of conducting a FMEA.
The essential steps in conducting an FMEA are listed here, a typical FMEA worksheet is shown in Table 21, and a procedure for critical components is given in Table 22.
1. Reliability block diagram construction. A reliability block diagram is generated that indicates the functional dependencies among the various elements of the system. It defines and identifies each required subsystem and assembly.
2. Failure definition. Rigorous failure definitions (including failure modes, failure mechanisms, and root causes) must be established for the entire system, the subsystems, and all lower equipment levels. A properly executed FMEA provides documentation of all critical components in a system.
3. Failure effect analysis. A failure effect analysis is performed on each item in the reliability block diagram. This takes into account each different failure mode of the item and indicates the effect (consequences) of that item's failure upon the performance of the item at both the local and next higher levels in the block diagram.
4. Failure detection and compensation. Failure detection features for each failure mode should be described. For example, previously known symptoms can be used based on the item behavior pattern(s) indicating that a failure has occurred. The described symptom can cover the operation of the component under consideration or it can cover both the component and the overall system or evidence of equipment failure.
TABLE 22 FMEA Procedure for Critical Components
1. The reliability engineer prepares a worksheet listing the high-risk (critical) components and the information required.
2. The product engineer defines the failure thresholds for each of the outputs of the subassemblies/modules, based on subassembly and product specifications.
3. Each design engineer, working with the appropriate component engineer, analyzes each of the components for which he or she is responsible and fills in the worksheet for those components listing the effects of component failure on the performance at the next level of assembly.
4. Each design engineer analyzes each of the components for which he or she is responsible and estimates the amount of component degradation required to cause the subassembly to fail, per the definitions of Step 2 above, and then fills in the appropriate sections of the worksheet. If the design is tolerant of failure of a specific component because of redundancy, the level of redundancy should be noted.
5. The design engineers consider each critical component and determine whether the design should be changed to make it more tolerant of component failure or degradation. They add their comments and action items to the report.
6. Then reliability engineering analyzes the completed worksheets, prepares a report listing the critical components (those whose failure causes system failure), and summarizes the potential failure modes for each high-risk component and the definitions of failure for each failure mode.
A detected failure should be corrected so as to eliminate its propagation to the whole system and thus to maximize reliability. Therefore, for each element provisions that will alleviate the effect or malfunction or failure should be identified.
5. Recordkeeping. The configurations for both the system and each item must be properly identified, indexed, and maintained.
6. Critical items list. The critical items list is generated based on the results of Steps 1 through 3.
It is important to note that both the FMEA and reliability prediction have definite shelf lives. Being bottom-up approaches, every time that a component (physical implementation) changes, the FMEA and reliability prediction become less effective. It must be determined at what point and how often in the design phase these will be performed: iterative throughout, at the end, etc. Nonetheless, an FMEA should be conducted before the final reliability prediction is completed to provide initial modeling and prediction information. When performed as an integral part of the early design process, it should be updated to reflect design changes as they are incorporated. An example of an FMEA that was conducted for a memory module is presented in Appendix B at the back of this guide. Also provided is a list of action items resulting from this analysis, the implementation of which provides a more robust and thus more reliable design.
22. DESIGN FOR ENVIRONMENT
The push for environmentally conscious electronics is increasing. It is being fueled by legal and regulatory requirements on a global level. Most of the directives being issued deal with (1) the design and end-of-life management of electronic products, requiring manufacturers to design their products for ease of disassembly and recycling, and (2) banning the use of specific hazardous materials such as lead, mercury, cadmium, hexavalent chromium, and flame retardants that contain bromine and antimony oxide.
The adoption of ISO 14000 by most Japanese OEMs and component suppliers is putting pressure on all global product/equipment manufacturers to establish an environmental management system. The ISO 14000 series of standards for environmental management and certification enables a company to establish an effective environmental management system and manage its obligations and responsibilities better. Following the adoption of the ISO 14000 standards, European and Asian companies are beginning to require a questionnaire or checklist on the environmental management system status of supplier companies. OEMs must obtain as much environmentally related information as possible from each of their suppliers, and even from their suppliers' suppliers. To make this job easier, OEMs are developing DFE metrics and tools that can be used by the supply base.
While most of the materials used in electrical and electronic products are safe for users of the products, some materials may be hazardous in the manufacturing process or contribute to environmental problems at the end of the product life. In most cases, these materials are used in electronic products because functional requirements cannot be met with alternative materials. For example, the high electrical conductivity, low melting point, and ductility of lead-based solder make it ideal for connecting devices on PWAs. Similarly, the flame-retardant properties of some halogenated materials make them excellent additives to flammable polymer materials when used in electrical equipment where a spark might ignite a fire.
Continued improvements in the environmental characteristics of electrical and electronic products will require the development and adoption of alternative materials and technologies to improve energy efficiency, eliminate hazardous or potentially harmful materials (where feasible), and increase both the re-usability and the recyclability of products at their end of life. The challenge facing the electronics industry with regard to environmentally friendly IC packaging is to make a switch to materials that have comparable reliability, manufacturability, price, and availability.
The electronics industry at large has established a list of banned or restricted materials, often called materials of concern. These materials include those prohibited from use by regulatory, legislative, or health concerns, along with materials that have been either banned or restricted by regulation or industrial customers or for which special interest groups have expressed concern. Several of the identified materials of concern are commonly found in electronic products, and eliminating them will require significant efforts to identify, develop, and qualify alter natives. These materials include Flame retardants. Flame retardants are found in PWAs, plastic IC pack ages, plastic housings, and cable insulation. The most common approach to flame retardancy in organic materials is to use halogenated, usually brominated, materials. Some inorganic materials, such as antimony trioxide are also used either alone or in conjunction with a brominated material.
Lead. Lead is found in solder and interconnects, batteries, piezoelectric devices, discrete components, and cathode ray tubes.
Cadmium. Cadmium is found in batteries, paints, and pigments and is classified as a known or suspected human carcinogen. Most major electronics companies are working to eliminate its use, except in batteries where there are well-defined recycling procedures to prevent inappropriate disposal.
Hexavalent chromium. This material is found in some pigments and paints (although these applications are decreasing) and on fasteners and metal parts, where it is used for corrosion resistance. Automotive OEMs are either banning its use or strongly encouraging alternatives. But these alternatives cannot consistently pass corrosion-resistance specifications.
Mercury. Mercury is found in the flat panel displays of laptop computers, digital cameras, fax machines, and flat panel televisions. Mercury is highly toxic and there are few alternatives to its use in flat panel displays.
The growing demand for electrical and electronic appliances will at the same time create more products requiring disposal. Efforts to increase the reuse and recycling of end-of-life electronic products have been growing within the electronics industry as a result of the previously mentioned regulatory and legal pressures. There are also efforts to reduce packaging or, in some cases, provide reusable packaging. Products containing restricted or banned materials are more costly and difficult to recycle because of regional restrictive legislation. All of this is adding complexity to the product designer's task.
In order to avoid landfill and incineration of huge amounts of discarded products, it will be necessary to develop a cost-effective infrastructure for reuse and recycling of electronic equipment. This trend will accelerate as more geo graphic regions pass "take-back" legislation to reduce the burden of landfills.
While many of the materials commonly found in electronic products can be easily recycled (e.g., metals and PWAs, for example), several materials commonly found in electronic products present special challenges. These include plastics and leaded glass from televisions and computer monitors.
23. ENVIRONMENTAL ANALYSIS
Various environmental analyses, such as mechanical shock and vibration analyses, are employed when new, unusual, or severe environments are anticipated.
Printed circuit boards and other structures can be modeled and resonant frequencies and amplitudes can be calculated, allowing any overstress conditions to be identified and alleviated. Other environments include temperature excursions, water and humidity, air pressure, sand and dust, etc. Long-term durability requires that cyclic stresses, particularly thermal cycling and vibration, be considered.
These must be studied to assure that the proposed design will not be degraded by the anticipated environmental exposures.
24. DEVELOPMENT AND DESIGN TESTING
The purpose of conducting design evaluation tests is to identify design weak nesses and thus areas for improvement, resulting in a more robust product. A summary of the key test and evaluation methods is as follows:
1. Prototyping, design modeling, and simulation are used while the design is still fluid to validate the design tools; validate and verify the design; identify design weaknesses, marginalities, and other problems; and drive improvement. The use of both BOM and design reviews aids this process.
2. Design for test is used to design the product for easy and effective testing as well as for rapid product debug, leading to early problem resolution.
3. The techniques of test, analyze, and fix and plan-do-check-act-repeat (the Deming cycle) are used to assess the current product's robustness, identify how much margin exists with regard to performance parameters, and identify areas for improvement, maximizing the reliability growth process.
4. Software test and evaluation is used to assess the status of the current version of the system software, identify software bugs and areas for improvement, and drive the improvement.
5. Special studies and application tests are used to investigate the idiosyncrasies and impact of unspecified parameters and timing condition interactions of critical ICs with respect to each other and their impact on the operation of the product as intended.
6. Accelerated environmental stress testing (such as HALT and STRIFE testing) of the PWAs, power supplies, and other critical components is used to identify weaknesses and marginalities of the completed de sign (with the actual production components being used) prior to re lease to production.
Some of these have been discussed previously; the remainder will now be ad dressed in greater detail.
24.1 Development Testing
Development is the best time to identify and correct problems in a product. Making changes is easier and cheaper during development than at any other stage of the product life. Anything that can be done to improve the product here will pay back the maximum benefit since 80% of product cost is usually locked in during this phase.
Development testing is conducted to detect any design errors or omissions overlooked by any previous analyses. This is also known as test, analyze, and fix (TAAF) testing. To be effective all three items must be addressed:
1. Tests must be severe and simulate the worst case expected environment.
2. All problems uncovered must be analyzed to identify the root cause.
3. Positive corrective action must be developed and incorporated to eliminate the root cause.
Testing must then be re-administered to verify that the corrections are effective.
This results in building more robust products and improving next-generation design of products.
24.2 Design Verification Testing
Once final prototype units have been manufactured, design verification testing (DVT) is conducted to ensure that the product meets its performance specifications (including exposure to anticipated application environments such as temperature, humidity, mechanical shock, and vibration), to assess a product's design margins, and to determine its robustness.
In a typical DVT process the new product goes through a battery of tests created to force real-time design flaws and manufacturing incompatibilities. Electrical circuit design engineers verify the electrical performance of the design.
They verify the CAD model simulation results and rationalize them with actual hardware build and with the variability of components and manufacturing processes. Mechanical engineers model and remodel the enclosure design. Printed circuit board designers check the layout of the traces on the PCB, adjust pad and package sizes, and review component layout and spacing. Manufacturing process engineers check the PCB's chemistry to ensure it is compatible with production cells currently being built. If a PCB has too many ball grid array components or too many low profile ceramic components, it may force the use of the more expensive and time-consuming "no-clean" chemistry. Test engineers look for testability features such as net count and test point accessibility. A board that has no test points or exposed vias will make in-circuit testing impossible and thus require a costlier alternative such as functional test. Cable assembly engineers must look at interconnects for better termination and shielding opportunities. Today's products are challenged by higher transmission rates, where greater speeds can cause crosstalk, limiting or preventing specified performance. Finally, plastic/polymer engineers review for flow and thermal characteristics that will facilitate an efficient production cycle, and sheet metal engineers look for tooling and die compatibility.
A well-designed DVT provides a good correlation of measured reliability results to modeled or predicted reliability. Design verification testing delivers best on its objectives if the product has reached a production-ready stage of design maturity before submission to DVT. Major sources of variation (in components, suppliers of critical components, model mix, and the like) are intentionally built into the test population. Test data, including breakdown of critical variable measurements correlated to the known sources of variation, give the product design team a practical look at robustness of the design and thus the ability to produce it efficiently in volume.
In these ways test is an important aspect of defining and improving product quality and reliability, even though the act of performing testing itself does not increase the level of quality.
Design verification testing is also a good time to check the product design's actual thermal characteristics and compare them with the modeled results, and to validate the effectiveness of the heat sinking and distribution system. A thermal profile of the PWA or module is generated looking for hot spots due to high power-dissipating components generating heat and the impact of this on nearby components.
Electronic equipment manufacturers have turned to the use of computational fluid dynamics (CFD) (discussed in section 5) during the front end of the design process and thermography after the design is complete to help solve complex thermal problems. Thermography uses a thermal imaging camera to take a picture of a PWA, module, or product. Thermographic cameras view infrared (IR) energy, as opposed to visible light energy, and display the resultant temperatures as shades of gray or different colors. Figure 24 is an example of a typical thermal scan of a PWA showing the heat-generating components (see color insert).
Any thermal anomaly can indicate a fault or defect. Components running too hot or cold can indicate a short or an open circuit, a diode placed backward or bent IC pins, to name several defects. Elevated temperature operation shortens the life of ICs, while large temperature gradients between components and the PCB increase the stress that can cause early failure due to material delamination.
Thermal imaging and measurement systems provide an effective means for identifying and resolving thermal-related problems by giving a direct measurement of the actual component, PWA, or module as opposed to the modeled thermal profile provided by CFD before the design is committed to hardware build.
Right: temperature scan of back (solder) side of same PWA showing effect of thermal conduction from "warm" components on the top side. To understand the coordinates of the right scan, imagine the left scan is rolled 180° around its horizontal axis. (See color insert.)
(From Ref. 3.)
24.4 Accelerated Stress Testing
Accelerated stress testing is an effective method for improving product reliability since products often have hidden defects or weaknesses which cause failures during normal operation in the field. Product failures may occur when the statistical distribution for a product's strength, or its capability of withstanding a stress, overlaps with the distributions of the operating environmental stresses (Fig. 25).
To prevent product failures, reliability may be achieved through a combination of robust design and tight control of variations in component quality and manufacturing processes. When the product undergoes sufficient improvements, there will no longer be an overlap between the stresses encountered and product strength distributions (Fig. 26).
Accelerated stress testing (i.e., HALT and STRIFE), which is normally conducted at the end of the design phase, determines a product's robustness and detects inherent design and manufacturing flaws or defects. Accelerated stress testing during development is intended to identify weak points in a product so they can be made stronger. The increased strength of the product translates to better manufacturing yields, higher quality and reliability, and faster reliability growth (Fig. 27). The major assumption behind accelerated stress testing is that any failure mechanism that occurs during testing will also occur during application (in the field) if the cause is not corrected. Hewlett-Packard has claimed an 11_ return on the investment in accelerated testing through warranty costs alone.
Their belief is that it is more expensive to argue about the validity of a potential field problem than to institute corrective actions to fix it.
Typically, a series of individual and combined stresses, such as multiaxis vibration, temperature cycling, and product power cycling, is applied in steps of increasing intensity well beyond the expected field environment until the fundamental limit of technology is reached and the product fails.
Several points regarding HALT (an acronym for highly accelerated life test, which is a misnomer because it is an overstress test) need to be made.
1. Appropriate stresses must be determined for each assembly since each has unique electrical, mechanical, thermal mass, and vibration characteristics.
Typical system stresses used during HALT include the following:
VCC voltage margining Clock speed/frequency Clock symmetry Power holdup/cycling Temperature Cold Hot Cycling Vibration
Design faults, faulty components
Design faults, faulty components
Design faults, faulty components
Overloads, marginal components
Overloads, low-quality components
Processing defects, soldering
Processing defects, soldering
Selection of the stresses to be used is the basis of HALT. Some stresses are universal in their application, such as temperature, thermal cycling, and vibration.
Others are suitable to more specific types of products, such as clock margining for logic boards and current loading for power components. Vibration and thermal stresses are generally found to be the most effective environmental stresses in precipitating failure. Temperature cycling detects weak solder joints, IC package integrity, CTE mismatch, PWA mounting problems, and PWA processing is sues-failures that will happen over time in the field. Vibration testing is normally used to check a product for shipping and operational values. Printed wire assembly testing can show weak or brittle solder or inadequate wicking. Bad connections may be stressed to failure at levels that do not harm good connections.
2. HALT is an iterative process, so that stresses may be added or deleted in the sequence of fail-fix-retest.
3. In conducting HALT there is every intention of doing physical damage to the product in an attempt to maximize and quantify the margins of product strength (both operating and destruct) by stimulating harsher-than-expected end use environments.
4. The HALT process continues with a test-analyze-verify-fix approach, with root cause analysis of all failures. Test time is compressed with accelerated stressing, leading to earlier product maturity. The results of accelerated stress testing are Fed back to design to select a different component/assembly and/or sup plier, improve a supplier's process, or make a circuit design or layout change Fed back to manufacturing to make a process change, typically of a workmanship nature Used to determine the environmental stress screening (ESS) profiles to be used during production testing, as appropriate
5. The importance of determining root causes for all failures is critical.
Root cause failure analysis is often overlooked or neglected due to underestimation of resources and disciplines required to properly carry out this effort. If failure analysis is not carried through to determination of all root causes, the benefits of the HALT process are lost.
Figure 28 depicts the impact that accelerated stress testing can have in lowering the useful life region failure rate and that ESS can have in lowering the early life failure rate (infant mortality) of the bathtub curve.
25. TRANSPORTATION AND SHIPPING TESTING
Once a design is completed, testing is conducted to evaluate the capability of the product/equipment to withstand shock and vibration. Shock and vibration, which are present in all modes of transportation, handling, and end-user environments, can cause wire chafing, fastener loosening, shorting of electrical parts, component fatigue, misalignment, and cracking. Dynamic testing tools (which include both sine and random vibration, mechanical shock, and drop impact and simulation of other environmental hazards) are used to more effectively design and test products to ensure their resistance to these forces. The shipping package container design is verified by conducting mechanical shipping and package tests.
3.26 REGULATORY TESTING
Appropriate reliability and regulatory tests are typically conducted at the conclusion of the design phase. Product regulations are the gate to market access. To sell a product in various geographical markets, the product must satisfy specific regulatory compliance requirements. To achieve this, the correct regulatory tests must be conducted to ensure that the product meets the required standards. For electronic-based products-such as computers, medical devices, and telecommunication products-safety, electromagnetic compatibility, and, as appropriate, telecommunications tests need to be performed. In all cases, certification of the product, usually by a regulating authority/agency in each country in which the product is sold, is a legal requirement.
26.1 Acoustic Measurements
The need for product acoustic noise emissions measurement and management is gaining increased importance. Information on acoustic noise emission of machinery and equipment is needed by users, planners, manufacturers, and authorities.
This information is required for comparison of the noise emissions from different products, for assessment of noise emissions against noise limits for planning workplace noise levels, as well as for checking noise reduction achievements.
Both sound pressure and sound power are measured according to ISO 7779, which is recognized as a standard for acoustic testing.
26.2 Product Safety Testing
Virtually all countries have laws and regulations which specify that products must be safe. On the surface, product safety testing appears to be a straightforward concept-a product should cause no harm. However, the issue gets complicated when trying to meet the myriad different safety requirements of individual countries when selling to the global market. Several examples are presented that make the point.
For electrical product safety standards in the United States and Canada, most people are familiar with Underwriters Laboratories (UL) and the Canadian Standards Association (CSA). The UL safety standard that applies to information technology equipment (ITE), for example, is UL1950. A similar CSA standard is CSA950. In this instance, a binational standard, UL1950/CSA950, also exists.
Standards governing electrical products sold in Europe are set up differently. The European Union (EU) has established European Economic Community (EEC) directives. The directive that applies to most electrical products for safety is the Low Voltage Directive (LVD), or 73/23/EEC. The LVD mandates CE marking, a requirement for selling your products in Europe. Furthermore, 73/ 23/EEC specifies harmonized European Norm (EN) standards for each product grouping, such as EN 60950 for ITE. Table 23 lists typical product safety test requirements.
26.3 Electromagnetic Compatibility Testing
Compliance to electromagnetic compatibility requirements is legally mandated in many countries, with new legislation covering emissions and immunity being introduced at an increasingly rapid rate. Electromagnetic compatibility requirements apply to all electrical products, and in most countries you cannot legally offer your product for sale without having the appropriate proof of compliance to EMC regulations for that country. This requires a staff of engineers who are familiar with myriad U.S. and international standards and regulations as well as established relationships with regulatory agencies. Table 24 lists the commonly used EMC test requirements for information technology equipment.
TABLE 23 Typical ITE Safety Tests
Rating test capabilities Purpose: determine the suitability of the product's electrical rating as specified in the applicable standard.
Temperature measurement capabilities Purpose: determine that the product's normal operating temperatures do not exceed the insulation ratings or the temperature limits of user-accessible surfaces.
Hi-pot testing capabilities Purpose: verify the integrity of the insulation system between primary and secondary as well as primary and grounded metal parts.
Humidity conditioning capabilities Purpose: introduce moisture into hydroscopic insulation prior to hi-pot testing.
Flammability tests to UL 1950/UL 94 Purpose: determine the flame rating of insulation material or enclosures to determine compliance with the applicable end-use product standards.
Force measurements as required by IEC 950 and IEC 1010 standards Purpose: determine if enclosure mechanical strength and product stability complies with standard.
Ground continuity testing Purpose: determine if the ground impedance is low enough to comply with the applicable standard.
Leakage current instrumentation to IEC 950 and IEC 1010 Purpose: determine if the chassis leakage current meets the standard limits.
X-radiation Purpose: verify that the X-radiation from a CRT monitor does not exceed standard limits.
27. DESIGN ERRORS
Design errors can occur in specifying the function, timing, and interface characteristics of an IC or in the logic and circuit design. They can also occur as a result of errors in the design models, design library, simulation and extraction tools, PWA layout software; using the wrong component (e.g., an SRAM with timing conditions that don't match the timing constraints required for interfacing it with the selected microprocessor); microprocessor and DSP code issues, etc.
In addition to being stored as a voltage, data and control signals are read as a voltage. If a signal voltage is above a certain threshold, then the data or control bit is read as a logic 1 below the threshold it is read as logic 0. When one bit is a logic 1 and the next bit is a logic 0, or vice versa, there is a transition period to allow the voltage to change. Because each individual device has slightly different signal delay (impedance) and timing characteristics, the length of that transition period varies. The final voltage value attained also varies slightly as a function of the device characteristics and the operating environment (temperature, humidity). Computer hardware engineers allow a certain period of time (called design margin) for the transition period to be completed and the voltage value to settle. If there are timing errors or insufficient design margins that cause the voltage to be read at the wrong time, the voltage value may be read incorrectly, and the bit may be misinterpreted, causing data corruption. It should be noted that this corruption can occur anywhere in the system and could cause incorrect data to be written to a computer disk, for example, even when there are no errors in computer memory or in the calculations.
The effect of a software design error is even less predictable than the effect of a hardware design error. An undiscovered software design error could cause both a processor halt and data corruption. For example, if the algorithm used to compute a value is incorrect, there is not much that can be done outside of good software engineering practices to avoid the mistake. A processor may also attempt to write to the wrong location in memory, which may overwrite and corrupt a value. In this case, it is possible to avoid data corruption by not allowing the processor to write to a location that has not been specifically allocated for the value it is attempting to write.
These considerations stress the importance of conducting both hardware and software design reviews.
Section 10.1 courtesy of Reliability Engineering Department, Tandem Division, Compaq Computer Corporation. Portions of Section 12 used with permission from SMT magazine. Portions of Section 11 courtesy of Noel Donlin, U.S. Army (retired).
1. Bergman D. CAD to CAM made easy, SMT, July 1999 and PennWell, 98 Spit Brook Rd., Nashua, NH 03062.
2. Brewer R. EMC design practices: preserving signal integrity. Evaluation Engineering, November 1999.
3. McLeish JG. Accelerated Reliability Testing Symposium (ARTS) USA, 1999.
4. Carlsson G. DFT Enhances PCB manufacturing. Future Circuits International. www.mriresearch.com.
5. Bergman D. GenCam addresses high density circuit boards. Future Circuits International, Issue No. 4. www.mriresearch.com.
1. Barrow P. Design for manufacture. SMT, January 2002.
2. Cravotta R. Dress your application for success. EDN, November 8, 2001.
3. Dipert B. Banish bad memories. EDN, November 22, 2001.
4. McDermott RE et al. The Basics of FMEA. Portland, OR: Productivity, Inc.
5. Nelson R. DFT lets ATE work MAGIC. Test & Measurement World, May 2001.
6. Parker KP and Zimmerle D. Boundary scan signals future age of test. EP&P, July 2002.
7. Sexton J. Accepting the PCB test and inspection challenge. SMT, April 2001.
8. Solberg V. High-density circuits for hand-held and portable products. SMT, April 2001.
9. Troescher M and Glaser F. Electromagnetic compatibility is not signal integrity. Item 2002.
10. Webb W. Designing dependable devices. EDN, April 18, 2002.
11. Williams P and Stemper M. Collaborative product commerce-the next frontier. Electronic Buyers News, May 6, 2002.
|Top of Page||PREV.||NEXT||Article Index||HOME|