Home | Forum | DAQ Fundamentals | DAQ Hardware | DAQ Software
Input Devices | Data Loggers + Recorders | Books | Links + Resources
Part I Program Structure and Execution
2. Representing and Manipulating Information
3. Machine-Level Representation of Programs
4. Processor Architecture
5. Optimizing Program Performance
6. The Memory Hierarchy
Part II Running Programs on a System
8. Exceptional Control Flow
9. Virtual Memory
Part III Interaction and Communication Between Programs
10. System-Level I/O
11. Network Programming
12. Concurrent Programming
A. Error Handling
This guide ( CSP) is for computer scientists, computer engineers, and others who want to be able to write better programs by learning what is going on "under the hood" of a computer system.
Our aim is to explain the enduring concepts underlying all computer systems, and to show you the concrete ways that these ideas affect the correctness, performance, and utility of your application programs. Other systems guides are written from a builder's perspective, describing how to implement the hardware or the systems software, including the operating system, compiler, and network interface.
This guide is written from a programmer's perspective, describing how application programmers can use their knowledge of a system to write better programs. Of course, learning what a system is supposed to do provides a good first step in learning how to build one, and so this guide also serves as a valuable introduction to those who go on to implement systems hardware and software.
If you study and learn the concepts in this guide, you will be on your way to becoming the rare "power programmer" who knows how things work and how to fix them when they break. Our aim is to present the fundamental concepts in ways that you will find useful right away. You will also be prepared to delve deeper, studying such topics as compilers, computer architecture, operating systems, em bedded systems, and networking.
Assumptions about the Student's Background
The presentation of machine code in the guide is based on two related formats supported by Intel and its competitors, colloquially known as "x86." IA32 is the machine code that has become the de facto standard for a wide range of systems.
x86-64 is an extension of IA32 to enable programs to operate on larger data and to reference a wider range of memory addresses. Since x86-64 systems are able to run IA32 code, both of these forms of machine code will see widespread use for the foreseeable future. We consider how these machines execute C programs on Unix or Unix-like (such as Linux) operating systems. (To simplify our presentation, we will use the term "Unix" as an umbrella term for systems having Unix as their heritage, including Solaris, MacOS, and Linux.) The text contains numerous programming examples that have been compiled and run on Linux systems. We assume that you have access to such a machine and are able to log in and do simple things such as changing directories.
If your computer runs Microsoft Windows, you have two choices. First, you can get a copy of Linux (www.ubuntu.com) and install it as a "dual boot" option, so that your machine can run either operating system. Alternatively, by installing a copy of the Cygwin tools (www.cygwin.com), you can run a Unix-like shell under
Windows and have an environment very close to that provided by Linux. Not all features of Linux are available under Cygwin, however.
We also assume that you have some familiarity with C or C++. If your only prior experience is with Java, the transition will require more effort on your part, but we will help you. Java and C share similar syntax and control statements.
However, there are aspects of C, particularly pointers, explicit dynamic memory allocation, and formatted I/O, that do not exist in Java. Fortunately, C is a small language, and it is clearly and beautifully described in the classic "K&R" text by Brian Kernighan and Dennis Ritchie . Regardless of your programming background, consider K&R an essential part of your personal systems library.
Several of the early sections in the guide explore the interactions between C programs and their machine-language counterparts. The machine-language examples were all generated by the GNU gcc compiler running on IA32 and x86 64 processors. We do not assume any prior experience with hardware, machine language, or assembly-language programming.
New to C? Advice on the C programming language
To help readers whose background in C programming is weak (or nonexistent), we have also included these special notes to highlight features that are especially important in C. We assume you are familiar with C++ or Java.
How to Use this Guide
Learning how computer systems work from a programmer's perspective is great fun, mainly because you can do it actively. Whenever you learn something new, you can try it out right away and see the result first hand. In fact, we believe that the only way to learn systems is to do systems, either working concrete problems or writing and running programs on real systems.
This theme pervades the entire guide. When a new concept is introduced, it is followed in the text by one or more practice problems that you should work immediately to test your understanding. Solutions to the practice problems are at the end of each section. As you read, try to solve each problem on your own, and then check the solution to make sure you are on the right track. Each section is followed by a set of homework problems of varying difficulty. Your instructor has the solutions to the homework problems in an Instructor's Manual. For each homework problem, we show a rating of the amount of effort we feel it will require:
1. Should require just a few minutes. Little or no programming required.
2. Might require up to 20minutes.Often involves writing and testing some code.
Many of these are derived from problems we have given on exams.
3. Requires a significant effort, perhaps 1-2 hours. Generally involves writing and testing a significant amount of code.
4. A lab assignment, requiring up to 10 hours of effort.
Each code example in the text was formatted directly, without any manual intervention, from a C program compiled with gcc and tested on a Linux system.
Of course, your system may have a different version of gcc, or a different compiler altogether, and so your compiler might generate different machine code, but the overall behavior should be the same. All of the source code is available from the CS: APP Web page at csapp.cs.cmu.edu. In the text, the file names of the source programs are documented in horizontal bars that surround the formatted code.
For example, the program in Figure 1 can be found in the file hello.c in directory code/intro/. We encourage you to try running the example programs on your system as you encounter them.
To avoid having a guide that is overwhelming, both in bulk and in content, we have created a number of Web asides containing material that supplements the main presentation of the guide. These asides are referenced within the guide with a notation of the form CHAP:TOP, where CHAP is a short encoding of the section subject, and TOP is short code for the topic that is covered. For example, Web Side bar data:bool contains supplementary material on Boolean algebra for the presentation on data representations in Section 2,whileWeb Side bar arch:vlog contains material describing processor designs using the Verilog hardware description language, supplementing the presentation of processor design in Section 4.
All of these Web asides are available from the CSP Web page.
What is a Side Bar?
You will encounter asides of this form throughout the text. Side bars are parenthetical remarks that give you some additional insight into the current topic. Side bars serve a number of purposes. Some are little history lessons. For example, where did C, Linux, and the Internet come from? Other asides are meant to clarify ideas that students often find confusing. For example, what is the difference between a cache line, set, and block? Other asides give real-world examples. For example, how a floating-point error crashed a Ariane rocket, or what the geometry of an actual hard disk drive looks like. Finally, some asides are just fun stuff. For example, what is a "hoinky"?
The CSP guide consists of 12 sections designed to capture the core ideas in computer systems:
-- Section 1: A Tour of Computer Systems. This section introduces the major ideas and themes in computer systems by tracing the life cycle of a simple "hello, world" program.
-- Section 2: Representing and Manipulating Information. We cover computer arithmetic, emphasizing the properties of unsigned and two's-complement number representations that affect programmers. We consider how numbers are represented and therefore what range of values can be encoded for a given word size. We consider the effect of casting between signed and unsigned numbers. We cover the mathematical properties of arithmetic operations. Novice programmers are often surprised to learn that the (two's-complement) sum or product of two positive numbers can be negative. On the other hand, two's complement arithmetic satisfies the algebraic properties of a ring, and hence a compiler can safely transform multiplication by a constant into a sequence of shifts and adds. We use the bit-level operations of C to demonstrate the principles and applications of Boolean algebra. We cover the IEEE floating-point format in terms of how it represents values and the mathematical properties of floating-point operations.
Having a solid understanding of computer arithmetic is critical to writing reliable programs. For example, programmers and compilers cannot replace the expression (x<y) with (x-y < 0), due to the possibility of overflow. They cannot even replace it with the expression (-y < -x), due to the asymmetric range of negative and positive numbers in the two's-complement representation. Arithmetic overflow is a common source of programming errors and security vulnerabilities, yet few other guides cover the properties of computer arithmetic from a programmer's perspective.
-- Section 3: Machine-Level Representation of Programs. We teach you how to read the IA32 and x86-64 assembly language generated by a C compiler. We cover the basic instruction patterns generated for different control constructs, such as conditionals, loops, and switch statements. We cover the implementation of procedures, including stack allocation, register usage conventions, and parameter passing. We cover the way different data structures such as structures, unions, and arrays are allocated and accessed. We also use the machine-level view of programs as a way to understand common code security vulnerabilities, such as buffer overflow, and steps that the programmer, the compiler, and the operating system can take to mitigate these threats.
Learning the concepts in this section helps you become a better programmer, because you will understand how programs are represented on a machine.
One certain benefit is that you will develop a thorough and concrete under standing of pointers.
-- Section 4: Processor Architecture. This section covers basic combinational and sequential logic elements, and then shows how these elements can be combined in a datapath that executes a simplified subset of the IA32 instruction set called "Y86."We begin with the design of a single-cycle datapath. This design is conceptually very simple, but it would not be very fast. We then intro duce pipelining, where the different steps required to process an instruction are implemented as separate stages. At any given time, each stage can work on a different instruction. Our ?ve-stage processor pipeline is much more realistic. The control logic for the processor designs is described using a simple hardware description language called HCL. Hardware designs written in HCL can be compiled and linked into simulators provided with the guide, and they can be used to generate Verilog descriptions suitable for synthesis into working hardware.
-- Section 5: Optimizing Program Performance. This section introduces a number of techniques for improving code performance, with the idea being that programmers learn to write their C code in such a way that a compiler can then generate efficient machine code. We start with transformations that re duce the work to be done by a program and hence should be standard practice when writing any program for any machine. We then progress to transformations that enhance the degree of instruction-level parallelism in the generated machine code, thereby improving their performance on modern "superscalar" processors. To motivate these transformations, we introduce a simple operational model of how modern out-of-order processors work, and show how to measure the potential performance of a program in terms of the critical paths through a graphical representation of a program. You will be surprised how much you can speed up a program by simple transformations of the C code.
-- Section 6: The Memory Hierarchy. The memory system is one of the most visible parts of a computer system to application programmers. To this point, you have relied on a conceptual model of the memory system as a linear array with uniform access times. In practice, a memory system is a hierarchy of storage devices with different capacities, costs, and access times. We cover the different types of RAM and ROM memories and the geometry and organization of magnetic-disk and solid-state drives. We describe how these storage devices are arranged in a hierarchy. We show how this hierarchy is made possible by locality of reference. We make these ideas concrete by introducing a unique view of a memory system as a "memory mountain" with ridges of temporal locality and slopes of spatial locality. Finally, we show you how to improve the performance of application programs by improving their temporal and spatial locality.
-- Section 7: Linking. This section covers both static and dynamic linking, including the ideas of relocatable and executable object files, symbol resolution, relocation, static libraries, shared object libraries, and position-independent code. Linking is not covered in most systems texts, but we cover it for several reasons. First, some of the most confusing errors that programmers can encounter are related to glitches during linking, especially for large software packages. Second, the object files produced by linkers are tied to concepts such as loading, virtual memory, and memory mapping.
-- Section 8: Exceptional Control Flow. In this part of the presentation, we step beyond the single-program model by introducing the general concept of exceptional control flow (i.e., changes in control flow that are outside the normal branches and procedure calls). We cover examples of exceptional control flow that exist at all levels of the system, from low-level hardware exceptions and interrupts, to context switches between concurrent processes, to abrupt changes in control flow caused by the delivery of Unix signals, to the nonlocal jumps in C that break the stack discipline.
This is the part of the guide where we introduce the fundamental idea of a process, an abstraction of an executing program. You will learn how processes work and how they can be created and manipulated from application programs. We show how application programmers can make use of multiple processes via Unix system calls. When you finish this section, you will be able to write a Unix shell with job control. It is also your first introduction to the nondeterministic behavior that arises with concurrent program execution.
-- Section 9: Virtual Memory. Our presentation of the virtual memory system seeks to give some understanding of how it works and its characteristics. We want you to know how it is that the different simultaneous processes can each use an identical range of addresses, sharing some pages but having individual copies of others. We also cover issues involved in managing and manipulating virtual memory. In particular, we cover the operation of storage allocators such as the Unix malloc and free operations. Covering this material serves several purposes. It reinforces the concept that the virtual memory space is just an array of bytes that the program can subdivide into different storage units. It helps you understand the effects of programs containing memory referencing errors such as storage leaks and invalid pointer references. Finally, many application programmers write their own storage allocators optimized toward the needs and characteristics of the application. This section, more than any other, demonstrates the benefit of covering both the hardware and the software aspects of computer systems in a unified way. Traditional computer architecture and operating systems texts present only part of the virtual memory story.
-- Section 10: System-Level I/O. We cover the basic concepts of Unix I/O such as files and descriptors. We describe how files are shared, how I/O redirection works, and how to access file meta data. We also develop a robust buffered I/O package that deals correctly with a curious behavior known as short counts, where the library function reads only part of the input data. We cover the C standard I/O library and its relationship to Unix I/O, focusing on limitations of standard I/O that make it unsuitable for network programming. In general, the topics covered in this section are building blocks for the next two sections on network and concurrent programming.
-- Section 11: Network Programming. Networks are interesting I/O devices to program, tying together many of the ideas that we have studied earlier in the text, such as processes, signals, byte ordering, memory mapping, and dynamic storage allocation. Network programs also provide a compelling context for concurrency, which is the topic of the next section. This section is a thin slice through network programming that gets you to the point where you can write a Web server. We cover the client-server model that underlies all network applications. We present a programmer's view of the Internet, and show how to write Internet clients and servers using the sockets interface. Finally, we introduce HTTP and develop a simple iterative Web server.
-- Section 12: Concurrent Programming. This section introduces concurrent programming using Internet server design as the running motivational example. We compare and contrast the three basic mechanisms for writing con current programs-processes, I/O multiplexing, and threads-and show how to use them to build concurrent Internet servers. We cover basic principles of synchronization using P and V semaphore operations, thread safety and reentrancy, race conditions, and deadlocks. Writing concurrent code is essential for most server applications. We also describe the use of thread-level programming to express parallelism in an application program, enabling faster execution on multi-core processors. Getting all of the cores working on a single computational problem requires a careful coordination of the concurrent threads, both for correctness and to achieve high performance.
New to this Edition
The first edition of this guide was published with a copyright of 2003. Considering the rapid evolution of computer technology, the guide content has held up surprisingly well. Intel x86 machines running Unix-like operating systems and programmed in C proved to be a combination that continues to encompass many systems today. Changes in hardware technology and compilers and the experience of many instructors teaching the material have prompted a substantial revision.
Here are some of the more significant changes:
-- Section 2:Representing and Manipulating Information. We have tried to make this material more accessible, with more careful explanations of concepts and with many more practice and homework problems. We moved some of the more theoretical aspects to Web asides. We also describe some of the security vulnerabilities that arise due to the overflow properties of computer arithmetic.
-- Section 3:Machine-Level Representation of Programs. We have extended our coverage to include x86-64, the extension of x86 processors to a 64-bit word size. We also use the code generated by amore recent version of gcc. We have enhanced our coverage of buffer overflow vulnerabilities. We have created Web asides on two different classes of instructions for floating point, and also a view of the more exotic transformations made when compilers attempt higher degrees of optimization. Another Web aside describes how to embed x86 assembly code within a C program.
-- Section 4: Processor Architecture. We include a more careful exposition of exception detection and handling in our processor design. We have also created a Web aside showing a mapping of our processor designs into Verilog, enabling synthesis into working hardware.
-- Section 5: Optimizing Program Performance. We have greatly changed our description of how an out-of-order processor operates, and we have created a simple technique for analyzing program performance based on the paths in a data-flow graph representation of a program. A Web aside describes how C programmers can write programs that make use of the SIMD (single instruction, multiple-data) instructions found in more recent versions of x86 processors.
-- Section 6: The Memory Hierarchy. We have added material on solid-state disks, and we have updated our presentation to be based on the memory hierarchy of an Intel Core i7 processor.
-- Section 7: Linking. This section has changed only slightly.
-- Section 8: Exceptional Control Flow. We have enhanced our discussion of how the process model introduces some fundamental concepts of concurrency, such as nondeterminism.
-- Section 9:VirtualMemory.We have updated our memory system case study to describe the 64-bit Intel Core i7 processor. We have also updated our sample implementation of malloc to work for both 32-bit and 64-bit execution.
-- Section 10: System-Level I/O. This section has changed only slightly.
-- Section 11: Network Programming. This section has changed only slightly.
-- Section 12: Concurrent Programming. We have increased our coverage of the general principles of concurrency, and we also describe how programmers can use thread-level parallelism to make programs run faster on multi-core machines.
In addition, we have added and revised a number of practice and homework problems.
Origins of this Guide
The guide stems from an introductory course that was developed called Introduction to Computer Systems (ICS). The ICS course has been taught every semester since then, each time to about 150-250 students, ranging from sophomores to masters degree students and with a wide variety of majors. It is a required course for all undergraduates in the CS and ECE departments of the University, and it has become a prerequisite for most upper-level systems courses.
The idea with ICS was to introduce students to computers in a different way.
Few of our students would have the opportunity to build a computer system. On the other hand, most students, including all computer scientists and computer engineers, will be required to use and program computers on a daily basis. So we decided to teach about systems from the point of view of the programmer, using the following filter: we would cover a topic only if it affected the performance, correctness, or utility of user-level C programs.
For example, topics such as hardware adder and bus designs were out. Topics such as machine language were in, but instead of focusing on how to write assembly language by hand, we would look at how a C compiler translates C constructs into machine code, including pointers, loops, procedure calls, and switch statements. Further, we would take a broader and more holistic view of the system as both hardware and systems software, covering such topics as linking, loading, processes, signals, performance optimization, virtual memory, I/O, and network and concurrent programming.
This approach allowed us to teach the ICS course in a way that is practical, concrete, hands-on, and exciting for the students. The response from our students and faculty colleagues was immediate and overwhelmingly positive, and we realized that others outside of CMU might benefit from using our approach. Hence this guide, which we developed from the ICS lecture notes, and which we have now revised to reflect changes in technology and how computer systems are implemented.
For Instructors: Courses Based on the Guide
Instructors can use the CSP guide to teach five different kinds of systems courses (Figure 2). The particular course depends on curriculum requirements, personal taste, and the backgrounds and abilities of the students. From left to right in the figure, the courses are characterized by an increasing emphasis on the programmer's perspective of a system. Here is a brief description:
-- ORG: A computer organization course with traditional topics covered in an untraditional style. Traditional topics such as logic design, processor architecture, assembly language, and memory systems are covered. However, there is more emphasis on the impact for the programmer. For example, data representations are related back to the data types and operations of C programs, and the presentation on assembly code is based on machine code generated by a C compiler rather than hand-written assembly code.
-- ORG+: The ORG course with additional emphasis on the impact of hardware on the performance of application programs. Compared to ORG, students learn more about code optimization and about improving the memory performance of their C programs.
-- ICS: The baseline ICS course, designed to produce enlightened programmers who understand the impact of the hardware, operating system, and compilation system on the performance and correctness of their application programs.
A significant difference from ORG+ is that low-level processor architecture is not covered. Instead, programmers work with a higher-level model of a modern out-of-order processor. The ICS course fits nicely into a 10-week quarter, and can also be stretched to a 15-week semester if covered at a more leisurely pace.
-- ICS+: The baseline ICS course with additional coverage of systems programming topics such as system-level I/O, network programming, and concurrent programming. This is the semester-long course, which covers every section in CSP except low-level processor architecture.
-- SP: A systems programming course. Similar to the ICS+ course, but drops
floating point and performance optimization, and places more emphasis on systems programming, including process control, dynamic linking, system level I/O, network programming, and concurrent programming. Instructors might want to supplement from other sources for advanced topics such as daemons, terminal control, and Unix IPC.
The main message of Figure 2 is that the CSP guide gives a lot of options to students and instructors. If you want your students to be exposed to lower level processor architecture, then that option is available via the ORG and ORG+ courses. On the other hand, if you want to switch from your current computer organization course to an ICS or ICS+ course, but are wary are making such a drastic change all at once, then you can move toward ICS incrementally. You can start with ORG, which teaches the traditional topics in a nontraditional way.
Once you are comfortable with that material, then you can move to ORG+, and eventually to ICS. If students have no experience in C (for example they have only programmed in Java), you could spend several weeks on C and then cover the material of ORG or ICS.
Finally, we note that the ORG+ and SP courses would make a nice two-term (either quarters or semesters) sequence. Or you might consider offering ICS+ as one term of ICS and one term of SP.
Classroom-Tested Laboratory Exercises
The ICS+ course receives very high evaluations from students.
Median scores of 5.0/5.0 and means of 4.6/5.0 are typical for the student course evaluations. Students cite the fun, exciting, and relevant laboratory exercises as the primary reason. The labs are available from the CSP Web page. Here are examples of the labs that are provided with the guide:
-- Data Lab. This lab requires students to implement simple logical and arithmetic functions, but using a highly restricted subset of C. For example, they must compute the absolute value of a number using only bit-level operations.
This lab helps students understand the bit-level representations of C data types and the bit-level behavior of the operations on data.
-- Binary Bomb Lab. A binary bomb is a program provided to students as an object-code file. When run, it prompts the user to type in six different strings.
If any of these is incorrect, the bomb "explodes," printing an error message and logging the event on a grading server. Students must "defuse" their own unique bombs by disassembling and reverse engineering the programs to determine what the six strings should be. The lab teaches students to understand assembly language, and also forces them to learn how to use a debugger.
-- Buffer Overflow Lab. Students are required to modify the run-time behavior of a binary executable by exploiting a buffer overflow vulnerability. This lab teaches the students about the stack discipline, and teaches them about the danger of writing code that is vulnerable to buffer overflow attacks.
-- Architecture Lab. Several of the homework problems of Section 4 can be combined into a lab assignment, where students modify the HCL description of a processor to add new instructions, change the branch prediction policy, or add or remove bypassing paths and register ports. The resulting processors can be simulated and run through automated tests that will detect most of the possible bugs. This lab lets students experience the exciting parts of processor design without requiring a complete background in logic design and hardware description languages.
-- Performance Lab. Students must optimize the performance of an application kernel function such as convolution or matrix transposition. This lab provides a very clear demonstration of the properties of cache memories, and gives students experience with low-level program optimization.
-- Shell Lab. Students implement their own Unix shell program with job control, including the ctrl-c and ctrl-z keystrokes, fg, bg, and jobs commands. This is the student's first introduction to concurrency, and gives them a clear idea of Unix process control, signals, and signal handling.
-- Malloc Lab. Students implement their own versions of malloc, free, and (optionally) real loc. This lab gives students a clear understanding of data layout and organization, and requires them to evaluate different trade-offs between space and time efficiency.
-- Proxy Lab. Students implement a concurrent Web proxy that sits between their browsers and the rest of the World Wide Web. This lab exposes the students to such topics as Web clients and servers, and ties together many of the concepts from the course, such as byte ordering, file I/O, process control, signals, signal handling, memory mapping, sockets, and concurrency. Students like being able to see their programs in action with real Web browsers and Web servers.
The CSP Instructor's Manual has a detailed discussion of the labs, as well as directions for downloading the support software.
PREV. | NEXT
Updated: Sunday, April 16, 2017 18:14 PST