Wednesday, October 27, 2010

About Computer Organization

Most computers today operate according to the “von Neumann architecture.” The main idea of the von
Neumann architecture is that the program to be executed resides in the computer’s memory, along with the
program’s data. John von Neumann published this idea in 1945. Today this concept is so familiar it seems self-evident, but earlier computers were usually wired for a certain function. In effect, the program was built into the construction of the computer. Think of an early calculator; for example, imagine an old hand-cranked mechanical calculator. The machine was built to do one well-defined thing. In the case of an old hand-cranked calculator, it was built only to add. Put a number in; crank it; get the new sum.

To subtract, the operator needed to know how to do complementary subtraction, which uses addition to
accomplish subtraction. Instead of offering a subtract function, the old calculator required the operator to add
the “ten’s complement” of the number to be subtracted. You can search for “ten’s complement” on Google to
learn more, but the point for now is that early computing devices
were built for certain functions only. One could never, for instance, use an old adding machine to maintain a list of neighbors’ phone numbers!
The von Neumann architecture is also called the “stored program computer.” The program steps are stored
in the computer’s memory, and the computation cycle of the machine retrieves the next step (instruction to be
executed) from memory, completes that computation, and then retrieves the next step. This cycle simply repeats until the computer retrieves an instruction to “halt.”

There are three primary units in the von Neumann computer. Memory is where both programs and data are
stored. The central processing unit (CPU) accesses the program and data in memory and performs the calculations. The I/O unit provides access to devices for data input and output.

DATA REPRESENTATION
We’re used to representing numbers in “base 10.” Presumably this number base makes sense to us because
we have 10 fingers. If our species had evolved with 12 fingers, we would probably have 2 more digits among
the set of symbols we use, and we would find it quite natural to compute sums in base 12. However, we have
only 10 fingers, so let’s start with base 10. Remember what the columns mean when we write a number like 427. The seven means we have 7 units, the two means we have 2 tens, and the four means we have 4 hundreds. The total quantity is 4 hundreds, plus 2 tens, plus 7. The column on the far right is for units (which you can also write as 100), the next column to the left is for 10s (which you can also write as 101), and the next column is for 100s (which you can write as 102). We say that we use “base 10” because the columns correspond to powers of 10—100, 101, 102, etc.

Suppose that we had evolved with 12 fingers and were more comfortable working in base 12, instead. What
would the meaning of 427 be? The seven would still mean 7 units (120 is also equal to 1), but now the two would mean 2 dozen (121 equals 12), and the four would mean 4 gross (122 equals 144). The value of the number 427 in base 12 would be 4 gross, plus 2 dozen, plus 7, or 607 in our more familiar base-10 representation. Some people say we would be better off using base 12, also known as the duodecimal or dozenal system. For example, you can readily find a sixth, a third, a quarter, or a half in base 12, whereas you can only find a half easily in base 10. Twelve is also a good match for our calendar, our clock, and even our compass.  As well,the decision to use base 10 in daily life was made long ago!

The point of this discussion is to show that base 10 is simply one number system of many. One can
compute in base 10, or base 12, or base-any-other-number. Our choice of number system can be thought of as arbitrary—we’ve got 10 fingers, so let’s use base 10. We could compute just as competently in base 7, or base 12, or base 2.

COMPUTER WORD SIZE
Each computer deals with a certain number of bits at a time. The early hobbyist computers manipulated
8 bits at a time, and so were called “8-bit computers.” Another way to say this was that the computer “word size” was 8 bits. The computer might be programmed to operate on more than 8 bits, but its basic operations dealt with 8 bits at a time. If our program must count, how large a count can an 8-bit computer maintain? Going back to our discussion of the binary number system, this is the largest number we can represent with 8 bits:

11111111

This number is 128, plus 64, plus 32, plus 16, plus 8, plus 4, plus 2, plus 1—255. That’s it for an 8-bit
computer, unless we resort to some “workaround.” The first IBM PC used the Intel 8088 processor. It had an 8-bit data bus (meaning it read and wrote 8 bits at a time from/to peripheral devices), but internally it was a 16-bit computer. How large a count can a 16-bit computer maintain? Here’s the number, broken into two 8-bit chunks (bytes) for legibility:

1111111 11111111

This number is 32,768 (215), plus 16,384, plus 8192, plus 4096, plus 2048, plus 1024, plus 256, plus 255
(the lower 8 bits we already computed above)—65,535. That’s a much bigger number than the maximum
number an 8-bit computer can work with, but it’s still pretty small for some jobs. You’d never be able to use
a 16-bit computer for census work, for instance, without some “workaround.”
Today, most computers we’re familiar with use a 32-bit word size. The maximum count possible with
32 bits is over 4 billion. The next generation computers will likely use a 64-bit word size, and the maximum
count possible with 64 bits is something like a trillion billions!
The ability to represent a large number directly is nice, but it comes at a cost of “bit efficiency.” Here’s
what the number 6 looks like in a 32-bit word:

00000000000000000000000000000110

There are a lot of wasted bits (leading zeros) there! When memory was more expensive, engineers used to
see bit-efficiency as a consideration, but memory is now so inexpensive that it usually is no longer a concern.

INTEGER DATA FORMATS
So far our discussion has been of whole numbers only, and even of positive whole numbers. Computers
need to keep track of the sign of a number, and must also be able to represent fractional values (real numbers). As you might expect, if we need to keep track of the sign of a number, we can devote a bit of the computer word to maintaining the sign of the number. The leftmost bit, also known as the most significant bit (“msb” in contrast to the least significant bit, “lsb,” at the right end of the word), will be zero if the number is positive,and 1 if the number is negative. Here is a positive 6 for an 8-bit computer:

00000110

The msb is 0, so this is a positive number, and we can inspect the remaining 7 bits and see that the value is 6.
Now here’s a counter-intuitive observation. How do we represent −6? You might think it would be like this:

10000110

That would be incorrect, however. What happens if we add 1 to that representation? We get 10000111,
which would be −7, not −5! This representation does not work correctly, even in simple arithmetic computations. Let’s take another tack. What number would represent −1? We can test our idea by adding 1 to −1. We should get 0 as a result. How about this for negative 1:

11111111

That actually works. If we add 1 to that number, we get all zeros in the sum (and we discard the final carry).

REAL NUMBER FORMATS
Numbers containing fractions are more difficult to represent. Real numbers consist of a mantissa and
an exponent. Computer designers decide how to allocate the bits of the computer word so that some can be
used for the mantissa and some for the exponent. In addition, the mantissa can be positive or negative, and the
exponent can be positive or negative. You might imagine that different designers could create different definitions for real number formats. A larger mantissa will provide greater precision; a larger exponent will provide for larger and smaller magnitudes (scale). As recently as the 1980s, different computer manufacturers used different representations, and those differences made it difficult to move data between computers, and difficult to move (“port”) programs from one make of computer to another.

Since then, the IEEE has created a standard for binary floating-point number representation using 32 and
64 bits. The 32-bit format looks like this:
SEEEEEEEEmmmmmmmmmmmmmmmmmmmmmmm
The msb is the sign of the number, the 8-bit field is the exponent of 2, and the 23-bit field is the mantissa. The sign of the exponent is incorporated into the exponent field, but the IEEE standard does not use simple two’s complement for representing a negative exponent. For technical reasons, which we touch on below, it uses a different approach. How would we represent 8.5? First we convert 8.5 to binary, and for the first time we will show a binary fractional value:
1000.1
To the left of the binary point (analogous to the decimal point we’re familiar with) we have 8. To the right
of the binary point, we have 1/2. Just as the first place to the right of the decimal point in base 10 is a tenth, the first place to the right of the binary point in base 2 is a half.

In a manner akin to using “scientific notation” in base 10, we normalize binary 1000.1 by moving the
binary point left until we have only the 1 at the left, and then adding a factor of 2 with an exponent:
1.0001 * 23 From this form we can recognize the exponent in base 2, which in this case is 3, and the mantissa, which is 0001.
The IEEE 32-bit specification uses a “bias” of 127 on the exponent (this is a way of doing without
a separate sign bit for the exponent, and making comparisons of exponents easier than would be the case with
two’s complements—trust us, or read about it on-line), which means that the exponent field will have the binary value of 127 + 3, or 130. After all this, the binary representation of 8.5 is:

01000001000010000000000000000000

The sign bit is 0 (positive), the exponent field has the value 130 (10000010), and the mantissa field has the
value 0001 (and lots of following zeros).
As you can imagine, computing with real numbers requires the computer to do more work than
computing with integers. The mantissa and exponent fields must be considered appropriately in all mathematical operations. In fact, some computers have special floating-point processor hardware to speed such calculations.

CHARACTER FORMATS
We think of computing as work with numbers, but in fact most computing operates on character data rather
than numeric data—names, addresses, order numbers, gender, birthdates, etc. are usually, or often, represented by strings of characters rather than numeric values.
Characters are mapped to integer numbers. There have been many character–to-integer mappings
over the years. IBM invented a mapping called binary coded decimal (BCD), and later extended
BCD interchange coded (EBCDIC), which became a de facto standard with IBM’s early success in the
computer market.

The American standard American Standard Code for Information Interchange (ASCII) was defined in the
1960s and became the choice of most computer vendors, aside from IBM. Today Unicode is becoming popular  because it is backwards compatible with ASCII and allows the encoding of more complex alphabets, such as those used for Russian, Chinese, and other languages. We will use ASCII to illustrate the idea of character encoding, since it is still widely used, and it is simpler to describe than Unicode.
In ASCII each character is assigned a 7-bit integer value. For instance, ‘A’ = 65 (1000001), ‘B’ = 66
(1000010), ‘C’ = 67 (1000011), etc. The 8th bit in a character byte is intended to be used as a parity bit, which allows for a simple error detection scheme.

If parity is used, the 8th or parity bit is used to force the sum of the bits in the character to be an
even number (even parity) or an odd number (odd parity). Thus, the 8 bits for the character ‘B’ could take
these forms:

01000010 even parity
11000010 odd parity
01000010 no parity


If parity is being used, and a noisy signal causes one of the bits of the character to be misinterpreted, the
communication device will see that the parity of the character no longer checks. The data transfer can then be
retried, or an error announced. This topic is more properly discussed under the heading of data communications, but since we had to mention the 8th bit of the ASCII code, we didn’t want you to be left completely in the dark about what parity bits and parity checking are.
The lowercase characters are assigned a different set of numbers: ‘a’ = 97 (1100001), ‘b’ = 98 (1100010),
‘c’ = 99 (1100011), etc. In addition, many special characters are defined: ‘$’ = 36 (0100100), ‘+’ = 43
(0101011), ‘>’ = 62 (01111110), etc.

A number of “control characters” are also defined in ASCII. Control characters do not print, but can be used
in streams of characters to control devices. For example, ‘line feed’ = 10 (0001010), ‘tab’ = 11 (0001011),
‘backspace’ = 8 (0001000), etc.
For output, to send the string “Dog” followed by a linefeed, the following sequence of bytes would be sent
(the msb is the parity bit, and in this example parity is being ignored, and the parity bit set to 0):

01000100 01101111 01100111 00001010
D          o        g      lf (line feed)


Likewise for input, if a program is reading from a keyboard, the keyboard will send a sequence of integer
values that correspond to the letters being typed.
How does a program know whether to interpret a series of bits as an integer, a character, or a floating-point
number? Bits are bits, and there is no label on a memory location saying this location holds an integer/character/real. The answer is that the program will interpret the bits based on its expectation.
If the program expects to find a character, it will try to interpret the bits as a character. If the bit pattern
doesn’t make sense as a character encoding, either the program will fail or an error message will result.
Likewise, if the program expects an integer, it will interpret the bit pattern as an integer, even if the bit pattern originally encoded a character. It is incumbent on the programmer to be sure that the program’s handling of data is appropriate.


CPU/ALU
The CPU is the part of the computer one thinks of first when describing the components of a computer. The
repetitive cycle of the von Neumann computer is to a) load an instruction from memory into the CPU, and b)
decode and execute the instruction. Executing the instruction may include performing arithmetic or logical
operations, and also loading or storing data in memory. When the instruction execution is complete, the computer fetches the next instruction from memory, and executes that instruction. The cycle continues indefinitely, unless the instruction fetched turns out to be a HALT instruction. The CPU is usually described as consisting of a control unit and an arithmetic and logic unit (ALU). The control unit is responsible for maintaining the steady cycle of fetch-and-execute, and the ALU provides the hardware for arithmetic operations, value comparisons (greater than, less than, equal to), and logical functions (AND, OR, NOT, etc.). Both the control unit and the ALU include special, very high-performance memory cells called registers.
Registers are intimately connected to the wiring of the control unit and the ALU; some have a special purpose,
and some are general purpose. One special-purpose register is the program counter (PC).

The PC keeps track of the address of the instruction to execute next. When the control unit begins
a fetch–execute cycle, the control unit moves the instruction stored at the address saved in the PC to another
special register called the instruction register (IR). When such a fetch of the next instruction occurs, the control unit automatically increments the PC, so that the PC now “points” to the next instruction in sequence. The control unit then decodes the instruction in the IR, and executes the instruction. When execution is complete, the control unit fetches the instruction to which the PC now points, and the cycle continues. Other registers of the ALU are general purpose. General-purpose registers are used to store data close to the processor, where the processor can access the information even more quickly than when the value is in memory. Different computers have different numbers of registers, and the size of the registers will be congruent with the word size of the computer (16-bit, 32-bit, etc.).

The number of registers, and the nature of the special-purpose registers, comprise an important part of the
computer architecture. In the case of the Intel x86 architecture, there are four 32-bit general-purpose registers (EAX, EBX, ECX, and EDX), and four 32-bit registers devoted to address calculations and storage (ESP, EBP,ESI, and EDI). One could say much more about registers in the Intel x86 architecture, but they are now too complex to describe completely, as the architecture has been cleverly expanded while maintaining complete compatibility with earlier designs.

MEMORY
Computer memory is organized into addressable units, each of which stores multiple bits. In the early days
of computing (meaning up until the 1970s), there was no agreement on the size of a memory unit. Different
computers used different size memory “cells.” The memory cell size was also referred to as the computer’s
“word size.” The computer word was the basic unit of memory storage. The word size of the IBM 704 was
36 bits; the word size of the Digital Equipment PDP-1 was 18 bits; the word size of the Apollo Guidance
Computer was 15 bits; the word size of the Saturn Launch Vehicle Computer was 26 bits; the word size of the CDC 6400 was 60 bits. These machines existed during the 1950s, 1960s, and 1970s.

The IBM 360 family, starting in the mid-1960s, introduced the idea of a standard memory cell of 8 bits called
the “byte.” Since then, computer manufacturers have come to advertise memory size as a count of standard bytes. The idea of the computer word size is still with us, as it represents the number of bits the computer usually processes at one time. The idea of word size has become less crystalline, however, because newer computer designs operate on units of data of different sizes. The Intel Pentium processes 32 or 64 bits at a time, but it is also backwards compatible to the Intel 8086 processor of 1980 vintage, which had a word size of 16 bits. To this day, the Intel family of processors calls 16 bits a word, and in any case each byte has its own address in memory. Today the byte is the measure of computer memory, and most computers, regardless of word size, offer “byte addressability.”

Byte addressability means that each byte has a unique memory address. Even though the computer may be a 32-bit machine, each byte in the 4-byte computer word (32 bits) can be addressed uniquely, and its value can be read or updated. Memory is used to store program instructions and data. The basic operations on memory are store and retrieve. Storing is also referred to as “writing.” Retrieval is also referred to as “fetching,” “loading,” or “reading.” Fetch is an obvious synonym for retrieve, but what about load? By loading one means loading a register in the CPU from memory, which from the point of view of the memory system is retrieval.
There are at least two registers associated with the memory control circuitry to facilitate storage and
retrieval. These are the memory address register (MAR) and the memory data register (MDR). When writing to memory, the CPU first transfers the value to be written to the MDR, and the address of the location to be used to the MAR. At the next memory access cycle, the value in MDR will be copied into the location identified by the contents of the MAR.

When retrieving from memory, the CPU first stores the address to read in the MAR. When the read occurs
on the next memory access cycle, the value in that location is copied into the MDR. From the MDR in the
memory controller, the data value can be transferred to one of the CPU registers or elsewhere.
Main computer memory, such as we have in our PCs, is referred to as random access memory (RAM). That means we can access any element of memory at will, and with roughly the same speed, regardless of address.
By contrast, consider information and data stored on a magnetic tape. Magnetic tape is a kind of memory
(we can store data on a magnetic tape), but magnetic tape is definitely not random access. Magnetic tape is
serial access. We can read the contents of memory location 4000 only after having read and passed over all those locations that come before.

In addition to main memory, which has been the focus of our discussion so far, computer designers also
usually provide small, high-performance memories, called cache memories, that are located close to the CPU.
Cache memory may even be located on the same electronic chip as the CPU. Cache is the French word for “hiding place.” Cache memory is used to hold a copy of the contents of a small number of main memory locations. This turns out to be very useful, because program execution demonstrates a property called “locality of reference.” By locality of reference, we mean that for relatively long periods of time, the execution of a program will reference and affect a small number of memory locations. Accesses to memory are not random. Rather, for one period of time the program will read and write one part of memory, for example, an array of numbers, and for another period of time the program will store and retrieve from a different part of memory, for example, a record from a data base.

When the computer copies the contents of main memory currently being accessed to cache memory, the
CPU can avoid waiting for access to slower main memory, and access the cache instead. Since access times for cache memory are typically 5 to 10 times faster than access times for main memory, this tactic has proven very generally effective. Almost all computers built since 1980 have incorporated one or more cache memories in their design. The management of cache memory is challenging, because the system must keep the contents of the cache memory synchronized with the contents of main memory. Engineers call this cache “coherency.” As long as the program is reading from memory, but not writing, there is no problem. When the program writes to memory, however, both main memory and cache must be updated. Also, when the program begins to access a new area of memory, one for which the contents are not already reflected in the cache, the cache management algorithm will typically bring to the cache the needed word as well as a number of following words from memory. At the same time, the cache management algorithm must decide which contents of the current cache to discard. As complex as this management is, use of cache memory usually makes a very noticeable difference in performance, with speedup of average memory access often in the neighborhood of 50 percent.


INPUT AND OUTPUT (I/O)
Obviously, most data on which we compute resides outside of the computer itself; perhaps it’s originally
on paper receipts, or in lists on paper. And when computation is complete, we want to see the results outside of the computer’s own memory; on a display, or on paper, for example. While there is variation in the way CPUs, memory, and caches are implemented, there is even more variation in the ways in which I/O is implemented. First of all, there are many different I/O devices. Some are for interacting with humans, such as keyboards, mice, touch screens, and displays. Others are for use by the computer directly,such as disk drives, tape drives, and network interfaces. I/O devices also vary enormously in speed, and they’re all much slower than the CPU and main memory. A typist working at 40 words per minute is going pretty fast, and striking about 200 keys a minute, or one key every .3 seconds. Let’s compute how many instructions a 1 GHz personal computer might execute during that .3 seconds.

Some instructions execute on one clock cycle, but many require more than one. Let’s assume that an average
instruction requires 3 cycles. If that’s the case, then the 1 GHz computer executes 330 million instructions per
second, or 99 million instructions in the time it takes to type one letter. To get a feel for the difference in speed between the keyboard and the CPU, imagine that the typist walks one foot in the time it takes to type one letter, and imagine also that the computer travels one foot in the time it takes to execute an instruction. If that were the case, then in the time the typist walks a foot, the computer travels 18,750 miles, or about three quarters of the way around the earth! In the early days of computing, the CPU would wait for each character to be typed. A machine instruction would ready the keyboard interface to accept a character from the keyboard, and the next instruction would test to see if the character had been received. If the character had not yet been received, the program would simply loop, testing (“polling”) to see if the character had been received. This is called “programmed I/O with polling,” or “busy waiting.” It’s a simple but prohibitively costly approach. Today computers use an “interrupt system” to avoid busy waiting, and the operating system supervises all I/O. Each I/O device is connected to the computer via an “I/O controller.”

An I/O controller is a small,special-purpose computer within the computer. It has a few well-defined functions, and a small amount of memory of its own with which to “buffer” (store temporarily) the information being sent or received. When a program requires output, for example, the operating system moves the data to the buffer memory of the I/O controller for the device, and commands the I/O controller to start the operation. From that point on, the main computer is free to do other work, while the I/O controller handles the details and timing of moving the data to its destination. When the data transfer is complete, the I/O controller creates an “interrupt” which notifies the main computer that the transfer is now finished. The operating system responds to the interrupt in an appropriate way, perhaps by starting another output operation to the same device (think of multiple lines going to a printer). When a program requires input, the operating system will suspend the execution of the requesting program and command the I/O controller of the device to start reading the necessary data. The operating system will then transfer control of the CPU to a different program, which will execute while the first program is waiting for its input. When the requested data become available, the I/O controller for the device will generate an interrupt.

The operating system will respond by suspending the execution of the second program, moving the data from
the buffer on the I/O controller to the program that requested the data initially, and restarting the first program. The interrupt system is used for all data transfers today. While that is so, there are also some useful
categorizations of device types. Devices may be categorized as character devices or block devices. A keyboard is a character device, and a disk is a block device. A character device transfers a character (8 bits) at a time, and a block device transfers a buffer, or set of data, at a time. Other examples of character devices include telephone modems and simple terminals. Other examples of block devices include CD-ROM drives, magnetic tape drives, network interfaces, sound interfaces, and blocks of memory supporting devices like displays. Character devices interrupt on each character (8 bits) transferred,and block devices interrupt only when the entire block has been transferred. Modern computer designs usually include a facility called direct memory access (DMA) for use with block devices. The DMA controller is its own special computer with access to memory, and it shares access to main memory with the CPU. DMA moves data directly between the buffer in the I/O controller and main memory,and it does so without requiring any service from the CPU.

Block devices can be used without DMA and, when they are used that way, the practice is called “programmed I/O with interrupts.” With programmed I/O, the block device interrupts when the buffer is ready,but the operating system must still use the CPU to move the data between the buffer on the I/O controller andthe destination in main memory. When DMA is used with a block device, the data are transferred directly between the device and main memory, without requiring assistance from the operating system and the CPU. The operating system starts the transfer by specifying an address at which to start and the count of bytes to transfer. The CPU is then free to continue computing while the data are moved in or out. This is a further improvement in system efficiency, and today DMA is almost universally used for disk and other block transfers.

No comments:

Post a Comment