the computer hardware can accomplish is the machine’s instruction set. Given the wide variety of computer
applications, and the sophistication of many applications, it can be surprising to learn how limited and primitive the instruction set of a computer is. Machine instructions include loading a CPU register from memory, storing the contents of a CPU register in memory, jumping to a different part of the program, shifting the bits of a computer word left or right, comparing two values, adding the values in two registers, performing a logical operation
(e.g., ANDing two conditions), etc. For the most part, machine instructions provide only very basic computing facilities. A computer’s assembly language corresponds directly to its instruction set; there is one assembly language mnemonic for each machine instruction. Unless you program in assembly language, you will have very little visibility of the machine instruction set. However, differences in instruction sets explain why some programs run on some machines but not others. Unless two computers share the same instruction set, they will not be able to execute the same set of machine instructions.
The IBM 360 family of computers was the first example of a set of computers which differed in implementation, cost, and capacity, but which shared a common machine instruction set. This allowed programs written for one IBM 360 model to run on other models of the family, and it allowed customers to start with a smaller model, and later move up to a larger model without having to reinvest in programming. At the time, this capability was a breakthrough.
Today, most programming is done in higher-level languages, rather than assembly language. When you program in a higher-level language, you write statements in the syntax of your programming language (e.g., Java, C, Python), and the language processor translates your code into the correct set of machine instructions to execute your intent. If you want to run the same program on a different computer with a different instruction set, you can often simply supply your code to the appropriate language processor on the new computer. Your source code may not change, but the translation of your code into machine instructions will be different because the computer instruction sets are different. The language processor has the responsibility to translate standard higher-level programming syntax into the correct machine instruction bit patterns.
Machine instructions are represented as patterns of ones and zeros in a computer word, just as numbers and
characters are. Some of the bits in the word are set aside to provide the “op-code,” or operation to perform.
Examples of op-codes are ADD, Jump, Compare, and AND. Other bits in the instruction word specify the values to operate on, the “operands.” An operand might be a register, a memory location, or a value already in the instruction word operand field.
An example machine instruction is the following ADD instruction for the Intel x86 computers. The Intel x86
instruction set is an unusually complex one to describe, because Intel has expanded the instruction set as it has evolved the computer family. It would have been easier to create new instruction sets when computing evolved from 16-bit processing in 1978, to 32-bit processing in 1986, to 64-bit processing in 2007. Instead, the Intel engineers very cleverly maintained compatibility with earlier instruction sets, while they added advanced capabilities. This allowed old programs to continue to run on new computers, and that greatly eased upgrades among PC users. The result, however effective technically and commercially, is an instruction set that is somewhat complex to describe. Here is the bit pattern, broken into bytes for readability, which says, “Add 40 to the contents of the DX register:”
00000001 11000010 00000000 00101000
The first byte is the op-code for ADD immediate (meaning the number to add resides in the instruction
word itself). The second byte says that the destination operand is a register, and in particular, the DX register.
The third and fourth bytes together comprise the number to add; if you evaluate the binary value of those bits,
you will see that the value is 40. To look at the content of a computer word, you cannot tell whether the word contains an instruction or a piece of data. Fetched as an instruction, the bit pattern above means add 40 to the DX register. Retrieved as an integer, the bit pattern means 29,491,240. In the Intel architecture, instructions (“code”) are stored in a separate section of memory from data. When the computer fetches the next instruction, it does so from the code section of memory. This mechanism prevents a type of error that was common with earlier, simpler computer architectures,the accidental execution of data, as if the data were instructions. Here is an example JMP instruction. This says, “Set the program counter (transfer control) to address 20,476 in the code:”
11101001 11111100 01001111
The first byte is the op-code for JMP direct (meaning the address provided is where we want to go, not a
memory location holding the address to which we want to go). The second byte is the low-order byte for the
address to which to jump. The third byte is the high-order byte for the address! How odd is that, you may think? To get the proper address, we have to take the two bytes and reorder them, like this:
01001111 11111100
This “peculiarity” is due to the fact that the Intel processor line is historically “little endian.” That is, it
stores the least significant byte of a multiple byte value at the lower (first) address. So, the first byte of a 2-byte address contains the low-order 8 bits, and the second byte contains the high-order 8 bits.
An advantage of the little endian design is evident with the JMP instruction because the “short” version of
the JMP instruction takes only an 8-bit (1-byte) operand, which is naturally the low-order byte (the only byte). So the JMP direct with a 2-byte operand simply adds the high-order byte to the low-order byte. To say this another way, the value of the jump destination, whether 8 bits or 16 bits, can be read starting at the same address. Other computers, such as the Sun SPARC, the PowerPC, the IBM 370 and the MIPS, are “big endian,” meaning that the most significant byte is stored first. Some argue that big endian form is better because it reads more easily when humans look at the bit pattern, because human speech is big endian (we say, “four hundred, forty,” not “forty and four hundred”), and because the order of bits from least significant to most significant is the same within a byte as the ordering of the bytes themselves. There is, in fact, no performance reason to prefer big endian or little endian formats. The formats are a product of history. Today, big endian order is the standard for network data transfers, but only because the original TCP/IP protocols were developed on big endian machines.
Here is a representative sampling of machine instructions from the Intel x86 machine instruction set. Most
x86 instructions specify a “source” and a “destination,” where each can in general be a memory location or a
register. This list does not include every instruction; for instance, there are numerous variations of the jump
instruction, but they all transfer control from one point to another. This list does provide a comprehensive look at all the types of instructions:
MOV | Move "source" to "destination," leaving source unchanged |
ADD | Add source to destination, and put sum in destination |
SUB | Substract source from destination, storing result in destination |
DIV | divide accumulator by soucre; quotient and remainder stored separately |
IMUL | signed multiply |
DEC | decrement; subtract 1 from destination |
INC | increment; add 1 to destination |
AND | logical AND of source and destination, putting result in destination |
OR | inclusive OR of source and destination, putting result in destination |
XOR | exclusive OR of source and destination, with result in destination |
NOT | logical NOT, inverting the bits of destination |
IN | input data to the accumulator from an I/O port |
OUT | output data to port |
JMP | unconditional jump to destination |
JG | jump if greater; jump based on compare flag settings |
JZ | jump if zero; jump if the zero flag is set |
BSF | find the first bit set to 1, and put index to that bit in destination |
BSWAP | byte swap; reverses the order of bytes in a 32-bit word |
BT | bit test; checks to see if the bit indexed by source is set |
CALL | procedure call; performs housekeeping and transfers to a procedure |
RET | performs housekeeping for return from procedure |
CLC | clear the carry flag |
CMP | compare source and destination, setting flags for conditions |
HLT | halt the CPU |
INT | interrupt; create a software interrupt |
LMSW | load machine status word |
LOOP | loop until counter register becomes zero |
NEG | negate as two’s complement |
POP | transfer data from the stack to destination |
PUSH | transfer data from source to stack |
ROL | rotate bits left |
ROR | rotate bits right |
SAL | shift bits left, filling right bits with 0 |
SAR | shift bits right, filling left bits with the value of the sign bit |
SHR | shift bits right, filling left bits with 0 |
Other computer families will have machine instructions that differ in detail, due to the differences in the
designs of the computers (number of registers, word size, etc.), but they all do the same, simple, basic things.
The instructions manipulate the bits of the words mathematically and logically. In general, instructions fall into these categories: data transfer, input/output, arithmetic operations, logical operations, control transfer, and comparison. Upon such simple functions all else is built.
No comments:
Post a Comment