A Transistor is a switch that can be ON or OFF.
an open transistor, therefore without contact between the conductors, is not crossed by electricity, provides the binary number = 0
while a closed transistor, then with contact between conductors, is traversed by current, provides the binary number = 1
The Intel pentium4 microchip has over 43,000,000 transistors, AMD athlon has at least 37,000,000.
The Oscillator, ie the Clock, adjusts the working speed of the computer, more beats = greater speed, measured in megahertz,
i.e millions of beats per second.
the current passing through a transistor can be used to control another transistor. It turns the switch on ON or OFF
to change the status of the second transistor. This configuration is called PORT.
the logic port NOT is composed of a single transistor that takes an Input from the Clock and an Input from another transistor.
this Port produces only one output, which is always the opposite of the input coming from the transistor
different combinations of NOT ports create other logical ports
using different combinations of logical ports , the microchip executes the Addition operation from which
all other mathematical operations descend.
the addition is executed through structures called Half-Adder and Full-Adder
a half-adder is made by a port XOR and a port AND which receives both the same Bit in input
2d + 3d = 10b + 11b
half-adder processes the digits at right using the portd XOR and AND
the resutl of XOR is the digit at right of the final result
the result of AND is the input of ports XOR and AND of the full-adder
also, the full-adder processes the digits at left of thr bits 10 and 10
the results are the inputs of other ports AND and XOR
the results are processed with the results of the half-adder
one of these results is the input of OR
all the results gives the binary number 101 that is 5 on decimal numbers
MC includes (CPU=MP) + ((cache level 2 (greater and faster than cache level 1 included in MP))
BIOS = SW normally contained in ROM or other non-volatile memory, beetween HW and SW
Clock - clock frequency - is the number of switches 0 and 1 that circuits in MP, normally
an instruction needs of more clock cycles.
Quartz Oscillator which is inside the cpu=mp and can be controlled via BIOS
CMOS complementary metal-oxide semiconductor, microchips that maintains hardware and
configuration settings by the power of a buffer battery
In big endian, you store the most significant byte in the smallest address.
In little endian, you store the least significant byte in the smallest address.
Real Mode can use 1MB of segmented memory address only. all CPU so started (initially the bus had 20 bit of addresses
Protected Mode, CPU begins executing instructions in real mode, then can use the virtual memory and the multitasking
Buffer - transit memory or even intermediary memory area used to compensate for differences
in speed in the transfer or transmission of data, or to speed up execution of some operations
such as operations on character strings.
cache level 1 (I-cache = instruction cache + D-cahe = data cache) is inside MP
cache level 2 , greater and faster than L1, is in MC but extern and near at MP
BIU bus interface unit , input of info in processor, duplicates infos and send to
cache L1 (I-cache and D-cache) and to Cache L2
Fetch decode unit , fetches instructions from I-Cache ,
BTB branch target Buffer, compares every instruction with a record of another buffer
to verify if this instruction already is already used
Program Counter (physical address) = CS:IP = code segment*16 + Instruction Pointer (EA effective address)
BTB Branch Target Buffer
In the microprocessor architecture the target predictor branch is a functional unit dedicated to prediction
of the arrival address of a conditioned branch or an unconditional jump before the instruction was
loaded from the instruction cache. The instruction cache is a specialized cache.
The target predictor branch should not be confused with the branch prediction unit as this unit looks for
predicting whether the branch will be followed or not.
In many parallel processors the instruction cache has a relatively high latency and therefore the identification of the address
of arrival of the jump represents a bottleneck. The procedure for its identification carries out the following
The instruction cache provides a block of instructions
The instruction block is analyzed in the branch search
The branch prediction unit identifies the first jump that should be performed
The jump destination address is calculated
Instructions are loaded from that address
In many processors these operations require two clock cycles and therefore the processor loses a complete clock cycle
to load the new instructions after each predicted jump. Given that on average a predicted jump is present every ten
instructions performed the loss of performance can be significant. Some processors have high latency
cache instructions and therefore the performance degradation is even higher. To reduce the performance loss many
processors include a target predictor branch unit, given the address of the jump this unit predicts the destination of the
jump. An improvement of the idea predicts the beginning of sequential instructions starting from the address of the previous block
of sequential instructions.
Prediction reduces the operations to be performed which become:
Hash of the address of the first sequential instruction
Loading from the predictor of the address of the jumps present in the block of instructions in execution
Selection of the arrival address of the first predicted jump
The predictor occupies about 5-10% of the instruction cache space but the loading of the instructions after the jump is
speeded up considerably. If it were not fast enough you could parallelize the prediction of the addresses of the
jumps and prediction of jumps.
The prediction percentage of a jump is around 93% of successes.
PIPELINE is a set of data processing elements connected in series,
where the output of one element is the input of the next one.
The elements of a pipeline are often executed in parallel or
in time-sliced mode.
Some amount of buffer storage is often inserted between
elements. Pipelines include:
Instruction pipelines, such as the classic RISC pipeline,
which are used in central processing units (CPUs) and
other microprocessors to allow overlapping execution of
multiple instructions with the same circuitry.
The circuitry is usually divided by stages and each
stage processes a specific part of one instruction at a time,
passing the partial results to the next stage.
Examples of stages are instruction decode,
arithmetic/logic and register fetch. They are related to
the technologies of superscalar execution, operand forwarding,
speculative execution and out-of-order execution.
address of 1st element of SP = FFFEh
element size = W = 2B
Stack overflow (overwritten memory locations of program, but not of the OS) indefinite filling of Stack
Heap area = Dynamic Memory = optional zone, where during runtime, the programmer, through instructions,
temporarily allocates some memory for variables whose dimension can only be verified
during execution (eg the size of an input string). Its size is not predetermined and
can also be allocated and deallocated several times during runtime.
The management of the area of heap is obtained through code area instructions.
Stack area = memory zone handled automatically by Compilers. By programmer instructions
the compilers manage the area of stack transparently to the programmer,
allocating and deallocating the local variables and parameters passed to the procedures.
Only in the Programming in Assembly it's possible directly handle the stack area
with appropriate instructions.
AX BX CX DX to store. Arithmetical registers
SP BP DI SI to access to memory
to address memory spaces
memory = Segments
address = Segment:Offset
the segment = 16 bit + 0000 at right that multiply by 16)
eg : [CS] = 123Ah , [IP] = 341Bh
SEG:OFF = 123A0:341B = 123a0+341b=157BBh = physical address
from the top to bottom:
FLAGS are indicator bits (grouped into a status log named PSW register)
normally read by conditioned jump instructions
OF: overflow indicator. It is setted at 1 when the result of one
addition or subtraction (with sign ) causes overflow
SF: sign indicator. It is setted at 1 when the result of a logic-arithmetic
operation is a negative number (= MSB of the result)
ZF: zero indicator. It is setted at 1 when the result of a logic-arithmetic
operation is = zero
CF: setted at 1 when a logic-arithmetic operation gives a rest
(indicates overflow in case of numbers
SIMD single instruction stream multiple data stream :
architecture in which different processing units processes multiple data streams in parallel.
This is used by vector processors or by processors that work in parallel.
SIMD is often used by supercomputers and with some variants even in modern microprocessors.
Threads refer to the highest level of code executed
by a processor, so with many threads, your CPU can
handle several tasks at the same time. All CPUs have
active threads, and every process performed on your
computer has at least a single thread.
Unlike a microprocessor, the MICROCONTROLLER combines all the elements inside a single small container,
and in theory it does not need other external components to work.
Everything is in fact contained in a single chip, including the memory for the program, the RAM memory,
the clock oscillator, the reset circuit and the peripherals.
The calculation capabilities of a microcontroller are extremely small. For example the
RAM memory is made up of a few hundred cells, and is usually not expandable.
Among the typical applications of a
microcontroller there may be the burglar alarms, the measuring instruments,
those for brightness adjustment, battery chargers and transmitters / receivers.
For these reasons, microcontrollers are designed to run a small set of
specific functions, for example in the case of a Digital Signal Processor, which carries out a
small set of signal processing functions and is widely used for
adjust the brakes on all four wheels, or adjust the air conditioning in the car.
16-bit Processors and Segmentation (1978)
The IA-32 architecture family was preceded by 16-bit processors, the 8086 and 8088.
The 8086 has 16-bit registers
and a 16-bit external data bus, with 20-bit addressing giving a 1-MByte address space.
The 8088 is similar to
the 8086 except it has an 8-bit external data bus.
The 8086/8088 introduced segmentation to the IA-32 architecture. With segmentation,
a 16-bit segment register
contains a pointer to a memory segment of up to 64 KBytes. Using four segment
registers at a time, 8086/8088
processors are able to address up to 256 KBytes without switching between segments.
The 20-bit addresses that
can be formed using a segment register and an additional 16-bit pointer provide
a total address range of 1 MByte.
The Intel 286 Processor (1982)
The Intel 286 processor introduced protected mode operation into the IA-32
architecture. Protected mode uses the
segment register content as selectors or pointers into descriptor tables.
Descriptors provide 24-bit base addresses
with a physical memory size of up to 16 MBytes, support for virtual memory
management on a segment swapping
basis, and a number of protection mechanisms. These mechanisms include:
• Segment limit checking
• Read-only and execute-only segment options
• Four privilege levels
The Intel386 Processor (1985)
The Intel386 processor was the first 32-bit processor in the IA-32 architecture family.
It introduced 32-bit registers
for use both to hold operands and for addressing. The lower half of each 32-bit Intel386
register retains the properties
of the 16-bit registers of earlier generations, permitting backward compatibility.
The processor also provides
a virtual-8086 mode that allows for even greater efficiency when executing programs
created for 8086/8088
In addition, the Intel386 processor has support for:
• A 32-bit address bus that supports up to 4-GBytes of physical memory
• A segmented-memory model and a flat memory model
• Paging, with a fixed 4-KByte page size providing a method for virtual memory management
• Support for parallel stages
The Intel486 Processor (1989)
The Intel486 processor added more parallel execution capability by expanding the
Intel386 processor’s instruction
decode and execution units into five pipelined stages. Each stage operates in parallel
with the others on up to
five instructions in different stages of execution.
In addition, the processor added:
• An 8-KByte on-chip first-level cache that increased the percent of instructions
that could execute at the scalar
rate of one per clock
• An integrated x87 FPU
• Power saving and system management capabilities
The Intel Pentium Processor (1993)
The introduction of the Intel Pentium processor added a second execution pipeline to achieve superscalar performance
(two pipelines, known as u and v, together can execute two instructions per clock). The on-chip first-level
cache doubled, with 8 KBytes devoted to code and another 8 KBytes devoted to data. The data cache uses the MESI
protocol to support more efficient write-back cache in addition to the write-through cache previously used by the
Intel486 processor. Branch prediction with an on-chip branch table was added to increase performance in looping
In addition, the processor added:
• Extensions to make the virtual-8086 mode more efficient and allow for 4-MByte as well as 4-KByte pages
• Internal data paths of 128 and 256 bits add speed to internal data transfers
• Burstable external data bus was increased to 64 bits
• An APIC to support systems with multiple processors
• A dual processor mode to support glueless two processor systems
A subsequent stepping of the Pentium family introduced Intel MMX technology (the Pentium Processor with MMX
technology). Intel MMX technology uses the single-instruction, multiple-data (SIMD) execution model to perform
parallel computations on packed integer data contained in 64-bit registers.
See Section 2.2.7, “SIMD Instructions.”
2.1.6 The P6 Family of Processors (1995-1999)
The P6 family of processors was based on a superscalar microarchitecture that set new performance standards; see
also Section 2.2.1, “P6 Family Microarchitecture.” One of the goals in the design of the P6 family microarchitecture
was to exceed the performance of the Pentium processor significantly while using the same 0.6-micrometer, fourlayer,
metal BICMOS manufacturing process. Members of this family include the following:
• The Intel Pentium Pro processor is three-way superscalar. Using parallel processing techniques, the
processor is able on average to decode, dispatch, and complete execution of (retire) three instructions per
clock cycle. The Pentium Pro introduced the dynamic execution (micro-data flow analysis, out-of-order
execution, superior branch prediction, and speculative execution) in a superscalar implementation. The
processor was further enhanced by its caches. It has the same two on-chip 8-KByte 1st-Level caches as the
Pentium processor and an additional 256-KByte Level 2 cache in the same package as the processor.
• The Intel Pentium II processor added Intel MMX technology to the P6 family processors along with new
packaging and several hardware enhancements. The processor core is packaged in the single edge contact
cartridge (SECC). The Level l data and instruction caches were enlarged to 16 KBytes each, and Level 2 cache
sizes of 256 KBytes, 512 KBytes, and 1 MByte are supported. A half-frequency backside bus connects the Level
2 cache to the processor. Multiple low-power states such as AutoHALT, Stop-Grant, Sleep, and Deep Sleep are
supported to conserve power when idling.
• The Pentium II Xeon processor combined the premium characteristics of previous generations of Intel
processors. This includes: 4-way, 8-way (and up) scalability and a 2 MByte 2nd-Level cache running on a fullfrequency
INTEL 64 AND IA-32 ARCHITECTURES
• The Intel Celeron processor family focused on the value PC market segment. Its introduction offers an
integrated 128 KBytes of Level 2 cache and a plastic pin grid array (P.P.G.A.) form factor to lower system design
• The Intel Pentium III processor introduced the Streaming SIMD Extensions (SSE) to the IA-32 architecture.
SSE extensions expand the SIMD execution model introduced with the Intel MMX technology by providing a
new set of 128-bit registers and the ability to perform SIMD operations on packed single-precision floatingpoint
values. See Section 2.2.7, “SIMD Instructions.”
• The Pentium III Xeon processor extended the performance levels of the IA-32 processors with the
enhancement of a full-speed, on-die, and Advanced Transfer Cache.
The Intel Pentium 4 Processor Family (2000-2006)
The Intel Pentium 4 processor family is based on Intel NetBurst microarchitecture;
The Intel Pentium 4 processor introduced Streaming SIMD Extensions 2 (SSE2);
The Intel Pentium 4 processor 3.40 GHz, supporting Hyper-Threading Technology introduced Streaming
SIMD Extensions 3 (SSE3); see Section 2.2.7, “SIMD Instructions.”
Intel 64 architecture was introduced in the Intel Pentium 4 Processor Extreme Edition supporting Hyper-Threading
Technology and in the Intel Pentium 4 Processor 6xx and 5xx sequences.
Intel Virtualization Technology (Intel VT) was introduced in the Intel Pentium 4 processor 672 and 662.