Successor of Itanium processor.
Major revision:
256 application registers:
64 predicate registers (PR0 - PR63): contain predicate test (compare) results, for conditional execution of instructions,
first 16 registers static: available to all programs,
rest rotating: can be renamed to accelerate loops.
8 branch registers (BR0 - BR7).
128 application registers (AR0 - AR127): special-purpose data and control registers.
4 Privilege Levels (PL): 0-3.
Current Privilege Level (CPL) in PSR.cpl (Processor Status Register, PSR).
Bi-endian memory access: controlled by UM.be bit (User Mask, UM).
Memory mapped I/O.
Processor virtualization: enabled by PSR.vm bit, managed by PAL.
Virtual Machine Monitor (VMM): managing and virtualizing system resources, creating a virtual environment.
From Montecito: Intel VT for Itanium (VT-i),
Virtual Processor Descriptor (VPD): description of resources of a single virtual processor,
rest of Virtual Processor State (VPS) maintained by VMM.
IA-32 compatibility mode: IA-32 System Environment, i.e. Pentium III.
16 bit Real Mode, 16 bit VM86, 16/32 bit Protected Mode, memory segmentation.
Multimedia instruction sets: MMX, SSE, SSE2 (from Intel Itanium 2 9000 series processor).
Switch between Itanium and IA-32 instruction sets using JMPE, br.ia, and rtfi.
All interruptions handled by Itanium instruction set code.
Current execution mode in PSR.is.
From Madison, IA-32 support implemented in software, as part of operating system (IA-32 Execution Layer, EL),
IA-32 EL provided by Intel for Linux and Windows,
erratum: segmentation not supported in IA-32 EL versions 4.3, 4.4, 5.3, and 6.5,
erratum: 16 bit application mode not supported in IA-32 EL versions 4.3, 4.4, 5.3, and 6.5,
note: CPUID returns only manufacturer and family of emulated processor model.
PA-RISC supported through Aries emulator.
Operating system: supported through Extensible Firmware Interface (EFI).
System Abstraction Layer (SAL): firmware providing platform initialization, configuration, and test, operating system boot, run-time functionality (i.e. BIOS (Basic Input Output System), Machine Checks, and Platform Management Interruptions (PMI, successor IA-32 System Management Mode (SMM))).
Processor Abstraction Layer (PAL): firmware providing processor specific Machine Checks, initialization, PMI, power management, configuration, and error recovery.
Developer's Interface Guide for IA-64 Servers (DIG64): design guidelines for building blocks and interfaces of IA-64 systems, providing an interoperable and stable baseline hardware interface for software developers.
On-die L1 cache (Harvard architecture):
On-die, unified L2 cache:
256 kbyte,
8-way set-associative, 128 byte line size,
write back, write-allocate,
non-blocking, out-of-order,
5 cycles minimum latency for integers, 6 cycles minimum latency for FPs,
16 byte banks,
32 Gbyte/s max. reading speed.
Cache coherency through MESI protocol.
From Itanium 2 9000 series processors: on-die L2 cache (Harvard architecture):
Cache coherency through MESI protocol.
8-way set-associative, 128 byte line size,
7 cycle latency,
8-way set-associative, 128 byte line size,
write-back, write-allocate,
non-blocking, out-of-order,
5 cycles minimum latency for integers, 6 cycles minimum latency for FPs,
16 byte banks,
32 Gbyte/s max. reading speed.
Intel Cache Safe Technology: protection of data and tags: double bit detection, single bit correction (ECC, Error-Correcting Code).
On-die, unified L3 cache:
up to 2x 12 = 24 Mbyte,
McKinley and Madison: 4-way set-associative per Mbyte, Madison 9M: 2-way set-associative per Mbyte; 128 byte line size,
fully pipelined, non-blocking,
McKinley: 12 cycles minimum latency for integers, 13 cycles minimum latency for FPs; Madison and Madison 9M: 14 cycles minimum latency for integers, 15 cycles minimum latency for FPs; Montecito: 14 cycles latency,
bandwidth to core 32 bytes per core cycle (256 bit bus to core),
6.2 Gbyte/s max. traffic speed to memory,
providing data to core at up to 48 Gbyte/s.
Cache coherency through MESI protocol,
Intel Cache Safe Technology: protection of data and tags: double bit detection, single bit correction (ECC, Error-Correcting Code).
Translation Look-aside Buffer (TLB) and Virtual Hash Page Table (VHPT):
Hardware Page Walker (HPW): loads VHPT from L2 cache / L3 cache / memory at TLB misses.
64 entry, fully associative,
only page size of 4 kbyte supported,
2 cycles latency,
128 entry, fully associative,
page size 4 kbyte - 256 Mbyte supported.
32 entry, fully associative,
2 cycles penalty at miss,
only page size of 4 kbyte supported,
128 entry, fully associative,
page size 4 kbyte - 4 Gbyte supported.
Advanced Load Address Table (ALAT): between L1 data cache (L1D) and DTLB, keeps track of speculative data loads,
32 entry, fully associative.
Double pipeline: 8 stage in-order, 6 instructions wide.
Split issue dispersal: three instructions (16 bytes) per bundle.
Scoreboarding, non-blocking caches (for compile-time non-determinism).
Execution units, all fully pipelined:
1 cycle latency,
2 cycles latency,
executing 1 SIMD FP operation per cycle,
only a single issue port for PMUL and POPCNT,
ANSI/IEEE-754,
FMAC: Floating Point Multiply Add Calculation: multiply and add of 82 bit floating point values in one cycle (for matrix calculations),
4 cycles latency,
4 cycles latency,
11 issue ports:
serving the execution units above.
Dynamic prefetch, optimized branch prediction, speculative execution.
Branch prediction:
512 entry, two-level.
Branch Target Address Cache (BTAC): 64 entry.
Interval Time Counter (ITC): register for timing ticks.
In 32 bit compatibility mode: Time Stamp Counter (TSC).
Streamlined Advanced PIC (SAPIC): based on IA-32 APIC (Advanced Programmable Interrupt Controller),
for Aborts, Interrupts, Faults, and Traps:
Virtual address space: 64 bit, no segmentation.
Multiple Address Space (MAS): each process has its own unique Virtual Region (flat linear address space).
8 61 bit Virtual Regions (Virtual Region Number, VRN; Region Identifier, RID), 224 Virtual Address Spaces of 261 bits.
4 kbyte - 4 Gbyte pages (Virtual Page Number, VPN).
Physical address space: 63 bit.
Up to 50 bits supported in page tables.
Write Coalescing (WC): streams of non-cachable writes can be combined into a single bus write transaction.
WC Buffer (WCB): two-entry, 128 byte.
Enhanced Machine Check Architecture (EMCA): parity and ECC (Error-Correcting Code) on all major address and data busses.
50 bit address bus.
Physical addressing:
200/266/333 MHz DDR bus (McKinley bus, Scalability Port): 128 bit data.
Source Synchronous Signaling (SSS).
6.4/8.5/10.6 Gbyte/s max. throughput.
Assisted Gunning Transceiver Logic signaling (AGTL+),
based on GTL+ bus of Intel Pentium III and Pentium III Xeon processors.
1.5 V ± 1.5 %.
Power pod connector.
Tests:
Processor performance monitoring and profiling:
Hyper-Threading Technology (HTT) (from Montecito),
Temporal Multi-Threading (TMT; Switch-on-Event Multi-Threading, SoEMT): threads not running simultaneously, core switches in case of high-latency event.
SMP (Symmetric Multi-Processing): glueless up to four processors (max. 16 in IA-32 compatibility mode).
Max. four processors at 200 MHz, max. two processors at 266 or 333 MHz.
Shared memory, cache coherency through MESI protocol.
Multiplier (Phase Lock Loop, PLL):
set through pins during reset:
multiplier\pin
A21# - A17#
2/9
10110
2/10
10101
2/13
10010
2/14
10001
2/15
10000
2/16
01111
Power and performance management:
P-states:
Performance:
Thermal management: via on-die thermal diode:
System management: System Management Bus (SMBus):
Marking:
CPUID: 8 byte registers:
Family number
Model number
Processor
0x07
0x00
Itanium Merced
0x1F
0x00
Itanium 2 McKinley
0x1F
0x01
Itanium 2 Madison, Deerfield, Hondo
0x1F
0x02
Itanium 2 Madison 9M, Fanwood
0x20
0x00
Itanium 2 9000 series Montecito, Millington
CPUID return values:
| 0x10 | L1D: 16 kbyte, 4-way set-associative, 32 byte line size |
| 0x15 | L1I: 16 kbyte, 4-way set-associative, 32 byte line size |
| 0x1A | L2: 96 kbyte, 6-way set-associative, 64 byte line size |
| 0x88 | L3: 2 Mbyte, 4-way set-associative, 64 byte line size |
| 0x89 | L2: 4 Mbyte, 4-way set-associative, 64 byte line size |
| 0x8A | L2: 8 Mbyte, 4-way set-associative, 64 byte line size |
| 0x90 | ITLB: 64 entry, fully associative, 4 kbyte - 256 Mbyte pages |
| 0x96 | DTLB0: 32 entry, fully associative, 4 kbyte - 256 Mbyte pages |
| 0x9B | DTLB1: 96 entry, fully associative, 4 kbyte - 256 Mbyte pages |
Set EAX register to 2, then returned in EAX, EBX, ECX, EDX registers (MSB - LSB):
| EAX | 0x00 | 0x15 | 0x10 | 0x00 |
| EBX | 0x00 | 0x00 | 0x88/0x89 | 0x00 |
| ECX | 0x00 | 0x9B | 0x00 | 0x00 |
| EDX | 0x80 | 0x00 | 0x00 | 0x00 |
IA-32 CPUID cache returm values:
0x67
L1D: 64 kbyte, 4-way set-associative, 64 byte line size
0x77
L1I: 64 kbyte, 4-way set-associative, 64 byte line size
0x7E
L2: 256 kbyte, 8-way set-associative, 128 byte line size
0x7E
L3: 3Mbyte, 12-way set-associative, 128 byte line size
Used by HP as PA-RISC replacement, and in High Performance Computing (HPC).
|
|
|
Page viewed 590 times since Mon 17 Nov 2008, 14:30.