AMD Announces SSE5 Instruction Extensions

Specifications available today

August 30, 2007 - Today, AMD announces the SSE5 extensions to the X86 instruction set. These allow software developers to simplify code and achieve greater efficiency for the most performance-hungry applications. According to AMD, High Performance Computing (HPC), multimedia, and security applications will benefit most.

As the industry's focus is shifting from processor speeds to increasing power efficiency, the number of instructions executed per second on one processor core remains relatively constant. As a result, both software and hardware vendors must pursue new approaches to improving computing performance.

A floating-point matrix multiply using the new SSE5 extensions is 30 percent faster than a similar algorithm implemented with the existing SSE instructions, Leendert van Doorn, a senior fellow for AMD, told DDJ. Discrete Cosine Transformations (DCT), which are a basic building block for multimedia encoders, get a 20 percent performance improvement. And the Advanced Encryption Standard (AES) algorithm gets a factor of 5 performance improvement by using the new SSE5 extension compared to an AES implementation that just uses the AMD64 instructions.

The new SSE5 instructions are part of AMD's Extensions for Software Parallelism initiative, designed to optimize programming for multicore chips. Two weeks ago, the chip manufacturer introduced the Light-Weight Profiling Proposal (LWP), an extension allowing user space processes to gather performance data about themselves with very low overhead.

New SSE5 Instructions

The 170 new SSE5 instructions include:

  • Three operand instructions:
    A computing instruction is executed by applying a mathematical or logical function to operands, or inputs. By increasing the number of operands an X86 instruction can handle from two to three, SSE5 enables the consolidation of multiple, simple instructions into a single, more effective instruction. The ability to execute three-operand instructions is currently only possible on certain RISC architectures.
  • Fused Multiply Accumulate:
    The three-operand instruction capability enables the creation of new instructions which efficiently execute complex calculations. The Fused Multiply Accumulate instruction combines multiplication and addition to enable iterative calculations with one instruction. The simplification of the code enables rapid execution for more realistic graphics shading, rapid photographic rendering, spatialized audio, complex vector mathematics and other performance-intense applications.

The full SSE5 specification can be found here. The new instructions will be implemented in AMD's Bulldozer core (K11), available in 2009. Updates to the GCC compiler will be available this week, Van Doorn stated, and AMD Code Analyst Performance Analyzer, Performance Library, Core Math Library and SimNow! are all updated with SSE5 support.

