Nehalem optimizations: the powerful new Core i7
Here’s a piece I wrote for Avail Media to explain some of the Nehalem optimizations I made in the past month or two.
Note: “X/Y” instruction timing means a latency of X clocks (after doing that instruction, one has to wait X clocks to get the results), and an inverse throughput of Y clocks (if one runs a ton of that instruction one after another, one can execute that instruction every Y clocks).
The Nehalem CPU has a number of benefits over the previous Intel generation, the Penryn processor.
First of all, the Nehalem has a much faster SSE unit than the Penryn. A huge number of SSE operations have had their throughput doubled: