This diff is highly related to this post.
If one looks in Intel’s documentation of their assembly, one notices a few things. In particular, there are a whole bunch of operations which do exactly the same thing but have different opcodes. Intel introduces “movaps” and “movups” for aligned and unaligned moves in SSE1, and then “movdqa” and “movdqu” in SSE2… to do exactly the same thing. The same situation occurs with pand and andps… etc. The end result is a number of things:
1. Wasted opcode space on opcodes that do exactly the same thing.
2. Wasted executable size, since movdqa is larger than movaps (3 byte vs 2 byte opcode) despite doing exactly the same thing.
3. Loss of our sanity.