Chroma encoding revisited
Chroma has always been a ripe target for optimization. We have to perform transform+quantization on every block, but the vast majority of blocks end up having not a single nonzero coefficient to code, so it seems as if we wasted our time doing all that arithmetic only to find out that there was no information there anyways. But we can’t just skip it, because the few times that there are coefficients, they are very important. Part of this problem is unique to H.264, which has a quite curious method of encoding its chroma, which I will describe here for those not familiar with it.
For each chroma channel in the current macroblock, 4 4×4 transforms are performed on the residual, making up an 8×8 block. Then, the DC coefficients of each transform are collected and put into a separate 2×2 block, which is transformed again with a Hadamard transform. In the bitstream, the encoder can signal three modes, which apply to both chroma channels. The first mode, 0, simply says there is no chroma data. The second mode, 1, says there is DC data, but not AC data (the rest of the coefficients that weren’t put into that special 2×2 block). The third mode, 2, says that there is both DC and AC data. Since having AC but not DC data is extremely rare, there is no special mode for this.