Diary Of An x264 Developer

10/04/2009 (4:43 am)

Why so many H.264 encoders are bad

If one works long enough with a large number of H.264 encoders, one might notice that a large number of them are pretty much awful.  This of course shouldn’t be a surprise: Sturgeon’s Law says that “90% of everything is crap”.  It’s also exacerbated by the fact that H.264 is the most widely-accepted video standard in years and has spawned a huge amount of software that implements it, thus generating more mediocre implementations.

But even this doesn’t really explain the massive gap between good and bad H.264 encoders.  Good H.264 encoders, like x264, can beat previous-generation encoders like Xvid visually at half the bitrate in many cases.  Yet bad H.264 encoders are often so terrible that they lose to MPEG-2!  The disparity wasn’t nearly this large with previous standards… and there’s a good reason for this.

H.264 offers a great variety of compression features, more than any previous standard.  This also greatly increases the number of ways that encoder developers can shoot themselves in the foot.  In this post I’ll go through a sampling of these.  Most of the problems stem from the single fact that blurriness seems good when using mean squared error as a mode decision metric.

Read More…

06/30/2009 (3:28 pm)

Chroma encoding revisited

Chroma has always been a ripe target for optimization. We have to perform transform+quantization on every block, but the vast majority of blocks end up having not a single nonzero coefficient to code, so it seems as if we wasted our time doing all that arithmetic only to find out that there was no information there anyways. But we can’t just skip it, because the few times that there are coefficients, they are very important. Part of this problem is unique to H.264, which has a quite curious method of encoding its chroma, which I will describe here for those not familiar with it.

For each chroma channel in the current macroblock, 4 4×4 transforms are performed on the residual, making up an 8×8 block. Then, the DC coefficients of each transform are collected and put into a separate 2×2 block, which is transformed again with a Hadamard transform. In the bitstream, the encoder can signal three modes, which apply to both chroma channels. The first mode, 0, simply says there is no chroma data. The second mode, 1, says there is DC data, but not AC data (the rest of the coefficients that weren’t put into that special 2×2 block). The third mode, 2, says that there is both DC and AC data. Since having AC but not DC data is extremely rare, there is no special mode for this.

Read More…

10/22/2008 (12:41 pm)

In the pipeline, part 2

From the local commit git log:

Read More…

07/17/2008 (12:58 pm)

Psy RDO

Many of you may have seen this thread on the Doom9 forums. But what exactly is Psy RDO, other than a magical patch that increases detail retention and sharpness with seemingly few to no downsides? Well here’s how it works.

Psy RDO measures, in addition to normal PSNR-wise distortion, the difference in complexity between the reconstructed frame (what the video will look like on decoding) and the source frame. It values complexity values as close to that of the original frame as possible. Thus, it strongly biases against blurring, and for that matter, any significant loss of detail. Even on non-grainy sources, this generally results in a reduction of banding, increased sharpness, and better retention of fine details. The speed cost is relatively low, though Psy RDO requires regular RD to be enabled.

In all tests so far, this new metric appears to beat out FGO.

And in the spirit of the improved quality, I’ve made a new codec comparison with the latest psy-RDO x264 against basically every other encoder on my (and various other people’s) hard disks.

06/03/2008 (11:03 am)

The power of CABAC

Today we’ll take a post directly from the stream analyzer.

position : 24×25 (384×400)
mb_addr : 1024
size (in bits) : 1
mb_type : 1
pmode : 0
mb_type : Intra(I_16x16_0_0_0)
slice_number : 0
transform_8x8 : 0
field\frame : frame
cbp bits
: 0000 00 00
: 0000 00 00
: 0000
: 0000
quant_param : 0
pmode : Intra_16x16
ipred : 16x16_Vertical

position : 23×27 (368×432)
mb_addr : 1103
size (in bits) : 15
mb_type : 6
pmode : 0
mb_type : Intra(I_16x16_0_0_0)
slice_number : 0
transform_8x8 : 0
field\frame : frame
cbp bits
: 0000 00 00
: 0000 00 00
: 0000
: 0000
quant_param : 0
pmode : Intra_16x16
ipred : 16x16_Vertical

Same macroblock type. Same (zero) coded block pattern. Same prediction mode. Except that one is 1 bit, and the other is 15.  Both are CABAC; the only different is the state of the context table.

05/01/2008 (5:16 pm)

Inter RD refine bugs

I think I’ve found two of these in one week now.

For those who don’t know, on inter blocks in P-frames, subme 7 works by a process called “qpel RD”; that is, it does an ordinary subpixel refinement of the motion vectors except instead of the usual fast metric (SAD or SATD), it does a full rate-distortion comparison on each qpel position using a hexagonal search. Of course, to increase speed, it uses a SATD threshold above which it won’t bother with the whole RD process. The reason for this is obvious when you see the numbers; a full-macroblock SATD takes about 450 clocks, while a full-macroblock RD takes about 10,000 clocks. So even if SATD allowed us to avoid doing an RD 1/20th of the time, it would be worth it; the real numbers are much higher than that, of course.

Now, onto the bugs. The first one was in CAVLC, where in an 8x8dct inter block the numbers for the numbers of non-zero coefficients in each DCT block were not being calculated, resulting in incorrect calculations of bit cost. This wasn’t a problem for CABAC, because CABAC only needs to know whether a block is all-zero or not all-zero; while CAVLC needs to know exactly how many non-zero coefficients there are. This was resolved as part of my overhaul of the nnz code.

The second bug I just ran into today, where I found that when RD was done on 8×16 and 16×8 blocks, for simplicity’s sake, the RD function encoded two 8×8 blocks separately and then summed the resulting scores. This isn’t a problem per se, but the 8×8 RD function went on to check what type of 8×8 block it was: this could be an 8×8, 2 8x4s, 2 4x8s, or 4 4×4 blocks. An 8×16 or 16×8 block can’t have such subblocks, while an 8×8 block can. This again isn’t a problem… but when a 16×8 or 8×16 block type is chosen, it doesn’t reset the subpartition types used for the 8×8 search… so if the 8×8 search chose a sub-8×8 block type, the 8×8 block encode now thinks that we have a sub-8×8 block type even if we cannot possibly have one. The solution was pretty simple.