Diary Of An x264 Developer

05/03/2008 (1:30 am)

Film grain optimization

Optimizing an encoder for film grain is a tough problem. For one, film grain is, by definition, basically uncorrelated between frames; that is, film grain from a previous frame is totally useless in encoding the current frame’s film grain (at least it would seem!). This would suggest that intra blocks are necessary for encoding film grain, which is what generally results. Yet this encounters another problem: film grain is made up of a whole slew of spacial frequencies, many of which cannot be represented at the quantizers often used in P/B-frames! This makes it extremely difficult to efficiently represent the film grain at reasonable bitrates.

But we can cheat.

Previous frames have a lot of the necessary spatial frequencies–why not steal them? Sure, an inter block won’t be as efficient as an intra block, but it might work better. Indeed, the initial idea I got from glancing at the results of the film grain optimization in Elecard’s encoder (Mainconcept core). Their film grain optimization almost completely disabled I-blocks in P-frames, suggesting that this was indeed the avenue to go down. Of course, their film grain optimization really wasn’t that good–so who knows?

To begin with, I tried the obvious; completely disable intra blocks in P-frames for the hell of it. Surprisingly, this actually worked; in many cases it improved grain retention! But if I was to make this practical, I’d have to find a real way of implementing a metric to decide what block type to use, rather than just brute-force disabling an entire category of blocks.

I eventually came back upon an idea I considered a while back–what about NSSD? NSSD, also known as “noise-retaining sum of squared differences,” is a block comparison metric that is supposed to promote retaining grain/noise. How exactly does it do this? NSSD is equal to the sum of the ordinary SSD and the absolute value of the difference in “noise” values for the two blocks to be compared. “Noise” is abs( x(i,j) – x(i+1,j) – x(i,j+1) + x(i+1,j+1) ) summed up over all pixels x(i,j) in the source block (ignoring pixels that would result in this formula going over the edge of the block). In other words, it doesn’t compare the pixels of the two blocks; it simply measures the “noisiness” of each block, and makes sure that they have a “similar” amount of noise. Keeping the two blocks visually similar is taken care of by the SSD portion of the score.

Amazingly, this worked; replacing the RD metric (SSD) with NSSD, combined with tweaking of the RD thresholds to ensure that modes that tended to retain noise were always analyzed, drastically improved grain retention, and made inter blocks drastically more common in grainy footage. The patch can be found here, complete with mildly optimized MMX assembly for the “noise” operation, ported from ffmpeg (where NSSD is available as a -cmp/-subcmp/-rdcmp option).