Anatomy of an optimization: H.264 deblocking
As mentioned in the previous post, H.264 has an adaptive deblocking filter. But what exactly does that mean — and more importantly, what does it mean for performance? And how can we make it as fast as possible? In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.
H.264′s deblocking filter has two steps: strength calculation and the actual filter. The first step calculates the parameters for the second step. The filter runs on all the edges in each macroblock. That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels. The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters!). The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.
Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.
If we’re on the edge between an intra macroblock and any other macroblock: Strength 4
If we’re on an internal edge of an intra macroblock: Strength 3
If either side of a 4-pixel-long edge has residual data: Strength 2
If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same: Strength 1
Otherwise: Strength 0 (no deblocking)
These values are then thrown into a lookup table depending on the quantizer: higher quantizers have stronger deblocking. Then the actual filter is run with the appropriate parameters. Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.