Diary Of An x264 Developer

02/26/2010 (6:35 pm)

The problems with wavelets

I have periodically noted in this blog and elsewhere various problems with wavelet compression, but many readers have requested that I write a more detailed post about it, so here it is.

Wavelets have been researched for quite some time as a replacement for the standard discrete cosine transform used in most modern video compression.  Their methodology is basically opposite: each coefficient in a DCT represents a constant pattern applied to the whole block, while each coefficient in a wavelet transform represents a single, localized pattern applied to a section of the block.  Accordingly, wavelet transforms are usually very large with the intention of taking advantage of large-scale redundancy in an image.  DCTs are usually quite small and are intended to cover areas of roughly uniform patterns and complexity.

Both are complete transforms, offering equally accurate frequency-domain representations of pixel data.  I won’t go into the mathematical details of each here; the real question is whether one offers better compression opportunities for real-world video.

DCT transforms, though it isn’t mathematically required, are usually found as block transforms, handling a single sharp-edged block of data.  Accordingly, they usually need a deblocking filter to smooth the edges between DCT blocks.  Wavelet transforms typically overlap, avoiding such a need.  But because wavelets don’t cover a sharp-edged block of data, they don’t compress well when the predicted data is in the form of blocks.

Thus motion compensation is usually performed as overlapped-block motion compensation (OBMC), in which every pixel is calculated by performing the motion compensation of a number of blocks and averaging the result based on the distance of those blocks from the current pixel.  Another option, which can be combined with OBMC, is “mesh MC“, where every pixel gets its own motion vector, which is a weighted average of the closest nearby motion vectors.  The end result of either is the elimination of sharp edges between blocks and better prediction, at the cost of greatly increased CPU requirements.  For an overlap factor of 2, it’s 4 times the amount of motion compensation, plus the averaging step.  With mesh MC, it’s even worse, with SIMD optimizations becoming nearly impossible.

At this point, it would seem wavelets would have pretty big advantages: when used with OBMC, they have better inter prediction, eliminate the need for deblocking, and take advantage of larger-scale correlations.  Why then hasn’t everyone switched over to wavelets then?  Dirac and Snow offer modern implementations.  Yet despite decades of research, wavelets have consistently disappointed for image and video compression.  It turns out there are a lot of serious practical issues with wavelets, many of which are open problems.

Read More…