Diary Of An x264 Developer

02/26/2010 (6:35 pm)

The problems with wavelets

I have periodically noted in this blog and elsewhere various problems with wavelet compression, but many readers have requested that I write a more detailed post about it, so here it is.

Wavelets have been researched for quite some time as a replacement for the standard discrete cosine transform used in most modern video compression.  Their methodology is basically opposite: each coefficient in a DCT represents a constant pattern applied to the whole block, while each coefficient in a wavelet transform represents a single, localized pattern applied to a section of the block.  Accordingly, wavelet transforms are usually very large with the intention of taking advantage of large-scale redundancy in an image.  DCTs are usually quite small and are intended to cover areas of roughly uniform patterns and complexity.

Both are complete transforms, offering equally accurate frequency-domain representations of pixel data.  I won’t go into the mathematical details of each here; the real question is whether one offers better compression opportunities for real-world video.

DCT transforms, though it isn’t mathematically required, are usually found as block transforms, handling a single sharp-edged block of data.  Accordingly, they usually need a deblocking filter to smooth the edges between DCT blocks.  Wavelet transforms typically overlap, avoiding such a need.  But because wavelets don’t cover a sharp-edged block of data, they don’t compress well when the predicted data is in the form of blocks.

Thus motion compensation is usually performed as overlapped-block motion compensation (OBMC), in which every pixel is calculated by performing the motion compensation of a number of blocks and averaging the result based on the distance of those blocks from the current pixel.  Another option, which can be combined with OBMC, is “mesh MC“, where every pixel gets its own motion vector, which is a weighted average of the closest nearby motion vectors.  The end result of either is the elimination of sharp edges between blocks and better prediction, at the cost of greatly increased CPU requirements.  For an overlap factor of 2, it’s 4 times the amount of motion compensation, plus the averaging step.  With mesh MC, it’s even worse, with SIMD optimizations becoming nearly impossible.

At this point, it would seem wavelets would have pretty big advantages: when used with OBMC, they have better inter prediction, eliminate the need for deblocking, and take advantage of larger-scale correlations.  Why then hasn’t everyone switched over to wavelets then?  Dirac and Snow offer modern implementations.  Yet despite decades of research, wavelets have consistently disappointed for image and video compression.  It turns out there are a lot of serious practical issues with wavelets, many of which are open problems.

Read More…

02/22/2010 (3:05 pm)

Flash, Google, VP8, and the future of internet video

Filed under: google,H.264,HTML5,Theora,VP8,x264 ::

This is going to be a much longer post than usual, as it’s going to cover a lot of ground.

The internet has been filled for quite some time with an enormous number of blog posts complaining about how Flash sucks–so much that it’s sounding as if the entire internet is crying wolf.  But, of course, despite the incessant complaining, they’re right: Flash has terrible performance on anything other than Windows x86 and Adobe doesn’t seem to care at all.  But rather than repeat this ad nauseum, let’s be a bit more intellectual and try to figure out what happened.

Flash became popular because of its power and flexibility.  At the time it was the only option for animated vector graphics and interactive content (stuff like VRML hardly counts).  Furthermore, before Flash, the primary video options were Windows Media, Real, and Quicktime: all of which were proprietary, had no free software encoders or decoders, and (except for Windows Media) required the user to install a clunky external application, not merely a plugin.  Given all this, it’s clear why Flash won: it supported open multimedia formats like H.263 and MP3, used an ultra-simple container format that anyone could write (FLV), and worked far more easily and reliably than any alternative.

Thus, Adobe (actually, at the time, Macromedia) got their 98% install base.  And with that, they began to become complacent.  Any suggestion of a competitor was immediately shrugged off; how could anyone possibly compete with Adobe, given their install base?  It’d be insane, nobody would be able to do it.  They committed the cardinal sin of software development: believing that a competitor being better is excusable.  At x264, if we find a competitor that does something better, we immediately look into trying to put ourselves back on top.  This is why x264 is the best video encoder in the world.  But at Adobe, this attitude clearly faded after they became the monopoly.  This is the true danger of monopolies: they stymie development because the monpolist has no incentive to improve their product.

In short, they drank their own Kool-aid.  But they were wrong about a few critical points.

Read More…

02/15/2010 (9:02 pm)

x264: now with adaptive streaming support

Filed under: ratecontrol,streaming,x264 ::

You’re running a video chat program on a relatively weak upstream connection.  Someone else opens a video chat program on the same connection and your available bandwidth immediately drops.  What do you do?

You’re running a streaming video server that sends live video to an iPhone.  The client moves into an area of weaker reception and the stream begins to break up.  What do you do?

You’re running a streaming video server and it has currently maxed out your connection with the current viewers, but you want another person to be able to connect.   You’d rather not restart the whole server though.  What do you do?

Read More…