Diary Of An x264 Developer

06/21/2010 (6:56 am)

How to cheat on video encoder comparisons

Over the past few years, practically everyone and their dog has published some sort of encoder comparison.  Sometimes they’re actually intended to be something for the world to rely on, like the old Doom9 comparisons and the MSU comparisons.  Other times, they’re just to scratch an itch — someone wants to decide for themselves what is better.  And sometimes they’re just there to outright lie in favor of whatever encoder the author likes best.  The latter is practically an expected feature on the websites of commercial encoder vendors.

One thing almost all these comparisons have in common — particularly (but not limited to!) the ones done without consulting experts — is that they are horribly done.  They’re usually easy to spot: for example, two videos at totally different bitrates are being compared, or the author complains about one of the videos being “washed out” (i.e. he screwed up his colorspace conversion).  Or the results are simply nonsensical.  Many of these problems result from the person running the test not “sanity checking” the results to catch mistakes that he made in his test.  Others are just outright intentional.

The result of all these mistakes, both intentional and accidental, is that the results of encoder comparisons tend to be all over the map, to the point of absurdity.  For any pair of encoders, it’s practically a given that a comparison exists somewhere that will “prove” any result you want to claim, even if the result would be beyond impossible in any sane situation.  This often results in the appearance of a “controversy” even if there isn’t any.

Keep in mind that every single mistake I mention in this article has actually been done, usually in more than one comparison.  And before I offend anyone, keep in mind that when I say “cheating”, I don’t mean to imply that everyone that makes the mistake is doing it intentionally.  Especially among amateur comparisons, most of the mistakes are probably honest.

So, without further ado, we will investigate a wide variety of ways, from the blatant to the subtle, with which you too can cheat on your encoder comparisons.

Read More…

06/14/2010 (11:59 am)

Stop doing this in your encoder comparisons

Filed under: Uncategorized ::

I’ll do a more detailed post later on how to properly compare encoders, but lately I’ve seen a lot of people doing something in particular that demonstrates they have no idea what they’re doing.

PSNR is not a very good metric.  But it’s useful for one thing: if every encoder optimizes for it, you can effectively measure how good those encoders are at optimizing for PSNR.  Certainly this doesn’t tell you everything you want to know, but it can give you a good approximation of “how good the encoder is at optimizing for SOMETHING“.  The hope is that this is decently close to the visual results.  This of course can fail to be the case if one encoder has psy optimizations and the other does not.

But it only works to begin with if both encoders are optimized for PSNR.  If one optimizes for, say, SSIM, and one optimizes for PSNR, comparing PSNR numbers is completely meaningless. If anything, it’s worse than meaningless — it will bias enormously towards the encoder that is tuned towards PSNR, for obvious reasons.

And yet people keep doing this.

They keep comparing x264 against other encoders which are tuned against PSNR.  But they don’t tell x264 to also tune for PSNR (–tune psnr, it’s not hard!), and surprise surprise, x264 loses.  Of course, these people never bother to actually look at the output; if they did, they’d notice that x264 usually looks quite a bit better despite having lower PSNR.

This happens so often that I suspect this is largely being done intentionally in order to cheat in encoder comparisons.  Or perhaps it’s because tons of people who know absolutely nothing about video coding insist on doing comparisons without checking their methodology.  Whatever it is, it clearly demonstrates that the person doing the test doesn’t understand what PSNR is or why it is used.

Another victim of this is Theora Ptalarbvorm, which optimizes for SSIM at the expense of PSNR  — an absolutely great decision for visual quality.  And of course if you just blindly compare Ptalarbvorm (1.2) and Thusnelda (1.1), you’ll notice Ptalarbvorm has much lower PSNR!  Clearly, it must be a worse encoder, right?

Stop doing this. And call out the people who insist on cheating.

05/25/2010 (11:01 pm)

Anatomy of an optimization: H.264 deblocking

Filed under: assembly,development,H.264,speed,x264 ::

As mentioned in the previous post, H.264 has an adaptive deblocking filter.  But what exactly does that mean — and more importantly, what does it mean for performance?  And how can we make it as fast as possible?  In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.

H.264′s deblocking filter has two steps: strength calculation and the actual filter.  The first step calculates the parameters for the second step.  The filter runs on all the edges in each macroblock.  That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels.  The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters!).  The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.

Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.

If we’re on the edge between an intra macroblock and any other macroblock: Strength 4
If we’re on an internal edge of an intra macroblock: Strength 3
If either side of a 4-pixel-long edge has residual data: Strength 2
If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same: Strength 1
Otherwise: Strength 0 (no deblocking)

These values are then thrown into a lookup table depending on the quantizer: higher quantizers have stronger deblocking.  Then the actual filter is run with the appropriate parameters.  Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.

Read More…

05/19/2010 (9:30 am)

The first in-depth technical analysis of VP8

Filed under: google,VP8 ::

Back in my original post about Internet video, I made some initial comments on the hope that VP8 would solve the problems of web video by providing a supposed patent-free video format with significantly better compression than the current options of Theora and Dirac. Fortunately, it seems I was able to acquire access to the VP8 spec, software, and source a good few days before the official release and so was able to perform a detailed technical analysis in time for the official release.

The questions I will try to answer here are:

1. How good is VP8? Is the file format actually better than H.264 in terms of compression, and could a good VP8 encoder beat x264? On2 claimed 50% better than H.264, but On2 has always made absurd claims that they were never able to back up with results, so such a number is almost surely wrong. VP7, for example, was claimed to be 15% better than H.264 while being much faster, but was in reality neither faster nor higher quality.

2. How good is On2′s VP8 implementation? Irrespective of how good the spec is, is the implementation good, or is this going to be just like VP3, where On2 releases an unusably bad implementation with the hope that the community will fix it for them? Let’s hope not; it took 6 years to fix Theora!

3. How likely is VP8 to actually be free of patents? Even if VP8 is worse than H.264, being patent-free is still a useful attribute for obvious reasons. But as noted in my previous post, merely being published by Google doesn’t guarantee that it is. Microsoft did similar a few years ago with the release of VC-1, which was claimed to be patent-free — but within mere months after release, a whole bunch of companies claimed patents on it and soon enough a patent pool was formed.

We’ll start by going through the core features of VP8. We’ll primarily analyze them by comparing to existing video formats.  Keep in mind that an encoder and a spec are two different things: it’s possible for good encoder to be written for a bad spec or vice versa! Hence why a really good MPEG-1 encoder can beat a horrific H.264 encoder.

But first, a comment on the spec itself.

Read More…

05/08/2010 (1:47 pm)

Taking submissions for encoder comparison

Filed under: Uncategorized ::

With VP8 supposedly going to come out in about 2 weeks, it’s time to get a rough idea as to the visual state of the art in terms of encoders.  Accordingly, I’m doing a small visual codec comparison in which we will take a few dozen encoders, encode a single test clip, and perform score-based visual tests on real humans using a blind test.  There will be no PSNR or SSIM results posted.

See the doom9 thread for more information and feel free to submit streams for your own encoders.  I’m particularly interested in some newer proprietary encoders for which I wouldn’t be able to get the software for due to NDAs or similar (such as VP8, Sony Blu-code, etc) — but for which I would be able to get a dump of the decoded output.

05/07/2010 (8:57 am)

Simply beyond ridiculous

Filed under: H.265,speed ::

For the past few years, various improvements on H.264 have been periodically proposed, ranging from larger transforms to better intra prediction.  These finally came together in the JCT-VC meeting this past April, where over two dozen proposals were made for a next-generation video coding standard.  Of course, all of these were in very rough-draft form; it will likely take years to filter it down into a usable standard.  In the process, they’ll pick the most useful features (hopefully) from each proposal and combine them into something a bit more sane.  But, of course, it all has to start somewhere.

A number of features were common: larger block sizes, larger transform sizes, fancier interpolation filters, improved intra prediction schemes, improved motion vector prediction, increased internal bit depth, new entropy coding schemes, and so forth.  A lot of these are potentially quite promising and resolve a lot of complaints I’ve had about H.264, so I decided to try out the proposal that appeared the most interesting: the Samsung+BBC proposal (A124), which claims compression improvements of around 40%.

The proposal combines a bouillabaisse of new features, ranging from a 12-tap interpolation filter to 12thpel motion compensation and transforms as large as 64×64.  Overall, I would say it’s a good proposal and I don’t doubt their results given the sheer volume of useful features they’ve dumped into it.  I was a bit worried about complexity, however, as 12-tap interpolation filters don’t exactly scream “fast”.

I prepared myself for the slowness of an unoptimized encoder implementation, compiled their tool, and started a test encode with their recommended settings.

Read More…

04/25/2010 (11:01 am)

Announcing the first free software Blu-ray encoder

Filed under: blu-ray,x264 ::

For many years it has been possible to make your own DVDs with free software tools.  Over the course of the past decade, DVD creation evolved from the exclusive domain of the media publishing companies to something basically anyone could do on their home computer.

But Blu-ray has yet to get that treatment.  Despite the “format war” between Blu-ray and HD DVD ending over two years ago, free software has lagged behind.  “Professional” tools for Blu-ray video encoding can cost as much as $100,000 and are often utter garbage.  Here are two actual screenshots from real Blu-rays: I wish I was making this up.

But today, things change.  Today we take the first step towards a free software Blu-ray creation toolkit.

Thanks to tireless work by Kieran Kunyha, Alex Giladi, Lamont Alston, and the Doom9 crowd, x264 can now produce Blu-ray-compliant video.  Extra special thanks to The Criterion Collection for sponsoring the final compliance test to confirm x264′s Blu-ray compliance.

With x264′s powerful compression, as demonstrated by the incredibly popular BD-Rebuilder Blu-ray backup software, it’s quite possible to author Blu-ray disks on DVD9s (dual-layer DVDs) or even DVD5s (single-layer DVDs) with a reasonable level of quality.  With a free software encoder and less need for an expensive Blu-ray burner, we are one step closer to putting HD optical media creation in the hands of the everyday user.

To celebrate this achievement, we are making available for download a demo Blu-ray encoded with x264, containing entirely free content!

Read More…

03/18/2010 (10:29 pm)

Announcing x264 Summer of Code 2010!

Filed under: development,google,GSOC,x264 ::

With the announcement of Google Summer of Code 2010 and the acceptance of our umbrella organization, Videolan, we are proud to announce the third x264 Summer of Code!  After two years of progressively increasing success, we expect this year to be better than ever.  Last year’s successes include ARM support and weighted P-frame prediction.  This year we have a wide variety of projects of varying difficulty, including some old ones and a host of new tasks.  The qualification tasks are tough, so if you want to get involved, the sooner the better!

Interested in getting started?  Check out the wiki page, hop on #x264 on Freenode IRC, and say hi to the gang!  No prior experience or knowledge in video compression necessary: just dedication and the willingness to ask questions and experiment until you figure things out.

02/26/2010 (6:35 pm)

The problems with wavelets

I have periodically noted in this blog and elsewhere various problems with wavelet compression, but many readers have requested that I write a more detailed post about it, so here it is.

Wavelets have been researched for quite some time as a replacement for the standard discrete cosine transform used in most modern video compression.  Their methodology is basically opposite: each coefficient in a DCT represents a constant pattern applied to the whole block, while each coefficient in a wavelet transform represents a single, localized pattern applied to a section of the block.  Accordingly, wavelet transforms are usually very large with the intention of taking advantage of large-scale redundancy in an image.  DCTs are usually quite small and are intended to cover areas of roughly uniform patterns and complexity.

Both are complete transforms, offering equally accurate frequency-domain representations of pixel data.  I won’t go into the mathematical details of each here; the real question is whether one offers better compression opportunities for real-world video.

DCT transforms, though it isn’t mathematically required, are usually found as block transforms, handling a single sharp-edged block of data.  Accordingly, they usually need a deblocking filter to smooth the edges between DCT blocks.  Wavelet transforms typically overlap, avoiding such a need.  But because wavelets don’t cover a sharp-edged block of data, they don’t compress well when the predicted data is in the form of blocks.

Thus motion compensation is usually performed as overlapped-block motion compensation (OBMC), in which every pixel is calculated by performing the motion compensation of a number of blocks and averaging the result based on the distance of those blocks from the current pixel.  Another option, which can be combined with OBMC, is “mesh MC“, where every pixel gets its own motion vector, which is a weighted average of the closest nearby motion vectors.  The end result of either is the elimination of sharp edges between blocks and better prediction, at the cost of greatly increased CPU requirements.  For an overlap factor of 2, it’s 4 times the amount of motion compensation, plus the averaging step.  With mesh MC, it’s even worse, with SIMD optimizations becoming nearly impossible.

At this point, it would seem wavelets would have pretty big advantages: when used with OBMC, they have better inter prediction, eliminate the need for deblocking, and take advantage of larger-scale correlations.  Why then hasn’t everyone switched over to wavelets then?  Dirac and Snow offer modern implementations.  Yet despite decades of research, wavelets have consistently disappointed for image and video compression.  It turns out there are a lot of serious practical issues with wavelets, many of which are open problems.

Read More…

02/22/2010 (3:05 pm)

Flash, Google, VP8, and the future of internet video

Filed under: google,H.264,HTML5,Theora,VP8,x264 ::

This is going to be a much longer post than usual, as it’s going to cover a lot of ground.

The internet has been filled for quite some time with an enormous number of blog posts complaining about how Flash sucks–so much that it’s sounding as if the entire internet is crying wolf.  But, of course, despite the incessant complaining, they’re right: Flash has terrible performance on anything other than Windows x86 and Adobe doesn’t seem to care at all.  But rather than repeat this ad nauseum, let’s be a bit more intellectual and try to figure out what happened.

Flash became popular because of its power and flexibility.  At the time it was the only option for animated vector graphics and interactive content (stuff like VRML hardly counts).  Furthermore, before Flash, the primary video options were Windows Media, Real, and Quicktime: all of which were proprietary, had no free software encoders or decoders, and (except for Windows Media) required the user to install a clunky external application, not merely a plugin.  Given all this, it’s clear why Flash won: it supported open multimedia formats like H.263 and MP3, used an ultra-simple container format that anyone could write (FLV), and worked far more easily and reliably than any alternative.

Thus, Adobe (actually, at the time, Macromedia) got their 98% install base.  And with that, they began to become complacent.  Any suggestion of a competitor was immediately shrugged off; how could anyone possibly compete with Adobe, given their install base?  It’d be insane, nobody would be able to do it.  They committed the cardinal sin of software development: believing that a competitor being better is excusable.  At x264, if we find a competitor that does something better, we immediately look into trying to put ourselves back on top.  This is why x264 is the best video encoder in the world.  But at Adobe, this attitude clearly faded after they became the monopoly.  This is the true danger of monopolies: they stymie development because the monpolist has no incentive to improve their product.

In short, they drank their own Kool-aid.  But they were wrong about a few critical points.

Read More…

« Previous PageNext Page »