Diary Of An x264 Developer

06/21/2010 (6:56 am)

How to cheat on video encoder comparisons

Over the past few years, practically everyone and their dog has published some sort of encoder comparison.  Sometimes they’re actually intended to be something for the world to rely on, like the old Doom9 comparisons and the MSU comparisons.  Other times, they’re just to scratch an itch — someone wants to decide for themselves what is better.  And sometimes they’re just there to outright lie in favor of whatever encoder the author likes best.  The latter is practically an expected feature on the websites of commercial encoder vendors.

One thing almost all these comparisons have in common — particularly (but not limited to!) the ones done without consulting experts — is that they are horribly done.  They’re usually easy to spot: for example, two videos at totally different bitrates are being compared, or the author complains about one of the videos being “washed out” (i.e. he screwed up his colorspace conversion).  Or the results are simply nonsensical.  Many of these problems result from the person running the test not “sanity checking” the results to catch mistakes that he made in his test.  Others are just outright intentional.

The result of all these mistakes, both intentional and accidental, is that the results of encoder comparisons tend to be all over the map, to the point of absurdity.  For any pair of encoders, it’s practically a given that a comparison exists somewhere that will “prove” any result you want to claim, even if the result would be beyond impossible in any sane situation.  This often results in the appearance of a “controversy” even if there isn’t any.

Keep in mind that every single mistake I mention in this article has actually been done, usually in more than one comparison.  And before I offend anyone, keep in mind that when I say “cheating”, I don’t mean to imply that everyone that makes the mistake is doing it intentionally.  Especially among amateur comparisons, most of the mistakes are probably honest.

So, without further ado, we will investigate a wide variety of ways, from the blatant to the subtle, with which you too can cheat on your encoder comparisons.

Read More…

06/14/2010 (11:59 am)

Stop doing this in your encoder comparisons

Filed under: Uncategorized ::

I’ll do a more detailed post later on how to properly compare encoders, but lately I’ve seen a lot of people doing something in particular that demonstrates they have no idea what they’re doing.

PSNR is not a very good metric.  But it’s useful for one thing: if every encoder optimizes for it, you can effectively measure how good those encoders are at optimizing for PSNR.  Certainly this doesn’t tell you everything you want to know, but it can give you a good approximation of “how good the encoder is at optimizing for SOMETHING“.  The hope is that this is decently close to the visual results.  This of course can fail to be the case if one encoder has psy optimizations and the other does not.

But it only works to begin with if both encoders are optimized for PSNR.  If one optimizes for, say, SSIM, and one optimizes for PSNR, comparing PSNR numbers is completely meaningless. If anything, it’s worse than meaningless — it will bias enormously towards the encoder that is tuned towards PSNR, for obvious reasons.

And yet people keep doing this.

They keep comparing x264 against other encoders which are tuned against PSNR.  But they don’t tell x264 to also tune for PSNR (–tune psnr, it’s not hard!), and surprise surprise, x264 loses.  Of course, these people never bother to actually look at the output; if they did, they’d notice that x264 usually looks quite a bit better despite having lower PSNR.

This happens so often that I suspect this is largely being done intentionally in order to cheat in encoder comparisons.  Or perhaps it’s because tons of people who know absolutely nothing about video coding insist on doing comparisons without checking their methodology.  Whatever it is, it clearly demonstrates that the person doing the test doesn’t understand what PSNR is or why it is used.

Another victim of this is Theora Ptalarbvorm, which optimizes for SSIM at the expense of PSNR  — an absolutely great decision for visual quality.  And of course if you just blindly compare Ptalarbvorm (1.2) and Thusnelda (1.1), you’ll notice Ptalarbvorm has much lower PSNR!  Clearly, it must be a worse encoder, right?

Stop doing this. And call out the people who insist on cheating.