Update: post now contains a Theora comparison as well; see below.
JPEG is a very old lossy image format. By today’s standards, it’s awful compression-wise: practically every video format since the days of MPEG-2 has been able to tie or beat JPEG at its own game. The reasons people haven’t switched to something more modern practically always boil down to a simple one — it’s just not worth the hassle. Even if JPEG can be beaten by a factor of 2, convincing the entire world to change image formats after 20 years is nigh impossible. Furthermore, JPEG is fast, simple, and practically guaranteed to be free of any intellectual property worries. It’s been tried before: JPEG-2000 first, then Microsoft’s JPEG XR, both tried to unseat JPEG. Neither got much of anywhere.
Now Google is trying to dump yet another image format on us, “WebP”. But really, it’s just a VP8 intra frame. There are some obvious practical problems with this new image format in comparison to JPEG; it doesn’t even support all of JPEG’s features, let alone many of the much-wanted features JPEG was missing (alpha channel support, lossless support). It only supports 4:2:0 chroma subsampling, while JPEG can handle 4:2:2 and 4:4:4. Google doesn’t seem interested in adding any of these features either.
But let’s get to the meat and see how these encoders stack up on compressing still images. As I explained in my original analysis, VP8 has the advantage of H.264′s intra prediction, which is one of the primary reasons why H.264 has such an advantage in intra compression. It only has i4x4 and i16x16 modes, not i8x8, so it’s not quite as fancy as H.264′s, but it comes close.
The test files are all around 155KB; download them for the exact filesizes. For all three, I did a binary search of quality levels to get the file sizes close. For x264, I encoded with
--tune stillimage --preset placebo. For libvpx, I encoded with
--best. For JPEG, I encoded with ffmpeg, then applied jpgcrush, a lossless jpeg compressor. I suspect there are better JPEG encoders out there than ffmpeg; if you have one, feel free to test it and post the results. The source image is the 200th frame of Parkjoy, from derf’s page (fun fact: this video was shot here! More info on the video here.).
This seems rather embarrassing for libvpx. Personally I think VP8 looks by far the worst of the bunch, despite JPEG’s blocking. What’s going on here? VP8 certainly has better entropy coding than JPEG does (by far!). It has better intra prediction (JPEG has just DC prediction). How could VP8 look worse? Let’s investigate.
VP8 uses a 4×4 transform, which tends to blur and lose more detail than JPEG’s 8×8 transform. But that alone certainly isn’t enough to create such a dramatic difference. Let’s investigate a hypothesis — that the problem is that libvpx is optimizing for PSNR and ignoring psychovisual considerations when encoding the image… I’ll encode with
--tune psnr --preset placebo in x264, turning off all psy optimizations.
Files: (x264, optimized for PSNR [154KB]) [Note for the technical people: because adaptive quantization is off, to get the filesize on target I had to use a CQM here.]
Results (decoded to PNG): (x264, optimized for PSNR)
What a blur! Only somewhat better than VP8, and still worse than JPEG. And that’s using the same encoder and the same level of analysis — the only thing done differently is dropping the psy optimizations. Thus we come back to the conclusion I’ve made over and over on this blog — the encoder matters more than the video format, and good psy optimizations are more important than anything else for compression. libvpx, a much more powerful encoder than ffmpeg’s jpeg encoder, loses because it tries too hard to optimize for PSNR.
These results raise an obvious question — is Google nuts? I could understand the push for “WebP” if it was better than JPEG. And sure, technically as a file format it is, and an encoder could be made for it that’s better than JPEG. But note the word “could”. Why announce it now when libvpx is still such an awful encoder? You’d have to be nuts to try to replace JPEG with this blurry mess as-is. Now, I don’t expect libvpx to be able to compete with x264, the best encoder in the world — but surely it should be able to beat an image format released in 1992?
Earth to Google: make the encoder good first, then promote it as better than the alternatives. The reverse doesn’t work quite as well.
Addendum (added Oct. 2, 03:51):
maikmerten gave me a Theora-encoded image to compare as well. Here’s the PNG and the source (155KB). And yes, that’s Theora 1.2 (Ptalarbvorm) beating VP8 handily. Now that is embarassing. Guess what the main new feature of Ptalarbvorm is? Psy optimizations…
Addendum (added Apr. 20, 23:33):
There’s a new webp encoder out, written from scratch by skal (available in libwebp). It’s significantly better than libvpx — not like that says much — but it should probably beat JPEG much more readily now. The encoder design is rather unique — it basically uses K-means for a large part of the encoding process. It still loses to x264, but that was expected.