Diary Of An x264 Developer

02/26/2010 (6:35 pm)

The problems with wavelets

I have periodically noted in this blog and elsewhere various problems with wavelet compression, but many readers have requested that I write a more detailed post about it, so here it is.

Wavelets have been researched for quite some time as a replacement for the standard discrete cosine transform used in most modern video compression.  Their methodology is basically opposite: each coefficient in a DCT represents a constant pattern applied to the whole block, while each coefficient in a wavelet transform represents a single, localized pattern applied to a section of the block.  Accordingly, wavelet transforms are usually very large with the intention of taking advantage of large-scale redundancy in an image.  DCTs are usually quite small and are intended to cover areas of roughly uniform patterns and complexity.

Both are complete transforms, offering equally accurate frequency-domain representations of pixel data.  I won’t go into the mathematical details of each here; the real question is whether one offers better compression opportunities for real-world video.

DCT transforms, though it isn’t mathematically required, are usually found as block transforms, handling a single sharp-edged block of data.  Accordingly, they usually need a deblocking filter to smooth the edges between DCT blocks.  Wavelet transforms typically overlap, avoiding such a need.  But because wavelets don’t cover a sharp-edged block of data, they don’t compress well when the predicted data is in the form of blocks.

Thus motion compensation is usually performed as overlapped-block motion compensation (OBMC), in which every pixel is calculated by performing the motion compensation of a number of blocks and averaging the result based on the distance of those blocks from the current pixel.  Another option, which can be combined with OBMC, is “mesh MC“, where every pixel gets its own motion vector, which is a weighted average of the closest nearby motion vectors.  The end result of either is the elimination of sharp edges between blocks and better prediction, at the cost of greatly increased CPU requirements.  For an overlap factor of 2, it’s 4 times the amount of motion compensation, plus the averaging step.  With mesh MC, it’s even worse, with SIMD optimizations becoming nearly impossible.

At this point, it would seem wavelets would have pretty big advantages: when used with OBMC, they have better inter prediction, eliminate the need for deblocking, and take advantage of larger-scale correlations.  Why then hasn’t everyone switched over to wavelets then?  Dirac and Snow offer modern implementations.  Yet despite decades of research, wavelets have consistently disappointed for image and video compression.  It turns out there are a lot of serious practical issues with wavelets, many of which are open problems.

1.  No known method exists for efficient intra coding. H.264′s spatial intra prediction is extraordinarily powerful, but relies on knowing the exact decoded pixels to the top and left of the current block.  Since there is no such boundary in overlapped-wavelet coding, such prediction is impossible.  Newer intra prediction methods, such as markov-chain intra prediction, also seem to require an H.264-like situation with exactly-known neighboring pixels.  Intra coding in wavelets is in the same state that DCT intra coding was in 20 years ago: the best known method was to simply transform the block with no prediction at all besides DC.  NB: as described by Pengvado in the comments, the switching between inter and intra coding is potentially even more costly than the inefficient intra coding.

2.  Mixing partition sizes has serious practical problems. Because the overlap between two motion partitions depends on the partitions’ size, mixing block sizes becomes quite difficult to define.  While in H.264 an smaller partition always gives equal or better compression than a larger one when one ignores the extra overhead, it is actually possible for a larger partition to win when using OBMC due to the larger overlap.  All of this makes both the problem of defining the result of mixed block sizes and making decisions about them very difficult.

Both Snow and Dirac offer variable block size, but the overlap amount is constant; larger blocks serve only to save bits on motion vectors, not offer better overlap characteristics.

3.  Lack of spatial adaptive quantization. As shown in x264 with VAQ, and correspondingly in HCEnc’s implementation and Theora’s recent implementation, spatial adaptive quantization has staggeringly impressive (before, after) effects on visual quality.  Only Dirac seems to have such a feature, and the encoder doesn’t even use it.  No other wavelet formats (Snow, JPEG2K, etc) seem to have such a feature.  This results in serious blurring problems in areas with subtle texture (as in the comparison below).

4.  Wavelets don’t seem to code visual energy effectively. Remember that a single coefficient in a DCT represents a pattern which applies across an entire block: this makes it very easy to create apparent “detail” with a DCT.  Furthermore, the sharp edges of DCT blocks, despite being an apparent weakness, often result in a “fake sharpness” that can actually improve the visual appearance of videos, as was seen with Xvid.  Thus wavelet codecs have a tendency to look much blurrier than DCT-based codecs, but since PSNR likes blur, this is often seen as a benefit during video compression research.  Some of the consequences of these factors can be seen in this comparison; somewhat outdated and not general-case, but which very effectively shows the difference in how wavelets handle sharp edges and subtle textures.

Another problem that periodically crops up is the visual aliasing that tends to be associated with wavelets at lower bitrates.  Standard wavelets effectively consist of a recursive function that upscales the coefficients coded by the previous level by a factor of 2 and then adds a new set of coefficients.  If the upscaling algorithm is naive — as it often is, for the sake of speed — the result can look quite ugly, as if parts of the image were coded at a lower resolution and then badly scaled up.  Of course, it looks like that because they were coded at a lower resolution and then badly scaled up.

JPEG2000 is a classic example of wavelet failure: despite having more advanced entropy coding, being designed much later than JPEG, being much more computationally intensive, and having much better PSNR, comparisons have consistently shown it to be visually worse than JPEG at sane filesizes.  Here’s an example from Wikipedia.  By comparison, H.264′s intra coding, when used for still image compression, can beat JPEG by a factor of 2 or more (I’ll make a post on this later).  With the various advancements in DCT intra coding since H.264, I suspect that a state-of-the-art DCT compressor could win by an even larger factor.

Despite the promised benefits of wavelets, a wavelet encoder even close to competitive with x264 has yet to be created.  With some tests even showing Dirac losing to Theora in visual comparisons, it’s clear that many problems remain to be solved before wavelets can eliminate the ugliness of block-based transforms once and for all.

32 Responses to “The problems with wavelets”

  1. Pengvado Says:

    > With mesh MC, it’s even worse, with SIMD optimizations becoming nearly impossible.

    Bah. pshufb can handle it.

    > Intra coding in wavelets is in the same state that DCT intra coding was in 20 years ago: the best known method was to simply transform the block with no prediction at all besides DC.

    That’s a problem. But a worse problem is the transition between inter and intra blocks. If you’re only smoothing the transition between two similar MVs, then it’s not too hard to find an imperfect OBMC that works ok. But smoothing the difference between inter and intra? Even if you’re not trying to do anything fancy wrt intra prediction, and would satisfy yourself with DC, the only OBMC method I know of that can handle that without adding whole new new nonzero AC coefs to the residual is:
    Take your 8×8 or 16×16 or whatever motion blocks, round each of them up to the size of the largest wavelet basis vector (128×128 or something, for an overlap factor of *256*, not 4), DWT the MCed block, overlap in wavelet domain, then add the wavelet residual and iDWT. IOW, let the OBMC averaging function be a wavelet blend rather than an alpha blend. Needless to say, no video codec has seriously done this.

    > Snow offers no variable block size (8×8 or 16×16 only). Dirac offers variable block size, but the overlap amount is constant; larger blocks serve only to save bits on motion vectors, not offer better overlap characteristics.

    Snow does it the same way as Dirac. If you enable 8×8 blocks, then overlap is size 8, but mvs are quadtree coded.

  2. Dark Shikari Says:

    @Pengvado

    Fixed that comment about Snow.

  3. Drazick Says:

    Thanks for writing about it.
    Does JPEG XR considered to be “State Of The Art” DCT Compression?

    Regarding Dirac, Do you think with the proper support (By Xiph, Wikimedia and Mozilla) it might become something comparable to H.264 in the near future (1 to 2 years)?
    Maybe not up there with X.264 but an adequate H.264 results?

    Basically, Does Wavelets based compression offer a greater potential than DCT?

    Thanks.

  4. n Says:

    How do you think about MCTF (Motion Compensated Temporal Filtering)?
    It only uses wavelet for low-passed frames.

    http://140.118.107.213/personal/master97/webb/point/Wavelet-based%20Scalable%20Video%20Coding%20With%20Low%20Complex%20Spatial.ppt
    http://iphome.hhi.de/marpe/download/VCIP05_SVC.pdf

  5. yuvi Says:

    JPEG XR doesn’t use the DCT, and I don’t believe even Microsoft claims it beats JPEG compression wise. It was designed more with the intention of supporting higher bitdepths and RGB (it uses YCgCo instead of JPEG’s YCrCb) than for better compression.

  6. Red Says:

    And how does Snow offers modern wavelet implementation? It hasn’t seen serious development for years.

  7. Venkatesh Srinivas Says:

    JPEG XR uses a two-level block transform, each step a Hadamard transform or a rotation; the transform is similar enough to the DCT, but uses only integer math and is reversible.

    In the MSFT papers, JPEG XR mostly did slightly worse than JPEG 2000 in PSNR/bitrate and MSSIM/bitrate, iirc.

  8. Dark Shikari Says:

    @Red

    Snow is competitive with Dirac in tests, which certainly makes it a “modern implementation” whether or not it’s been updated lately.

    Plus, “3 years old” is hardly “not modern”.

  9. skal Says:

    try this one:

    http://sites.google.com/site/dlimagecomp/

    it *will* beat intra-h264 hands on.

    and… you can fry an egg on your desktop with it :)

  10. Pengvado Says:

    @n
    Last I heard, the update step of MCTF was just about useless. And if you remove the update step and leave only the prediction step, then it’s just equivalent to B-pyramid.

  11. bgm Says:

    Amazing article!!!

    About H.264 intra, you expect AIC use in web?, how long will take?

    Also, there are some papers showing in the case of “residue” coding that the DST preserves more energy than DCT. Is viable to transform residue with DST and refs with DCT?

  12. Pengvado Says:

    AIC != H.264 intra. AIC is a nonstandard format that is similar to (but not bitstream compatible with) a subset of H.264 (in particular, i8x8 only), and thus doesn’t compress as well.

  13. cb Says:

    It’s a little unfair to say this is a “problem” with wavelets. Wavelets were specifically designed so that the LL band would be smooth, and generally were optimized to minimize MSE.

    In an engineering or mathematics sense, wavelets were created to solve a certain problem, and they succeeded marvelously.

    The “looks good” comparison which seems to favor plain old JPEG and H264 is hard to formulate mathematically, and therefore also hard to build wavelets to target.

    However, there is progress on this front, and it seems quite likely that some wavelet-like image compressor will be very good in an HVS sense at some point. The trick will be finding the right D measure in doing R/D optimization of wavelets.

  14. Dark Shikari Says:

    @cb

    Better said, it’s a problem with wavelet *compression*, not wavelets themselves.

    We already have plenty of decent “D” measures. VAQ-weighted psy-RD (x264′s D measure) does a pretty good job.

  15. lucabe Says:

    This is a very interesting reading, thanks for it!
    BTW, can you recommend some technical papers describing the details in more depth? I found a lot of papers about wavelets on scholar, but I do not really know which ones are good.

  16. Jan Rychter Says:

    Thank you for your excellent blog articles. It’s so refreshing to read original, well thought out and well written technical articles.

    As someone who has managed the development of an H.264 decoder (on a TI DSP) and doing a bit of x86 assembly on the side, I find myself nodding in agreement most of the time. And now, thanks to this article, I can learn quite a bit about new things.

    Keep on, this is great work and a great service!

  17. spideydouble Says:

    Wow. Thank you for distilling this rather complex subject (IMO) into such simple terms. This is a great primer on wavelet *compression*. It is also very helpful in understanding why Red may have went the direction they did in developing Redcode RAW; its potential and current limitations.

    Keep up the great work!

  18. Jeremy Says:

    Regarding JPEG2000 and “…comparisons have consistently shown it to be visually worse than JPEG at sane filesizes,” in my own anecdotal experience, I do not find this to be true. Particularly at lower bitrates, JPEG2000 looks consistently better to me (I find the heavy blocking artifacts to be much more pronounced than JPEG2000′s smoothing artifacts). Could you include some references to the comparisons you’re speaking of?

    (case in point: http://www.codinghorror.com/blog/2007/02/beyond-jpeg.html – it’s not any more scientific than the singular image you linked to on wikipedia, but to my old rusty eyes it’s pretty clear which image looks better at a comparable size)

    As always, thanks for the post.

  19. Dark Shikari Says:

    @Jeremy

    JPEG-2000 does win at extraordinarily small filesizes, but that’s because JPEG doesn’t scale with resolution like JPEG-2000 does. If you downscale the image before compressing it, thus placing the “bits per pixel” at a non-insane value, JPEG will often return to winning.

  20. Steinar H. Gunderson Says:

    yuvi: The Wikipedia entry on JPEG XR mentions that the bitstream specification claims JPEG XR (ie., HD Photo) “delivers a lossy compressed image of better perceptive quality than JPEG at less than half the file size”.

  21. Mino Says:

    Hello, I’m a Japanese encoding fan.
    Please let me ask you a question about FFmpeg.

    On following options of FFmpeg: -flags/-flags2/-partitions/-cmp, do I have to add “+” to top of the first item?

    In short, which is correct, “-flags2 mbtree+bpyramid” or “-flags2 +mbtree+bpyramid”?

    On Japanese websites, we can’t get information reliable enough.

  22. Dark Shikari Says:

    @Mino

    I’m not sure. Both might work, but I don’t think I know any better than you. The best way to try it is to test, or to read the ffmpeg code.

    In the future, you would probably be best to ask such questions on #x264 on Freenode; there are tons of Japanese x264 fans who hang around there as well.

  23. yuvi Says:

    @Steinar

    I missed that quote from the spec, but it’s prefaced by “HD Photo offers image quality comparable to JPEG-2000″ which makes me think they’re doing the whole compare at compression ratios where downscaling the image is more important than any other compression tech. Or they just assumed that JPEG2000 always beat JPEG and never actually compared directly against JPEG…

    Then again, when I played with a HD photo encoder, it created files that looked terrible and were 3x larger than jpeg even at visually lossless ratios, so if Microsoft has an encoder that beats JPEG in any situation they haven’t released it as far as I can find.

  24. Nil Einne Says:

    Another interesting post. As I mentioned in the Google post, I’ve wondered whether Dirac might be a better bet.

    While it seems not as clear cut as I hoped one thing which I’ve been interested in is the patent issue. Wavelet compression is very active area of research so would seem a minefield but then so is DCT. The state of play may be more understood in the DCT arena although as the VC-1 problem illustrates there’s still big risks. All this (well I didn’t know of the VC-1 issue) made me wonder that other then the fact that Theora is just too old without a decent feature set whether there is a lot more room to move, even if it’s not very clear where you can move in the wavelet arena? For a FLOSS patent free codec this is obviously a core consideration. A useful thing of course would be to come close to h.264 (x264 in particular).

    P.S. I know this is fairly OT but since I already made 3 posts in the other topic including the one I promised was my last, are there many other companies that use x264 internally? You mentioned most websites using it and I came across a comment in Youtube where you mention Avail being a big contributor (and also user). Do those involved in mastering Bluray movies and the like tend to use it? Or satellite & digital terrestrial TV companies (these of course requiring realtime)?

    For some odd reason I’ve always presumed that they either use dedicated hardware encoders (particularly those needing realtime although that’s perhaps not really necessary nowadays) or commercial encoders (which in retrospect seems silly, they can sort out patent licensing themselves and the people who handle this stuff would evaluate the codecs for their internal usage and wouldn’t ignore quality issue). I guess it’s also partly a result of reading how allegedly companies often ignore the FLOSS option even when it makes more sense because they’re confused by commercial vendor (particularly Microsoft) FLOSS which again in retrospect isn’t particularly relevant here.

    If they do use x264 it would be another interesting realisation.

  25. Dark Shikari Says:

    @Nil

    Many many many companies use x264 internally. But not in all realms. x264 only very recently got the features necessary for Blu-ray compliance and we’re working out the last few bugs (there will be an official announcement when it’s all said and done). There is at least one company planning to switch though.

    For television broadcast, Avail is one of the rare exceptions; most companies do use extremely expensive hardware encoders. There are a few others looking into x264 (e.g. one French company, IIRC).

  26. Mark Says:

    Red Camera has developed Redcode aka Redray, Red 4k which claims to compression 4k to 15 Mbit/sec using wavelet base compression.

  27. Dark Shikari Says:

    @Mark

    And it’s a joke. It completely murders detail and ruins picture quality. Its sole purpose is to allow them to claim “4K” when they don’t even have 1080p of actual detail.

  28. Mark Says:

    @Dark Shikari

    That is what Red Camera has claim, 4k at 15 Mbit/sec. You can check out it out at reduser.net where other people have seen the redcode compression. Is it true? Who knows, but I am eager to see when redcode it release. I doubt it too.

  29. Mark Says:

    @Dark Shikari

    There was a meeting in Las Vegas April 14, 2010, and people claim they could not see the different between Recode 4k 15 Mbit/sec and uncompressed 4k. I wish you or any compression expert was there to evaluate the claim.

  30. RobShaver Says:

    This Wikipedia article says that they achieve a 12:1 ratio at 36 MB/s.
    http://en.wikipedia.org/wiki/Red_Digital_Cinema_Camera_Company

    So it seems like Red and CineForm have solved these problems. Is the issue that H.264 has a much higher compression ratio? (I was unable to find any data on the H.264 ratio.)

    What’s the problem with their schemes?

    Peace,

    Rob:-]

  31. Drew Says:

    Thanks Rob, any discussion of wavelet implementation without mentioning Red and Cineform is lacking. These are simply two examples of commercially successful products that use wavelet and are praised by industry professional. I haven’t got to work with Red 4K myself yet, but no one complained about compression artifacts in District 9.

    That said, these products compete with DCT codecs in the 140mbps+ range, H.264 and this article seem to be more about the 1.4mbps range… I’m a codec user, not developer, but I could see how such vastly different bit rates could mean completely different challenges.

  32. Pengvado Says:

    Redcode is a variant of JPEG2000. It doesn’t solve any of these problems, it just throws bits at them.

    I know less about the internals of CineForm, so I can’t say for sure that it doesn’t support AQ. But at least complaints 1, 2, and 4 fully apply to it.

    Yes, the the issue is that H.264 has a much higher compression ratio. These are compression formats we’re talking about; there’s nothing other than compression ratio that could be at stake.

Leave a Reply