Diary Of An x264 Developer

08/09/2009 (10:41 pm)

Encoding animation

Filed under: benchmark,H.264,ratecontrol,SSIM,x264 ::

Note: I originally posted this a day earlier, but quickly retracted it when it was pointed out that I made a rather egregious error in the ffmpeg tests’ settings, so I re-did those and added a few more codecs.

Encoder comparisons are a dime a dozen these days, but there’s one thing I’ve almost never seen tested: cartoon sources.  Animated material has totally different characteristics from film and presents a whole separate set of encoding challenges.

First, we’ll start with what makes such video easy to compress.  Animation is mostly static; backgrounds are completely static with characters placed in front of them.  The characters themselves are mostly static as well; modern animation is usually at a significantly lower framerate than the actual video.  Furthermore, characters may stand still with only their mouths moving, dramatically reducing complexity.  Finally, animation is usually very clean, without any film grain.  All of this combines to seemingly make animation compression a very simple task.

But don’t let the above reasons fool you; animation is also very hard to compress.  To begin with, the vast majority of bits in animation are spent on sharp edges.  The frequency transform of a sharp edge is a high-magnitude Sinc distribution of coefficients–enormously expensive to encode.  In fact, put plain and simply, both DCT-based video formats (H.264, MPEG-4, MPEG-2, Theora, etc) and wavelet-based formats (Dirac, Snow, etc) are completely unsuitable for encoding the sharp edges inherent in animated content.  And as far as I can tell, nobody has come up with anything significantly better, either.  Maybe those directional wavelets might do the trick.

There are also a number of problems from an encoder perspective in addition to the issues of the transform itself.  Motion is rarely smooth in animated content; since animation is usually done at a much lower framerate than the actual video, an object may alternate motion and non-motion, resulting in motion search being unable to use temporal predictors.  Furthermore, when that object jumps 20 pixels to the right, it’s inherently hard to find where it went using normal methods.

Additionally there are visual issues: the edges of objects are so costly bit-wise that encoders tend to leave very few bits left for other parts of the video, like the backgrounds.  This means one often needs to use a much higher bitrate than otherwise necessary to stop the backgrounds from falling apart.

Furthermore, animation tends to torture ratecontrol; ordinary predictive methods often fall apart on animated sources, since a frame can alternate between high and low bit cost despite having seemingly uniform complexity.  Even x264′s VBV can give very bad results in 1-pass mode with animated input, so it should be no surprise that many encoders have ratecontrol issues when dealing with such content.

All of this combines to make animation at first glance deceptively easy–but in reality quite difficult–to encode.

So, let’s see how the various encoders stack up.  It’s practically impossible to find a good freely available animated source, so I’ve picked 5000 frames from the only good DVD animation source I have: the Summer Day’s Dream OVA.  This is a quite clean source with relatively low motion; typical animation.

For testing, I’ve chosen SSIM as the quality metric, since subjective testing is a nightmare and rarely gives solid numerical results anyways.  The reason I’ve chosen SSIM instead of PSNR is not just that SSIM is a better metric in general–but also that that the results are specifically more meaningful for animation.  Because the majority of the bits in animation are spent on the black lines that form edges of objects, the edges’ mean squared error dwarfs all other distortions.  Thus it becomes PSNR-optimal to make the background look like crap if the extra bits can slightly improve the edges–obviously not very visually optimal.  Though SSIM isn’t perfect, it’s certainly a better choice from a visual perspective.  I used the MSU Video Quality Tool for SSIM measurement.

For comparison, I’ve used the formula “1/(1-SSIM)”.  This creates a number which can be compared between results.  It assumes that 0.98 SSIM is twice as good as 0.96 SSIM is twice as good as 0.92 SSIM and so forth.  Note, however, that if this method says “codec X is 3 times better than codec Y”, it is not the same as saying that “codec X needs 3 times as much bitrate as codec Y for the same quality”, because that assumes a perfectly straight RD curve, which isn’t guaranteed.  I could test for the latter, but that would be very time-consuming as it would require testing over and over and over until I got the same SSIM with each encoder.

Additionally, I have attempted to normalize the slight differences in filesize between the encodes, since not all encoders gave the exact filesize (though I already did a lot of re-encodes to try to get them as close as possible). This was a “speed is no object” test; I used the slowest settings available on every encoder. I didn’t do much any source-specific tweaking of any encoders though; the ffmpeg settings are just from my big “quality-optimal ffmpeg settings” file I keep around, so I don’t know if other settings would give better results.

Overall, this test is very special-case (anime encoding, not ordinary film footage) and is not at all exact–but the results should be at least somewhat applicable to the specific question: how well can various encoders handle animated input?

For all encoders, I used 250kbps video and a keyframe interval of (as close as I could specify) 250 frames.  A few of the most obscure encoders didn’t allow setting a keyframe interval (e.g. Bink).  Here’s the encoders used, settings used, and their respective video formats:

x264 (r1206)
Video format: H.264/AVC High Profile
Settings: –preset placebo –tune ssim –rc-lookahead 250, two-pass

I encoded with five other sets of settings for comparison, to see how much x264 degrades on the faster speed modes, and additionally how much SSIM is gained from –tune ssim as compared to –tune psnr.

x264 Baseline
Video format: H.264/AVC Baseline Profile
Settings: –preset placebo –tune ssim –rc-lookahead 250 –profile baseline, two-pass

x264 Ultrafast
Video format: H.264/AVC Baseline Profile
Settings: –preset ultrafast –tune ssim, two-pass

x264 Veryfast
Video format: H.264/AVC High Profile
Settings: –preset veryfast –tune ssim, two-pass

x264 Medium
Video format: H.264/AVC High Profile
Settings: –preset medium –tune ssim, two-pass

x264 PSNR
Video format: H.264/AVC High Profile
Settings: –preset placebo –tune psnr –rc-lookahead 250 –profile baseline, two-pass

WMV (Expression Encoder 3)
Video format: VC-1 Advanced Profile
Settings: “best” preset

Xvid (1.2.1)
Video format: MPEG-4 Part 2 ASP
Settings: maxed out settings with bframes/qpel/GMC

Thusnelda (August 7th ffmpeg2theora nightly)
Video format: Theora
Settings: Two-pass (no other quality tweaks are available)

Quicktime (7.6.2)
Video format: H.264/AVC Main Profile
Settings: Two-pass (no other quality tweaks are available)

ffmpeg mpeg2
Video format: MPEG-2 video
Settings: -flags qpel+mv0+cbp+aic -dia_size 1040 -g 250 -bf 8 -b_strategy 2 -cmp sad -subcmp rd -mbd 2 -precmp sad -last_pred 4 -subq 8 -bidir_refine 4 -trellis 1 -qns 3, two-pass.

ffmpeg mpeg4
Video format: MPEG-4 Part 2 ASP
Settings: -flags mv4+qpel+mv0+cbp+aic -dia_size 1040 -g 250 -bf 8 -b_strategy 2 -cmp sad -subcmp rd -mbd 2 -precmp sad -last_pred 4 -subq 8 -bidir_refine 4 -trellis 1 -qns 3, two-pass.

ffmpeg flv1
Video format: Sorenson Spark H.263 (FLV1)
Settings: -flags +mv4+mv0+cbp+aic -dia_size 1040 -g 250 -cmp sad -subcmp rd -mbd 2 -precmp sad -last_pred 4 -subq 8 -trellis 1 -qns 3, two-pass.

On2 VP7
Video format: VP7
Settings: two-pass with “best” preset

Ateme (1.11)
Video format: H.264/AVC High Profile
Settings: “Full” preset with “macroblock-level” psy optimizations.  Note: This is not the latest encoder, 2.0, as I don’t know anyone with access to it, so treat this as an example of a decent H.264 encoder, not the best that Ateme can offer.

Real Producer (10)
Video format: RV40 (similar to H.264/AVC Main without CABAC)
Settings: “High” quality, two-pass.

Bink Video
Video format: Bink Video
Settings: 64-frame lookahead (no other quality tweaks are available)

Elecard Converter Studio (3.1)
Video format: H.264/AVC High Profile
Settings: Maxed out; I tested the complexity mask to see if it helped SSIM, but it didn’t, so I left it off.

Badaboom (1.2.1)
Video format: H.264/AVC Main Profile
Settings: It doesn’t have much to tweak, not even two-pass.

Indeo 5 (5.1)
Video format: Indeo 5
Settings: It doesn’t have much to tweak.

ffmpeg Snow
Video format: Snow
Settings: (mencoder) vcodec=snow:vstrict=-2:vbitrate=250:pred=0:mbd=2:cmp=12:
subcmp=12:mbcmp=12:qpel:vme=8:refs=8:keyint=250 (two-pass)

ffmpeg SVQ1
Video format: SVQ1
Settings: -qscale 23.5 -g 250 -cmp satd -subcmp satd -mbcmp satd -mbd rd -dia 4 -last_pred 2
Note: This was chosen as an example of the best of the pre-transform generation of video formats, the vector quantization codecs. ffmpeg was used because the Apple version is atrocious, and constant quality was used because ffmpeg doesn’t have bitrate mode for SVQ1 (Newton-Raphson was used to get a matching bitrate with constant quality mode).

I was going to include Dirac as well, but I couldn’t figure out how to set the keyint to 250, and it would be very unfair to include Dirac with the default low keyint (40).

And now for the results… since the graph is far too large to post here, I’ll link it instead:

The Graph

The color coding:
Blue: x264
Red: Non-x264 H.264 encoders
Green: ffmpeg encoders
Yellow: Open source encoders that don’t fall into any of the above
Purple: Proprietary encoders that don’t fall into any of the above

There are a lot of surprises here, so I’ll go over them one at a time.

1. x264 Baseline Profile beats everything non-x264.

I didn’t see this one coming.  Despite there being a 55% difference between High and Baseline, x264 still edges ahead of Elecard with Baseline.

2. Even when optimizing for PSNR, x264 beats the hell out of everything.

I expected AQ to be a huge factor in x264′s win, since the test is an SSIM test; but apparently it isn’t.

3.  ffmpeg’s encoders are shockingly good.

With kitchen-sink-level atrociously slow options, ffmpeg does quite well; its mpeg-4 encoding beats WMV and its MPEG-2 beats Theora.  Even the FLV1 nearly ties Badaboom.

4.  The gap between bad and good H.264 encoders is a factor of four.

Of course, a lot of things weren’t surprising either: Apple and Badaboom have atrociously bad H.264 implementations.  We pretty much already knew that.

5. WMV did really badly.

WMV is crippled in this test by the fact that it doesn’t allow transforms other than 8×8 in intra blocks, but this still doesn’t explain why ffmpeg MPEG-4 was able to beat it.

Now for the gotchas:

1.  As per before, this is a test on anime, nothing else; it inherently biases towards video formats with smaller transform size (like H.264), so the results are not generalizable to non-animated material.  Additionally, some of these encoders are built for speed, and so comparing their quality to extremely slow encoders is not entirely fair.

2.  I had a number of issues with decodes getting out of sync and giving absurdly low SSIM values.  I think I resolved all such issues, but I can’t guarantee it.

3.  The only way I was able to get perfect sync was using the ffmpeg Real and WMV decoders, which may not be entirely bit-exact, so the results may be slightly off for these tests.

4.  Theora’s ratecontrol apparently had some issues on this clip (according to a Theora dev), so it could significantly improve in future tests.

5.  Indeo 5.1 dropped frames, which is why its SSIM is so low.  Of course, if the encoder really ran out of bits so badly that it had to drop frames, we can’t do much about it.

Above all, this graph shows the enormous importance of good implementations: a bad H.264 implementation (Apple) can lose to a good implementation (ffmpeg) of a last-gen format (MPEG-4 ASP).

For reference, almost all (I might have missed one or two) of the encoded clips can be found here.

Postscript re Theora (October 6th):

With the release of Theora 1.1, I re-tested Theora due to a request by a reader.  The results are summarized as follows (and I will explain why they are not on the graph):

Theora has not improved significantly since the beta version I tested.  Specifically, the score I got on the beta was 17.35 and the score on the final version was 16.56; if anything a slight negative change.  There was one improvement: the encoder achieved within 2% of the target bitrate, rather than me having to try a dozen bitrates to get it to reach the target as previously.

Anyways, I did much more detailed investigation this time, and discovered the problem…

The encoder was dropping frames, even in two-pass mode.  To be exact, 124 frames.

This is of course a sure path to quality disaster and suggests some serious bugs.  I did a constant-quality encode and found that -v 4.1 was needed to get the bitrate necessary–which clearly means that the encoder wasn’t running out of bits during the two-pass encode and makes the dropped frames seem even more weird.  But what if we ignore the dropped frames, sync up the remaining frames, and measure their quality?  Obviously, this unfairly biases in favor of Theora as it assumes the dropped frames wouldn’t have cost any bits, but let’s do it for argument’s sake.

The result is a score on the graph of 23.13, putting Theora a bit below ffmpeg FLV1. This makes much more sense given that the Theora format itself is quite similar to H.263 (which FLV1 is based off of).  Of course, until the encoder is fixed such that it doesn’t drop frames, even that number is dubious.

Thanks to xiphmont for the nailfps plugin for mplayer, which was able to sync up the video despite the large number of dropped frames.

17.33703

38 Responses to “Encoding animation”

  1. CryptWizard Says:

    Thanks for the explanation and all those tests.
    I can tell you put a lot of effort into it.

  2. Oipo Says:

    The settings for x264 ultrafast and x264 medium don’t differ. Typo/thinko? (both have the veryfast preset and two-pass/ssim-tune switches)

    Otherwise, interesting results.

  3. Liorithiel Says:

    I guess some of these apply also to encoding screencasts?

  4. Ricardo Says:

    Hi

    could you publish the whole x264 command lines like you did for ffmpeg/mpeg2?

  5. Dark Shikari Says:

    @Oipo: Fixed.

    @Ricardo: I did publish the whole x264 commandline.

  6. A Says:

    Ok, now do it again with other kinds of source material (not that I expect much different results other than a mild flattening of the differences) :)

    You talk about sharp edges and how they’re hard. Unless I’m mistaken, H.264 has one thing that sets it rather ahead here: intra prediction in image (not transformed) space. Plus with a 4×4 transform, it’s hard to talk about “sharp” since most content is “anti-aliased” in a way that color changes take about 3 pixels or so, so to speak. So I don’t think this trips H.264 too bad – in fact, its design seems to be ideal for this kind of content (compared with everything else). I didn’t know WMV couldn’t use 8×4/4×8/4×4 transforms for intra, that’s a very unfortunate choice for the format (hardly the only one though).

    That brings me (out of curiosity) to a question about H.264 – now that you’re familiar with it, what do you think of it? I mean, there’s some horrible stuff (interlacing, some picture management stuff…), some overconvoluted stuff (many prediction modes, two entropy coders the weakest of which also has rather complicated coefficient encoding…), some questionably useless stuff (4×4 motion partitions which x264 doesn’t even implement for B-frames…), some apparently weak stuff (the transforms come to mind, with 8×8 added late and missing 8×4 and 4×8 that WMV has – also they need to be “managed” using aq to produce satisfactory results wasting bits in qp changes – MS designed them and while I think it’s far-fetched they might have been the victim of a conflict of interest)…

    So, ignoring the bloat for a moment (stuff that nobody uses such as those in the extended profile), do you think H.264 is “good” design-wise (as in reasonably complex for the achieved efficiency, and not having many design mistakes or low-hanging fruits that would have improved compression easily)?

  7. Dark Shikari Says:

    @A

    You’re entirely right about intra prediction–it does help a lot, though if you look at the DCT coefficients in a stream analyzer, it doesn’t help as much as one would like; the edges it predicts are often “far enough” from the real edge that there’s still a lot of residual, especially with curved edges.

    I would say H.264 is a pretty good design if you ignore MBAFF; it’s very versatile and overall has fewer flaws than most previous specs. If I had to point out the most egregious issues:

    1. Accumulating rounding error (qpel + bipred). This might be losing up to 5-10% efficiency in some cases, maybe even more with multi-level b-pyramid.
    2. There’s a lot of redundancy in CAVLC syntax elements, and even a bit in CABAC (CBFs and delta-QP are suboptimal).
    3. H.264 could probably benefit from mixing transforms within macroblocks, which doesn’t cost any speed (in a test I did, it gave ~1% improvement).
    4. The small partitions are mostly useless and should probably go. They exist because H.264 was originally designed for CIF video…

    There’s a ton of other things, but those are the main things I can think of that don’t increase complexity at all.

    I wouldn’t say that CAVLC is overcomplicated by the way; the residual coder is quite a refreshing break from the usual 2D/3D VLC system.

  8. Mat Says:

    BTW, lot’s of fansub now use h264 for encoding anime.

    Here [1] a config they used in the last anime I watch (1280×720 23.976 fps). Yes they used x264 :)

    [1]
    x264 – core 67 r1162M f7bfcfa – H.264/MPEG-4 AVC codec – Copyleft 2003-2009 – http://www.videolan.org/x264.html – options: cabac=1 ref=9 deblock=1:-1:-1 analyse=0×3:0×113 me=umh subme=9 psy_rd=1.0:0.0 mixed_ref=1 me_range=24 chroma_me=1 trellis=2 8x8dct=1 cqm=0 deadzone=21,11 chroma_qp_offset=-2 threads=6 nr=0 decimate=1 mbaff=0 bframes=6 b_pyramid=1 b_adapt=2 b_bias=0 direct=1 wpredb=1 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=1773 ratetol=1.0 qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30 aq=1:0.80

  9. Chengbin Says:

    Dark Shikari, great test! Thank you so much for your effort.

    One thing caught my mind. Why is DivX not included? Is it because it is not a free encoder?

  10. Chengbin Says:

    I forgot to mention, why is AQ strength at 1 instead of 0.6?

  11. Ben Waggoner Says:

    Expression Encoder by default will save a Settings.dat which is an XML file of the settings actually used. Could you share the codec-related section of that?

    “Best Quality” tweaks some parameters of the underlying setting, but doesn’t do anything particularly magical. Anime like this may need some additional tuning beyond the presets (I-frame DQuant may be helpful, for example, or 2 B-frames instead of one).

    I’ll create an Expression Encoder 2 Main Profile encode as well, just out of curiosity, and a Smooth Streaming VBR variable sized VC-1 encode.

  12. Dark Shikari Says:

    @Chengbin

    The rules say no source-specific tuning.

    About DivX; their encoder currently doesn’t allow keyframe intervals of over 96 due to a bug, so it would be unfair to pit it against the other H.264 encoders which are set to a much longer keyframe interval.

    @Benwaggoner

    I just loaded the Best setting, I don’t think I touched any of it, so you should be able to replicate it easily.

  13. wolf550e Says:

    Why is FFMPEG MPEG-4 better than Xvid? And could you give the cpu time and elapsed time (for multhithreaded encoders), or at least indicate when there’s an order of magnitude difference?

  14. Ricardo Says:

    I cant encode anything with the command lines you used, its the first time i use a x264 build since you introduced presets in x264, i guess i must be doing something wrong since im not used with presets in x264, one example:

    x264.exe –preset placebo –tune ssim –rc-lookahead 250, two-pass “teste.mp4″ “teste.avs”

    i’ve looked here but cant understand what im doing wrong…thanks anyway, keep up the good work

  15. Ricardo Says:

    forgot to include the link related to presets
    http://forum.doom9.org/showthread.php?t=148149

  16. Wyti Says:

    Hi, i’m realy interested in trying ffmpeg because of his amazing mpeg4 ASP performances !

    But i’m under windows, and i want to know if something like ffdshow (with use ffmpeg) can give me about the same results ?
    Or do i have to try to compile ffmpeg for windows to test this ?

  17. Fruit Says:

    Wyti: you can find ffmpeg builds for windows, or you can use mencoder. ffdshow installs wfv encoders for the ASP codec, but I don’t know how uptodate it is and whether it allows all the commandline options.

    You will probably need to look into the documentation of ffmpeg/mencoder to understand all the options…

  18. Dark Shikari Says:

    @Ricardo

    Make sure you have the latest version of x264.

    @Wyti

    You can get the latest ffmpeg builds at http://ffmpeg.arrozcru.com/autobuilds/ . Note that if you actually want to use ffmpeg in reality, you should probably use somewhat less insane settings; 8 B-frames with RD-optimal B-frame decision and an exhaustive motion search is probably not practical for most real encodes.

    @wolf

    This test, as should be pretty obvious, inherently benefits encoders that have more advanced (and slower) analysis options. It would be interesting to see whether Xvid wins in a test where all encoders have to go the same speed, but such a test is incredibly difficult to do (since everyone will always complain that you aren’t using optimal settings for some speed). ffmpeg does have a few unique features that Xvid doesn’t have which might have been useful, e.g. QNS.

  19. Ricardo Says:

    im using the same build you used, not an old one, i went through CMD and x264 display the following error:

    Rawyuv input requires a resolution

    whatever i try to convert (mp4 or avi) i always receive that message

    im using the following avs:
    Directshowresize(“teste.avi)

    the same avs, video works with an older build before you implemented presents into x264

    i can view the avs on mpc and virtualdub

  20. Wyti Says:

    Sorry to come again with that…
    First thanks for the links i can use ffmpeg now and it’s a good thing, but it’s strange, you’re using options who aren’t in the doc at all… like -flags mv4+qpel+mv0+cbp+aic -dia_size 1040 -b_strategy 2 -cmp sad -subcmp rd -precmp sad -last_pred 4 -subq 8 -bidir_refine 4 -trellis 1 -qns 3 (yeah it’s a lot…).
    The only doc i found is this (http://ffmpeg.org/ffmpeg-doc.html) and the files in the doc folder.
    Being on windows, i don’t have acces to the man page or anything else that could give me some more settings…
    where can i find complete ffmpeg switch documentation ? (and if thoses switch were explained in a way i can understand what they do it would be great)

  21. Dark Shikari Says:

    @Ricardo

    This isn’t really a question for the blog here; you need to learn how to use x264. Your mistake is that you forgot to add ConvertToYV12() at the end of your script.

    @Wyti

    There is no man page.

    ffmpeg -h

    Also note a couple of the options are completely undocumented, e.g. dia_size, which is only explained in the code itself. Yes, the ffmpeg interface is incredibly baroque. For everyone’s information, -dia_size 1040 means “exhaustive search of radius 16″, or in x264, –me esa –merange 16.

  22. Ben Waggoner Says:

    And here’s the WMV and H.264 versions from Expression Encoder I promised.

    http://cid-bee3c9ac9541c85b.skydrive.live.com/browse.aspx/.Public/Maikaze

    This may have revealed a bug in EEv3, as SAD and Hadamard motion match are producing identical output in identical time.

  23. nurbs Says:

    @ricardo
    1st pass:
    x264.exe –-preset placebo –-tune ssim –-rc-lookahead 250 –pass 1 –output NUL “teste.avs”
    2nd pass:
    x264.exe –-preset placebo –-tune ssim –-rc-lookahead 250 –pass 2 –output “teste.mp4″ “teste.avs”

    BTW x264 comes with a very extensive help that can be accessed via the –longhelp switch, so if you have trouble with the options again this is probably the best thing to do.

  24. nurbs Says:

    Either I’m completely incapable of typing or this board sometimes converts two dashes (–) to one dash (-).

  25. JoeH Says:

    It would be interesting to see how MainConcept compares – I use their codec sometimes as implemented into TMPGEncXpress 4 and usually get pretty good results if I set the settings pretty high.

  26. Multimedia Mike Says:

    @nurbs: Yeah, the WP forum, or perhaps the theme/stylesheet, has an obnoxious habit of changing double dashes to single, long dashes.

    About the benchmark– wait, did you say that FFmpeg’s SVQ1 encoder is better than Apple’s? I never really found that to be the case (and I helped write the FFmpeg encoder).

  27. Pengvado Says:

    Re SVQ1: Apple’s encoder spent all of its bits on I-frames, and then had nothing left for any motion. It dropped frames, or coded any new object DC-only, or used Skip blocks in places where one frame doesn’t at all match the previous. Whereas even if FFmpeg had to go to really high lambdas, it still spent enough of its bits on mvs to accurately code anything that could be motion compensated.

    Screenshot Apple SVQ1, FFmpeg SVQ1.

    Video Apple SVQ1, and Dark Shikari posted the FFmpeg one.

  28. Multimedia Mike Says:

    @Pengvado: Interesting. All of those old SVQ1-encoded movie trailers looked fine. I wonder what’s different? Did they throw more bitrate and/or less video resolution at the problem?

    BTW, “DC-only” should probably be “mean only” for SVQ1. Though I forget — could the quantized DC coefficient of a block be similar to a mathematical mean of all the samples?

  29. ASP Says:

    I’ve always wondered why ASP performs so much better than MPEG1/2. Compared with MPEG2, in this comparison:

    * It doesn’t force one slice per MB row (that’s >10% of the bitrate down the drain, still not that bad)
    * It has qpel, not that it should matter much
    * It has 8×8 MVs, again not really something that could justify a 100% gain

    That’s it, right? The VLC tables might be tweaked for lower bitrates though, but otherwise I don’t see where the difference comes from…

  30. Pengvado Says:

    DC = mean. quantized DC = quantized mean.

  31. Dark Shikari Says:

    @ASP

    Better VLC tables, for one, especially at low bitrates. And of course there’s the fact that at this low a bitrate, a ton of blocks will be skipped or similar, so more efficient syntax coding gives a great improvement. There’s also the AC intra prediction. The qpel probably does help a lot, given that it is anime, despite the fact that the ASP qpel filter is generally retarded.

    I’m a bit surprised as well though; I didn’t expect that large a difference.

  32. compn Says:

    you try xvid’s cartoon mode?

  33. Dark Shikari Says:

    @Compn

    Yes, it was used during the test, should have mentioned it.

  34. foxyshadis Says:

    You and pengvado should look into getting a voice on the next mpeg standarization committee. ;)

    Also, my thoughts on next-gen transforms:
    Fit directional wavelets (or curvelets, ridgelets, etc) to the edges as a pre-pass, then apply a 16×16 DCT or wavelet to everything left. You avoid the complexity of multiscale and get roughly the same benefit with inloop deblocking.

    After a lot of experimentation and research surveys, I’ve come to the conclusion that wavelets have no real advantage, and I was a big proponent of them years ago. Every advantage to wavelet standards came from better coding (eg, arithmetic), larger transforms, multiscale, and better implementations, none of which can’t be done with DCT. AVC proved that it could be. The biggest advantage for most companies was that they got there first and could patent the hell out of it.

    While I’m ranting, a survey of current research should be required before being allowed to publish tests and results for compression research, which seems to be dominated by people claiming amazing gains from a novel algorithm, compared to their first braindead attempt.

  35. Dark Shikari Says:

    @foxyshadis

    The problem is that such a thing is effectively a matching-pursuits transform, which has enormous computational complexity–an optimal solution is actually NP and it’s not really certain that a greedy algorithm would do very well either (and it would still be 100+ times slower than the DCT).

    I recall a paper showed, however, that the optimal number of zero coefficients per transform is larger than that supplied by a complete transform, so an overcomplete matching-pursuits transform is not at all unreasonable as an option; there’s just the speed issue and the question of forward transform algorithms.

  36. Patrick Says:

    I always thought that SSIM 1.0 means that the result is identically to the source. Looking at your graph, that can’t be true, because there result > 1.0. Can you point me to some good explanation about SSIM?

  37. Dark Shikari Says:

    @Patrick:

    Notice that the Y axis is 1/(1-SSIM), not SSIM itself. This turns SSIM, which is an inverse metric (0.98 is twice as good as 0.96 is twice as good as 0.92 …) into a linear metric.

  38. onitake Says:

    did you have a look at the cartoon mode of xvid?
    the relevance might be limited with the speed and quality of current h.264 encoders, but it should offer a little quality gain.
    i played with it recently, but didn’t make any comparisons with the standard mode.
    in case you’re interested, i made a trivial ffmpeg patch that adds +chromaopt and +cartoon to flags2.

Leave a Reply