Diary Of An x264 Developer

07/23/2010 (4:01 pm)

Announcing the world’s fastest VP8 decoder: ffvp8

Filed under: ffmpeg,google,speed,VP8 ::

Back when I originally reviewed VP8, I noted that the official decoder, libvpx, was rather slow.  While there was no particular reason that it should be much faster than a good H.264 decoder, it shouldn’t have been that much slower either!  So, I set out with Ronald Bultje and David Conrad to make a better one in FFmpeg.  This one would be community-developed and free from the beginning, rather than the proprietary code-dump that was libvpx.  A few weeks ago the decoder was complete enough to be bit-exact with libvpx, making it the first independent free implementation of a VP8 decoder.  Now, with the first round of optimizations complete, it should be ready for primetime.  I’ll go into some detail about the development process, but first, let’s get to the real meat of this post: the benchmarks.

We tested on two 1080p clips: Parkjoy, a live-action 1080p clip, and the Sintel trailer, a CGI 1080p clip.  Testing was done using “time ffmpeg -vcodec {libvpx or vp8} -i input -vsync 0 -an -f null -”.  We all used the latest SVN FFmpeg at the time of this posting; the last revision optimizing the VP8 decoder was r24471.

Parkjoy graphSintel graph

As these benchmarks show, ffvp8 is clearly much faster than libvpx, particularly on 64-bit.  It’s even faster by a large margin on Atom, despite the fact that we haven’t even begun optimizing for it.  In many cases, ffvp8′s extra speed can make the difference between a video that plays and one that doesn’t, especially in modern browsers with software compositing engines taking up a lot of CPU time.  Want to get faster playback of VP8 videos?  The next versions of FFmpeg-based players, like VLC, will include ffvp8.  Want to get faster playback of WebM in your browser?  Lobby your browser developers to use ffvp8 instead of libvpx.  I expect Chrome to switch first, as they already use libavcodec for most of their playback system.

Keep in mind ffvp8 is not “done” — we will continue to improve it and make it faster.  We still have a number of optimizations in the pipeline that aren’t committed yet.

Developing ffvp8

The initial challenge, primarily pioneered by David and Ronald, was constructing the core decoder and making it bit-exact to libvpx.  This was rather challenging, especially given the lack of a real spec.  Many parts of the spec were outright misleading and contradicted libvpx itself.  It didn’t help that the suite of official conformance tests didn’t even cover all the features used by the official encoder!  We’ve already started adding our own conformance tests to deal with this.  But I’ve complained enough in past posts about the lack of a spec; let’s get onto the gritty details.

The next step was adding SIMD assembly for all of the important DSP functions.  VP8′s motion compensation and deblocking filter are by far the most CPU-intensive parts, much the same as in H.264.  Unlike H.264, the deblocking filter relies on a lot of internal saturation steps, which are free in SIMD but costly in a normal C implementation, making the plain C code even slower.  Of course, none of this is a particularly large problem; any sane video decoder has all this stuff in SIMD.

I tutored Ronald in x86 SIMD and wrote most of the motion compensation, intra prediction, and some inverse transforms.  Ronald wrote the rest of the inverse transforms and a bit of the motion compensation.  He also did the most difficult part: the deblocking filter.  Deblocking filters are always a bit difficult because every one is different.  Motion compensation, by comparison, is usually very similar regardless of video format; a 6-tap filter is a 6-tap filter, and most of the variation going on is just the choice of numbers to multiply by.

The biggest challenge in an SIMD deblocking filter is to avoid unpacking, that is, going from 8-bit to 16-bit.  Many operations in deblocking filters would naively appear to require more than 8-bit precision.  A simple example in the case of x86 is abs(a-b), where a and b are 8-bit unsigned integers.  The result of “a-b” requires a 9-bit signed integer (it can be anywhere from -255 to 255), so it can’t fit in 8-bit.  But this is quite possible to do without unpacking: (satsub(a,b) | satsub(b,a)), where “satsub” performs a saturating subtract on the two values.  If the value is positive, it yields the result; if the value is negative, it yields zero.  Oring the two together yields the desired result.  This requires 4 ops on x86; unpacking would probably require at least 10, including the unpack and pack steps.

After the SIMD came optimizing the C code, which still took a significant portion of the total runtime.  One of my biggest optimizations was adding aggressive “smart” prefetching to reduce cache misses.  ffvp8 prefetches the reference frames (PREVIOUS, GOLDEN, and ALTREF)… but only the ones which have been used reasonably often this frame.  This lets us prefetch everything we need without prefetching things that we probably won’t use.  libvpx very often encodes frames that almost never (but not quite never) use GOLDEN or ALTREF, so this optimization greatly reduces time spent prefetching in a lot of real videos.  There are of course countless other optimizations we made that are too long to list here as well, such as David’s entropy decoder optimizations.  I’d also like to thank Eli Friedman for his invaluable help in benchmarking a lot of these changes.

What next?  Altivec (PPC) assembly is almost nonexistent, with the only functions being David’s motion compensation code.  NEON (ARM) is completely nonexistent: we’ll need that to be fast on mobile devices as well.  Of course, all this will come in due time — and as always — patches welcome!

Appendix: the raw numbers

Here’s the raw numbers (in fps) for the graphs at the start of this post, with standard error values:

Core i7 620QM (1.6Ghz), Windows 7, 32-bit:
Parkjoy ffvp8: 44.58 +/- 0.44
Parkjoy libvpx: 33.06 +/- 0.23
Sintel ffvp8: 74.26 +/- 1.18
Sintel libvpx: 56.11 +/- 0.96

Core i5 520M (2.4Ghz), Linux, 64-bit:
Parkjoy ffvp8: 68.29 +/- 0.06
Parkjoy libvpx: 41.06 +/- 0.04
Sintel ffvp8: 112.38 +/- 0.37
Sintel libvpx: 69.64 +/- 0.09

Core 2 T9300 (2.5Ghz), Mac OS X 10.6.4, 64-bit:
Parkjoy ffvp8: 54.09 +/- 0.02
Parkjoy libvpx: 33.68 +/- 0.01
Sintel ffvp8: 87.54 +/- 0.03
Sintel libvpx: 52.74 +/- 0.04

Core Duo (2Ghz), Mac OS X 10.6.4, 32-bit:
Parkjoy ffvp8: 21.31 +/- 0.02
Parkjoy libvpx: 17.96 +/- 0.00
Sintel ffvp8: 41.24 +/- 0.01
Sintel libvpx: 29.65 +/- 0.02

Atom N270 (1.6Ghz), Linux, 32-bit:
Parkjoy ffvp8: 15.29 +/- 0.01
Parkjoy libvpx: 12.46 +/- 0.01
Sintel ffvp8: 26.87 +/- 0.05
Sintel libvpx: 20.41 +/- 0.02

132 Responses to “Announcing the world’s fastest VP8 decoder: ffvp8”

  1. Markus Says:

    Why would Chrome switch to ffvp8? Both Chrome and libvpx are by Google. That would mean that Google is admitting its stupidity.

  2. Dark Shikari Says:

    @Markus

    Does it make me stupid if someone else can write faster code than me? I doubt it. Furthermore, Google has already been referring people to libavcodec’s VP8 decoder as documentation on the VP8 video format, so I doubt they have anything against it.

  3. Amadeus Says:

    Incredible.

    I wonder; Since the initial release of libvpx Google have on a weekly basis improved the code.

    Can these, and future, improvements be merged into ffvp8, or are the code bases so different now, that improvements in libvpx can’t be merged to ffvp8?

  4. Dark Shikari Says:

    @Amadeus

    We’ve already adopted most optimizations that looked interesting. Some examples include the merged SSE chroma deblocking and the unrolled-tree coefficient decoding. There might be a couple left, but we wouldn’t be this fast unless we went to a pretty good effort to combine all the best ideas.

    You can’t “merge” changes like that anyways; ffvp8 is not a fork of libvpx, but a completely new implementation.

  5. kl Says:

    Awesome work! Thanks!

    I don’t see anything wrong with Google using this implementation. libvpx is reference codec, but that doesn’t mean it’s best in performance.

  6. Amadeus Says:

    @Dark Shikari

    So from now on, improvements made in libvpx aren’t really interesting from a ffmpeg developers point of view?

  7. Dark Shikari Says:

    @Amadeus

    Of course they’re interesting. If it’s an idea we haven’t thought of before, we can consider how it would apply to ffvp8.

  8. anonymous Says:

    Interesting indeed. It would make sense if Mozilla and Opera switched to ffv8, they have absolutely no excuse whatsoever for not using the better, more cleanly written decoder instead of the inferior one; ffvp8 is completely free software.

  9. I.K. Says:

    I couldn’t really tell, if it was a joke or not, but can Google use ffvp8 for Chrome and Chrome OS? Both are closed source.

    I also read that the ffmpeg decoder is only 1400 lines of code. Would that mean, that Google would have to ship H.264 code inorder to use ffvp8?

    Does libvpx really compile on PPC and ARM as of this point?

  10. Dark Shikari Says:

    @I.K.

    Chrome is open source, as Chromium. Chrome is just the name for their official version. Furthermore, the LGPL, the license ffmpeg uses, allows linking from closed-source software anyways.

    The ffmpeg decoder is about 1700 lines of code… plus 500 lines of DSP code, a thousand lines of headers (mostly tables and the VP5/6/7/8 arithmetic coder), and around 3000 lines of asm.

    Yes, some is shared with the H.264 decoder, but that’s because those parts do exactly the same thing. If that counts as “shipping H.264 code”, shipping libvpx is “shipping H.264 code”. Also, Chrome already ships ffmpeg’s H.264 decoder anyways.

    Yes, libvpx supports PPC and ARM, and has optimizations for them.

  11. anonymous Says:

    Could it be that some vendors are afraid of shipping ffvp8 due to the fear of ffvp8 not supporting new revisions of the format in a timely manner compared to libvpx, whose development is kept in sync with the experimental branch changes?

  12. Dark Shikari Says:

    @anonymous

    The experimental branch is not intended to be used for actual video distribution. If and when Google wants, they’ll announce a VP8.1 or VP9 or whatnot. There’s no guarantee of compatibility, backwards or forwards, even within libvpx, with the experimental branch.

  13. I.K. Says:

    @Dark:

    What I meant by Chrome is closed source was, that Chrome contains Flash plugin, and mayby also the PDF viewer…?

    But given that they could just link to ffmpeg, as you wrote, that part if solved :)

    Okay, I guess that doesn’t count as H.264 code then.

    About Chrome using ffmpeg’s H.264 decoder. Besides it being fast, doesn’t the licensee get the full H.264 source code when paying MPEG LA?

  14. Dark Shikari Says:

    @I.K.

    I have no idea what you mean by “full H.264 source code”. That sentence makes no logical sense.

  15. Ed McManus Says:

    Impressive work, as always. Did these optimizations make it into the encoder?

  16. Dark Shikari Says:

    @Ed

    ffmpeg doesn’t have a VP8 encoder (besides the wrapper around libvpx). Encoder optimizations are generally different from decoder ones anyways; the functions that are bottlenecks in decoders are often almost negligible in encoders.

  17. Alereon Says:

    Just a quick note that it doesn’t appear that your Core i7 numbers are correct. I’m assuming that’s meant to be the Core i7 720QM, as there is no i7 620QM, and the i7 620M is a 2.67Ghz dual-core CPU. It also looks like Turbo Mode may have been disabled, because if the workload isn’t sufficiently multi-threaded to take good advantage of the two additional cores, the CPU should have clocked itself up to 2.4Ghz, turning in numbers identical to the i5 520M (not counting any effects from the i7 having twice the L3 cache).

    Sorry to nitpick on your otherwise great post, thanks very much for all your hard working to help make VP8 a potentially viable Free alternative!

    Unrelated postscript: Intel can go to hell for making their processor naming scheme as confusing as it is on the Core i(n)-series.

  18. Dark Shikari Says:

    @Alereon

    You’re right, it’s the lowest end QM (the 1.6Ghz one).

    Turbo mode is on, but my laptop’s cooling is awful, so it generally never triggers.

  19. Brian Mingus Says:

    Great work. If I may ask, how is it funded? I have been helped by “Dark Shikari” in IRC, so I assume that some of the work is volunteer, but are any ffmpeg devs paid?

  20. Dark Shikari Says:

    @Brian

    Some devs are paid to work on specific tasks in ffmpeg, usually by companies who need said features. The VP8 decoder was purely volunteer, as far as I know.

  21. QQ Says:

    Doesn’t anybody have an AMD chip to benchmark things with?

  22. Markus Says:

    Dark Shikari Says:
    July 23rd, 2010 at 4:38 pm

    “@Markus

    Does it make me stupid if someone else can write faster code than me?”

    Well, if you were trying to develop a competitive codec for many years (=On2/Google) and someone (=you guys) shows up and outclasses you within a matter of weeks, it makes you look very stupid, especially if you acted as the only professionals around (On2 always did that).

  23. lauri Says:

    You should really make the same test with Atom N450 as it is the successor for N270 and it is 64-bit

  24. Total Emptiness Says:

    Markus: You just don’t get it, do you? Google is the smartest operating company in the whole world. They wrote a prototype and then let others optimize it for free. Not to mention the free publicity. And you still think Google is stupid.

  25. xcomcmdr Says:

    @Dark:
    Did you devs write some specs document based on your investigations of VP8, to avoid (in the future) the traps of the misleading code-that-are-the-specs ?

    Just curious ;-)

  26. Amadeus Says:

    Btw. Exceeding cool trick with satsub()! Thanks for sharing this trick.

  27. Dude Says:

    Did you try using intrinsics rather than assembly first? I’ve found the latest versions of gcc/icc get within 5% of hand-tuned SSE assembly (with obvious implications for readability/maintainability).

  28. Philip Jägenstedt Says:

    Thanks for writing to tell us the tales of your optimizing adventures. It’d be interesting to hear a bit more of why you are faster. Have you found some new areas of improvement that libvpx hasn’t, or is it just the sum of being a little bit faster in many places? (If you tell me libvpx unpacks for abs(a-b), I’ll cry.)

    Anyway, I doubt we’ll see this in Firefox due to the license. We (Opera) could theoretically use it via GStreamer, but I haven’t checked out the code or anything yet. Good work in any case!

  29. zub Says:

    @Markus
    I think you’re stupid. I’ll leave it up to you to figure out why.

    Good job FFMPEG

  30. josth Says:

    great work, congrats. win7 x64 results would have been certainly insteresting tho.

  31. Boris Says:

    @anonymous:

    For what it’s worth, Mozilla’s license (the MPL part, not the GPL/LGPL part) may well in fact prevent it from using an LGPL-only library, depending on details of how the library needs to be used.

  32. nona Says:

    Would using David Schleef’s Orc make sense instead of hand-rolling SIMD assembler for every CPU? Or would Orc just get in the way?

  33. qubit Says:

    @xcomcmdr, RE:code-that-are-the-specs:
    He did mention that “We’ve already started adding our own conformance tests to deal with this.”

    Hopefully those tests will make it upstream into whatever test suites Google/On2 is maintaining.

  34. rikhard Says:

    hello all, thanks for this :) i would like to ask how can we give some $$$ to the developers? do you guys have a paypal donation button somewhere? thanks

  35. Jonathan Lin Says:

    @Markus But wouldn’t admitting your stupidity up front and adopting someone else’s superior solution mean that you’re not so stupid after all? The real stupid move is to turn on the NIH blinders and refuse to accept that someone else can code better than you can.

    It’s no shame to admit that maybe the job could have been done better – afterall, that’s probably part of the reason they opened sourced the thing in the first place – so people can improve it and make it shine.

    They probably knew the decoder wasn’t that great to begin with.

    Humility begets greater things.

  36. DaVince Says:

    @Markus: however, On2 created the format and codec in the first place, and Google devs themselves have been trying to improve the original slow codec, which basically already IS sort of admitting it wasn’t really fast.

    I can see Google switch to fvp8. As far as I’m aware they always try to use the best technology around (when possible, and this is possible).

  37. Lennie Says:

    Are you going to rename your blog now ? ;-)

  38. Martijn Says:

    Thanks for the great work!

    How does this new codec compare to H264?

  39. FarmerBob Says:

    @Markus

    Speaking in generalities, faster != better. The On2 code might be slower because it is more maintainable. (On the other hand, they might just be idiots.)

    Personally, I don’t think I want to maintain code littered with performance hacks like (satsub(a,b) | satsub(b,a)) instead of abs(a – b), clever as that might be. Probably fine if you’re the only one ever maintaining the code, not so good if you’re a company that has to consider employee turnover and code maintenance.

  40. Dark Shikari Says:

    @Philip

    One of the main reasons ffvp8 is faster is because of libvpx’s exceedingly bad cache access patterns. Like basically every piece of On2 software ever written (this is an old habit of theirs — hard to break), libvpx does everything in passes. That is, it decodes the bitstream in one pass over the frame, then it does inter prediction, then it does idct… each time doing a full pass over the frame. ffvp8 only makes one pass, doing everything in that pass. At high resolutions this gives it a significant speed benefit.

    Of course, even without this fact, it’s still significantly faster — this is just the biggest reason for the speed gap. It’s not the only one.

  41. D Says:

    Great Job everyone! The power of an open source community. Still I think Google owes you…

  42. Orthochronous Says:

    @nona,

    I’ve been looking at orc and the underlying problem, and whilst orc has the ability to actually generate and JIT code whilst the program is running that is rare, I think it has the problem that having the model of being assembler it doesn’t do instruction scheduling or register allocation. So the user has to write “machine-independent source code” that fixes the scheduling and register allocation at the “generic level” before transliteration into machine dependent code.

    If machine independent SIMD is used, I think it is likely to be better overall to be slightly higher level (for programming ease) since it likely increases performance to have “compiler-ish” passes such as instruction scheduling and register allocation be machine-dependent. (I’m actually working on something like this, but partly because you only get one “project announcement” I’m waiting until more concepts are working before publicising.)

  43. Matt Says:

    Are ffvp8 and libvpx multi-threaded? How well do they scale compared to each other?

  44. dave Says:

    Apparently the On2/Google employees have already started a rewrite of their decoder, codenamed “dixie”. The first goal they list for that project is “Increase speed by paying more attention to data locality and cache layout, and by eliminating redundant work in general.” so it would appear they agree with DS’s identification of that as a major problem area.

    http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/097fd7d08d104bc6

    Any chance of a comparison to decoders of other formats? It would appear, based on numbers you gave in your initial VP8 analysis and the speed difference documented here that the ffmpeg VP8 decoder is now faster than the ffmpeg H.264 decoder and roughly as fast as the ffmpeg Theora and CoreAVC H.264 decoders.

  45. Dark Shikari Says:

    @dave

    The numbers I gave for difference between ffh264 and libvpx for decoding speed were for x86_32 on my Windows 7 machine. As you can tell by the graph, the gain isn’t as large on my Windows machine as on the x86_64 machines. I’m guessing ffvp8 is probably similar speed to ffh264, or marginally faster, at least on my own machine.

  46. unxed Says:

    Are there any plans on developing native vp8 encoder for FFmpeg?

  47. Dark Shikari Says:

    @unxed

    That’s a lot more work, and honestly, I doubt it will happen. ffmpeg only has one significant lossy video encoder — its core mpeg encoder. A VP8 encoder probably wouldn’t fit in the framework provided by this, at least not easily (much in the same way that an H.264 encoder wouldn’t).

    But of course, if someone wants to try…

  48. unxed Says:

    @Dark Shikari
    > ffmpeg only has one significant lossy video encoder — its core mpeg encoder

    And what about WMV 7,8 and H.263/261? Are they too simple to be called “significant”?

  49. Relgoshan Says:

    FarmerBob: Fast + Simple + (Same Result) == STFU. It is, in a sense the very definition of Better. A person may as well explore the full power of a modern CPU, otherwise what is the point? And especially 1080p video has been a challenge for VP8, as it will be for another couple of years.

    orthocronous: What, like perl and other interpreted languages? Platform-independent code run by platform-optimized interpreters? Not very good for a video codec, I think. The video itself should be the only platform-independent code being interpreted.

  50. Dark Shikari Says:

    @unxed

    They all use the core mpeg encoder. They’re the same code, with slight variations for each format.

    Keep in mind that an encoder is a very complicated program with many parts — by comparison, the only large differences between many of the formats you mentioned is probably the entropy coder and the headers. These can be swapped out trivially without changing much of the rest of the encoder. That’s why ffmpeg has so many “encoders” that all come out of its core mpeg encoder — they’re all incredibly similar (8×8-DCT-based MPEG-alikes), so most of the same code can be used for them.

    Equally, this is why ffmpeg has no WMV9 encoder — it’s different enough that it can’t easily be retrofitted into the core MPEG encoder.

  51. unxed Says:

    @Dark Shikari

    Thanks for your great work and rapid and informative answers!

  52. unxed Says:

    @Philip

    > Anyway, I doubt we’ll see this in Firefox due to the license.

    Can you explain that?

  53. unxed Says:

    @Dark Shikari

    Graphics says “Core 2 T9300 (2.5Ghz), Mac OS X 10.6.4, 64-bit”, but the raw stats say “Core 2 T9300 (2.5Ghz), Linux, 64-bit”.

    So was it Mac OS X or Linux?

  54. Dark Shikari Says:

    Mac OS X. My mistake, fixed.

  55. craag Says:

    Awesome work, looking forward to the optimisations for Atom so I can watch HD video on my netbook!
    :-D

  56. Bobster Says:

    I don’t have the simtel trailer, but I encoded parkjoy, using ffmpeg git-f43faf4 (July 24/2010) and at 1920/1080 I got:
    frame= 500 fps= 74 q=0.0 Lsize= -0kB time=10.00 bitrate= -0.0kbits/s
    video:0kB audio:0kB global headers:0kB muxing overhead -inf%

    real 0m6.781s
    so 74 frames per second on a Corei7-920 Linux-64 bit (kernel 2.6.35-rc6-git1) stock clock (2.66 GHz)

  57. Ron Overdrive Says:

    Just out of curiosity, how does ffvp8 compare to libvpx in resource usage? Currently on my machine libvpx is a CPU hog.

  58. James Says:

    Making optimizations for x86 is nice, but it would be nicer if I could browse Youtube without my PPC laptop choking.
    Why I can playback 480p h.264 without dropping frames but not 120p 3fps flash/vp8 is a mystery.

  59. Pengvado Says:

    @Dude
    5% isn’t good enough, we don’t optimize solely for the latest version of gcc, and even if it works now we don’t trust the next gcc not to pessimize it. Performance is definitely not monotonic in gcc version, which makes the total effect on maintenance burden not so obvious.

    Intinsics are sometimes more readable, and sometimes not, due to yasm’s better preprocessor.

  60. gfxkiller Says:

    I must say that what you’ve done is impressive indeed. I understand not using proprietary MS Directcompute or NVIDIAs CUDA APIs, but why not use OpenCL, unlock the power of the GPU, and kick it up an order of magnitude or two?

  61. Rob Says:

    How would these speeds compare to a decoder that uses a lot less recources like the WMV-HD/VC-1 decoder which has similar quality video especially on low level hardware

  62. SirDaniel Says:

    My test :)
    Sempron Xp 2600+
    test.webm, ~3Mb/s, 1280×640, 23.976 fps, 2min 6s.
    timecodec.exe, renderer: null

    FFdshow r3512 13 Jul >>>dfps 29,0
    Webm filter 0.9.9.0 >>>dfps 38,4
    MPCVideoDec.ax r2151 >>>dfps 53,6 (svn ffmpeg)

    Very nice improvements even on old CPU. I’d say competitive to best h264 decoders.

  63. Dark Shikari Says:

    @gfxkiller

    GPUs are extremely unsuited to video decoding for a variety of reasons. As far as I’ve seen, GPU decoders basically don’t exist — even the “GPU acceleration” used for H.264 and so forth is almost always an ASIC, not the GPU itself, because the GPU is far too slow.

  64. Dark Shikari Says:

    @Rob

    VC-1 is nowhere near VP8 or H.264 in terms of compression or quality, and is not much faster either. Furthermore, no open source encoder exists, Microsoft’s is a pile of balls, and ffmpeg’s decoder is slow.

  65. Anonymous Says:

    Dark Shikari is tsundere for VP8.

  66. SirDaniel Says:

    Another benchmark. This time with Parkjoy and Sintel Trailer. Same machine and decoders:

    Parkjoy:
    FFDshow >>>dfps 9,2
    Webm filter>>>dfps 11,8
    MPCVideoDec >>>dfps 16,4

    Sintel Trailer:
    FFdshow >>>dfps 14,0
    Webm filter >>dfps 18,8
    MPCVideoDec >>>dfps 27,9

    So, can i expect full speed 1080p vp8 decoding in near future? ;) Sintel playing was *not so bad*.

  67. unxed Says:

    @Dark Shikari

    Another question: can ffvp8 also decode vp4,5,6 and 7?

  68. Dark Shikari Says:

    FYI, it seems some of the asm is very badly optimized for Phenom; in 5 minutes I wrote up a patch that made Phenom about ~6% faster, so more should be coming there too.

    @unxed

    VP 5 and 6. Not 7, as nobody’s reverse-engineered it yet. It’ll probably happen now that 8 is out, as I have a hunch that most of 8 is identical to 7. I’m not quite sure 4 ever existed.

  69. Dark Shikari Says:

    @Anonymous{65}

    I di–didn’t mean to do this for you. I just h-had an extra bit of asm lying around, th-that’s all.

  70. unxed Says:

    @SirDaniel

    Can you share the lastest SVN builds of MPC HC and FFDshow tryouts?

  71. heh Says:

    unxed: ffmpeg (which ffvp8 is part of) can decode VP5 and VP6.

  72. SirDaniel Says:

    XhmikosR’s has added ffmpeg vp8 code into newest MPC-HC and compiled it. Dwnload it from here http://xhmikosr.1f0.de/ I use standalone filter for timecodec.
    FFdshow used libavcodec. I suppose some old one, before optimizations was done. I just saw that newest revision 3515 has updated vp8 too.

  73. unxed Says:

    @Dark Shikari

    > We’ve already started adding our own conformance tests to deal with this.

    Did you plan to release those tests to public or send them to google webm team?

  74. Dark Shikari Says:

    @unxed

    All new tests will be added to the FATE2 set of ffmpeg tests, check cvslog for commits to that.

  75. unxed Says:

    @SirDaniel

    > from here http://xhmikosr.1f0.de/

    Thanks a lot!

  76. unxed Says:

    @SirDaniel

    And how did you manage to load video in .ivf container? What demuxer did you use?

  77. meerkat Says:

    I’d like to write my own H.264 encoder and decoder, in order to understand the H.264 encoding and decoding process.

    I’ve been working on DirectShow projects and I’ve found that DirectShow has some limitations.

    Where would you recommend that I start, in order to understand both the H.264 encoding and decoding process?

    I have several years experience in commercial C++ development, but I’m fairly new to video (about 13 months DirectShow).

  78. SirDaniel Says:

    I used MKVMerge from MKVToolnix to mux it into webm.

  79. Matyas Says:

    Where is the ffvp8 code available from? I tried this: I got the latest ffmpeg via svn but a I did not see ffvp8* filenames. A google search of “ffvp8 source download” did not turn up anything useful. Thanks!

  80. Dark Shikari Says:

    @Matyas

    libavcodec/vp8.c
    libavcodec/vp8dsp.c
    etc

  81. TheGZeus Says:

    @Dark Shikari
    Chrome is Open Core.
    Chromium is open, Chrome is based on Chromium, but with proprietary code added.

    They might have difficulties if they integrated someone else’s code from another project, but would probably have no issue.

  82. Dark Shikari Says:

    @TheGZeus

    What part of it is proprietary?

  83. Alex Says:

    @TheGZeus The point is moot. Both Chrome and Chromium already ship FFmpeg for all the other formats the browsers support.

  84. Amadeus Says:

    @81

    The Flash player is build directly into Chrome.

  85. Alex Says:

    @83 to me its seems to be more of a bundled-with than part-of. It still talks to Chrome via a plugin API and can be not just disabled but deleted entirely.

  86. dyf Says:

    Will these advances in codec make play back and editing of HDSLR video any faster on the Desktop/Workstation or is VP8 all about web delivery?

  87. Huulivoide Says:

    For all who don’t know the chromium can be build to use external ffmpeg instead of the one tha comes whit the source code of chromium. Also is this in the main git, or isit in some “private” branch? I can’t find it from there, but mayby I just dont know what to look for.

  88. Joe P Says:

    libvpx is a reference copy. Reference copies must be cleanly written, easy to read, and understandable. Thus, speed isn’t required for reference copies.

    As google identifies generic optimizations that can be applied to the reference they will do so. Hand-coded assembly language has no place in a reference copy.

    If a reference system is the fastest version, then the developers doing actual working systems need help…

  89. Relgoshan Says:

    …and even bundling their own known-good it is still choppy and crashy. *grml*

    Dark: So you are saying that a format would need to be explicitly designed for stream processors from the ground-up? When video is hardware-accelerated it uses a separate piece of hardware on the board?

  90. Multimedia Mike Says:

    @Dark: VP4 does exist. VfW codec and sample:

    http://samples.mplayerhq.hu/V-codecs/VP4/

    No one cared then and fewer care now.

  91. Dark Shikari Says:

    @Joe P

    Then libvpx isn’t a reference copy according to your own definition. It has tens of thousands of lines of hand-written assembly, enormous amounts of of impenetrable code created via macros to maximize speed, and so forth.

  92. pafnucy Says:

    65. Anonymous Says:
    Dark Shikari is tsundere for VP8.

    69. Dark Shikari Says:
    @Anonymous{65}
    I di–didn’t mean to do this for you. I just h-had an extra bit of asm lying around, th-that’s all.

    It made my day.

  93. Clark Mills Says:

    Not a coder but would like to tip hat with pizza. Where’s your “donate” button? :)

  94. Fruit Says:

    It almost seems you guys should setup donating interface for that ‘SSE4′ Macbook that Roland (the other developer behind this) needs :)

  95. Blue_MiSfit Says:

    Nice work as always, D_S, and the rest of the ffmpeg devs that worked on this!

    Also
    @Relgoshan:
    Absolutely, the decodes all happen on a special ASIC.

    Derek

  96. TGM Says:

    Very awesome, especially when some of us want WebM as HTML5 standard… Donate button?

  97. Michel S. Says:

    I have an N470 Atom netbook that runs 64-bit Linux — if you want, I can provide benchmark numbers; just send me a mail.

  98. Michael Miller Says:

    This seems like a good place to ask my question, if there’s a better venue, please let me know. We have a MediaPointe system which captures s-video and DVI inputs into mp4 format. After editing one of these videos with QuickTimeX and saving to an mov format, it plays fine from the local hard drive but when served from our Darwin Streaming Server, there is some smearing of the video in the upper right and the audio is stuttered to the point of being useless. It reminds me of how scanning professional photographs gives a smeared result, but we shouldn’t have any DRM issues since we created the content. You can see the video here: http://dss-vm.ncsa.illinois.edu/numerical_libraries.mov Any ideas what’s causing this? Why would this occur when streaming from a server and not from the local HD? Any pointers are greatly appreciated.

    Michael

  99. Relgoshan Says:

    The answer is that you are using QuicktimeX and Darwin server. Not to place any hate on Quicktime, but it happens that Quicktime is an awful solution. You may as well code with something that uses the x264 libary, and install VLC on client systems.

  100. NM64 Says:

    @Dark Shikari{69}

    Hah, that explains everything! I was wondering why after that one VP8-bashing post that you still ended up working on VP8.

    Of course as a real-life tsundere, I really should’ve picked up on that…

  101. lu_zero Says:

    @98 you might try feng and see if the output stream is similar or have a look with wireshark.

  102. ashaw Says:

    How difficult would it be to do a similar job with x264 and VP8 as is being done with ffmpeg and VP8?

  103. IgorC Says:

    Dark Shikari,

    It might be uncomfortable question but here it is.
    Is there any chance you consider to code for VP8 encoder? As it’s very similar to H.264. Until now Google keep as an option for experimental branch (VPx?) so useful tools can be added as AQ. The most important x264′s quality tools are also could be applicable to VP8 as VAQ, mbtree, psychovisual enhancements.

    Thank you for fastest VP8 decoder.

  104. TGM Says:

    @IgorC (102)

    Is VP8 like H.264? Links?

    The only people that could really comment on this are people that work on both codecs. And if this true there’d be implications for Google courtesy of the MPEG group…

  105. shmerl Says:

    Thanks for your great work! I hope Firefox would adopt ffvp8 as a video decoder for WebM.

  106. Orthochronous Says:

    @ Relgoshan at 49

    Sorry I’ve been away for a couple of days. About “platform independence”: You’re probably right about video codecs because, as Dark Shikari has pointed out elsewhere in this thread, the core algorithms are very similar in many codecs and there are by definition comparatively few video codecs with real backing (eg, formats for a big online video repository) and they aren’t moving targets. As such, there’s probably enough manpower around to do the coding and regression testing (you do regression test all code on all supported platforms, don’t you?).

    I’m thinking more about up-coming applications in augmented reality and computer vision, where there are many, many more core algorithms. As such, there simply isn’t the manpower available to do everything in highly hand-optimised assembler for all supported platforms. Orc arose precisely because there weren’t enough PEOPLE writing SIMD optimised inner loop patches for gstreamer. Orc is a good attempt at a solution of a real problem, my issue with orc is purely that it assumes that just doing literal transliteration of platform-independent SIMD is sufficient for efficiency, which I think it isn’t.

    But this is going off-topic of this blog, so I’ll stop now.

  107. Dark Shikari Says:

    @IgorC

    If I write for a VP8 encoder, that encoder will be called x264, and there will be a commandline option –vp8.

  108. Abass Says:

    @Dark Shikari

    are you guys going to use Cuda for vp8 decoder?

  109. NM64 Says:

    Cuda? You mean that Nvidia-only stuff?

    Hate to break it to ya, but even if ATI’s OpenCL drivers suck, you still gotta remember that Intel is the current leader in GPU sales.

  110. tom Says:

    Thank you for the excelent job you’ve done!

    Dark Shikari, what about this encoder comparison: http://x264dev.multimedia.cx/?p=372

    It would be good to hear some news about it.

  111. shmerl Says:

    NM4: However Nvidia is indisputable leader on Unix/Linux desktops. ATI is way behind there.

  112. shmerl Says:

    Mozilla guys seem to be reluctant to use LGPL library with MPL code:
    https://bugzilla.mozilla.org/show_bug.cgi?id=581773
    But it’s not exactly clear what is the problem.

  113. TGM Says:

    @111 (Shmerl):

    There was a mentioning on Groklaw recently about the MPL being re-written to be more GPL friendly.

    Link here: http://www.groklaw.net/article.php?story=20100718112719569

  114. Relgoshan Says:

    Orthocronous: Oh I can see where you are coming from on that front, but ideally this is more useful with scripted languages. In fact, it constitutes much of the development cycle for next-generation web browsers. Similarly, more and more video games can be freely scripted with plaintext commands. As for per-platform conditional optimization of a codec, the question is mostly one of manpower; which you propose to solve with an automated tuning step. In my mind, this at least should not be done on the fly (but done once on installation).

    Dark: 108-110 intrigue me. I’ve been setting up new computers for retail, based on the ION family of chipsets. The generic accelerator certainly does improve video performance, and it appears that such accelerators will become super-common on all platforms eventually. So if you know someone who does mobile opts, how much of VP8 could be accelerated through such a chip? I had heard previously that the six-tap filter (was it luma?) in decoding was too complex for current hardware acceleration. But then again, people once said that x87 was disabled in Long Mode. So what’s the outlook?

  115. amar Says:

    I think the support ffmpeg/ffvp8 is providing for VP8 , that does not contains encoding.
    Now only decoding is supported for webm.
    whereas libvpx supports encoding/decoding both.

  116. SirDaniel Says:

    Dont know if its valuable at all but i have some nice numbers with new update and i like testing ;) :
    Tested on: sempron 1,8GHz, mpc-hc standalone filters, time codec:
    Park joy
    MPCVideoDec >>>went from 16,4 to 18,5 dfps

    Sintel Trailer
    MPCVideoDec >>>went from 27,9 to 31,6 dfps

  117. SirDaniel Says:

    Mhh.. tried newest ffshow. It has somehow better VP8 iplemented.. It goes almost 34 frames with Sintel. Just 2 or 3 frames more and this trailer reaches smooth playback on my stupid cpu.

  118. Relgoshan Says:

    I take it that you think ffvp8 is awesome, then? I certainly do, what with using a netbook. More optimization (if possible) would be great, but it is already more stable than libvpx.

  119. Chris White Says:

    @ Dark Shikari{107}:
    “If I write for a VP8 encoder, that encoder will be called x264, and there will be a commandline option –vp8.”

    Hehe, so true. I’ve read through all of your articles and all the comments on everything on your blog, and you’ve certainly made it clear that, like most modern Mpeg-like codecs, VP8 uses most of the same encoding methods as H264 and is basically a crippled H264 variation. In that respect, adding support for VP8 modifications to the x264 encoder makes the most sense if you were to actually make an encoder.

    Seriously though, dude, you need a break. Either that or some of those hundreds of millions of dollars Google paid for On2′s crap. You’re like the ambassador for VP8 now, stoically highlighting its flaws and fixing them. That needs to be recognized and rewarded… Google, are you listening? It also annoys me to no end that Google was so quick to set the buggy codec’s “spec” in stone rather than having a year or so of development and improvement time before finalizing it. Ugh. It really could be a nice codec if you were just allowed to improve it! Although, considering all the latest hoopla about the VP8 flaws, maybe they will reconsider. Heck, they don’t even have a REAL spec. Google, again, if you are reading this, get your head out of your … and put the codec through a year of serious improvements.

    I feel honored that we had some good irc chats many years ago (no, you won’t remember me, but yes we did discuss Touhou :-) .

  120. Gideon "Gnafu" Mayhak Says:

    @107. Dark Shikari:

    I realize you may have just said this in jest or to prove a point, but I have to ask:

    Is this something you might actually consider? With your knowledge of VP8 (after analyzing it and helping optimize FFmpeg’s decoder), how feasible would it be to add a VP8 output option to x264? Not in a million years, or possible side project? Maybe a Summer of Code project for next year?

  121. Matthew Raymond Says:

    Note that the FFmpeg team has been publically congratulated for their work on ffvp8 on the official WebM Project Blog:

    http://webmproject.blogspot.com/2010/08/ffmpeg-vp8-decoder-implementation.html

  122. Ricardo Santos Says:

    Hi everyone.

    Ive read some comments about firefox not being able to use the ffvp8 decoder for license reasons, altough not an expert cant a plugin be made to enable webm decoding to be done through a ffvp8 plugin? firefox, opera and chrome can be “upgraded” through plugins, why not doing it now? It would certainly (i think) accelerate the adoption process of the webm format or is everyone waiting for adobe to include vp8 decoder through the flash player? They said they would include vp8 decoder but nothing so far…

  123. wiak Says:

    Great work, it will now play realy well on my AMD Vision laptop with Athlon II M300 2Ghz, 4GB DDR2, Mobility Radeon HD 4200 while on battery!

    MPC-HC 2099 was slow and dropped alot of frames, MPC-HC 2397 drops nearly no frames and is alot faster

  124. Captain Redbeard Says:

    It would be interesting to update the results using current svn version of ffvp8 and libvpx 0.9.2 ( https://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/affaf0069c199ca4 ) which both offer some improvements.

  125. Kavan Says:

    @DS may be a long question, and may not be relevant but how does the decoder handle packetloss?

  126. CruNcher Says:

    Nvidia and AMD are both working on implementing VP8 into the next VPX and UVD Generations :)
    At least that’s what they told when Google released it, maybe the coming UVD in the HD6000 series by AMD (Ati is going to die, the fusion is done) will already have this VP8 Decoding support :)
    Also Adobe is most probably gonna support this by then via their GPU interface for both Vendors as well as libvpx software based Playback.
    Though it got very silent on these things ;) also one reason is the MPEG announcement of keeping H.264 longer free for consumer, i guess both AMD/Nvidia are reevaluating currently the usefulness (cost vs income) on these VP8 plans also in sight of Google TV importance.

    Btw great Performance Optimization work Dark :)

  127. Vlad Says:

    Could you please write a blog post about HEVC a.k.a. H.265? It would be very interesting to know your opinions about various technologies and algorithms that are considered for HEVC (and tested in KTA). Something like preliminary overview.
    Sorry for offtopic.

  128. Dark Shikari Says:

    @Gideon

    It may happen. I’ve thought about it… possibly just to prove a point ;)

  129. M Says:

    DS : with the new sdk release, you should redo a bench :)

  130. VPX Says:

    http://blog.webmproject.org/2010/10/vp8-codec-sdk-aylesbury-release.html

    “20-40% (average 28%) improvement in libvpx decoder speed”

  131. Axe Says:

    @ Dark Shikari :

    I was looking at the code of ffmpeg’s implementation of libvpx & I would have loved to see more commenting in the code, as it seems very difficult to follow the flow.

    Can you plz give me more details about “avpicture_layout” & what is it doing?

  132. Gideon "Gnafu" Mayhak Says:

    @128. Dark Shikari:

    I know this blog post is a bit older now, but I still felt this was the best place to comment about something I’ve been trying to read up on over the last couple days.

    I’ve been discovering mentions and discussions regarding what seems to be a semi-official “fork” of x264 for VP8 encoding called xvp8 (led by Ronald/BBB). As I’m very excited about VP8 and WebM, I’m very excited that there is work being done on a VP8 “version” of x264. In my searching, I also discovered another project referring to itself as x262 (an MPEG-2 encoder based on x264).

    Now, in light of this, I had a thought; and perhaps it has floated in the minds of you, Ronald, and others: do you see the possibility of a future project, perhaps called xencoder (if that’s not already taken by something), that would incorporate all these things into a single encoder? While I know many people will still want a separate xvp8 that they feel has no patent-encumbered parts, a unified “xencoder” would be awesome for the larger community.

    Or will x264 always be x264, with or without a –vp8-output option, and xvp8 will always be a separate project? And while x262 seems perhaps farther removed, have you seen anything about it that has made you interested in incorporating that into x264′s main codebase as well?

    Thank you for any thoughts you can shed on this, as I’m just wanting to get as much information I can. This all has me very excited :-) .

    P.S. Can you shed any further light on xvp8 at this point? Is Ronald’s x264 repository on GitHub really xvp8, or just x264?

Leave a Reply