Announcing the world’s fastest VP8 decoder: ffvp8
Back when I originally reviewed VP8, I noted that the official decoder, libvpx, was rather slow. While there was no particular reason that it should be much faster than a good H.264 decoder, it shouldn’t have been that much slower either! So, I set out with Ronald Bultje and David Conrad to make a better one in FFmpeg. This one would be community-developed and free from the beginning, rather than the proprietary code-dump that was libvpx. A few weeks ago the decoder was complete enough to be bit-exact with libvpx, making it the first independent free implementation of a VP8 decoder. Now, with the first round of optimizations complete, it should be ready for primetime. I’ll go into some detail about the development process, but first, let’s get to the real meat of this post: the benchmarks.
We tested on two 1080p clips: Parkjoy, a live-action 1080p clip, and the Sintel trailer, a CGI 1080p clip. Testing was done using “time ffmpeg -vcodec {libvpx or vp8} -i input -vsync 0 -an -f null -”. We all used the latest SVN FFmpeg at the time of this posting; the last revision optimizing the VP8 decoder was r24471.
As these benchmarks show, ffvp8 is clearly much faster than libvpx, particularly on 64-bit. It’s even faster by a large margin on Atom, despite the fact that we haven’t even begun optimizing for it. In many cases, ffvp8′s extra speed can make the difference between a video that plays and one that doesn’t, especially in modern browsers with software compositing engines taking up a lot of CPU time. Want to get faster playback of VP8 videos? The next versions of FFmpeg-based players, like VLC, will include ffvp8. Want to get faster playback of WebM in your browser? Lobby your browser developers to use ffvp8 instead of libvpx. I expect Chrome to switch first, as they already use libavcodec for most of their playback system.
Keep in mind ffvp8 is not “done” — we will continue to improve it and make it faster. We still have a number of optimizations in the pipeline that aren’t committed yet.
Developing ffvp8
The initial challenge, primarily pioneered by David and Ronald, was constructing the core decoder and making it bit-exact to libvpx. This was rather challenging, especially given the lack of a real spec. Many parts of the spec were outright misleading and contradicted libvpx itself. It didn’t help that the suite of official conformance tests didn’t even cover all the features used by the official encoder! We’ve already started adding our own conformance tests to deal with this. But I’ve complained enough in past posts about the lack of a spec; let’s get onto the gritty details.
The next step was adding SIMD assembly for all of the important DSP functions. VP8′s motion compensation and deblocking filter are by far the most CPU-intensive parts, much the same as in H.264. Unlike H.264, the deblocking filter relies on a lot of internal saturation steps, which are free in SIMD but costly in a normal C implementation, making the plain C code even slower. Of course, none of this is a particularly large problem; any sane video decoder has all this stuff in SIMD.
I tutored Ronald in x86 SIMD and wrote most of the motion compensation, intra prediction, and some inverse transforms. Ronald wrote the rest of the inverse transforms and a bit of the motion compensation. He also did the most difficult part: the deblocking filter. Deblocking filters are always a bit difficult because every one is different. Motion compensation, by comparison, is usually very similar regardless of video format; a 6-tap filter is a 6-tap filter, and most of the variation going on is just the choice of numbers to multiply by.
The biggest challenge in an SIMD deblocking filter is to avoid unpacking, that is, going from 8-bit to 16-bit. Many operations in deblocking filters would naively appear to require more than 8-bit precision. A simple example in the case of x86 is abs(a-b), where a and b are 8-bit unsigned integers. The result of “a-b” requires a 9-bit signed integer (it can be anywhere from -255 to 255), so it can’t fit in 8-bit. But this is quite possible to do without unpacking: (satsub(a,b) | satsub(b,a)), where “satsub” performs a saturating subtract on the two values. If the value is positive, it yields the result; if the value is negative, it yields zero. Oring the two together yields the desired result. This requires 4 ops on x86; unpacking would probably require at least 10, including the unpack and pack steps.
After the SIMD came optimizing the C code, which still took a significant portion of the total runtime. One of my biggest optimizations was adding aggressive “smart” prefetching to reduce cache misses. ffvp8 prefetches the reference frames (PREVIOUS, GOLDEN, and ALTREF)… but only the ones which have been used reasonably often this frame. This lets us prefetch everything we need without prefetching things that we probably won’t use. libvpx very often encodes frames that almost never (but not quite never) use GOLDEN or ALTREF, so this optimization greatly reduces time spent prefetching in a lot of real videos. There are of course countless other optimizations we made that are too long to list here as well, such as David’s entropy decoder optimizations. I’d also like to thank Eli Friedman for his invaluable help in benchmarking a lot of these changes.
What next? Altivec (PPC) assembly is almost nonexistent, with the only functions being David’s motion compensation code. NEON (ARM) is completely nonexistent: we’ll need that to be fast on mobile devices as well. Of course, all this will come in due time — and as always — patches welcome!
Appendix: the raw numbers
Here’s the raw numbers (in fps) for the graphs at the start of this post, with standard error values:
Core i7 620QM (1.6Ghz), Windows 7, 32-bit:
Parkjoy ffvp8: 44.58 +/- 0.44
Parkjoy libvpx: 33.06 +/- 0.23
Sintel ffvp8: 74.26 +/- 1.18
Sintel libvpx: 56.11 +/- 0.96
Core i5 520M (2.4Ghz), Linux, 64-bit:
Parkjoy ffvp8: 68.29 +/- 0.06
Parkjoy libvpx: 41.06 +/- 0.04
Sintel ffvp8: 112.38 +/- 0.37
Sintel libvpx: 69.64 +/- 0.09
Core 2 T9300 (2.5Ghz), Mac OS X 10.6.4, 64-bit:
Parkjoy ffvp8: 54.09 +/- 0.02
Parkjoy libvpx: 33.68 +/- 0.01
Sintel ffvp8: 87.54 +/- 0.03
Sintel libvpx: 52.74 +/- 0.04
Core Duo (2Ghz), Mac OS X 10.6.4, 32-bit:
Parkjoy ffvp8: 21.31 +/- 0.02
Parkjoy libvpx: 17.96 +/- 0.00
Sintel ffvp8: 41.24 +/- 0.01
Sintel libvpx: 29.65 +/- 0.02
Atom N270 (1.6Ghz), Linux, 32-bit:
Parkjoy ffvp8: 15.29 +/- 0.01
Parkjoy libvpx: 12.46 +/- 0.01
Sintel ffvp8: 26.87 +/- 0.05
Sintel libvpx: 20.41 +/- 0.02


July 23rd, 2010 at 4:34 pm
Why would Chrome switch to ffvp8? Both Chrome and libvpx are by Google. That would mean that Google is admitting its stupidity.
July 23rd, 2010 at 4:38 pm
@Markus
Does it make me stupid if someone else can write faster code than me? I doubt it. Furthermore, Google has already been referring people to libavcodec’s VP8 decoder as documentation on the VP8 video format, so I doubt they have anything against it.
July 23rd, 2010 at 5:23 pm
Incredible.
I wonder; Since the initial release of libvpx Google have on a weekly basis improved the code.
Can these, and future, improvements be merged into ffvp8, or are the code bases so different now, that improvements in libvpx can’t be merged to ffvp8?
July 23rd, 2010 at 5:25 pm
@Amadeus
We’ve already adopted most optimizations that looked interesting. Some examples include the merged SSE chroma deblocking and the unrolled-tree coefficient decoding. There might be a couple left, but we wouldn’t be this fast unless we went to a pretty good effort to combine all the best ideas.
You can’t “merge” changes like that anyways; ffvp8 is not a fork of libvpx, but a completely new implementation.
July 23rd, 2010 at 5:31 pm
Awesome work! Thanks!
I don’t see anything wrong with Google using this implementation. libvpx is reference codec, but that doesn’t mean it’s best in performance.
July 23rd, 2010 at 5:38 pm
@Dark Shikari
So from now on, improvements made in libvpx aren’t really interesting from a ffmpeg developers point of view?
July 23rd, 2010 at 5:40 pm
@Amadeus
Of course they’re interesting. If it’s an idea we haven’t thought of before, we can consider how it would apply to ffvp8.
July 23rd, 2010 at 5:43 pm
Interesting indeed. It would make sense if Mozilla and Opera switched to ffv8, they have absolutely no excuse whatsoever for not using the better, more cleanly written decoder instead of the inferior one; ffvp8 is completely free software.
July 23rd, 2010 at 5:50 pm
I couldn’t really tell, if it was a joke or not, but can Google use ffvp8 for Chrome and Chrome OS? Both are closed source.
I also read that the ffmpeg decoder is only 1400 lines of code. Would that mean, that Google would have to ship H.264 code inorder to use ffvp8?
Does libvpx really compile on PPC and ARM as of this point?
July 23rd, 2010 at 5:54 pm
@I.K.
Chrome is open source, as Chromium. Chrome is just the name for their official version. Furthermore, the LGPL, the license ffmpeg uses, allows linking from closed-source software anyways.
The ffmpeg decoder is about 1700 lines of code… plus 500 lines of DSP code, a thousand lines of headers (mostly tables and the VP5/6/7/8 arithmetic coder), and around 3000 lines of asm.
Yes, some is shared with the H.264 decoder, but that’s because those parts do exactly the same thing. If that counts as “shipping H.264 code”, shipping libvpx is “shipping H.264 code”. Also, Chrome already ships ffmpeg’s H.264 decoder anyways.
Yes, libvpx supports PPC and ARM, and has optimizations for them.
July 23rd, 2010 at 6:01 pm
Could it be that some vendors are afraid of shipping ffvp8 due to the fear of ffvp8 not supporting new revisions of the format in a timely manner compared to libvpx, whose development is kept in sync with the experimental branch changes?
July 23rd, 2010 at 6:02 pm
@anonymous
The experimental branch is not intended to be used for actual video distribution. If and when Google wants, they’ll announce a VP8.1 or VP9 or whatnot. There’s no guarantee of compatibility, backwards or forwards, even within libvpx, with the experimental branch.
July 23rd, 2010 at 6:08 pm
@Dark:
What I meant by Chrome is closed source was, that Chrome contains Flash plugin, and mayby also the PDF viewer…?
But given that they could just link to ffmpeg, as you wrote, that part if solved
Okay, I guess that doesn’t count as H.264 code then.
About Chrome using ffmpeg’s H.264 decoder. Besides it being fast, doesn’t the licensee get the full H.264 source code when paying MPEG LA?
July 23rd, 2010 at 6:18 pm
@I.K.
I have no idea what you mean by “full H.264 source code”. That sentence makes no logical sense.
July 23rd, 2010 at 6:38 pm
Impressive work, as always. Did these optimizations make it into the encoder?
July 23rd, 2010 at 6:44 pm
@Ed
ffmpeg doesn’t have a VP8 encoder (besides the wrapper around libvpx). Encoder optimizations are generally different from decoder ones anyways; the functions that are bottlenecks in decoders are often almost negligible in encoders.
July 23rd, 2010 at 9:22 pm
Just a quick note that it doesn’t appear that your Core i7 numbers are correct. I’m assuming that’s meant to be the Core i7 720QM, as there is no i7 620QM, and the i7 620M is a 2.67Ghz dual-core CPU. It also looks like Turbo Mode may have been disabled, because if the workload isn’t sufficiently multi-threaded to take good advantage of the two additional cores, the CPU should have clocked itself up to 2.4Ghz, turning in numbers identical to the i5 520M (not counting any effects from the i7 having twice the L3 cache).
Sorry to nitpick on your otherwise great post, thanks very much for all your hard working to help make VP8 a potentially viable Free alternative!
Unrelated postscript: Intel can go to hell for making their processor naming scheme as confusing as it is on the Core i(n)-series.
July 23rd, 2010 at 9:25 pm
@Alereon
You’re right, it’s the lowest end QM (the 1.6Ghz one).
Turbo mode is on, but my laptop’s cooling is awful, so it generally never triggers.
July 23rd, 2010 at 10:42 pm
Great work. If I may ask, how is it funded? I have been helped by “Dark Shikari” in IRC, so I assume that some of the work is volunteer, but are any ffmpeg devs paid?
July 23rd, 2010 at 10:46 pm
@Brian
Some devs are paid to work on specific tasks in ffmpeg, usually by companies who need said features. The VP8 decoder was purely volunteer, as far as I know.
July 23rd, 2010 at 11:35 pm
Doesn’t anybody have an AMD chip to benchmark things with?
July 24th, 2010 at 2:57 am
Dark Shikari Says:
July 23rd, 2010 at 4:38 pm
“@Markus
Does it make me stupid if someone else can write faster code than me?”
Well, if you were trying to develop a competitive codec for many years (=On2/Google) and someone (=you guys) shows up and outclasses you within a matter of weeks, it makes you look very stupid, especially if you acted as the only professionals around (On2 always did that).
July 24th, 2010 at 3:21 am
You should really make the same test with Atom N450 as it is the successor for N270 and it is 64-bit
July 24th, 2010 at 4:13 am
Markus: You just don’t get it, do you? Google is the smartest operating company in the whole world. They wrote a prototype and then let others optimize it for free. Not to mention the free publicity. And you still think Google is stupid.
July 24th, 2010 at 5:08 am
@Dark:
Did you devs write some specs document based on your investigations of VP8, to avoid (in the future) the traps of the misleading code-that-are-the-specs ?
Just curious
July 24th, 2010 at 6:40 am
Btw. Exceeding cool trick with satsub()! Thanks for sharing this trick.
July 24th, 2010 at 7:54 am
Did you try using intrinsics rather than assembly first? I’ve found the latest versions of gcc/icc get within 5% of hand-tuned SSE assembly (with obvious implications for readability/maintainability).
July 24th, 2010 at 7:56 am
Thanks for writing to tell us the tales of your optimizing adventures. It’d be interesting to hear a bit more of why you are faster. Have you found some new areas of improvement that libvpx hasn’t, or is it just the sum of being a little bit faster in many places? (If you tell me libvpx unpacks for abs(a-b), I’ll cry.)
Anyway, I doubt we’ll see this in Firefox due to the license. We (Opera) could theoretically use it via GStreamer, but I haven’t checked out the code or anything yet. Good work in any case!
July 24th, 2010 at 8:07 am
@Markus
I think you’re stupid. I’ll leave it up to you to figure out why.
Good job FFMPEG
July 24th, 2010 at 8:22 am
great work, congrats. win7 x64 results would have been certainly insteresting tho.
July 24th, 2010 at 8:42 am
@anonymous:
For what it’s worth, Mozilla’s license (the MPL part, not the GPL/LGPL part) may well in fact prevent it from using an LGPL-only library, depending on details of how the library needs to be used.
July 24th, 2010 at 8:45 am
Would using David Schleef’s Orc make sense instead of hand-rolling SIMD assembler for every CPU? Or would Orc just get in the way?
July 24th, 2010 at 9:00 am
@xcomcmdr, RE:code-that-are-the-specs:
He did mention that “We’ve already started adding our own conformance tests to deal with this.”
Hopefully those tests will make it upstream into whatever test suites Google/On2 is maintaining.
July 24th, 2010 at 9:00 am
hello all, thanks for this
i would like to ask how can we give some $$$ to the developers? do you guys have a paypal donation button somewhere? thanks
July 24th, 2010 at 9:18 am
@Markus But wouldn’t admitting your stupidity up front and adopting someone else’s superior solution mean that you’re not so stupid after all? The real stupid move is to turn on the NIH blinders and refuse to accept that someone else can code better than you can.
It’s no shame to admit that maybe the job could have been done better – afterall, that’s probably part of the reason they opened sourced the thing in the first place – so people can improve it and make it shine.
They probably knew the decoder wasn’t that great to begin with.
Humility begets greater things.
July 24th, 2010 at 9:27 am
@Markus: however, On2 created the format and codec in the first place, and Google devs themselves have been trying to improve the original slow codec, which basically already IS sort of admitting it wasn’t really fast.
I can see Google switch to fvp8. As far as I’m aware they always try to use the best technology around (when possible, and this is possible).
July 24th, 2010 at 10:01 am
Are you going to rename your blog now ?
July 24th, 2010 at 10:03 am
Thanks for the great work!
How does this new codec compare to H264?
July 24th, 2010 at 11:18 am
@Markus
Speaking in generalities, faster != better. The On2 code might be slower because it is more maintainable. (On the other hand, they might just be idiots.)
Personally, I don’t think I want to maintain code littered with performance hacks like (satsub(a,b) | satsub(b,a)) instead of abs(a – b), clever as that might be. Probably fine if you’re the only one ever maintaining the code, not so good if you’re a company that has to consider employee turnover and code maintenance.
July 24th, 2010 at 12:04 pm
@Philip
One of the main reasons ffvp8 is faster is because of libvpx’s exceedingly bad cache access patterns. Like basically every piece of On2 software ever written (this is an old habit of theirs — hard to break), libvpx does everything in passes. That is, it decodes the bitstream in one pass over the frame, then it does inter prediction, then it does idct… each time doing a full pass over the frame. ffvp8 only makes one pass, doing everything in that pass. At high resolutions this gives it a significant speed benefit.
Of course, even without this fact, it’s still significantly faster — this is just the biggest reason for the speed gap. It’s not the only one.
July 24th, 2010 at 12:25 pm
Great Job everyone! The power of an open source community. Still I think Google owes you…
July 24th, 2010 at 1:04 pm
@nona,
I’ve been looking at orc and the underlying problem, and whilst orc has the ability to actually generate and JIT code whilst the program is running that is rare, I think it has the problem that having the model of being assembler it doesn’t do instruction scheduling or register allocation. So the user has to write “machine-independent source code” that fixes the scheduling and register allocation at the “generic level” before transliteration into machine dependent code.
If machine independent SIMD is used, I think it is likely to be better overall to be slightly higher level (for programming ease) since it likely increases performance to have “compiler-ish” passes such as instruction scheduling and register allocation be machine-dependent. (I’m actually working on something like this, but partly because you only get one “project announcement” I’m waiting until more concepts are working before publicising.)
July 24th, 2010 at 1:16 pm
Are ffvp8 and libvpx multi-threaded? How well do they scale compared to each other?
July 24th, 2010 at 1:49 pm
Apparently the On2/Google employees have already started a rewrite of their decoder, codenamed “dixie”. The first goal they list for that project is “Increase speed by paying more attention to data locality and cache layout, and by eliminating redundant work in general.” so it would appear they agree with DS’s identification of that as a major problem area.
http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/097fd7d08d104bc6
Any chance of a comparison to decoders of other formats? It would appear, based on numbers you gave in your initial VP8 analysis and the speed difference documented here that the ffmpeg VP8 decoder is now faster than the ffmpeg H.264 decoder and roughly as fast as the ffmpeg Theora and CoreAVC H.264 decoders.
July 24th, 2010 at 1:51 pm
@dave
The numbers I gave for difference between ffh264 and libvpx for decoding speed were for x86_32 on my Windows 7 machine. As you can tell by the graph, the gain isn’t as large on my Windows machine as on the x86_64 machines. I’m guessing ffvp8 is probably similar speed to ffh264, or marginally faster, at least on my own machine.
July 24th, 2010 at 1:54 pm
Are there any plans on developing native vp8 encoder for FFmpeg?
July 24th, 2010 at 2:43 pm
@unxed
That’s a lot more work, and honestly, I doubt it will happen. ffmpeg only has one significant lossy video encoder — its core mpeg encoder. A VP8 encoder probably wouldn’t fit in the framework provided by this, at least not easily (much in the same way that an H.264 encoder wouldn’t).
But of course, if someone wants to try…
July 24th, 2010 at 3:00 pm
@Dark Shikari
> ffmpeg only has one significant lossy video encoder — its core mpeg encoder
And what about WMV 7,8 and H.263/261? Are they too simple to be called “significant”?
July 24th, 2010 at 3:01 pm
FarmerBob: Fast + Simple + (Same Result) == STFU. It is, in a sense the very definition of Better. A person may as well explore the full power of a modern CPU, otherwise what is the point? And especially 1080p video has been a challenge for VP8, as it will be for another couple of years.
orthocronous: What, like perl and other interpreted languages? Platform-independent code run by platform-optimized interpreters? Not very good for a video codec, I think. The video itself should be the only platform-independent code being interpreted.
July 24th, 2010 at 3:04 pm
@unxed
They all use the core mpeg encoder. They’re the same code, with slight variations for each format.
Keep in mind that an encoder is a very complicated program with many parts — by comparison, the only large differences between many of the formats you mentioned is probably the entropy coder and the headers. These can be swapped out trivially without changing much of the rest of the encoder. That’s why ffmpeg has so many “encoders” that all come out of its core mpeg encoder — they’re all incredibly similar (8×8-DCT-based MPEG-alikes), so most of the same code can be used for them.
Equally, this is why ffmpeg has no WMV9 encoder — it’s different enough that it can’t easily be retrofitted into the core MPEG encoder.
July 24th, 2010 at 3:24 pm
@Dark Shikari
Thanks for your great work and rapid and informative answers!
July 24th, 2010 at 4:04 pm
@Philip
> Anyway, I doubt we’ll see this in Firefox due to the license.
Can you explain that?
July 24th, 2010 at 4:08 pm
@Dark Shikari
Graphics says “Core 2 T9300 (2.5Ghz), Mac OS X 10.6.4, 64-bit”, but the raw stats say “Core 2 T9300 (2.5Ghz), Linux, 64-bit”.
So was it Mac OS X or Linux?
July 24th, 2010 at 4:40 pm
Mac OS X. My mistake, fixed.
July 24th, 2010 at 6:05 pm
Awesome work, looking forward to the optimisations for Atom so I can watch HD video on my netbook!
July 24th, 2010 at 6:42 pm
I don’t have the simtel trailer, but I encoded parkjoy, using ffmpeg git-f43faf4 (July 24/2010) and at 1920/1080 I got:
frame= 500 fps= 74 q=0.0 Lsize= -0kB time=10.00 bitrate= -0.0kbits/s
video:0kB audio:0kB global headers:0kB muxing overhead -inf%
real 0m6.781s
so 74 frames per second on a Corei7-920 Linux-64 bit (kernel 2.6.35-rc6-git1) stock clock (2.66 GHz)
July 24th, 2010 at 7:34 pm
Just out of curiosity, how does ffvp8 compare to libvpx in resource usage? Currently on my machine libvpx is a CPU hog.
July 24th, 2010 at 7:50 pm
Making optimizations for x86 is nice, but it would be nicer if I could browse Youtube without my PPC laptop choking.
Why I can playback 480p h.264 without dropping frames but not 120p 3fps flash/vp8 is a mystery.
July 24th, 2010 at 8:40 pm
@Dude
5% isn’t good enough, we don’t optimize solely for the latest version of gcc, and even if it works now we don’t trust the next gcc not to pessimize it. Performance is definitely not monotonic in gcc version, which makes the total effect on maintenance burden not so obvious.
Intinsics are sometimes more readable, and sometimes not, due to yasm’s better preprocessor.
July 25th, 2010 at 1:24 am
I must say that what you’ve done is impressive indeed. I understand not using proprietary MS Directcompute or NVIDIAs CUDA APIs, but why not use OpenCL, unlock the power of the GPU, and kick it up an order of magnitude or two?
July 25th, 2010 at 3:59 am
How would these speeds compare to a decoder that uses a lot less recources like the WMV-HD/VC-1 decoder which has similar quality video especially on low level hardware
July 25th, 2010 at 4:07 am
My test
Sempron Xp 2600+
test.webm, ~3Mb/s, 1280×640, 23.976 fps, 2min 6s.
timecodec.exe, renderer: null
FFdshow r3512 13 Jul >>>dfps 29,0
Webm filter 0.9.9.0 >>>dfps 38,4
MPCVideoDec.ax r2151 >>>dfps 53,6 (svn ffmpeg)
Very nice improvements even on old CPU. I’d say competitive to best h264 decoders.
July 25th, 2010 at 4:13 am
@gfxkiller
GPUs are extremely unsuited to video decoding for a variety of reasons. As far as I’ve seen, GPU decoders basically don’t exist — even the “GPU acceleration” used for H.264 and so forth is almost always an ASIC, not the GPU itself, because the GPU is far too slow.
July 25th, 2010 at 4:14 am
@Rob
VC-1 is nowhere near VP8 or H.264 in terms of compression or quality, and is not much faster either. Furthermore, no open source encoder exists, Microsoft’s is a pile of balls, and ffmpeg’s decoder is slow.
July 25th, 2010 at 4:46 am
Dark Shikari is tsundere for VP8.
July 25th, 2010 at 4:47 am
Another benchmark. This time with Parkjoy and Sintel Trailer. Same machine and decoders:
Parkjoy:
FFDshow >>>dfps 9,2
Webm filter>>>dfps 11,8
MPCVideoDec >>>dfps 16,4
Sintel Trailer:
FFdshow >>>dfps 14,0
Webm filter >>dfps 18,8
MPCVideoDec >>>dfps 27,9
So, can i expect full speed 1080p vp8 decoding in near future?
Sintel playing was *not so bad*.
July 25th, 2010 at 5:40 am
@Dark Shikari
Another question: can ffvp8 also decode vp4,5,6 and 7?
July 25th, 2010 at 5:49 am
FYI, it seems some of the asm is very badly optimized for Phenom; in 5 minutes I wrote up a patch that made Phenom about ~6% faster, so more should be coming there too.
@unxed
VP 5 and 6. Not 7, as nobody’s reverse-engineered it yet. It’ll probably happen now that 8 is out, as I have a hunch that most of 8 is identical to 7. I’m not quite sure 4 ever existed.
July 25th, 2010 at 5:50 am
@Anonymous{65}
I di–didn’t mean to do this for you. I just h-had an extra bit of asm lying around, th-that’s all.
July 25th, 2010 at 5:51 am
@SirDaniel
Can you share the lastest SVN builds of MPC HC and FFDshow tryouts?
July 25th, 2010 at 6:00 am
unxed: ffmpeg (which ffvp8 is part of) can decode VP5 and VP6.
July 25th, 2010 at 6:01 am
XhmikosR’s has added ffmpeg vp8 code into newest MPC-HC and compiled it. Dwnload it from here http://xhmikosr.1f0.de/ I use standalone filter for timecodec.
FFdshow used libavcodec. I suppose some old one, before optimizations was done. I just saw that newest revision 3515 has updated vp8 too.
July 25th, 2010 at 6:20 am
@Dark Shikari
> We’ve already started adding our own conformance tests to deal with this.
Did you plan to release those tests to public or send them to google webm team?
July 25th, 2010 at 6:25 am
@unxed
All new tests will be added to the FATE2 set of ffmpeg tests, check cvslog for commits to that.
July 25th, 2010 at 6:27 am
@SirDaniel
> from here http://xhmikosr.1f0.de/
Thanks a lot!
July 25th, 2010 at 6:45 am
@SirDaniel
And how did you manage to load video in .ivf container? What demuxer did you use?
July 25th, 2010 at 7:56 am
I’d like to write my own H.264 encoder and decoder, in order to understand the H.264 encoding and decoding process.
I’ve been working on DirectShow projects and I’ve found that DirectShow has some limitations.
Where would you recommend that I start, in order to understand both the H.264 encoding and decoding process?
I have several years experience in commercial C++ development, but I’m fairly new to video (about 13 months DirectShow).
July 25th, 2010 at 10:48 am
I used MKVMerge from MKVToolnix to mux it into webm.
July 25th, 2010 at 11:23 am
Where is the ffvp8 code available from? I tried this: I got the latest ffmpeg via svn but a I did not see ffvp8* filenames. A google search of “ffvp8 source download” did not turn up anything useful. Thanks!
July 25th, 2010 at 12:06 pm
@Matyas
libavcodec/vp8.c
libavcodec/vp8dsp.c
etc
July 25th, 2010 at 12:09 pm
@Dark Shikari
Chrome is Open Core.
Chromium is open, Chrome is based on Chromium, but with proprietary code added.
They might have difficulties if they integrated someone else’s code from another project, but would probably have no issue.
July 25th, 2010 at 12:25 pm
@TheGZeus
What part of it is proprietary?
July 25th, 2010 at 5:07 pm
@TheGZeus The point is moot. Both Chrome and Chromium already ship FFmpeg for all the other formats the browsers support.
July 25th, 2010 at 5:15 pm
@81
The Flash player is build directly into Chrome.
July 25th, 2010 at 5:36 pm
@83 to me its seems to be more of a bundled-with than part-of. It still talks to Chrome via a plugin API and can be not just disabled but deleted entirely.
July 26th, 2010 at 3:00 am
Will these advances in codec make play back and editing of HDSLR video any faster on the Desktop/Workstation or is VP8 all about web delivery?
July 26th, 2010 at 4:59 am
For all who don’t know the chromium can be build to use external ffmpeg instead of the one tha comes whit the source code of chromium. Also is this in the main git, or isit in some “private” branch? I can’t find it from there, but mayby I just dont know what to look for.
July 26th, 2010 at 6:19 am
libvpx is a reference copy. Reference copies must be cleanly written, easy to read, and understandable. Thus, speed isn’t required for reference copies.
As google identifies generic optimizations that can be applied to the reference they will do so. Hand-coded assembly language has no place in a reference copy.
If a reference system is the fastest version, then the developers doing actual working systems need help…
July 26th, 2010 at 10:03 am
…and even bundling their own known-good it is still choppy and crashy. *grml*
Dark: So you are saying that a format would need to be explicitly designed for stream processors from the ground-up? When video is hardware-accelerated it uses a separate piece of hardware on the board?
July 26th, 2010 at 11:40 am
@Dark: VP4 does exist. VfW codec and sample:
http://samples.mplayerhq.hu/V-codecs/VP4/
No one cared then and fewer care now.
July 26th, 2010 at 12:35 pm
@Joe P
Then libvpx isn’t a reference copy according to your own definition. It has tens of thousands of lines of hand-written assembly, enormous amounts of of impenetrable code created via macros to maximize speed, and so forth.
July 26th, 2010 at 1:41 pm
65. Anonymous Says:
Dark Shikari is tsundere for VP8.
69. Dark Shikari Says:
@Anonymous{65}
I di–didn’t mean to do this for you. I just h-had an extra bit of asm lying around, th-that’s all.
It made my day.
July 26th, 2010 at 2:20 pm
Not a coder but would like to tip hat with pizza. Where’s your “donate” button?
July 26th, 2010 at 3:10 pm
It almost seems you guys should setup donating interface for that ‘SSE4′ Macbook that Roland (the other developer behind this) needs
July 27th, 2010 at 1:20 pm
Nice work as always, D_S, and the rest of the ffmpeg devs that worked on this!
Also
@Relgoshan:
Absolutely, the decodes all happen on a special ASIC.
Derek
July 28th, 2010 at 12:12 am
Very awesome, especially when some of us want WebM as HTML5 standard… Donate button?
July 28th, 2010 at 2:45 am
I have an N470 Atom netbook that runs 64-bit Linux — if you want, I can provide benchmark numbers; just send me a mail.
July 28th, 2010 at 8:11 am
This seems like a good place to ask my question, if there’s a better venue, please let me know. We have a MediaPointe system which captures s-video and DVI inputs into mp4 format. After editing one of these videos with QuickTimeX and saving to an mov format, it plays fine from the local hard drive but when served from our Darwin Streaming Server, there is some smearing of the video in the upper right and the audio is stuttered to the point of being useless. It reminds me of how scanning professional photographs gives a smeared result, but we shouldn’t have any DRM issues since we created the content. You can see the video here: http://dss-vm.ncsa.illinois.edu/numerical_libraries.mov Any ideas what’s causing this? Why would this occur when streaming from a server and not from the local HD? Any pointers are greatly appreciated.
Michael
July 28th, 2010 at 9:47 am
The answer is that you are using QuicktimeX and Darwin server. Not to place any hate on Quicktime, but it happens that Quicktime is an awful solution. You may as well code with something that uses the x264 libary, and install VLC on client systems.
July 28th, 2010 at 5:02 pm
@Dark Shikari{69}
Hah, that explains everything! I was wondering why after that one VP8-bashing post that you still ended up working on VP8.
Of course as a real-life tsundere, I really should’ve picked up on that…
July 29th, 2010 at 4:15 am
@98 you might try feng and see if the output stream is similar or have a look with wireshark.
July 30th, 2010 at 12:02 am
How difficult would it be to do a similar job with x264 and VP8 as is being done with ffmpeg and VP8?
July 30th, 2010 at 3:01 am
Dark Shikari,
It might be uncomfortable question but here it is.
Is there any chance you consider to code for VP8 encoder? As it’s very similar to H.264. Until now Google keep as an option for experimental branch (VPx?) so useful tools can be added as AQ. The most important x264′s quality tools are also could be applicable to VP8 as VAQ, mbtree, psychovisual enhancements.
Thank you for fastest VP8 decoder.
July 30th, 2010 at 4:45 am
@IgorC (102)
Is VP8 like H.264? Links?
The only people that could really comment on this are people that work on both codecs. And if this true there’d be implications for Google courtesy of the MPEG group…
July 30th, 2010 at 8:54 am
Thanks for your great work! I hope Firefox would adopt ffvp8 as a video decoder for WebM.
July 30th, 2010 at 10:36 am
@ Relgoshan at 49
Sorry I’ve been away for a couple of days. About “platform independence”: You’re probably right about video codecs because, as Dark Shikari has pointed out elsewhere in this thread, the core algorithms are very similar in many codecs and there are by definition comparatively few video codecs with real backing (eg, formats for a big online video repository) and they aren’t moving targets. As such, there’s probably enough manpower around to do the coding and regression testing (you do regression test all code on all supported platforms, don’t you?).
I’m thinking more about up-coming applications in augmented reality and computer vision, where there are many, many more core algorithms. As such, there simply isn’t the manpower available to do everything in highly hand-optimised assembler for all supported platforms. Orc arose precisely because there weren’t enough PEOPLE writing SIMD optimised inner loop patches for gstreamer. Orc is a good attempt at a solution of a real problem, my issue with orc is purely that it assumes that just doing literal transliteration of platform-independent SIMD is sufficient for efficiency, which I think it isn’t.
But this is going off-topic of this blog, so I’ll stop now.
July 30th, 2010 at 1:30 pm
@IgorC
If I write for a VP8 encoder, that encoder will be called x264, and there will be a commandline option –vp8.
July 30th, 2010 at 2:26 pm
@Dark Shikari
are you guys going to use Cuda for vp8 decoder?
July 30th, 2010 at 9:17 pm
Cuda? You mean that Nvidia-only stuff?
Hate to break it to ya, but even if ATI’s OpenCL drivers suck, you still gotta remember that Intel is the current leader in GPU sales.
August 1st, 2010 at 11:59 am
NM4: However Nvidia is indisputable leader on Unix/Linux desktops. ATI is way behind there.
August 1st, 2010 at 12:05 pm
Mozilla guys seem to be reluctant to use LGPL library with MPL code:
https://bugzilla.mozilla.org/show_bug.cgi?id=581773
But it’s not exactly clear what is the problem.
August 1st, 2010 at 12:43 pm
@111 (Shmerl):
There was a mentioning on Groklaw recently about the MPL being re-written to be more GPL friendly.
Link here: http://www.groklaw.net/article.php?story=20100718112719569
August 3rd, 2010 at 9:39 am
Orthocronous: Oh I can see where you are coming from on that front, but ideally this is more useful with scripted languages. In fact, it constitutes much of the development cycle for next-generation web browsers. Similarly, more and more video games can be freely scripted with plaintext commands. As for per-platform conditional optimization of a codec, the question is mostly one of manpower; which you propose to solve with an automated tuning step. In my mind, this at least should not be done on the fly (but done once on installation).
Dark: 108-110 intrigue me. I’ve been setting up new computers for retail, based on the ION family of chipsets. The generic accelerator certainly does improve video performance, and it appears that such accelerators will become super-common on all platforms eventually. So if you know someone who does mobile opts, how much of VP8 could be accelerated through such a chip? I had heard previously that the six-tap filter (was it luma?) in decoding was too complex for current hardware acceleration. But then again, people once said that x87 was disabled in Long Mode. So what’s the outlook?
August 6th, 2010 at 11:42 am
Dont know if its valuable at all but i have some nice numbers with new update and i like testing
:
Tested on: sempron 1,8GHz, mpc-hc standalone filters, time codec:
Park joy
MPCVideoDec >>>went from 16,4 to 18,5 dfps
Sintel Trailer
MPCVideoDec >>>went from 27,9 to 31,6 dfps
August 6th, 2010 at 1:59 pm
Mhh.. tried newest ffshow. It has somehow better VP8 iplemented.. It goes almost 34 frames with Sintel. Just 2 or 3 frames more and this trailer reaches smooth playback on my stupid cpu.
August 10th, 2010 at 8:01 pm
I take it that you think ffvp8 is awesome, then? I certainly do, what with using a netbook. More optimization (if possible) would be great, but it is already more stable than libvpx.