This is going to be a much longer post than usual, as it’s going to cover a lot of ground.
The internet has been filled for quite some time with an enormous number of blog posts complaining about how Flash sucks–so much that it’s sounding as if the entire internet is crying wolf. But, of course, despite the incessant complaining, they’re right: Flash has terrible performance on anything other than Windows x86 and Adobe doesn’t seem to care at all. But rather than repeat this ad nauseum, let’s be a bit more intellectual and try to figure out what happened.
Flash became popular because of its power and flexibility. At the time it was the only option for animated vector graphics and interactive content (stuff like VRML hardly counts). Furthermore, before Flash, the primary video options were Windows Media, Real, and Quicktime: all of which were proprietary, had no free software encoders or decoders, and (except for Windows Media) required the user to install a clunky external application, not merely a plugin. Given all this, it’s clear why Flash won: it supported open multimedia formats like H.263 and MP3, used an ultra-simple container format that anyone could write (FLV), and worked far more easily and reliably than any alternative.
Thus, Adobe (actually, at the time, Macromedia) got their 98% install base. And with that, they began to become complacent. Any suggestion of a competitor was immediately shrugged off; how could anyone possibly compete with Adobe, given their install base? It’d be insane, nobody would be able to do it. They committed the cardinal sin of software development: believing that a competitor being better is excusable. At x264, if we find a competitor that does something better, we immediately look into trying to put ourselves back on top. This is why x264 is the best video encoder in the world. But at Adobe, this attitude clearly faded after they became the monopoly. This is the true danger of monopolies: they stymie development because the monpolist has no incentive to improve their product.
In short, they drank their own Kool-aid. But they were wrong about a few critical points.
The first mistake was assuming that Linux and OS X didn’t matter. Linux is an operating system used by a tiny, tiny minority of end-users, yet those users make up a huge portion of the world’s software developers and web developers. Merely going by user count suggests that Linux isn’t worth optimizing for; accordingly, Adobe allocated just one developer, one single developer, to the entire Linux platform. And in terms of OS X, Macs have become much more popular in recent years — especially among that same group of developers. Furthermore, Apple is a huge company; Flash performing terribly on their platform is a very good incentive for Apple to want to position themselves in opposition. Thus, Adobe made enemies of Apple and developers alike.
The second mistake was attacking free software. Practically all the websites on the internet use free software solutions on their servers — not merely limited to LAMP-like stacks. Youtube, Facebook, Hulu, and Vimeo all use ffmpeg and x264. Adobe’s H.264 encoder in Flash Media Encoder is so utterly awful that it is far worse than ffmpeg’s H.263 or Theora; they’re practically assuming users will go use x264 instead. For actual server software, the free software Red5 is extraordinarily popular for RTMP-based systems. And yet, despite all this, Adobe served a Cease&Desist order to servers hosting RTMPdump, claiming (absurdly) that it violated the DMCA due to allowing users to save video streams to their hard disk. RTMPdump didn’t die, of course, and it was just one program, but this attack lingered in the minds of developers worldwide. It made clear to them that Adobe was no friend of free software.
The third mistake was not supporting a free software Flash implementation. The lack of a good free software Flash client is not really Adobe’s fault; it has become clear that the Gnash folks are completely incompetent and nobody else seems interested. Cody Brocious wrote his own Flash rendering code in a matter of days for purpose of a Flash->iPhone app converter; he only stopped because Adobe released their own mere days before he had intended to release his. The Flash spec is open, and there are existing free software implementations of every single codec in Flash: there’s really nothing stopping a good free implementation. But Adobe’s mistake is one of inaction: they didn’t push for it because it wasn’t important to them.
By comparison, look at Moonlight, the free software implementation of Silverlight. Microsoft has actively worked with the free software community to help produce Moonlight. Think about how absurd that sounds; Microsoft — the bane of free software, if one goes by Slashdot — has been actively supporting an LGPL free software project, while Adobe has not! The biggest problem this creates is one of monopoly: people feel insecure using Flash because there is only one implementation, leaving them at the mercy of Adobe. In any situation, once there are multiple popular implementations of a file format, it’s far more difficult for any one party to commit abuse. Of course, this is intentional by Adobe: they wanted to have that power of abuse, which is why they didn’t support an alternative implementation.
Now it becomes clear why Flash is so disliked. It’s nowhere near the most insecure of popular browser plugins; Java has had far more vulnerabilities according to Secunia. It’s certainly not the least reliable, nor is it completely proprietary; as previously mentioned, the spec is public. Yet because of the above three mistakes, Adobe has made enemies of developers worldwide.
So, what now? Flash is crap, we hate Flash, but how do we get rid of Flash, at least for purposes of internet video?
Let’s start with HTML5 <video>. It’s quite clear that, barring an act of God (or Google, more on that later), if Flash is replaced in the near future, this will be how. But at the moment there are many serious problems, most of which must be solved for it to even have a chance:
1. Missing features. Developers who haven’t worked with Flash often underestimate its capabilities and assume that displaying video is as simple as displaying images. But there are many things that are useful to control. Flash lets you tell the client how long to buffer before playing a stream (critical for reliable playback of any live video). It provides signalling back to the server of packet loss rates, so that the server can throttle bandwidth accordingly.
There are dozens more; these are just a few. But this is the core problem mentioned at the start of this article, the problem that hit Adobe so hard: “believing that a competitor being better is excusable”. Many free software advocates promote HTML5 while declaring that these missing features are not a big issue and that we can do without them. This is not excusable! If you want to outcompete Adobe, you need to provide a superset of all commonly used features.
2. Video/audio/container format issues. Theora is a seriously hard sell to most companies, given its compression is much worse than x264′s and H.264 has no royalties for web video; as before, it’s hard to market something with at most nebulous benefits. Being “patent-free” may sound nice to free software advocates like us, but most business types only care about the bottom line, and if being “patent-free” doesn’t benefit the bottom line, they’re probably not going to care. (NB: a commenter noted that H.264 is only royalty free for free content, not paid content. This probably is not a big issue, since the royalty % is very small for paid content, and if you’re charging for content, you can probably afford to pay the small fees. But it obviously is a slightly different situation.)
But even if you ignore the compression issue, most companies don’t like storing multiple versions of every video — they still need H.264 for iPhone support. As a side note, Dirac is a potential patent-free option as well, and may provide better compression, but is slower to decode than Theora. It’s definitely an option to consider though, and one which is way too often ignored when considering formats for HTML5 video.
Youtube, for example, has thrown away petabytes of bandwidth in the pursuit of fewer versions of each video: the default “low quality H.264″ format, which now uses x264, is Baseline Profile-only. Providing a High Profile alternative could save them 30-50% bandwidth for desktop users, which make up the vast majority of Youtube users. But it would require storing yet another copy of the video (since Baseline is needed for iPhones), which is too costly for them. Duplicating each video again would require there to be some serious benefit to doing so — and Google apparently believes that a 30-50% compression improvement is not sufficient, though there seems to be something weird going on with the 360p/480p madness they recently unrolled (neither is High Profile though).
Of course, despite having $50 million dumped on their doorstep each year by Google, Mozilla will never pay the H.264 license fees, nor will they probably ever support users installing their own codecs. Thus, we are at an impasse. If Microsoft supports HTML5 <video> in IE9, which is quite possible, they will almost certainly support H.264 and probably not Theora. Thus, even ignoring the case of mobile devices like the iPhone, neither H.264 nor Theora will span the whole market. So which will web companies pick? Most likely, neither — they’ll see the split market as a reason to avoid HTML5 altogether, and drop back to Flash.
3. Ubiquity. Flash has the 98% market installation base on its side, a powerful force. Until Internet Explorer becomes a low-popularity browser (unlikely in the near term) or supports HTML5, Flash simply won’t be replaced. Furthermore, this effectively forces websites into using H.264: if they want to support both Flash and HTML5, using Theora would force them to store two redundant copies of the video, since Flash can’t play Theora.
4. Quality of implementations. Existing HTML5 implementations range from “bad” to “atrocious”; despite years of developers ragging on Flash, many of the existing implementations are still far slower than native media players, use terrible pixelated scaling (esp. Chrome), are outright buggy, or some combination of the above. Not only are the implementations often bad, but they’re inconsistently bad! Even if some work well, it does no good if many others don’t.
With all these problems, HTML 5 <video> looks to be in serious danger despite its promise. And this brings us to the main topic: what about Google, On2, and VP8? If, as the FSF frantically pleads, Google opens VP8, what problems does it solve and what problems does it create? And what benefits would this bring Google?
VP8 solves the compression problem: while still probably not as good as x264 (see the Addendum at the end for more details on this prediction), the gap is far smaller than with Theora, enough so that compression is far less of an issue. But it also brings up a host of new problems.
1. A few years ago, Microsoft re-released the proprietary WMV9 as the open VC-1, which they claimed to be royalty-free. Only months later, dozens of companies had come out of the woodwork claiming patents on VC-1. Within a year, a VC-1 licensing company was set up, and the “patent-free” was no more. Any assumption that VP8 is completely free of patents is likely a bit premature. Even if this does not immediately happen, many companies will not want to blindly include VP8 decoders in their software until they are confident that it isn’t infringing. Theora has been around for 6 years and there are still many companies (notably Nokia and Apple) who still refuse to include it! Of course this attitude may seem absurd, but one must understand who one is marketing to. One cannot get rid of businesspeople scared of patents by ignoring them.
2. VP8 is proprietary, and thus even if opened, would still have many of the problems of a proprietary format. There may be bugs in the format that were never uncovered because only one implementation was ever written (see RealVideo for an atrocious example of this). There will be only one implementation for quite some time; Theora has been around for 6 years now and there’s still only one encoder. Lack of competing implementations breeds complacency and stagnates progress. And given the quality of On2′s source releases in the past, I don’t have much hope for the actual source code of VP8; it will likely have to be completely rewritten to get a top-quality free software implementation.
3. It does nothing to solve the problems of hardware compatibility: most mobile devices uses ASICs for video decoding, most of which probably cannot be easily repurposed for VP8. This might be less of a problem if they’re targeting software implementations though; while it would eat more battery and be limited to mobile devices with powerful CPUs, it would not be unreasonable to play back VP8 on a fast ARM chip (see the Addendum for more on this).
The big advantage of VP8 is that it solves a problem that is unsolvable for Theora: Theora is forever crippled by its outdated technology and weak feature set. With state-of-the-art RD and psy optimization, as in x264, Theora can likely become competitive with Xvid or even maybe WMV9, but probably not x264. The only way to fix this would be a “Theora 2″, and attempting to ensure Theora’s “patent-free” status while adding new features would be extraordinarily difficult in today’s software patent environment. VP8, on the other hand, offers an immediate jump to what is hopefully an H.264-comparable level of compression.
But now for the big question: why would Google want to open VP8, and if they did, how would they do it? Google probably doesn’t pay a cent in license fees for Youtube; H.264 is free until at least 2016 for internet distribution and encoder fees only apply if you have more than 100,000 encoding servers. The cost of the license fees for Chrome are minimal (a few million dollars a year, capped). But despite that, there are actually some very good reasons.
1. Control. Google may view the control of other companies over H.264 as a threat: even though H.264 is licensed under RAND terms (Reasonable and Non-Discriminatory, they legally cannot be anti-competitive), there are many reasons for Google to want more control. If they push VP8, they not only compete with Flash via HTML5, but they also prevent Flash from playing their video streams. As it is unlikely (for the reasons mentioned at the start of the article) that Adobe will immediately jump ship to VP8, this creates a window of opportunity for Google to steal control from Adobe.
2. Blitzkrieg. The most risky, but most powerful thing Google could do is switch Youtube over to exclusively VP8 and roll out a new browser plugin to play it (but support HTML5 if available). Given Youtube’s popularity, this would likely get them 80%+ install base in a matter of a month or two; effectively a “blitzkrieg” targeting Adobe’s market share. This would be powerful because it wouldn’t rely on waiting for existing browsers (especially Internet Explorer) to switch over to VP8.
3. Trump card. Google may be worried about the future; if H.264 does succeed in eliminating all competition in the web marketplace, it would be quite possible that MPEG-LA would attempt to abuse their position and start charging fees for web usage. Perhaps MPEG-LA needs a good “scare” to make sure they never consider such a thing. Software monoculture is dangerous.
These seem like good enough reasons, albeit somewhat insidious ones (especially 2), for Google to release this attack. Do we really want Google having this much control? I’m not sure, but it’s sure as hell a better option than Adobe. Will it actually happen? Quite possibly; the only other sane purpose of an acquisition for $100 million would be to acquire patents to use as leverage in patent lawsuits. Would it succeed? Depends on how they do it and what other companies they rope into their plan. It also depends on what their target is: would they try to push hardware support too?
Where does x264 fit in all this? H.264 is certainly not going away, not for quite a while. In most sane parts of the world, software patents are a non-issue. But in the end, none of it matters for x264: we will continue our quest to create the best video compression software on Earth. Unlike Adobe, we don’t sit complacent when we are the best; we keep trying to become better. We add new features, improve compression, support new platforms, improve performance, and there’s far more to come. We don’t care that many H.264 encoders are so bad that they can be beaten by Theora or Xvid. We don’t care if VP8 comes out; that’s just another encoder to beat. We are here to ensure that the best choice is always free software by constantly making free software better.
But, of course, we wholeheartedly support the quest for royalty-free, free-software multimedia formats. There are many use-cases in which being free of patents is more important than compression, quality, performance, or even features. Bink Video is a staggeringly popular example: used in tens of thousands of games despite having compression 10 times or more worse than modern video formats — almost entirely because of its royalty-free (albeit proprietary) nature. If the day comes when Bink is replaced by a free software alternative, we will know the quest for a widely-accepted, free software, patent-free video format has succeeded. Until then — I wish luck to those pursuing such a goal.
Addendum: VP8′s feature set and compression capabilities
Many people have been wondering what the reality behind VP8 is, behind the usual marketing bullshit that these sorts of companies put out. As there is no public specification and even the encoder itself still isn’t released, this is all an educated guess based on what information I do have.
VP8 has been marketed in press releases as basically an “improved VP7″, primarily with the intent of being faster to decode in software on mobile devices, especially those without SIMD (e.g. ARM11). Thus, it is likely reasonable to start approaching VP8 by commenting on VP7. VP7 was released in ~2003, around the same time as H.264. It made waves due to being dramatically better than practically all H.264 encoders at the time. The reason wasn’t that VP7 was better than H.264, but rather that On2 had a far more mature codebase: they had been developing their encoder for years, while most H.264 encoders were slapped together in the months following the finalization of the specification. However, VP7 never caught on because it was completely proprietary; nobody wants to rely on a proprietary codec anymore. Over the years, as far as I can tell On2 never updated VP7 and the best H.264 encoders, like x264, moved well ahead. VP7 was mostly forgotten except for a few apps like Skype that licensed it.
Now let’s look at VP7 technically. While I don’t know too much about the internals, VP7 is notable in relying very heavily on strong postprocessing filters. This is not unique; practically all On2 codecs have this. Even Theora has an optional postprocessing filter that it inherited from VP3 in addition to its in-loop deblocker. On2′s postprocessing filters usually fall into three categories: deblocking, sharpening, and dithering/grain. The dithering filter is useful for avoiding blocking in dark and flat areas, similar to the effects of the gradfun mplayer filter. The sharpening filter helps compensate for the natural blurring effect of the quantizer in encoders that are not very psychovisually optimized. The deblocking filter is also notorious for blurring out tremendous amounts of texture and detail (example: vp7, x264). But this also provides a significant advantage: by moving many features of the codec into postprocessing, the video format becomes scalable; a decoder can do “less work” while still playing back the file, albeit at a lower quality. This doesn’t work if all the steps are mandatory.
Since VP8 is marketed as less complex than VP7, it likely still does not contain arithmetic coding, B-frames, or other computationally intensive features. We know from marketing material that one of the big promotions was it allowing the encoder to pick between interpolation modes to allow faster decoding if necessary. Clearly, VP8 is big on speed — which means they likely have not added a lot of new compression-related features. If it improved greatly over VP7, it would most likely be due to psychovisual optimizations in the encoder. But given their last press releasing showing a “comparison with x264″, it’s clear that they haven’t done this. Their “VP8″ image is a blurry disaster with nearly no detail at all, as opposed to the artifacty-but-detailed x264 image, which actually looked better to many commentors at Doom9, despite the obviously staged test.
Overall, I expect VP8 to be surely better than MPEG-4 ASP (Xvid/DivX) and WMV9/VC-1. It will likely be nearly as good as Mainconcept’s H.264 encoder (one of the best non-x264 encoders), but assuming they still believe that blurring out the entire image is a good idea, probably still significantly inferior to x264.
Update: According to gmaxwell, a Theora dev, this seems quite likely: “What I’d heard from ex-on2 folks was that there is some philosophical disagreement about how to optimize [encoder] tuning, and the tune for PSNR camp mostly won out.“ Apparently around the time of VP6, On2 went the full-retard route and optimized purely for PSNR, completely ignoring visual considerations. This explains quite well why VP7 looked so blurry and ugly.
If there’s anything to take away from this, it’s that psy optimizations are the single most critical feature of a modern video encoder. They’re the reason that Vorbis beat MP3 for audio, and now they’re just as important for video.