<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Diary Of An x264 Developer</title>
	<atom:link href="http://x264dev.multimedia.cx/feed" rel="self" type="application/rss+xml" />
	<link>http://x264dev.multimedia.cx</link>
	<description></description>
	<lastBuildDate>Sun, 23 Oct 2011 19:21:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>The neutering of Google Code-In 2011</title>
		<link>http://x264dev.multimedia.cx/archives/658</link>
		<comments>http://x264dev.multimedia.cx/archives/658#comments</comments>
		<pubDate>Sun, 23 Oct 2011 19:09:15 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[GCI]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[x264]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=658</guid>
		<description><![CDATA[Posting this from the Google Summer of Code Mentor Summit, at a session about Google Code-In! Google Code-In is the most innovative open-source program I&#8217;ve ever seen.  It provided a way for students who had never done open source &#8212; or never even done programming &#8212; to get involved in open source work.   It made [...]]]></description>
			<content:encoded><![CDATA[<p>Posting this from the Google Summer of Code Mentor Summit, at a session about Google Code-In!</p>
<p><a href="http://code.google.com/opensource/gci/2010-11/index.html">Google Code-In</a> is the most innovative open-source program I&#8217;ve ever seen.  It provided a way for students who had never done open source &#8212; or never even done programming &#8212; to get involved in open source work.   It made it easy for people who weren&#8217;t sure of their ability, who didn&#8217;t know whether they could do open source, to get involved and realize that yes, they too could do amazing work &#8212; whether code useful to millions of people, documentation to make the code useful, translations to make it accessible, and more.  Hundreds of students had a great experience, learned new things, and many stayed around in open source projects afterwards because they enjoyed it so much!</p>
<p>x264 benefitted greatly from Google Code-In.  Most of the high bit depth assembly code was written through GCI &#8212; literally man-weeks of work by an professional developer, done by high-schoolers who had never written assembly before!  Furthermore, we got loads of bugs fixed in ffmpeg/libav, a regression test tool, and more.  And best of all, we gained a new developer: Daniel Kang, who is now a student at MIT, an x264 and libav developer, and has gotten paid work applying the skills he learned in Google Code-In!</p>
<p>Some students in GCI complained about the system being &#8220;unfair&#8221;.  Task difficulties were inconsistent and there were many ways to game the system to get lots of points.  Some people complained about Daniel &#8212; he was completing a staggering number of tasks, so they must be too easy.  Yet many of the other students considered these tasks too hard.  I mean, I&#8217;m asking high school students to write hundreds of lines of complicated assembly code in one of the world&#8217;s most complicated instruction sets, and optimize it to meet extremely strict code-review standards!  Of course, there may have been valid complaints about other projects: I did hear from many students talking about gaming the system and finding the easiest, most &#8220;profitable&#8221; tasks.  Though, with the payout capped at $500, the only prize for gaming the system is a high rank on the points list.</p>
<p>According to people at the session, in an effort to make GCI more &#8220;fair&#8221;, Google has decided to change the system.  There are two big changes they&#8217;re making.</p>
<p>Firstly, Google is requiring projects to submit tasks on only two dates: the start, and the halfway point.  But in Google Code-In, we certainly had no idea at the start what types of tasks would be the most popular &#8212; or new ideas that came up over time.  Often students would come up with ideas for tasks, which we could then add!  A waterfall-style plan-everything-in-advance model does not work for real-world coding.  The halfway point addition may solve this somewhat, but this is still going to dramatically reduce the number of ideas that can be proposed as tasks.</p>
<p>Secondly, Google is requiring projects to submit at least 5 tasks of each category just to apply.  Quality assurance, translation, documentation, coding, outreach, training, user interface, and research.  For large projects like Gnome, this is easy: they can certainly come up with 5 for each on such a large, general project.  But often for a small, focused project, some of these are completely irrelevant.  This rules out a huge number of smaller projects that just don&#8217;t have relevant work in all these categories.  x264 may be saved here: as we work under the Videolan umbrella, we&#8217;ll likely be able to fudge enough tasks from Videolan to cover the gaps.  But for hundreds of other organizations, they are going to be out of luck.  It would make more sense to require, say, 5 out of 8 of the categories, to allow some flexibility, while still encouraging interesting non-coding tasks.</p>
<p>For example, what&#8217;s &#8220;user interface&#8221; for a software library with a stable API, say, a libc?  Can you make 5 tasks out of it that are actually useful?</p>
<p>If x264 applied on its own, could you come up with 5 real, meaningful tasks in each category for it?  It might be possible, but it&#8217;d require a lot of stretching.</p>
<p>How many smaller or more-focused projects do you think are going to give up and not apply because of this?</p>
<p>Is GCI supposed to be something for everyone, or just or Gnome, KDE, and other megaprojects?</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/658/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Summer of Code (in space)</title>
		<link>http://x264dev.multimedia.cx/archives/655</link>
		<comments>http://x264dev.multimedia.cx/archives/655#comments</comments>
		<pubDate>Sun, 24 Jul 2011 22:28:06 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[SOCIS]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=655</guid>
		<description><![CDATA[There&#8217;s apparently another Summer of Code in town.  x264 has been accepted into the ESA Summer of Code in Space.  Just like Google Summer of Code, work on x264 over the summer and get paid!  Watch out though; only some countries are allowed, so check first if you&#8217;re allowed to participate.  The application deadline is [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s apparently another Summer of Code in town.  x264 has been accepted into the ESA <a title="Summer of Code in Space" href="http://wiki.videolan.org/SOCIS_x264_2011" target="_blank">Summer of Code in Space</a>.  Just like Google Summer of Code, work on x264 over the summer and get paid!  Watch out though; only some countries are allowed, <a href="http://sophia.estec.esa.int/socis2011/?q=faq#socis_elig_org_who" target="_blank">so check first if you&#8217;re allowed to participate</a>.  The application deadline is July 27, 11AM (UTC); sorry for the short notice this time around!</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/655/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>You should apply for x264 Google Summer of Code</title>
		<link>http://x264dev.multimedia.cx/archives/648</link>
		<comments>http://x264dev.multimedia.cx/archives/648#comments</comments>
		<pubDate>Sun, 03 Apr 2011 06:33:09 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[GSOC]]></category>
		<category><![CDATA[x264]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=648</guid>
		<description><![CDATA[Want to do some fun open source work and get paid?  You should apply for GSOC.  Check out our ideas page and the official Google page. (And yes, I&#8217;ll get around to approving the queued comments and writing more real posts.  Eventually!  I promise!)]]></description>
			<content:encoded><![CDATA[<p>Want to do some fun open source work and get paid?  You should apply for GSOC.  Check out our <a href="http://wiki.videolan.org/SoC_x264_2011" target="_blank">ideas page</a> and the <a href="http://www.google-melange.com/gsoc/homepage/google/gsoc2011" target="_blank">official Google page</a>.</p>
<p>(And yes, I&#8217;ll get around to approving the queued comments and writing more real posts.  Eventually!  I promise!)</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/648/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Direct from the Blu-ray disc</title>
		<link>http://x264dev.multimedia.cx/archives/643</link>
		<comments>http://x264dev.multimedia.cx/archives/643#comments</comments>
		<pubDate>Sun, 05 Dec 2010 23:35:43 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[blu-ray]]></category>
		<category><![CDATA[x264]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=643</guid>
		<description><![CDATA[A MediaInfo from the Warner Brothers&#8217; Blu-ray &#8220;The Town&#8220;: General Complete name : 00020.m2ts Format : BDAV Format/Info : Advanced Video Codec File size : 528 KiB Duration : 900ms Overall bit rate : 4 745 Kbps Maximum Overall bit rate : 15.0 Mbps Video ID : 4113 (0x1011) Menu ID : 1 (0x1) Format [...]]]></description>
			<content:encoded><![CDATA[<p>A MediaInfo from the Warner Brothers&#8217; Blu-ray &#8220;<a href="http://www.imdb.com/title/tt0840361/" target="_blank">The Town</a>&#8220;:</p>
<div><code>General</code></div>
<div><code>Complete name                    : 00020.m2ts</code></div>
<div><code>Format                           : BDAV</code></div>
<div><code>Format/Info                      : Advanced Video Codec</code></div>
<div><code>File size                        : 528 KiB</code></div>
<div><code>Duration                         : 900ms</code></div>
<div><code>Overall bit rate                 : 4 745 Kbps</code></div>
<div><code>Maximum Overall bit rate         : 15.0 Mbps</code></div>
<div><code>Video</code></div>
<div><code>ID                               : 4113 (0x1011)</code></div>
<div><code>Menu ID                          : 1 (0x1)</code></div>
<div><code>Format                           : AVC</code></div>
<div><code>Format/Info                      : Advanced Video Codec</code></div>
<div><code>Format profile                   : High@L4.0</code></div>
<div><code>Format settings, CABAC           : Yes</code></div>
<div><code>Format settings, ReFrames        : 3 frames</code></div>
<div><code>Codec ID                         : 27</code></div>
<div><code>Duration                         : 1s 1ms</code></div>
<div><code>Bit rate mode                    : Variable</code></div>
<div><code>Bit rate                         : 5 000 Kbps</code></div>
<div><code>Maximum bit rate                 : 24.0 Mbps</code></div>
<div><code>Width                            : 1 920 pixels</code></div>
<div><code>Height                           : 1 080 pixels</code></div>
<div><code>Display aspect ratio             : 16:9</code></div>
<div><code>Frame rate                       : 23.976 fps</code></div>
<div><code>Color space                      : YUV</code></div>
<div><code>Chroma subsampling               : 4:2:0</code></div>
<div><code>Bit depth                        : 8 bits</code></div>
<div><code>Scan type                        : Progressive</code></div>
<div><code>Bits/(Pixel*Frame)               : 0.101</code></div>
<div><code>Stream size                      : 611 KiB</code></div>
<div><span style="color: #ff0000;"><strong><code>Writing library                  : x264 core 104 r1683 62997d6</code></strong></span></div>
<p><code> </code></p>
<p>(Yes, it&#8217;s just a menu.  But <a href="http://www.x264bluray.com/" target="_blank">good things</a> start small!)</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/643/feed</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Announcing TMPGEnc 4: now with x264!</title>
		<link>http://x264dev.multimedia.cx/archives/584</link>
		<comments>http://x264dev.multimedia.cx/archives/584#comments</comments>
		<pubDate>Fri, 26 Nov 2010 04:35:47 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[commercial]]></category>
		<category><![CDATA[japan]]></category>
		<category><![CDATA[licensing]]></category>
		<category><![CDATA[x264]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=584</guid>
		<description><![CDATA[A few months ago, we announced a commercial licensing program so that even companies unable to use GPL software in their products have a chance to use the open source x264 instead of proprietary alternatives.  The system worked on two basic concepts.  First, all licensees would still be required to give their changes to x264 [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago, we <a href="http://mailman.videolan.org/pipermail/x264-devel/2010-July/007508.html" target="_blank">announced a commercial licensing program</a> so that even companies unable to use GPL software in their products have a chance to use the open source x264 instead of proprietary alternatives.  The system worked on two basic concepts.  First, all licensees would still be required to give their changes to x264 back to us: x264 must forever remain free, with no useful contributions kept hidden from the community.  Second, all the profits would go directly back to x264, primarily to the developers who&#8217;ve made the most significant contributions to x264 over the years, but also to funding future development, bounties for new features, as well as contributing to other related projects (e.g. Videolan and ffmpeg).</p>
<p>Over the past couple of months, we&#8217;ve gotten an enormous response; over 40 companies have inquired about licensing, with more contacting us every day.  Due to the sheer volume of interest, we&#8217;ve partnered with <a href="http://corecodec.com/" target="_blank">CoreCodec</a>, the creators of the <a href="http://www.matroska.org/" target="_blank">free Matroska container format</a> and developers of <a href="http://corecodec.com/products/coreavc" target="_blank">CoreAVC</a>, to make x264 as widely available as possible in the world of commercial software as it is in the world of open source.  All of this is already filtering back to benefiting x264 users, with many bugs being reported by commercial licensees as well as some code contributed.</p>
<p>Today, we announce the first commercial consumer encoding software to switch to x264: <a href="http://tmpgenc.pegasys-inc.com/en/product/te4xp.html" target="_blank">Pegasys Inc.&#8217;s TMPGEnc</a>.  Expect many more to follow: with x264 now available commercially as well as freely, there are few excuses left to use any other H.264 encoder.  Vendors of overpriced, underpowered proprietary competitors should begin looking for new jobs.</p>
<p>(Pegasys press release: <a href="http://tmpgenc.pegasys-inc.com/en/press/10_1125.html" target="_blank">English</a>, <a href="http://tmpgenc.pegasys-inc.com/ja/press/10_1126.html" target="_blank">Japanese</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/584/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Patent skullduggery: Tandberg rips off x264 algorithm</title>
		<link>http://x264dev.multimedia.cx/archives/589</link>
		<comments>http://x264dev.multimedia.cx/archives/589#comments</comments>
		<pubDate>Thu, 25 Nov 2010 21:30:45 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[patents]]></category>
		<category><![CDATA[ripoffs]]></category>
		<category><![CDATA[x264]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=589</guid>
		<description><![CDATA[Update: Tandberg claims they came up with the algorithm independently: to be fair, I can actually believe this to some extent, as I think the algorithm is way too obvious to be patented.  Of course, they also claim that the algorithm isn&#8217;t actually identical, since they don&#8217;t want to lose their patent application. I still [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Update: Tandberg claims they came up with the algorithm independently: to be fair, I can actually believe this to some extent, as I think the algorithm is way too obvious to be patented.  Of course, they also claim that the algorithm isn&#8217;t actually identical, since they don&#8217;t want to lose their patent application.</strong></p>
<p><strong>I still don&#8217;t trust them, but it&#8217;s possible it&#8217;s merely bad research (and thus being unaware of prior art) as opposed to anything malicious.  Furthermore, word from within their office suggests they&#8217;re quite possibly being honest: supposedly the development team does not read x264 code at all.  So this might just all be very bad luck.</strong></p>
<p><strong>Regardless, the patent is still complete tripe, and should never have been filed.<br />
</strong></p>
<p><strong>Most importantly, stop harassing the guy whose name is on the patent (Lars): he&#8217;s just a programmer, not the management or lawyers responsible for filing the patent.  This is stupid and unnecessary.  I&#8217;ve removed the original post because of this; it can be found <a href="http://x264.nl/developers/Dark_Shikari/tandberg.html" target="_blank">here</a> for those who want to read it.</strong></p>
<p><span style="color: #000000;"><strong><span id="more-589"></span></strong></span><strong>Appendix: the details of the patent:</strong></p>
<p>I figure I&#8217;ll go over the exact correspondence between the patent and my code here.</p>
<p>﻿<em>1. A method for calculating run and level representations of quantized transform coefficients representing pixel values included in a block of a video picture, the method comprising:</em></p>
<p>Translation: It&#8217;s a run-level coder.</p>
<p><em>packing, at a video processing apparatus, each quantized transform  coefficients in a value interval [Max, Min] by setting all quantized  transform coefficients greater than Max equal to Max, and all quantized  transform coefficients less than Min equal to Min</em></p>
<p>The quantized coefficients are clipped to a certain valid range to allow them to be packed into bytes (they start as 16-bit values).</p>
<p><em>reordering, at the video processing apparatus, the quantized transform  ID coefficients according to a predefined order depending on respective  positions in the block resulting in an array C of reordered quantized  transform coefficients</em></p>
<p>This is the zigzag pattern used in H.264 (and most formats) for reordering DCT coefficients.  In x264, this is done before the run-level coder ste.</p>
<p><em>masking, at the video processing apparatus, C by generating an array M  containing ones in positions corresponding to positions of C having  non-zero values, and zeros in positions corresponding to positions of C  having zero values</em></p>
<p>This is creating a bitmask based on the coefficient values, the pmovmskb step.</p>
<p><em>is generating, at the video processing apparatus, for each position  containing a one in M, a run and a level representation by setting the  level value equal to an occurring value in a corresponding position of  C; and setting, at the video processing apparatus, for each position  containing a one in M<sub>5</sub></em> <em>the run value equal to the number of  proceeding positions relative to a current position in M since a  previous occurrence of one in M.</em></p>
<p>This is the process of creating run/level values from the bitmask.</p>
<p>Now into the detailed claims:</p>
<p>﻿﻿﻿<em>2. The method according to Claim 1, wherein the masking further includes, creating an array C from C where positions corresponding to positions of nonzero values in C are filled with ones, and positions corresponding to positions of zero values in C are filled with zeros, and creating M from C by extracting the most significant bit from values in respective position of C and inserting the bits in corresponding positions in M.</em></p>
<p>They&#8217;re extracting the most significant bit of the values to create a bitmask.  This is exactly what the pmovmskb in my algorithm does.</p>
<p>﻿<em>3. The method according to Claim 2, wherein the creating of the array C is executed by a C++ function PCMPGTB, and the creating of M from C is executed by a C++ function PMOVMSKB. </em></p>
<p>And here they use pcmpgtb (they call it a C++ function for some reason, but it&#8217;s a SSE instruction) to do the clipping of the input values.   This is exactly the same method I used in decimate_score.  They also use pmovmskb as mentioned.</p>
<p>﻿<em>4. The method according to Claim 1 , wherein the generating of the run and level representation further includes determining positions containing non-zero values in C by corresponding positions containing ones in M.</em></p>
<p><em>5. The method according to Claim 4, wherein the determining of positions containing non-zero values in C is executed by a C++ function BSF.</em></p>
<p>Here they iterate over the bitmask of transform coefficients using a &#8220;BSF&#8221; function to find runs, which is exactly what I did.  Of course, BSF isn&#8217;t a function, it&#8217;s an x86 instruction.</p>
<p><em>﻿6. The method according to Claim 1 , wherein Max is 256 and Min is 0.</em></p>
<p>This is almost surely a typo or mistake of some sort.  They mean the Max should be 255, not 256: 256 doesn&#8217;t fit in a uint8_t.</p>
<p><em>7. The method according to Claim 1 , wherein the predefined order follows a zigzag path of transform coefficient positions in the block starting in an upper left corner heading towards a lower right corner.</em></p>
<p>This is a description of the typical DCT zigzag pattern (like in H.264, MPEG-2, Theora, etc).</p>
<p>Everything after this part is just repeating itself with the phrase &#8220;an apparatus&#8221; added in order to make the USPTO listen to them.</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/589/feed</wfw:commentRss>
		<slash:comments>61</slash:comments>
		</item>
		<item>
		<title>How to contribute to open source, for companies</title>
		<link>http://x264dev.multimedia.cx/archives/576</link>
		<comments>http://x264dev.multimedia.cx/archives/576#comments</comments>
		<pubDate>Mon, 18 Oct 2010 01:50:36 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[x264]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=576</guid>
		<description><![CDATA[I have seen many nigh-incomprehensible attempts by companies to contribute to open source projects, including x264.  Developers are often simply boggled, wondering why the companies seem incapable of proper communication.  The companies assume the developers are being unreceptive, while the developers assume the companies are being incompetent, idiotic, or malicious.  Most of this seems to [...]]]></description>
			<content:encoded><![CDATA[<p>I have seen many nigh-incomprehensible attempts by companies to contribute to open source projects, including x264.  Developers are often simply boggled, wondering why the companies seem incapable of proper communication.  The companies assume the developers are being unreceptive, while the developers assume the companies are being incompetent, idiotic, or malicious.  Most of this seems to boil down to a basic lack of understanding of how open source works, resulting in a wide variety of misunderstandings.  Accordingly, this post will cover the <em>do</em>s and <em>don&#8217;t</em>s of corporate contribution to open source.</p>
<p><span id="more-576"></span><strong>Do: contact the project using their preferred medium of communication.</strong></p>
<p>Most open source projects use public methods of communication, such as mailing lists and IRC.  It&#8217;s not the end of the world if you mistakenly make contact with the wrong people or via the wrong medium, but be prepared to switch to the correct one once informed!  You may not be experienced using whatever form of communication the project uses, but if you refuse to communicate through proper channels, they will likely not be as inclined to assist you.  Larger open source projects are often much like companies in that they have different parts to their organization with different roles.  Don&#8217;t assume that everyone is a major developer!</p>
<p>If you don&#8217;t know what to do, a good bet is often to just ask someone.</p>
<p><strong>Don&#8217;t: contact only one person.</strong></p>
<p>Open source projects are a communal effort.  Major contributions are looked over by multiple developers and are often discussed by the community as a whole.  Yet many companies tend to contact only a single person in lieu of dealing with the project proper.  This has many flaws: to begin with, it forces a single developer (who isn&#8217;t paid by you) to act as your liaison, adding yet another layer between what you want and the people you want to talk to.  Contribution to open source projects should not be a game of telephone.</p>
<p>Of course, there are exceptions to this: sometimes a single developer is in charge of the entirety of some particular aspect of a project that you intend to contribute to, in which case this might not be so bad.</p>
<p><strong>Do: make clear exactly what it is you are contributing.</strong></p>
<p>Are you contributing code?  Development resources?  Money?  API documentation?  Make it as clear as possible, from the start!  How developers react, which developers get involved, and their expectations will depend heavily on what they <em>think</em> you are providing.  Make sure their expectations match reality.  Great confusion can result when they do not.</p>
<p>This also applies in the reverse &#8212; if there&#8217;s something you need from the project, such as support or assistance with development of your patch, make that explicitly clear.</p>
<p><strong>Don&#8217;t: code dump.</strong></p>
<p>Code does not have intrinsic value: it is only useful as part of a working, living project.  Most projects react very negatively to large &#8220;dumps&#8221; of code without associated human resources.  That is, they expect you to work with them to finalize the code until it is ready to be committed.  Of course, it&#8217;s better to work with the project from the start: this avoids the situation of writing 50,000 lines of code independently and then finding that half of it needs to be rewritten.  Or, worse, writing an enormous amount of code only to find it completely unnecessary.</p>
<p>Of course, the reverse option &#8212; keeping such code to yourself &#8212; is often even more costly, as it forces you to maintain the code instead of the official developers.</p>
<p><strong>Do: ignore trolls.</strong></p>
<p>As mentioned above, many projects use public communication methods &#8212; which, of course, allow anyone to communicate, by nature of being public.  Not everyone on a project&#8217;s IRC or mailing list is necessarily qualified to officially represent the project.  It is not too uncommon for a prospective corporate contributor to be turned off by the uninviting words of someone who isn&#8217;t even involved in the project due to assuming that they were.  Make sure you&#8217;re dealing with the right people before making conclusions.</p>
<p><strong>Don&#8217;t: disappear.</strong></p>
<p>If you are going to try to be involved in a project, you need to stay in contact.  We&#8217;ve had all too many companies who simply disappear after the initial introduction.  Some tell us that we&#8217;ll need an NDA, then never provide it or send status updates.  <em>You</em> may know why you&#8217;re not in contact &#8212; political issues at the company, product launch crunches, a nice vacation to the Bahamas &#8212; but <em>we </em>don&#8217;t!  If you disappear, we will assume that you gave up.</p>
<p>Above all, don&#8217;t assume that being at a large successful company makes you immune to these problems.  If anything, these problems seem to be the most common at the largest companies.  I didn&#8217;t name any names in this post, but practically every single one of these rules has been violated at some point by companies looking to contribute to x264.  In the larger scale of open source, these problems happen constantly.  Don&#8217;t fall into the same traps that many other companies have.</p>
<p>If you&#8217;re an open source developer reading this post, remember it next time you see a company acting seemingly nonsensically in an attempt to contribute: it&#8217;s quite possible they just don&#8217;t know what to do.  And just because they&#8217;re doing it wrong doesn&#8217;t mean that it isn&#8217;t your responsibility to try to help them do it right.</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/576/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>H.264 and VP8 for still image coding: WebP?</title>
		<link>http://x264dev.multimedia.cx/archives/541</link>
		<comments>http://x264dev.multimedia.cx/archives/541#comments</comments>
		<pubDate>Fri, 01 Oct 2010 02:48:23 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[H.264]]></category>
		<category><![CDATA[psychovisual optimizations]]></category>
		<category><![CDATA[VP8]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=541</guid>
		<description><![CDATA[Update: post now contains a Theora comparison as well; see below. JPEG is a very old lossy image format.  By today&#8217;s standards, it&#8217;s awful compression-wise: practically every video format since the days of MPEG-2 has been able to tie or beat JPEG at its own game.  The reasons people haven&#8217;t switched to something more modern [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Update: </strong>post now contains a Theora comparison as well; see below.</p>
<p>JPEG is a very old lossy image format.  By today&#8217;s standards, it&#8217;s awful compression-wise: practically every video format since the days of MPEG-2 has been able to tie or beat JPEG at its own game.  The reasons people haven&#8217;t switched to something more modern practically always boil down to a simple one &#8212; it&#8217;s just not worth the hassle.  Even if JPEG can be beaten by a factor of 2, convincing the entire world to change image formats after 20 years is nigh impossible.  Furthermore, JPEG is fast, simple, and practically guaranteed to be free of any intellectual property worries.  It&#8217;s been tried before: JPEG-2000 first, then Microsoft&#8217;s JPEG XR, both tried to unseat JPEG.  Neither got much of anywhere.</p>
<p>Now Google is trying to dump yet another image format on us, &#8220;WebP&#8221;.  But really, it&#8217;s just a VP8 intra frame.  There are some obvious practical problems with this new image format in comparison to JPEG; it doesn&#8217;t even support all of JPEG&#8217;s features, let alone many of the much-wanted features JPEG was missing (<a href="http://x264dev.multimedia.cx/?p=541#comment-6176" target="_blank"><span style="text-decoration: line-through;">alpha channel support</span></a>, lossless support).  It only supports 4:2:0 chroma subsampling, while JPEG can handle 4:2:2 and 4:4:4.  Google doesn&#8217;t seem interested in adding any of these features either.</p>
<p>But let&#8217;s get to the meat and see how these encoders stack up on compressing still images.  <a href="http://x264dev.multimedia.cx/?p=377" target="_blank">As I explained in my original analysis</a>, VP8 has the advantage of H.264&#8242;s intra prediction, which is one of the primary reasons why H.264 has such an advantage in intra compression.  It only has i4x4 and i16x16 modes, not i8x8, so it&#8217;s not quite as fancy as H.264&#8242;s, but it comes close.</p>
<p><span id="more-541"></span>The test files are all around 155KB; download them for the exact filesizes.  For all three, I did a binary search of quality levels to get the file sizes close.  For x264, I encoded with <code>--tune stillimage --preset placebo</code>.  For libvpx, I encoded with <code>--best</code>.  For JPEG, I encoded with ffmpeg, then applied <a href="http://akuvian.org/src/jpgcrush.tar.gz" target="_blank">jpgcrush</a>, a lossless jpeg compressor.  I suspect there are better JPEG encoders out there than ffmpeg; if you have one, feel free to test it and post the results.  The <a href="http://x264.nl/developers/Dark_Shikari/imagecoding/source.png" target="_blank">source image</a> is the 200th frame of Parkjoy, from <a href="http://media.xiph.org/video/derf/" target="_blank">derf&#8217;s page</a> (fun fact: this video was shot <a href="http://maps.google.com/maps?f=q&amp;source=s_q&amp;hl=en&amp;geocode=&amp;q=djurg%C3%A5rden+stockholm&amp;sll=56.607885,17.138672&amp;sspn=40.475203,55.019531&amp;ie=UTF8&amp;hq=&amp;hnear=Djurg%C3%A5rden&amp;ll=59.328625,18.135724&amp;spn=0.004482,0.006716&amp;t=h&amp;z=17&amp;layer=c&amp;cbll=59.328625,18.135724&amp;cbp=12,0,,0,5&amp;photoid=po-1607419" target="_blank">here</a>!  More info on the video <a href="http://media.xiph.org/video/derf/vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/SVT_MultiFormat_v10.pdf" target="_blank">here</a>.).</p>
<p>Files: (<a href="http://x264.nl/developers/Dark_Shikari/imagecoding/output.h264">x264</a> [154KB], <a href="http://x264.nl/developers/Dark_Shikari/imagecoding/output.ivf">vp8</a> [155KB], <a href="http://x264.nl/developers/Dark_Shikari/imagecoding/output.jpg" target="_blank">jpg</a> [156KB])</p>
<p>Results (decoded to PNG): (<a href="http://x264.nl/developers/Dark_Shikari/imagecoding/x264.png" target="_blank">x264</a>, <a href="http://x264.nl/developers/Dark_Shikari/imagecoding/vp8.png" target="_blank">vp8</a>, <a href="http://x264.nl/developers/Dark_Shikari/imagecoding/jpeg.png">jpg</a>)</p>
<p>This seems rather embarrassing for libvpx.  Personally I think VP8 looks by far the worst of the bunch, despite JPEG&#8217;s blocking.  What&#8217;s going on here?  VP8 certainly has better entropy coding than JPEG does (by far!).  It has better intra prediction (JPEG has just DC prediction).  How could VP8 look worse?  Let&#8217;s investigate.</p>
<p>VP8 uses a 4&#215;4 transform, which tends to blur and lose more detail than JPEG&#8217;s 8&#215;8 transform.  But that alone certainly isn&#8217;t enough to create such a dramatic difference.  Let&#8217;s investigate a hypothesis &#8212; that the problem is that libvpx is optimizing for PSNR and ignoring psychovisual considerations when encoding the image&#8230; I&#8217;ll encode with <code>--tune psnr --preset placebo</code> in x264, turning off all psy optimizations.  <em></em></p>
<p>Files: (<a href="http://x264.nl/developers/Dark_Shikari/imagecoding/output_psnr.h264" target="_blank">x264, optimized for PSNR</a> [154KB]) [<em>Note for the technical people: because adaptive quantization is off, to get the filesize on target I had to use a CQM here.]</em></p>
<p>Results (decoded to PNG): (<a href="http://x264.nl/developers/Dark_Shikari/imagecoding/x264_psnr.png" target="_blank">x264, optimized for PSNR</a>)</p>
<p>What a blur!  Only somewhat better than VP8, and still worse than JPEG.  And that&#8217;s using the same encoder and the same level of analysis &#8212; the only thing done differently is dropping the psy optimizations.  Thus we come back to the conclusion I&#8217;ve made over and over on this blog &#8212; the encoder matters more than the video format, and good psy optimizations are more important than anything else for compression.  libvpx, a much more powerful encoder than ffmpeg&#8217;s jpeg encoder, loses because it tries too hard to optimize for PSNR.</p>
<p>These results raise an obvious question &#8212; is Google nuts?  I could understand the push for &#8220;WebP&#8221; if it was better than JPEG.  And sure, technically as a file format it is, and an encoder could be made for it that&#8217;s better than JPEG.  But note the word &#8220;could&#8221;.  Why announce it <em>now </em>when libvpx is still such an awful encoder?  You&#8217;d have to be nuts to try to replace JPEG with this blurry mess as-is.  Now, I don&#8217;t expect libvpx to be able to compete with x264, the best encoder in the world &#8212; but surely it should be able to beat an image format released in 1992?</p>
<p>Earth to Google: make the encoder good first, <em>then</em> promote it as better than the alternatives.  The reverse doesn&#8217;t work quite as well.</p>
<p><strong>Addendum </strong>(added Oct. 2, 03:51)<strong>:</strong></p>
<p>maikmerten gave me a Theora-encoded image to compare as well.  Here&#8217;s the <a href="http://x264.nl/developers/Dark_Shikari/imagecoding/theora.png" target="_blank">PNG</a> and the <a href="http://x264.nl/developers/Dark_Shikari/imagecoding/output.ogv">source</a> (155KB).  And yes, that&#8217;s Theora 1.2 (Ptalarbvorm) beating VP8 handily.  Now <em>that</em> is embarassing.  Guess what the main new feature of Ptalarbvorm is?  Psy optimizations&#8230;</p>
<p><strong>Addendum (added Apr. 20, 23:33):</strong></p>
<p>There&#8217;s a new webp encoder out, written from scratch by skal (available in libwebp).  It&#8217;s significantly better than libvpx &#8212; not like that says much &#8212; but it should probably beat JPEG much more readily now.  The encoder design is rather unique &#8212; it basically uses K-means for a large part of the encoding process.  It still loses to x264, but that was expected.</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 318px; width: 1px; height: 1px; overflow: hidden;">[155KB]</div>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/541/feed</wfw:commentRss>
		<slash:comments>126</slash:comments>
<enclosure url="http://x264.nl/developers/Dark_Shikari/imagecoding/output.ogv" length="159972" type="video/ogg" />
		</item>
		<item>
		<title>Announcing the world&#8217;s fastest VP8 decoder: ffvp8</title>
		<link>http://x264dev.multimedia.cx/archives/499</link>
		<comments>http://x264dev.multimedia.cx/archives/499#comments</comments>
		<pubDate>Fri, 23 Jul 2010 23:01:54 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[VP8]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=499</guid>
		<description><![CDATA[Back when I originally reviewed VP8, I noted that the official decoder, libvpx, was rather slow.  While there was no particular reason that it should be much faster than a good H.264 decoder, it shouldn&#8217;t have been that much slower either!  So, I set out with Ronald Bultje and David Conrad to make a better [...]]]></description>
			<content:encoded><![CDATA[<p>Back when I <a href="http://x264dev.multimedia.cx/?p=377" target="_blank">originally reviewed VP8</a>, I noted that the official decoder, libvpx, was rather slow.  While there was no particular reason that it should be much faster than a good H.264 decoder, it shouldn&#8217;t have been that much slower either!  So, I set out with Ronald Bultje and David Conrad to make a better one in FFmpeg.  This one would be community-developed and free from the beginning, rather than the proprietary code-dump that was libvpx.  A few weeks ago the decoder was complete enough to be bit-exact with libvpx, making it the first independent free implementation of a VP8 decoder.  Now, with the first round of optimizations complete, it should be ready for primetime.  I&#8217;ll go into some detail about the development process, but first, let&#8217;s get to the real meat of this post: the benchmarks.</p>
<p style="text-align: left;"><span id="more-499"></span>We tested on two 1080p clips: <a href="http://x264.nl/developers/Dark_Shikari/parkjoy.ivf" target="_blank">Parkjoy</a>, a live-action 1080p clip, and the <a href="http://x264.nl/developers/Dark_Shikari/sintel_trailer_1080p_vp8_vorbis.webm" target="_blank">Sintel trailer</a>, a CGI 1080p clip.  Testing was done using &#8220;time ffmpeg -vcodec {libvpx or vp8} -i input -vsync 0 -an -f null -&#8221;.  We all used the latest SVN FFmpeg at the time of this posting; the last revision optimizing the VP8 decoder was r24471.</p>
<p style="text-align: center;"><a href="http://x264.nl/developers/Dark_Shikari/parkjoy.png" target="_blank"><img class="aligncenter" title="Parkjoy graph" src="http://x264.nl/developers/Dark_Shikari/parkjoy.png" alt="Parkjoy graph" width="647" height="384" /></a><a href="http://x264.nl/developers/Dark_Shikari/sintel.png" target="_blank"><img class="aligncenter" title="Sintel graph" src="http://x264.nl/developers/Dark_Shikari/sintel.png" alt="Sintel graph" width="645" height="375" /></a></p>
<p>As these benchmarks show, ffvp8 is clearly much faster than libvpx, particularly on 64-bit.  It&#8217;s even faster by a large margin on Atom, despite the fact that we haven&#8217;t even begun optimizing for it.  In many cases, ffvp8&#8242;s extra speed can make the difference between a video that plays and one that doesn&#8217;t, especially in modern browsers with software compositing engines taking up a lot of CPU time.  Want to get faster playback of VP8 videos?  The next versions of FFmpeg-based players, like VLC, will include ffvp8.  Want to get faster playback of WebM in your browser?  Lobby your browser developers to use ffvp8 instead of libvpx.  I expect Chrome to switch first, as they already use libavcodec for most of their playback system.</p>
<p>Keep in mind ffvp8 is not &#8220;done&#8221; &#8212; we will continue to improve it and make it faster.  We still have a number of optimizations in the pipeline that aren&#8217;t committed yet.</p>
<h3>Developing ffvp8</h3>
<p>The initial challenge, primarily pioneered by David and Ronald, was constructing the core decoder and making it bit-exact to libvpx.  This was rather challenging, especially given the <a href="http://x264dev.multimedia.cx/?p=486" target="_blank">lack of a real spec</a>.  Many parts of the spec were outright misleading and contradicted libvpx itself.  It didn&#8217;t help that the suite of official conformance tests didn&#8217;t even cover all the features used by the official encoder!  We&#8217;ve already started adding our own conformance tests to deal with this.  But I&#8217;ve complained enough in past posts about the lack of a spec; let&#8217;s get onto the gritty details.</p>
<p>The next step was adding <a href="http://en.wikipedia.org/wiki/SIMD" target="_blank">SIMD</a> assembly for all of the important <a href="http://en.wikipedia.org/wiki/Digital_signal_processing" target="_blank">DSP</a> functions.  VP8&#8242;s motion compensation and deblocking filter are by far the most CPU-intensive parts, much the same as in H.264.  Unlike H.264, the deblocking filter relies on a lot of internal saturation steps, which are free in SIMD but costly in a normal C implementation, making the plain C code even slower.  Of course, none of this is a particularly large problem; any sane video decoder has all this stuff in SIMD.</p>
<p>I tutored Ronald in x86 SIMD and wrote most of the motion compensation, intra prediction, and some inverse transforms.  Ronald wrote the rest of the inverse transforms and a bit of the motion compensation.  He also did the most difficult part: the deblocking filter.  Deblocking filters are always a bit difficult because every one is different.  Motion compensation, by comparison, is usually very similar regardless of video format; a 6-tap filter is a 6-tap filter, and most of the variation going on is just the choice of numbers to multiply by.</p>
<p>The biggest challenge in an SIMD deblocking filter is to avoid unpacking, that is, going from 8-bit to 16-bit.  Many operations in deblocking filters would naively appear to require more than 8-bit precision.  A simple example in the case of x86 is abs(a-b), where a and b are 8-bit unsigned integers.  The result of &#8220;a-b&#8221; requires a 9-bit signed integer (it can be anywhere from -255 to 255), so it can&#8217;t fit in 8-bit.  But this is quite possible to do without unpacking: (satsub(a,b) | satsub(b,a)), where &#8220;satsub&#8221; performs a saturating subtract on the two values.  If the value is positive, it yields the result; if the value is negative, it yields zero.  Oring the two together yields the desired result.  This requires 4 ops on x86; unpacking would probably require at least 10, including the unpack and pack steps.</p>
<p>After the SIMD came optimizing the C code, which still took a significant portion of the total runtime.  One of my biggest optimizations was adding aggressive &#8220;smart&#8221; prefetching to reduce cache misses.  ffvp8 prefetches the reference frames (PREVIOUS, GOLDEN, and ALTREF)&#8230; but only the ones which have been used reasonably often this frame.  This lets us prefetch everything we need without prefetching things that we probably won&#8217;t use.  libvpx very often encodes frames that almost never (but not quite never) use GOLDEN or ALTREF, so this optimization greatly reduces time spent prefetching in a lot of real videos.  There are of course countless other optimizations we made that are too long to list here as well, such as David&#8217;s entropy decoder optimizations.  I&#8217;d also like to thank Eli Friedman for his invaluable help in benchmarking a lot of these changes.</p>
<p>What next?  Altivec (PPC) assembly is almost nonexistent, with the only functions being David&#8217;s motion compensation code.  NEON (ARM) is completely nonexistent: we&#8217;ll need that to be fast on mobile devices as well.  Of course, all this will come in due time &#8212; and as always &#8212; patches welcome!</p>
<h3>Appendix: the raw numbers</h3>
<p>Here&#8217;s the raw numbers (in fps) for the graphs at the start of this post, with <a href="http://en.wikipedia.org/wiki/Standard_error_%28statistics%29" target="_blank">standard error</a> values:</p>
<p><strong>Core i7 620QM (1.6Ghz), Windows 7, 32-bit:</strong><br />
Parkjoy ffvp8: 44.58 +/- 0.44<br />
Parkjoy libvpx: 33.06 +/- 0.23<br />
Sintel ffvp8: 74.26 +/- 1.18<br />
Sintel libvpx: 56.11 +/- 0.96</p>
<p><strong>Core i5 520M (2.4Ghz), Linux, 64-bit:</strong><br />
Parkjoy ffvp8: 68.29 +/- 0.06<br />
Parkjoy libvpx: 41.06 +/- 0.04<br />
Sintel ffvp8: 112.38 +/- 0.37<br />
Sintel libvpx: 69.64 +/- 0.09</p>
<p><strong>Core 2 T9300 (2.5Ghz), </strong><strong>Mac OS X 10.6.4</strong><strong>, 64-bit:</strong><br />
Parkjoy ffvp8: 54.09 +/- 0.02<br />
Parkjoy libvpx: 33.68 +/- 0.01<br />
Sintel ffvp8: 87.54 +/- 0.03<br />
Sintel libvpx: 52.74 +/- 0.04</p>
<p><strong>Core Duo (2Ghz), Mac OS X 10.6.4, 32-bit:</strong><br />
Parkjoy ffvp8: 21.31 +/- 0.02<br />
Parkjoy libvpx: 17.96 +/- 0.00<br />
Sintel ffvp8: 41.24 +/- 0.01<br />
Sintel libvpx: 29.65 +/- 0.02</p>
<p><strong>Atom N270 (1.6Ghz), Linux, 32-bit</strong><strong>:</strong><br />
Parkjoy ffvp8: 15.29 +/- 0.01<br />
Parkjoy libvpx: 12.46 +/- 0.01<br />
Sintel ffvp8: 26.87 +/- 0.05<br />
Sintel libvpx: 20.41 +/- 0.02</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/499/feed</wfw:commentRss>
		<slash:comments>132</slash:comments>
		</item>
		<item>
		<title>VP8: a retrospective</title>
		<link>http://x264dev.multimedia.cx/archives/486</link>
		<comments>http://x264dev.multimedia.cx/archives/486#comments</comments>
		<pubDate>Tue, 13 Jul 2010 10:06:49 +0000</pubDate>
		<dc:creator>Dark Shikari</dc:creator>
				<category><![CDATA[DCT]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[VP8]]></category>

		<guid isPermaLink="false">http://x264dev.multimedia.cx/?p=486</guid>
		<description><![CDATA[I&#8217;ve been working the past few weeks to help finish up the ffmpeg VP8 decoder, the first community implementation of On2&#8242;s VP8 video format.  Now that I&#8217;ve written a thousand or two lines of assembly code and optimized a good bit of the C code, I&#8217;d like to look back at VP8 and comment on [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working the past few weeks to help finish up the ffmpeg VP8 decoder, the first community implementation of On2&#8242;s VP8 video format.  Now that I&#8217;ve written a thousand or two lines of assembly code and optimized a good bit of the C code, I&#8217;d like to look back at VP8 and comment on a variety of things &#8212; both good and bad &#8212; that slipped the net the first time, along with things that have changed since the time of that blog post.</p>
<p>These are less-so issues related to compression &#8212; that issue has been beaten to death, particularly in MSU&#8217;s recent comparison, where x264 <a href="http://www.compression.ru/video/codec_comparison/h264_2010/appendixes.html#Appendix_8" target="_blank">beat the crap out of VP8</a> and the VP8 developers pulled a Pinocchio in the developer comments.  But that was expected and isn&#8217;t particularly interesting, so I won&#8217;t go into that.  VP8 doesn&#8217;t have to be the best in the world in order to be useful.</p>
<p>When the ffmpeg VP8 decoder is complete (just a few more asm functions to go), we&#8217;ll hopefully be able to post some benchmarks comparing it to libvpx.</p>
<p><span id="more-486"></span><strong>1.  The spec, er, I mean, bitstream guide.</strong></p>
<p>Google has reneged on their claim that a spec existed at all and renamed it a &#8220;bitstream guide&#8221;.  This is probably after it was found that &#8212; not merely was it incomplete &#8212; but at least a dozen places in the spec differed wildly from what was actually in their own encoder and decoder software!  The deblocking filter, motion vector clamping, probability tables, and many more parts simply disagreed flat-out with the spec.  Fortunately, Ronald Bultje, one of the main authors of the ffmpeg VP8 decoder, is rather skilled at reverse-engineering, so we were able to put together a matching implementation regardless.</p>
<p>Most of the differences aren&#8217;t particularly important &#8212; they don&#8217;t have a huge effect on compression or anything &#8212; but make it vastly more difficult to implement a &#8220;working&#8221; VP8 decoder, or for that matter, decide what &#8220;working&#8221; really is.  For example, Google&#8217;s decoder will, if told to &#8220;swap the ALT and GOLDEN reference frames&#8221;, overwrite both with GOLDEN, because it first sets GOLDEN = ALT, and then sets ALT = GOLDEN.  Is this a bug?  Or is this how it&#8217;s supposed to work?  It&#8217;s hard to tell &#8212; there isn&#8217;t a spec to say so.  Google says that whatever libvpx does is right, but I doubt they intended this.</p>
<p>I expect a spec will eventually be written, but it was a bit obnoxious of Google &#8212; both to the community and to their own developers &#8212; to release so early that they didn&#8217;t even have their own documentation ready.</p>
<p><strong>2.  The TM intra prediction mode.</strong></p>
<p>One thing I glossed over in the original piece was that On2 had added an extra intra prediction mode to the standard batch that H.264 came with &#8212; they replaced Planar with &#8220;TM pred&#8221;.  For i4x4, which didn&#8217;t have a Planar mode, they just added it without replacing an old one, resulting in a total of 10 modes to H.264&#8242;s 9.  After understanding and writing assembly code for TM pred, I have to say that it is quite a cool idea.  Here&#8217;s how it works:</p>
<p>1.  Let us take a block of size 4&#215;4, 8&#215;8, or 16&#215;16.</p>
<p>2.  Define the pixels bordering the top of this block (starting from the left) as T[0], T[1], T[2]&#8230;</p>
<p>3.  Define the pixels bordering the left of this block (starting from the top) as L[0], L[1], L[2]&#8230;</p>
<p>4.  Define the pixel above the top-left of the block as TL.</p>
<p>5.  Predict every pixel &lt;X,Y&gt; in the block to be equal to clip3( T[X] + L[Y] &#8211; TL, 0, 255).</p>
<p>It&#8217;s effectively a generalization of gradient prediction to the block level &#8212; predict each pixel based on the gradient between its top and left pixels, and the topleft.  According to the VP8 devs, it&#8217;s chosen by the encoder quite a lot of the time, which isn&#8217;t surprising; it seems like a pretty good idea.  As just one more intra pred mode, it&#8217;s not going to do magic for compression, but it&#8217;s a cool idea and elegantly simple.</p>
<p><strong>3.  Performance and the deblocking filter.<br />
</strong></p>
<p>On2 advertised for quite some that VP8&#8242;s goal was to be significantly faster to decode than H.264.  When I saw the spec, I waited for the punchline, but apparently they were <em>serious</em>.  There&#8217;s nothing wrong with being of similar speed or a bit slower &#8212; but I was rather confused as to the fact that their design didn&#8217;t match their stated goal at all.  What apparently happened is they had multiple profiles of VP8 &#8212; high and low complexity profiles.  They marketed the performance of the low complexity ones while touting the quality of the high complexity ones, a tad dishonest.  More importantly though, practically nobody is using the low complexity modes, so anyone writing a decoder has to be prepared to handle the high complexity ones, which are the default.</p>
<p>The primary time-eater here is the deblocking filter.  VP8, being an H.264 derivative, has much the same problem as H.264 does in terms of deblocking &#8212; it spends an absurd amount of time there.  As I write this post, we&#8217;re about to finish some of the deblocking filter asm code, but before it&#8217;s committed, up to 70% or more of total decoding time is spent in the deblocking filter!  Like H.264, it suffers from the 4&#215;4 transform problem: a 4&#215;4 transform requires a total of 8 length-16 and 8 length-8 loopfilter calls per macroblock, while Theora, with only an 8&#215;8 transform, requires half that.</p>
<p>This problem is aggravated in VP8 by the fact that the deblocking filter isn&#8217;t strength-adaptive; if even one 4&#215;4 block in a macroblock contains coefficients, every single edge has to be deblocked.  Furthermore, the deblocking filter itself is quite complicated; the &#8220;inner edge&#8221; filter is a bit more complex than H.264&#8242;s and the &#8220;macroblock edge&#8221; filter is vastly more complicated, having two entirely different codepaths chosen on a per-pixel basis.  Of course, in SIMD, this means you have to do both and mask them together at the end.</p>
<p>There&#8217;s nothing wrong with a good-but-slow deblocking filter.  But given the amount of deblocking one needs to do in a 4&#215;4-transform-based format, it might have been a better choice to make the filter simpler.  It&#8217;s pretty difficult to beat H.264 on compression, but it&#8217;s certainly not hard to beat it on speed &#8212; and yet it seems VP8 missed a perfectly good chance to do so.  Another option would have been to pick an 8&#215;8 transform instead of 4&#215;4, reducing the amount of deblocking by a factor of 2.</p>
<p>And yes, there&#8217;s a simple filter available in the low complexity profile, but it doesn&#8217;t help if nobody uses it.</p>
<p><strong>4.  Tree-based arithmetic coding.</strong></p>
<p>Binary arithmetic coding has become the standard entropy coding method for a wide variety of compressed formats, ranging from LZMA to VP6, H.264 and VP8.  It&#8217;s simple, relatively fast compared to other arithmetic coding schemes, and easy to make adaptive.  The problem with this is that you have to come up with a method for converting non-binary symbols into a list of binary symbols, and then choosing what probabilities to use to code each one.  Here&#8217;s an example from H.264, the sub-partition mode symbol, which is either 8&#215;8, 8&#215;4, 4&#215;8, or 4&#215;4.  encode_decision( context, bit ) writes a binary decision (bit) into a numbered context (context).</p>
<p>8&#215;8: encode_decision( 21, 0 );</p>
<p>8&#215;4: encode_decision( 21, 1 ); encode_decision( 22, 0 );</p>
<p>4&#215;8: encode_decision( 21, 1 ); encode_decision( 22, 1 ); encode_decision( 23, 1 );</p>
<p>4&#215;4: encode_decision( 21, 1 ); encode_decision( 22, 1 );  encode_decision( 23, 0 );</p>
<p>As can be seen, this is clearly like a Huffman tree.  Wouldn&#8217;t it be nice if we could represent this in the form of an actual tree data structure instead of code?  On2 thought so &#8212; they designed a simple system in VP8 that allowed <strong>all</strong> binarization schemes in the entire format to be represented as simple tree data structures.  This greatly reduces the complexity &#8212; not speed-wise, but implementation-wise &#8212; of the entropy coder.  Personally, I quite like it.</p>
<p><strong>5.  The inverse transform ordering.</strong></p>
<p>I should at some point write a post about common mistakes made in video formats that <strong>everyone keeps making</strong>.  These are not issues that are patent worries or huge issues for compression &#8212; just stupid mistakes that are repeatedly made in new video formats, probably because someone just never asked the guy next to him &#8220;does this look stupid?&#8221; before sticking it in the spec.</p>
<p>One common mistake is the problem of transform ordering.  Every sane 2D transform is &#8220;separable&#8221; &#8212; that is, it can be done by doing a 1D transform vertically and doing the 1D transform again horizontally (or vice versa).  The original iDCT as used in JPEG, H.263, and MPEG-1/2/4 was an &#8220;idealized&#8221; iDCT &#8212; nobody had to use the exact same iDCT, theirs just had to give very close results to a reference implementation.  This ended up resulting in a <a href="http://guru.multimedia.cx/the-mpeg124-and-h26123-idct/" target="_blank">lot of practical problems</a>.  It was also slow; the only way to get an accurate enough iDCT was to do all the intermediate math in 32-bit.</p>
<p>Practically every modern format, accordingly, has specified an exact iDCT.  This includes H.264, VC-1, RV40, Theora, VP8, and many more.  Of course, with an exact iDCT comes an exact ordering &#8212; while the &#8220;real&#8221; iDCT can be done in any order, an exact iDCT usually requires an exact order.  That is, it specifies horizontal and then vertical, or vertical and then horizontal.</p>
<p>All of these transforms end up being implemented in SIMD.  In SIMD, a vertical transform is generally the only option, so a transpose is added to the process instead of doing a horizontal transform.  Accordingly, there are two ways to do it:</p>
<p>1.  Transpose, vertical transform, transpose, vertical transform.</p>
<p>2.  Vertical transform, transpose, vertical transform, transpose.</p>
<p>These may seem to be equally good, but there&#8217;s one catch &#8212; if the transpose is done first, it can be completely eliminated by merging it into the coefficient decoding process.  On many modern CPUs, particularly x86, transposes are very expensive, so eliminating one of the two gives a pretty significant speed benefit.</p>
<p>H.264 did it way 1).</p>
<p>VC-1 did it way 1).</p>
<p>Theora (inherited from VP3) did it way 1).</p>
<p>But no.  VP8 has to do it way 2), where you can&#8217;t eliminate the transpose.  Bah.  It&#8217;s not a huge deal; probably only ~1-2% overall at most speed-wise, but it&#8217;s just a needless waste.  What really bugs me is that VP3 got it right &#8212; why in the world did they screw it up this time around if they got it right beforehand?</p>
<p>RV40 is the other modern format I know that made this mistake.</p>
<p>(NB: You can do transforms without a transpose, but it&#8217;s generally not worth it unless the intermediate needs 32-bit math, as in the case of the &#8220;real&#8221; iDCT.)</p>
<p><strong>6.  Not supporting interlacing</strong>.</p>
<p><strong>THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU.</strong></p>
<p>Interlacing was the scourge of H.264.  It weaseled its way into every nook and cranny of the spec, making every decoder a thousand lines longer.  H.264 even included a highly complicated &#8212; and effective &#8212; dedicated interlaced coding scheme, MBAFF.  The mere existence of MBAFF, despite its usefulness for broadcasters and others still stuck in the analog age with their 1080i, 576i , and 480i content, was a blight upon the video format.</p>
<p>VP8 has once and for all avoided it.</p>
<p>And if anyone suggests adding interlaced support to the experimental VP8 branch, find a straightjacket and padded cell for them before they cause any real damage.</p>
]]></content:encoded>
			<wfw:commentRss>http://x264dev.multimedia.cx/archives/486/feed</wfw:commentRss>
		<slash:comments>52</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.386 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-05-17 05:38:42 -->

