Diary Of An x264 Developer

05/26/2009 (5:21 pm)

The art of commit messages

Filed under: assembly,development,speed,x264 ::

The commit message is one of the most important tools a developer has: in just a few lines, he can communicate a great deal of information to a great variety of people.  This group includes a vast swath of eager but relatively nontechnical users who merely want to know what was improved in the most recent update.  Additionally, this group includes a number of technical users who may look at the code from time to time and perhaps submit patches intermittently.  This group includes the other developers, who on a larger project may not be entirely aware of everything being worked on.  This group even includes the developer himself, as he will probably not remember today’s change in detail a year from now.

So what can a developer do to make a commit message relatively succint but still satisfy the needs of all of these people?  Let’s take the commit message I wrote for this year’s most significant patch so far, Holger’s overhaul of a large part of x264′s most important assembly code.

Title: Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT

This title describes the patch in a space small enough to display in gitweb without being cut off.  It’s definitely technical, but if you make the title completely non-technical, it will be useless to the developers and technical users, who cannot pinpoint the revision they’re interested in merely by using the non-technical words “vastly faster.”  However, the title does tell the non-technical user one thing: it is faster.  The revision doesn’t affect features or quality, it affects speed.

Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.

This tells the less-technical users about the development process and how it relates to them: the code was primarily optimized for Core 2 and Nehalem CPUs.

16×16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)

This provides two pieces of information: it gives technical users specific information about the most important DSP function in x264, but it also gives less-technical users a sort of scale: using this and the number below, any user should be able to roughly estimate the performance benefit of the patch on their system.

Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.

This continues the explanation: without wasting space on dozens of other numbers, it describes the performance changes in the other mentioned functions.

Overall performance boost is up to ~15% on 64-bit Conroe.

This provides a succint number: the most important thing to a user who only cares “exactly how much do I get out of this change?”

Now, of course, this commit message is not perfect.  In particular, it omits a lot of the most technical information.  My excuse for this would be the fact that x264 is a small project and we only have three people with any hope of understanding the black magic behind a lot of the changes in this patch: it would take pages of documentation to explain the methodology even to an assembly guru.  And indeed it did: Holger wrote an entire Masters’ thesis that primarily discussed this patch and the methods he used to develop the algorithms behind it.

But what if we continued this commit message to its logical conclusion and added the technical information?  It would look something like this:

New mostly-transposeless Hadamard transform algorithm.  Mixes butterflies with sumsubs to avoid the huge stall inherent in a real transpose step.  For SSE2, uses DEINTERLEAVE algorithm for loading with width-16 (only on x86_64 due to register requirements).  For SSSE3, uses pmaddubsw to perform the horizontal transform and limit the unpacks necessary during loading.  For SSE4, uses pblendw to speed up the partial transpose steps.

For DCT/IDCT, just reorder the functions to take advantage of out-of-order execution to minimize stall due to transpose.  A transposeless DCT might be possible too, but that would require using an internally-untransposed DCT, which would require modifying the zigzag functions as well, so this hasn’t been implemented yet.

We don’t have nearly enough room in the commit message to describe the entire algorithm in sufficient detail to reimplement it or even understand it, but we do have enough room to give basic information and leave information uncovered during development for future reinvestigation (the idea of the untransposed DCT).  To some extent, the purpose of a commit message for a sufficiently complicated change should be to inspire curiosity.  Since one clearly cannot expect to explain an enormous patch in such a small space, the message should “hook” technical users into investigating the change further.  Anything beyond this level of detail should really be reserved for comments in the code.

7 Responses to “The art of commit messages”

  1. Logan Capaldo Says:

    Link is broken, should be http://git.videolan.org/?p=x264.git;a=commit;h=2dca5f5413051a26cbba4e20f3c77ff69b694ba3

  2. Dark Shikari Says:


    Oddly enough, it got worse when I tried to fix it, prefixing the link with the local website address even when I edited the HTML manually. I’m just going to blame WordPress.

  3. cb Says:

    Is this Holger Lubitz thesis public?

  4. Denis Fortin Says:

    While i overlooked the commit message (being an infrequent reader of x264 changelog), this entry got me “hooked”. Is Holger’s Masters’ Thesis available somewhere ?

  5. David Pethes Says:

    And I thought that I was smart when I figured out that the output of my 4×4 hadamard assembly can come in reversed row order. Anyway, it inspired my curiosity, thanks ;) Maybe I could ask Holger if he’d give me a peek at his thesis, it sounds like interesting reading.

  6. Dark Shikari Says:

    Ask Holger on #x264 on Freenode; he’ll probably know ;)

  7. Siggy Brentrup Says:

    Last sunday during the Pirates Electional Party here in Oldenburg Germany, Holger allowed me to have a peek at
    the only printed copy; he isn’t allowed to publish it in either form before it has been officially accepted – a step he’s waiting for since 6 months now.

    It got me hooked up immediately (I’m coming from a CA background), expect an interesting reading.

Leave a Reply