Diary Of An x264 Developer

10/18/2009 (3:04 am)

Open source collaboration done right

Filed under: benchmark,linux,speed,x264 ::

For years I’ve dealt with all sorts of horrific situations when dealing with open source.  Like software modules written by different teams on a badly managed commercial project, different open source projects tend to defensively program around each others’ flaws rather than actually submitting patches to fix them.  There are even entire projects built around providing API wrappers that simplify usage and fix bugs present in the original library.

In many cases people don’t even submit bug reports.  Sometimes they outright patch each others’ libraries–and don’t submit the patches back to the original project.  At best this leads to tons of bugs and security vulnerabilities being overlooked in the original project.  At worst this leads to situations like the Debian OpenSSL fiasco, in which the people patching the code don’t know enough about it to safely work with it (and don’t even talk to the people who do).

But enough ranting–let me talk about a success story.

Some of you may know of the recent drama over BFS (Brain Fuck Scheduler) written by Con Kolivas.  Its primary purpose was to reduce latency for ordinary desktop applications (potentially at the cost of absolute throughput).  Unsurprisingly, someone soon tested x264 with BFS–and the results were absurd.  BFS trashed CFS, the existing kernel scheduler, by enormous margins–up to 80%.  Something was up with these results: if a scheduler A can get 80% better performance than scheduler B on a load as simple as x264, scheduler B must be seriously bugged.  This theory was further bolstered by the fact that BFS is a very simple scheduler while CFS is a very complex one; one of the heuristics in CFS could be causing problems for x264.

So I tentatively submitted a test case to the Linux kernel mailing list.  I didn’t know what to expect; maybe more flames carrying over from the BFS debate?  Instead, I got “Thanks a bunch for the nice repeatable testcase!” This is one of the few times I’ve seen this outside of what I attempt to do with x264: a developer happy to see someone report a bug with his code and apparently eager to jump to fixing it.  Though it certainly sounded good so far, but would anything result from this?

Answer: yes: up to a 70% increase in performance, committed the next day.  But the kernel devs weren’t done yet: a quick grep of Linux kernel mails over the next weeks showed x264 popping up in quite a few scheduler benchmarks: they had added it as a regular test case.  And just recently we got another 10% performance.

The morals of the story?

1.  Talk to upstream.  They know more about it than you do, full stop.  Don’t blindly complain about problems with X or try to fix it yourself: talk to the people who know what they’re doing.  Of course, if that fails, feel free to do it yourself: there are plenty of projects notorious for completely ignoring serious bug reports for years (e.g. GCC).

2.  If you are upstream, listen to bug reports.  “Patches welcome” is only a reasonable doctrine for feature requests, not for bug reports.  A sufficiently good test case for producing a bug should always result in an investigation into the problem by real developers.  I try to make this my doctrine at all times–if anyone reports anything weird with x264, at an absolute minimum I want to know why said weird behavior is occurring.  A large number of bug fixes (and also some algorithmic changes, such as with VBV) result from user issue reports.

3.  If you want x264 to run a lot faster, upgrade your kernel to tip, or at least upgrade on the next release.  You’ll get an enormous benefit with 4 or more cores.

http://saintdevelopment.com/codecs/bfs-vs-cfs.txt

16 Responses to “Open source collaboration done right”

  1. Mathias Says:

    You really are the best! But awesome improvement. I wonder what I get with my dual core. And I also wonder, if there was that high penalty, how did Windows compare to encoding on Linux and how does it compare now? And this is all coming with 2.6.32?

  2. Samuel Says:

    Nice post. I think it is great when people communicate and issues are solved.

  3. Ana Says:

    When the Debian-OpenSSL problem, before doing patching the code, the Debian Developer asked the OpenSSL people, see:
    http://marc.info/?l=openssl-dev&m=114651085826293&w=2

    You have this thread commented at:
    http://lwn.net/Articles/282038/

  4. Dark Shikari Says:

    @Ana

    Ah, didn’t realize that. Either way, sounds like a problem due to lack of proper collaboration, whichever side you want to blame ;)

    @Mathias

    Yes, it was probably slower than Windows. This explains a number of benchmarks I’ve seen recently where Windows trashes Linux at the same applications (when there’s really no good reason for it to do so).

    And yes, it’s all coming in 2.6.32.

  5. saintdev Says:

    I would also like to say that originally when I did the tests this was not intended to be a comparison of BFS vs. CFS. I was actually doing the tests to determine if BFS changed the ideal value to pass to –threads (which lead to some interesting conclusions). I just happened to run the tests first on CFS to get a baseline, and was astonished to find that BFS was much faster. I will be rerunning the tests once 2.6.32 is released.

  6. Pegasus Says:

    How long back this CFS problem goes? If I’m using early 2.6 kernel, say 2.6.9, am I wasting cpu time?

    Right now it seems I just have to test this …

  7. Dark Shikari Says:

    @Pegasus

    We have no idea. CFS is really quite a monstrosity and it’s hard to tell at what point any regression was introduced without testing it explicitly.

  8. saintdev Says:

    @Pegasus

    CFS wasn’t introduced until 2.6.23. 2.6.9 uses an O(1) scheduler that was ok at best. Also, in 2.6.23 CFS was very good, it’s when they started ‘optimizing’ it that things slowly got worse and worse.
    For a good example, look at
    http://ck.kolivas.org/patches/bfs/old/epicmakej4.png
    SD is Staircase Deadline, Con’s old scheduler.
    The tip/master is after they started making changes, I don’t if that included the above commits or not.

  9. james Says:

    I wish the pulseaudio gripers would listen to this advice. Instead we have people talking about building ANOTHER sound library for Linux. Insane.

  10. Soundless Says:

    @james

    .. it requires the developer(s) to listens also, which does not seem to be the case with pulseaudio. See for example digital passthrough (Ticket/167) or the insane amount of CPU consumed for just playing normal audio.

  11. Kevin83 Says:

    I noticed this post and decided to build a custom 2.6.32-rc6 kernel for my ubuntu 9.10 setup to see how much speed changed. To make a very long story short, my average fps went from 26.7 to 40.4 on one source and 18.2 to 30.6 on another. (cpu is an athlon II x2 240)

    Thanks much to the people who helped bring this about.

  12. Anonymous Says:

    @Ana

    If you read that exchange, one of the people he asks also tells him how to do what he’s trying to do correctly, and he ignores this.

  13. harlekyn Says:

    After reading this post and the comments, I tried it myself and compiled 2.6.32-rc6 on Ubuntu Karmic.

    Using Handbrake, I got ~39 FPS instead of ~25 FPS on my Phenom X4 9750. All 4 cores are fully saturated now, before they lingered at around 60%. Good stuff!

  14. Iqbal Qasim Says:

    As someone who used to have the job of accepting bug reports and has since moved on to holding the hands of customers who encounter bugs from their vendors – I wanted to emphasize that creating succinct, reproducible testcases is absolutely mandatory. If you can’t give the developers a reasonable testcase, the chances of the bug getting fixed plummet to near zero.

    It’s not the developer’s fault either – without a testcase or even with a testcase that is too complex there are just too many variables for a human to identify root cause.

    I realize you mentioned the testcase in your example, it just seemed a little bit glossed over for what is the #1 requirement of a good bug report.

  15. Dark Shikari Says:

    @Iqbal

    Absolutely. I will refuse to investigate a bug if I cannot get enough information to construct a test case. Of course, I’ll tell the user what he needs to do to get me that test case–99% of the time, they do.

  16. Foo Says:

    @Anonymous #12: “one of the people he asks also tells him how to do what he’s trying to do correctly” — If you’re referring to Geoff’s advice to use -DPURIFY (linked in the URL field of this comment), that was pretty clearly given tongue-in-cheek. The SSL code has different behavior with and without -DPURIFY, because passing -DPURIFY causes a lot of lines to be simply #ifdef’ed out — which is essentially what the Debian guys ended up doing, as I understand it.

Leave a Reply