Open source collaboration done right
For years I’ve dealt with all sorts of horrific situations when dealing with open source. Like software modules written by different teams on a badly managed commercial project, different open source projects tend to defensively program around each others’ flaws rather than actually submitting patches to fix them. There are even entire projects built around providing API wrappers that simplify usage and fix bugs present in the original library.
In many cases people don’t even submit bug reports. Sometimes they outright patch each others’ libraries–and don’t submit the patches back to the original project. At best this leads to tons of bugs and security vulnerabilities being overlooked in the original project. At worst this leads to situations like the Debian OpenSSL fiasco, in which the people patching the code don’t know enough about it to safely work with it (and don’t even talk to the people who do).
But enough ranting–let me talk about a success story.
Some of you may know of the recent drama over BFS (Brain Fuck Scheduler) written by Con Kolivas. Its primary purpose was to reduce latency for ordinary desktop applications (potentially at the cost of absolute throughput). Unsurprisingly, someone soon tested x264 with BFS–and the results were absurd. BFS trashed CFS, the existing kernel scheduler, by enormous margins–up to 80%. Something was up with these results: if a scheduler A can get 80% better performance than scheduler B on a load as simple as x264, scheduler B must be seriously bugged. This theory was further bolstered by the fact that BFS is a very simple scheduler while CFS is a very complex one; one of the heuristics in CFS could be causing problems for x264.
So I tentatively submitted a test case to the Linux kernel mailing list. I didn’t know what to expect; maybe more flames carrying over from the BFS debate? Instead, I got “Thanks a bunch for the nice repeatable testcase!” This is one of the few times I’ve seen this outside of what I attempt to do with x264: a developer happy to see someone report a bug with his code and apparently eager to jump to fixing it. Though it certainly sounded good so far, but would anything result from this?
Answer: yes: up to a 70% increase in performance, committed the next day. But the kernel devs weren’t done yet: a quick grep of Linux kernel mails over the next weeks showed x264 popping up in quite a few scheduler benchmarks: they had added it as a regular test case. And just recently we got another 10% performance.
The morals of the story?
1. Talk to upstream. They know more about it than you do, full stop. Don’t blindly complain about problems with X or try to fix it yourself: talk to the people who know what they’re doing. Of course, if that fails, feel free to do it yourself: there are plenty of projects notorious for completely ignoring serious bug reports for years (e.g. GCC).
2. If you are upstream, listen to bug reports. “Patches welcome” is only a reasonable doctrine for feature requests, not for bug reports. A sufficiently good test case for producing a bug should always result in an investigation into the problem by real developers. I try to make this my doctrine at all times–if anyone reports anything weird with x264, at an absolute minimum I want to know why said weird behavior is occurring. A large number of bug fixes (and also some algorithmic changes, such as with VBV) result from user issue reports.
3. If you want x264 to run a lot faster, upgrade your kernel to tip, or at least upgrade on the next release. You’ll get an enormous benefit with 4 or more cores.
October 18th, 2009 at 4:35 am
You really are the best! But awesome improvement. I wonder what I get with my dual core. And I also wonder, if there was that high penalty, how did Windows compare to encoding on Linux and how does it compare now? And this is all coming with 2.6.32?
October 18th, 2009 at 5:33 am
Nice post. I think it is great when people communicate and issues are solved.
October 18th, 2009 at 9:48 am
When the Debian-OpenSSL problem, before doing patching the code, the Debian Developer asked the OpenSSL people, see:
http://marc.info/?l=openssl-dev&m=114651085826293&w=2
You have this thread commented at:
http://lwn.net/Articles/282038/
October 18th, 2009 at 12:25 pm
@Ana
Ah, didn’t realize that. Either way, sounds like a problem due to lack of proper collaboration, whichever side you want to blame
@Mathias
Yes, it was probably slower than Windows. This explains a number of benchmarks I’ve seen recently where Windows trashes Linux at the same applications (when there’s really no good reason for it to do so).
And yes, it’s all coming in 2.6.32.
October 18th, 2009 at 6:34 pm
I would also like to say that originally when I did the tests this was not intended to be a comparison of BFS vs. CFS. I was actually doing the tests to determine if BFS changed the ideal value to pass to –threads (which lead to some interesting conclusions). I just happened to run the tests first on CFS to get a baseline, and was astonished to find that BFS was much faster. I will be rerunning the tests once 2.6.32 is released.
October 19th, 2009 at 12:33 am
How long back this CFS problem goes? If I’m using early 2.6 kernel, say 2.6.9, am I wasting cpu time?
Right now it seems I just have to test this …
October 19th, 2009 at 1:42 am
@Pegasus
We have no idea. CFS is really quite a monstrosity and it’s hard to tell at what point any regression was introduced without testing it explicitly.
October 19th, 2009 at 9:29 am
@Pegasus
CFS wasn’t introduced until 2.6.23. 2.6.9 uses an O(1) scheduler that was ok at best. Also, in 2.6.23 CFS was very good, it’s when they started ‘optimizing’ it that things slowly got worse and worse.
For a good example, look at
http://ck.kolivas.org/patches/bfs/old/epicmakej4.png
SD is Staircase Deadline, Con’s old scheduler.
The tip/master is after they started making changes, I don’t if that included the above commits or not.
October 28th, 2009 at 12:33 pm
I wish the pulseaudio gripers would listen to this advice. Instead we have people talking about building ANOTHER sound library for Linux. Insane.
October 29th, 2009 at 8:43 pm
@james
.. it requires the developer(s) to listens also, which does not seem to be the case with pulseaudio. See for example digital passthrough (Ticket/167) or the insane amount of CPU consumed for just playing normal audio.
November 4th, 2009 at 8:21 pm
I noticed this post and decided to build a custom 2.6.32-rc6 kernel for my ubuntu 9.10 setup to see how much speed changed. To make a very long story short, my average fps went from 26.7 to 40.4 on one source and 18.2 to 30.6 on another. (cpu is an athlon II x2 240)
Thanks much to the people who helped bring this about.
November 9th, 2009 at 9:13 am
@Ana
If you read that exchange, one of the people he asks also tells him how to do what he’s trying to do correctly, and he ignores this.
November 9th, 2009 at 5:02 pm
[...] weiterer Recherche bin ich auf einen Artikel eines x264 Entwicklers gestoßen, der mit dem BFS einen 80%igen Performance zuwachs erreichen konnte. Dieses führte er auf einen Bug zurück, der in dem aktuellen Entwicklungskernel [...]
November 11th, 2009 at 3:08 am
After reading this post and the comments, I tried it myself and compiled 2.6.32-rc6 on Ubuntu Karmic.
Using Handbrake, I got ~39 FPS instead of ~25 FPS on my Phenom X4 9750. All 4 cores are fully saturated now, before they lingered at around 60%. Good stuff!
December 17th, 2009 at 2:26 pm
As someone who used to have the job of accepting bug reports and has since moved on to holding the hands of customers who encounter bugs from their vendors – I wanted to emphasize that creating succinct, reproducible testcases is absolutely mandatory. If you can’t give the developers a reasonable testcase, the chances of the bug getting fixed plummet to near zero.
It’s not the developer’s fault either – without a testcase or even with a testcase that is too complex there are just too many variables for a human to identify root cause.
I realize you mentioned the testcase in your example, it just seemed a little bit glossed over for what is the #1 requirement of a good bug report.
December 18th, 2009 at 2:41 pm
@Iqbal
Absolutely. I will refuse to investigate a bug if I cannot get enough information to construct a test case. Of course, I’ll tell the user what he needs to do to get me that test case–99% of the time, they do.