Diary Of An x264 Developer

07/24/2011 (3:28 pm)

Summer of Code (in space)

Filed under: development,SOCIS ::

There’s apparently another Summer of Code in town.  x264 has been accepted into the ESA Summer of Code in Space.  Just like Google Summer of Code, work on x264 over the summer and get paid!  Watch out though; only some countries are allowed, so check first if you’re allowed to participate.  The application deadline is July 27, 11AM (UTC); sorry for the short notice this time around!

04/02/2011 (11:33 pm)

You should apply for x264 Google Summer of Code

Filed under: development,GSOC,x264 ::

Want to do some fun open source work and get paid?  You should apply for GSOC.  Check out our ideas page and the official Google page.

(And yes, I’ll get around to approving the queued comments and writing more real posts.  Eventually!  I promise!)

10/17/2010 (6:50 pm)

How to contribute to open source, for companies

Filed under: development,open source,x264 ::

I have seen many nigh-incomprehensible attempts by companies to contribute to open source projects, including x264.  Developers are often simply boggled, wondering why the companies seem incapable of proper communication.  The companies assume the developers are being unreceptive, while the developers assume the companies are being incompetent, idiotic, or malicious.  Most of this seems to boil down to a basic lack of understanding of how open source works, resulting in a wide variety of misunderstandings.  Accordingly, this post will cover the dos and don’ts of corporate contribution to open source.

Read More…

05/25/2010 (11:01 pm)

Anatomy of an optimization: H.264 deblocking

Filed under: assembly,development,H.264,speed,x264 ::

As mentioned in the previous post, H.264 has an adaptive deblocking filter.  But what exactly does that mean — and more importantly, what does it mean for performance?  And how can we make it as fast as possible?  In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.

H.264′s deblocking filter has two steps: strength calculation and the actual filter.  The first step calculates the parameters for the second step.  The filter runs on all the edges in each macroblock.  That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels.  The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters!).  The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.

Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.

If we’re on the edge between an intra macroblock and any other macroblock: Strength 4
If we’re on an internal edge of an intra macroblock: Strength 3
If either side of a 4-pixel-long edge has residual data: Strength 2
If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same: Strength 1
Otherwise: Strength 0 (no deblocking)

These values are then thrown into a lookup table depending on the quantizer: higher quantizers have stronger deblocking.  Then the actual filter is run with the appropriate parameters.  Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.

Read More…

03/18/2010 (10:29 pm)

Announcing x264 Summer of Code 2010!

Filed under: development,google,GSOC,x264 ::

With the announcement of Google Summer of Code 2010 and the acceptance of our umbrella organization, Videolan, we are proud to announce the third x264 Summer of Code!  After two years of progressively increasing success, we expect this year to be better than ever.  Last year’s successes include ARM support and weighted P-frame prediction.  This year we have a wide variety of projects of varying difficulty, including some old ones and a host of new tasks.  The qualification tasks are tough, so if you want to get involved, the sooner the better!

Interested in getting started?  Check out the wiki page, hop on #x264 on Freenode IRC, and say hi to the gang!  No prior experience or knowledge in video compression necessary: just dedication and the willingness to ask questions and experiment until you figure things out.

01/17/2010 (10:40 pm)

What’s coming next in x264 development

As seen in the previous post, a whole category of x264 improvements has now been completed and committed.  So, many have asked–what’s next?  Many major features have, from the perspective of users, seemingly come out of nowhere, so hopefully this post will give a better idea as to what’s coming next.

Big thanks to everyone who’s been helping with a lot of the changes outside of the core encoder.  This, by the way, is one of the easiest ways to get involved in x264 development; while learning how the encoder internals work is not necessarily that difficult, understanding enough to contribute useful changes often takes a lot of effort, especially since by and large the existing developers have already eliminated most of the low-hanging fruit.  By comparison, one can work on x264cli or parts of the libx264 outside the analysis sections with significantly less domain knowledge.  This doesn’t mean the work is any less difficult–only that it has a lower barrier to entry.

For most specific examples given below, I’ve put down an estimated time scale as an exercise in project estimation.  This is no guarantee as to when it will be done; just a wild guess by me.  Though it might serve as a personal motivator for the tasks that I’ve assigned to myself.  Don’t harass any of the other developers based on my bad guesses ;)

Do also note that even though projects have time scales doesn’t even necessarily mean that they will be finished at all: not everything that we plan ends up happening.  Many features end up sitting on the TODO list for months or even years before someone decides that it’s important enough to implement.  If your company is particularly interested in one of these features, I might be able to offer you a contract to make sure it gets done much sooner and in a way that best fits your use-case; contact me via email.

Read More…

12/06/2009 (1:15 am)

A curious SIMD assembly challenge: the zigzag

Filed under: assembly,development,speed,x264 ::

Most SIMD assembly functions are implemented in a rather straightforward fashion.  An experienced assembly programmer can spend 2 minutes looking at C code and either give a pretty good guess at how one would write SIMD for it–or equally–rule out SIMD as an optimization technique for that code.  There might be a nonintuitive approach that’s somewhat better, but one can usually get very good results merely by following the most obvious method.

But in some rare cases there is no “most obvious method”, even for functions that would seem extraordinarily simple.  These kind of functions present an unusual situation for the assembly programmer: they find themselves looking at some embarrassingly simple algorithm–one which simply cries out for SIMD–and yet they can’t see an obvious way to do it!  So let’s jump into the fray here and look at one of these cases.

Read More…

09/10/2009 (6:36 pm)

iDCT rounding

The quantization process in modern video encoders tends to make a lot of assumptions.  A common one is that of continuity and uniform step size–that, for example, if we are quantizing the value 2.5, both 2 and 3 will give equal distortion, being exactly 0.5 off from the correct value.  But this isn’t always true; in reality, we are working with an 8-bit range in each channel.  The inverse transform has to round our high-precision internal values to a small output range.

Normally, this isn’t a problem.  Since AC coefficients have (by definition) different output values for each output pixel, they serve to effectively dither the output of the iDCT.  But what happens when we don’t have any AC coefficients?

Read More…

08/06/2009 (11:36 pm)

A tree of thought

Filed under: development,GSOC,ratecontrol,x264 ::

“There is nothing like looking, if you want to find something… You certainly usually find something, if you look, but it is not always quite the something you were after.”

– J.R.R Tolkien

About a year and a half ago, I had an idea: what if we made a graph of how each block of the video referenced other blocks temporally and used this graph to increase quality on blocks which are referenced a lot and lower it on those which are referenced less?  Clearly this would greatly improve average quality… but when I thought through it, the problem became messier and messier.  I decided to put it off to later. I ended up making it a Google Summer of Code project for 2008, but that student disappeared after a few weeks of relative non-work and a hardly-working initial patch.  I mostly forgot about it; it was in the same category as explicit weighted prediction and MBAFF: messy things that might help, but I didn’t want to do.  This idea in particular got filed away under the name “MB-tree.”

Read More…

07/16/2009 (8:47 pm)

Cacheline splits, take two

It has been well over a year since the original cacheline-split patch and my subsequent cacheline-split patch for qpel interpolation.  I never implemented it for chroma, despite the potential benefit, because it required four extra registers, something that chroma MC was in serious short supply of.  Furthermore, chroma was only width-8 and width-4, and the lower the width, the lower the percentage of loads which crossed cachelines, so the less the overall possible benefit relative to the overhead of cacheline-split detection.

The cacheline split implementations, as can be seen in the original post, vary greatly, but they all have one thing in common: they perform two aligned loads, one on either side of the split, and then use shifts (or palignr) to merge the data together accordingly.  However, there is another possible trick that can be used here.

Read More…

Next Page »