Diary Of An x264 Developer

01/17/2010 (10:40 pm)

What’s coming next in x264 development

As seen in the previous post, a whole category of x264 improvements has now been completed and committed.  So, many have asked–what’s next?  Many major features have, from the perspective of users, seemingly come out of nowhere, so hopefully this post will give a better idea as to what’s coming next.

Big thanks to everyone who’s been helping with a lot of the changes outside of the core encoder.  This, by the way, is one of the easiest ways to get involved in x264 development; while learning how the encoder internals work is not necessarily that difficult, understanding enough to contribute useful changes often takes a lot of effort, especially since by and large the existing developers have already eliminated most of the low-hanging fruit.  By comparison, one can work on x264cli or parts of the libx264 outside the analysis sections with significantly less domain knowledge.  This doesn’t mean the work is any less difficult–only that it has a lower barrier to entry.

For most specific examples given below, I’ve put down an estimated time scale as an exercise in project estimation.  This is no guarantee as to when it will be done; just a wild guess by me.  Though it might serve as a personal motivator for the tasks that I’ve assigned to myself.  Don’t harass any of the other developers based on my bad guesses ;)

Do also note that even though projects have time scales doesn’t even necessarily mean that they will be finished at all: not everything that we plan ends up happening.  Many features end up sitting on the TODO list for months or even years before someone decides that it’s important enough to implement.  If your company is particularly interested in one of these features, I might be able to offer you a contract to make sure it gets done much sooner and in a way that best fits your use-case; contact me via email.

Category 1:  Algorithmic improvements

First, let me digress for a moment on the difference between algorithm and feature-related improvements.  When I refer to a feature in x264, I primarily mean something in the H.264 spec.  Most importantly, it’s a self-contained concept: it’s “X” as opposed to “a way of doing X”. It doesn’t strictly have to be part of the spec: periodic intra refresh is a feature because its primary purpose is to serve a particular use case, as opposed to improving speed or compression in the general case.

An algorithm can be part of a feature: for example, Weighted P-frame Prediction was a feature, but it also involved an algorithm: the weight-calculation code written by Dylan.  Of course, some algorithms don’t involve any particular feature: Psy-RD, for example, is simply a better method of picking modes, and could be applied in basically any encoder or use-case.

Algorithmic improvements therefore are changes that improve speed, compression, or both, but don’t add a new feature.  Various algorithmic improvements are upcoming, though most are things that I notice on the spur of the moment and simply implement on the spot, so it’s a bit hard to predict exactly what will be coming.  One recent algorithmic improvement is this change to weightp motion search, which added some shortcuts to speed up searching of duplicate reference frames.

One big potential improvement may come out of a patch by Loren Merritt which stores chroma data interleaved, resulting in a single plane of UVUVUV… instead of one plane of U and one plane of V (also known as NV12 colorspace).  This may significantly speed up deblocking, chroma interpolation, and chroma motion estimation.

Category 2: x264 the frontend

One of our long-term goals that is beginning to come into focus is making the x264 commandline interface into a general-purpose encoding tool that handles video, audio, muxing, sync, and so forth.  The recent LAVF+FFMS input patch was the first step towards this goal; many more will follow.  Some features that we intend to have, many of which are in development now:

1.  Resizing support (plus cropping, padding, etc) using libswscale.
(Time scale: 3-5 weeks)

2.  Intelligent –device support: dozens of device compatibility options to pick from, with the ability to combine multiple devices.  This will be quite a bit more powerful than any current GUI or x264 frontend.  Here’s an example:

Let’s say we have a 640×480 video that we want to encode.
–device ipod will encode a 320×240 stream that is iPod compatible.
–device psp will encode a 364×272 stream that is PSP compatible.
–device ipod,psp will encode a 364×272 stream that is both iPod and PSP compatible.

What happened?  The iPod has an optimal resolution–320×240, the resolution of its screen.  But it also has a maximum resolution, 640×480.  So if we’re encoding for both the iPod and the PSP, we should use a resolution as close to the PSP optimal (480×272) as we can without exceeding the iPod’s max.
(Time scale: 3-5 weeks)

3.  Audio encoding support through libavcodec, probably using libvorbis (for MKV) and ffmpeg aac (for MP4).
(Time scale: 2-4 months)

4.  IVTC and deinterlacing through decomb.
(Time scale: 4-6 months)

5.  An open source, DVB-compatible TS muxer (thanks to Kieran Kunhya).
(Time scale: 1-3 months)

6.  Ability to specify multiple –tune options.
(Time scale: 1 week)

Category 3: Feature improvements

Despite x264 recently getting a lot of new features, we’re nowhere near done.  Here’s some of what’s to come:

1.  CBR and VBR NAL-HRD with VFR support.  Once we get this, x264 will officially support Blu-ray encoding!  Thanks to Kieran Kunhya, Alex Giladi, and Lamont Alston for working on this.
(Time scale: 1-3 months)

2.  Open-GOP support.  Thanks to Lamont Alston for working on this.
(Time scale: 1-3 months)

3.  VFR ratecontrol support: reliably set VBV and ABR/2-pass modes even with VFR input.
(Time scale: 1-2 months)

4.  Various API enhancements.  libx264 will be getting various new API functions that calling applications may find useful: in particular, I plan to move presets, tunes, devices, and profiles into the API, though not as normal commandline options.  For example, the caller will be able to specify x264_param_default( param, “fast”, “film,fastdecode” ) or similar.  If you need a particular API call for your use-case, contact us on #x264dev IRC on Freenode and we’ll be happy to consider it.
(Time scale: 1-3 months)

5.  disable_deblock_idc=2 support, aka “disable deblocking on slice edges”.  This allows greater decoder and encoder parallelism when using slice-based threading, even further lowering latency.
(Time scale: 2-3 months)

6.  Ratecontrol reconfiguration support: change the VBV, CRF, and bitrate settings on the fly!  Perfect for adaptive streaming to iPhone or similar.
(Time scale: 2-3 months)

38 Responses to “What’s coming next in x264 development”

  1. julius666 Says:

    Wow! Very impressive list (although I doubt I’ll ever use audio encoding support for example).
    But where is aq-mode 4? Is it ready to commit?

  2. Dark Shikari Says:

    @Julius

    AQ mode 4 would be an algorithmic improvement, and if it’s committed, it will probably simply overwrite the current AQ mode 2. No need to keep multiple experimental AQ modes around.

  3. Steinar H. Gunderson Says:

    I see none of these really touch on interlaced encoding (except, well, deinterlacing :-) )… Is that too obscure to be interesting?

    If nothing else, I hope the BFF/TFF signalling from the various NAL-HRD patches will be pulled in. :-)

    /* Steinar */

  4. Bruce Says:

    That is a very exciting list, I just wish I had the skills to contribute. ;)

    I’m curious, though, what do you mean by “general-purpose encoding tool?” Would that be a base GUI, or more of a stand-alone application? Or am I misunderstanding? Either way, it would be very cool to see, but seems like a lot to take on, in addition to everything else being planned.

  5. Dark Shikari Says:

    @Steinar

    TFF/BFF/pulldown will be part of NAL HRD.

    @bruce

    It means that x264 will serve the same sort of role as ffmpeg, albeit simpler and with more limited output options (H.264 only for example, obviously). Making a GUI on top of it will become trivial, since the GUI would literally only be an interface, as opposed to having to handle all the aspect of the encoding process.

  6. Bruce Says:

    DS, thanks for the response. I look forward to seeing what comes of your efforts in this regard.

  7. Carl Eugen Hoyos Says:

    Shouldn’t the DVB-compatible TS muxer be implemented in libavformat (i.e. the existing one extended and/or fixed)?

  8. Dark Shikari Says:

    @Carl

    Libavformat already has a pretty good TS muxer. We’re getting one in x264 separately for a few reasons:

    1) The TS muxer will need some pretty detailed integration with x264′s internal VBV model; something that libavformat, as far as I know, doesn’t have the API to do.

    2) Kieran wanted to write one. This is also why we have an FLV muxer for seemingly no good reason ;)

  9. Cogman Says:

    Sounds pretty interesting. I’m especially interested in the “all in one” aspect of the encoder. How are you planning to implement your audio encoding section? Will it be an AAC encoder, or some other special sauce?

    Audio encoding is difficult IMO because, unlike video, it is much harder to nail down what sounds good. (What I mean by that, is with a video, it is easier for anyone to compare two videos frame by frame, zoom in, see the differences, ect. But with audio, it is pretty much purely preference.)

    Either way, sounds exciting. Hopefully MPEG-5 isn’t released before these features are implemented :P .

  10. Dark Shikari Says:

    @Cogman

    As in the article, we’ll support Vorbis and AAC using existing libraries; we’re not going to reimplement our own audio encoder ;)

  11. Steinar H. Gunderson Says:

    @Dark Shikari: Sure, and that’s good, but I’m still waiting for, say, weighted P-frames for interlaced encoding, or complete MBAFF support.

  12. Dark Shikari Says:

    @Steinar

    Weightp for interlaced is waiting on Dylan, the author of weightp, who has that (plus chroma weightp plus K-means analysis) on his list. I don’t know any ETA though.

    Complete MBAFF support… I may put that on the plate for Summer of Code. Do note that there is at least $7500 at stake, potentially 2-3 times that, for anyone who can implement full MBAFF and get it committed.

  13. n Says:

    How about DVB-T output support via VGA port?
    http://bellard.org/dvbt/
    http://en.wikipedia.org/wiki/DVB-T
    new VGA port configuration way (xrandr instead of xorg.conf):
    http://bk.gnarf.org/creativity/vgasig/html/#SECTION00056000000000000000

  14. n Says:

    also I think algorithm considering luv/lab color space may be great.

  15. John Says:

    libavcodec encodes AAC? That’s news; I thought the built-in one was highly experimental (maybe non-compliant, certainly low quality), and that normally it just wrapped around FAAC (mediocre quality, license issues?)

    I wonder if any of these two can beat LAME MP3 at the same bitrate.

    It sucks that there are no good free AAC encoders (well, neroaacenc is “free” but not the free you care about).

  16. Dark Shikari Says:

    @John

    ffmpeg’s AAC is much better now, though still a bit worse than FAAC, which is a tad unfortunate. We’re hoping it will get better, but that’s why we intend to default to libvorbis if possible.

  17. mc Says:

    Hi,

    What about AC3 audio? Are there plans for linking against liba52 or maybe using something else for AC3?

  18. Igor Says:

    Does anybody think that it’s late to develope open source AAC encoder (unfortunately)?
    The development of x264 was started in right time (after approval of H.264 standard).

    First starndard of AAC was in 1997. Untill now there is no competetive OSS AAC encoder that would be even any close to commercial encoders.

    As it was demonstrated OSS has an important role in standarization. Xvid(ASP), LAME(MP3) and of course x264(H.264).
    That can explain why AAC has hard time struggling with wide MP3 standard.

    Also OSS AAC encoder could rise the quality as LAME did http://www.hydrogenaudio.org/forums/index.php?showtopic=58724
    New HE-AAC encoder shows that there is still BIG room for improvements.

    But it will requires at least 2-3 years from now. And it actually may be happen that nobody will care about improvement of audio quality as it’s only ~10-15% of whole video+audio bitrate.

    And MPEG is already also working on new standard of audio coding. USAC.

  19. Igor Says:

    When I’ve mentioned new HE-AAC encoder I mean Apple encoder.

    And MPEG surround seems to be more attractive as it reduces bitrates much.

  20. n Says:

    @mc

    liba52 is decoder. but AC3 open source encoder is available here.
    http://aften.sourceforge.net/

  21. emma Says:

    hi, you did a lot of work to improve the subjective quanlity recently, and now I have a question about how to evaluate the subjective quanlity of a sequence.
    We alwayes use psnr or ssim to evaluate the objective quanlity of a sequence, but sometimes it looks worse when its psnr/ssim is higher. How can we evaluate a sequence more efficiently, not just see it by ourselves.

  22. Dark Shikari Says:

    @emma

    If there was a good way to do that, we’d be all over it ;)

  23. TEB Says:

    AAC or Mpeg1L2, DVB-TS output, IP-Multicast out and UDP inbound would make this the worlds first h264 transcoder/ RT encoder that im sure will crush any Tandber/Sci-Atl RT encoder.. Looking forward to testing it. ;)

  24. GA Says:

    “–device psp will encode a 364×272 stream that is PSP compatible.”
    Typo or bad idea.

    And what happened to quantizer noise shaping? That would drastically improve quality in baseline, almost giving it trellis-like qualities without any of the added complexity of cabac.

  25. Shevach Riabtsev Says:

    What do you think about to develop an utility which converts CABAC stream into CAVLC one and visa versa.
    This utility is actually a transcoder and can’t be a part of x264 but it is required by market.
    Indeed, two main H.264 markets exist:
    1) mobile video and video conferencing where H.264 streams usually are coded in CAVLC mode.
    2) Video broadcast where H.264 are usually coded in CABAC mode.

    Therefore sometimes it is required to convert a stream from video broadcasting market to mobile one.

  26. beyondtheeyes Says:

    @Shevach

    Interesting idea but wouldn’t it also require H.264/AVC profile transcoding ? Mobile video and video conferencing applications are mainly using Baseline profile, Brodcast / IPTV is usually using High profile. Besides, formats (resolution & franerate) are also likely different, meaning full decode / re-encode would be necessary. More advance techniques do exist, including input stream data reuse (GOP structures, motion vectors, etc.) but this needs to be counterbalance with overall architecture design & cost in comparison with full decode / re-encode.

  27. shon3i Says:

    Finaly offical Blu-Ray support. Many thanks for this, aslo OpenGOP feature

  28. pip Says:

    Shevach, if your mobile video device cant handle High Profile CABAC streams then stop buying them or advocating them to others.

    the artificial PR separation of the two markets is not needed Nor wanted today, that’s why x264 always defaults to High Profile, its the Only Option you Need today with all the cheap ARM A8/A9 and other High Profile mobile PMP and related kit coming to market today.

    its simple, if you care about long term visual quality, buy your mobile kit from those that support and play High Profile Today.

  29. beyondtheeyes Says:

    @pip

    I believe H.264/AVC profiles do exist for a bunch of good reasons, including some market needs to use a profile that meets their requirements, one of them being low delay operation modes and low power consumption (or equivalently longer battery life) and not only best effort or offline video playback. However, I do agree that the trend seems to be in favor of High profile. Therefore the pressure in terms of technical specifications put on handled devices for accessing video over the Internet is increasing.

  30. Emanem Says:

    Hi, just be careful that ffmpeg aac support doesn’t work for PS3/PSP.

    My 2 cents,
    Cheers,

  31. PowerGamer Says:

    “1. Resizing support (plus cropping, padding, etc) using libswscale.”

    Assuming x264 gets as input H.264 raw stream demuxed from Bluray, will it be possible to crop non-even amount of pixels from both top and bottom (or left and right) simultaneously (for ex., crop 1080 video by 21 pixels from top and bottom to make it 1038)? Currently to achieve such a crop in Avisynth I use Spline16Resize(1920,1038,0,21,0,-21) because Crop(0,21,0,-21) fails with “YV12 images can only be cropped by even numbers” error.

  32. Nikolay Says:

    How about vfr-output?

    I mean if we have on input 60fps video. I want ability to skip “same” frames on “slowmotion” scenes.

    thanks.

  33. Leon Says:

    Avoid using ffmpeg’s internal vorbis encoder. The quality that it produces is really terrible.

  34. Dark Shikari Says:

    @Leon

    I meant using libvorbis through the ffmpeg interface (as a generic interface). The *actual* “ffmpeg vorbis encoder” is an abortion.

  35. Leon Says:

    @Dark Shikari: Ahh ok, I can now sleep at night :)

    Is it possible to implement the reference AAC-HE encoder from 3GPP for low bitrates or are there legal implications of using it?

    http://www.3gpp.org/ftp/Specs/html-info/26410.htm

    The quality was quite good compared to Nero’s AAC-HE encoder

  36. Dark Shikari Says:

    @Leon

    Usually reference encoders/decoders don’t have GPL-compatible licenses. See FAAC, which uses code from the reference encoder.

  37. Niktesla Says:

    will you link everything, related to sound support to x264.exe, or it’ll be a pack of dlls ?
    What do you think about supporting calls to CT AAC+ dll (the one coming with winamp) ?

  38. Maximilien Noal Says:

    CT AAC encoder beats every other AAC encoder, when you want very good HE-AAC (at like 48 or 64 kbps) sound. (I use it with MeGUI and aac_encplus.exe, it just needs nscrt.dll and enc_aacplus.dll from Winamp v5.11)

    But that’s not an entirely free (as in speech) solution.. :/

    FAAC is still way behind for such encodes.

    Vorbis is a very good alternative option for low-bitrate encodes, assuming you’ll be using aoTuV Vorbis ? (it’s like LAME for the MP3 format). AoTuV Vorbis can even beat CT AAC sometimes (the other times it’s equal to the ear)! =)

    more on aoTuV : http://en.wikipedia.org/wiki/Vorbis

Leave a Reply