For the past few years, various improvements on H.264 have been periodically proposed, ranging from larger transforms to better intra prediction. These finally came together in the JCT-VC meeting this past April, where over two dozen proposals were made for a next-generation video coding standard. Of course, all of these were in very rough-draft form; it will likely take years to filter it down into a usable standard. In the process, they’ll pick the most useful features (hopefully) from each proposal and combine them into something a bit more sane. But, of course, it all has to start somewhere.
A number of features were common: larger block sizes, larger transform sizes, fancier interpolation filters, improved intra prediction schemes, improved motion vector prediction, increased internal bit depth, new entropy coding schemes, and so forth. A lot of these are potentially quite promising and resolve a lot of complaints I’ve had about H.264, so I decided to try out the proposal that appeared the most interesting: the Samsung+BBC proposal (A124), which claims compression improvements of around 40%.
The proposal combines a bouillabaisse of new features, ranging from a 12-tap interpolation filter to 12thpel motion compensation and transforms as large as 64×64. Overall, I would say it’s a good proposal and I don’t doubt their results given the sheer volume of useful features they’ve dumped into it. I was a bit worried about complexity, however, as 12-tap interpolation filters don’t exactly scream “fast”.
I prepared myself for the slowness of an unoptimized encoder implementation, compiled their tool, and started a test encode with their recommended settings.
I waited. The first frame, an I-frame, completed.
I took a nap.
I waited. The second frame, a P-frame, was done.
I played a game of Settlers.
I waited. The third frame, a B-frame, was done.
I worked on a term paper.
I waited. The fourth frame, a B-frame, was done.
After a full 6 hours, 8 frames had encoded. Yes, at this rate, it would take a full two weeks to encode 10 seconds of HD video. On a Core i7. This is not merely slow; this is over 1000 times slower than x264 on “placebo” mode. This is so slow that it is not merely impractical; it is impossible to even test. This encoder is apparently designed for some sort of hypothetical future computer from space. And word from other developers is that the Intel proposal is even slower.
This has led me to suspect that there is a great deal of cheating going on in the H.265 proposals. The goal of the proposals, of course, is to pick the best feature set for the next generation video compression standard. But there is an extra motivation: organizations whose features get accepted get patents on the resulting standard, and thus income. With such large sums of money in the picture, dishonesty becomes all the more profitable.
There is a set of rules, of course, to limit how the proposals can optimize their encoders. If different encoders use different optimization techniques, the results will no longer be comparable — remember, they are trying to compare compression features, not methods of optimizing encoder-side decisions. Thus all encoders are required to use a constant quantizer, specified frame types, and so forth. But there are no limits on how slow an encoder can be or what algorithms it can use.
It would be one thing if the proposed encoder was a mere 10 times slower than the current reference; that would be reasonable, given the low level of optimization and higher complexity of the new standard. But this is beyond ridiculous. With the prize given to whoever can eke out the most PSNR at a given quantizer at the lowest bitrate (with no limits on speed), we’re just going to get an arms race of slow encoders, with every company trying to use the most ridiculous optimizations possible, even if they involve encoding the frame 100,000 times over to choose the optimal parameters. And the end result will be as I encountered here: encoders so slow that they are simply impossible to even test.
Such an arms race certainly does little good in optimizing for reality where we don’t have 30 years to encode an HD movie: a feature that gives great compression improvements is useless if it’s impossible to optimize for in a reasonable amount of time. Certainly once the standard is finalized practical encoders will be written — but it makes no sense to optimize the standard for a use-case that doesn’t exist. And even attempting to “optimize” anything is difficult when encoding a few seconds of video takes weeks.
Update: The people involved have contacted me and insist that there was in fact no cheating going on. This is probably correct; the problem appears to be that the rules that were set out were simply not strict enough, making many changes that I would intuitively consider “cheating” to be perfectly allowed, and thus everyone can do it.
I would like to apologize if I implied that the results weren’t valid; they are — the Samsung-BBC proposal is definitely one of the best, which is why I picked it to test with. It’s just that I think any situation in which it’s impossible to test your own software is unreasonable, and thus the entire situation is an inherently broken one, given the lax rules, slow baseline encoder, and no restrictions on compute time.