A tree of thought
“There is nothing like looking, if you want to find something… You certainly usually find something, if you look, but it is not always quite the something you were after.”
– J.R.R Tolkien
About a year and a half ago, I had an idea: what if we made a graph of how each block of the video referenced other blocks temporally and used this graph to increase quality on blocks which are referenced a lot and lower it on those which are referenced less? Clearly this would greatly improve average quality… but when I thought through it, the problem became messier and messier. I decided to put it off to later. I ended up making it a Google Summer of Code project for 2008, but that student disappeared after a few weeks of relative non-work and a hardly-working initial patch. I mostly forgot about it; it was in the same category as explicit weighted prediction and MBAFF: messy things that might help, but I didn’t want to do. This idea in particular got filed away under the name “MB-tree.”
About a year ago I noticed that Mainconcept’s encoder, though much worse than x264 overall, seemed to have significantly better I-frame quantizer decision than x264 did in a 1000-frame segment of Elephant’s Dream. My guess was it had a simple 2-pass heuristic for I-frame quantizers; low-motion scenes would get higher quality I-frames and high-motion scenes would get lower quality I-frames; it seemed pretty straightforward. The way I decided to do it was via a concept I called “propagation.”
I added a new value to the stats file for ratecontrol: the ratio between the inter and intra cost of a frame. My logic was that this value roughly represented the amount of information propagated across the frame: if the frame costs 70% less as an inter frame than as an intra frame, then about 30% of the frame is new information and 70% is old information. Using this, I set up a simple loop to propagate the effect of the I-frame into future frames and try to guess the optimal I-frame quantizer.
It worked pretty well. Across a large set of test clips, it was neutral or gave improvements most of the time. But it seemed hacky; while part of the formula I was able to derive with a bit of calculus, there was a constant factor in the expression that I had no theoretical justification for and was completely experimental. There were also a few test clips where it made things worse. Then the thought hit me…
… what if we use this for P-frames too? Surely it makes sense that all frames’ quality should depend on how much the information contained within propagates to future frames. But before we go further, let’s jump back in time almost a decade.
Quantizer curve compression, or “qcomp”, is a very old and simple idea. Lower the quality in areas of the video with high complexity and raise the quality in areas with low complexity. The original implementation dates at least back to the original libavcodec ratecontrol. There are three justifications for this:
1. High-complexity scenes generally have high motion, and one is less likely to notice quality loss in a high motion scene, since fine detail is impossible to see in motion anyways.
2. High-complexity scenes are extraordinarily costly bit-wise, so even if it makes quality somewhat worse, it might be worth it to save those bits to use elsewhere.
3. Each frame in a high-complexity scene is not referenced very far into the future since there is a great deal of change between each frame, so even from a PSNR perspective, one should allocate fewer bits to those frames as opposed to frames which are nearly static.
x264 uses qcomp pretty much unchanged except that it performs a gaussian blur over the frame complexities to avoid too much local fluctuation between quantizers.
Clearly, the propagation method is just another way of implementing the basic concept of qcomp: frames whose data doesn’t propagate far are basically high complexity, and frames whose data does propagate far are basically low complexity. As a result, I disabled qcomp when testing this idea. And the tests bore me out: there was a significant improvement across most test clips! But on a few clips, especially anime, there was a very significant loss of quality. Why?
As one might expect, in anime, the vast majority of complexity is usually confined to a small portion of the frame–for example, a character walking across an otherwise-static frame. Furthermore, the sharp lines making up the character are much more “complex” than the static background. Thus, it can appear to the propagation algorithm that a series of frames is complex, when in reality only the character’s motion is complex, while the rest of the frame is static. The algorithm then lowers the quality of all the frames, including the background, greatly decreasing quality, despite the fact that it should only have lowered quality on the moving character instead. If only we could apply this propagation algorithm to individual blocks instead of the whole frame…
… hmm, wait, where have we seen that idea before?
But this time, I decided that I would actually do it. I would write the simplest possible MB-tree, the one that I could hack in as quickly as possible. To avoid having to implement a statsfile, I wrote it in lookahead and used CRF only. I made it only support P-frames–not even I-frames–which I dealt with by making the first frame of my test clip black and setting the keyframe interval to a very large number. The entire thing took me just 80 minutes to write this initial hacky attempt–and yet I had spent a year and a half not writing it! So then I tested it on LosslessTouhou2.mkv… and got over 60% improved SSIM.
The rest, as they say, is history.
“It is the job that is never started that takes longest to finish.”
– J.R.R Tolkien
August 7th, 2009 at 3:45 am
Hi,
After being so active in the field you start to have new ideas regularly (some are good some others no so much). The problem is to have time to try them all and that can only be achieved by creating clones of yourself because those ideas are usually born in a foggy environment (not well defined) and difficult to explain to someone else.
The three justifications that you present are quite valid and I believe that there is a lot of potential to exploit there. The best of all is that those 3 points that you highlight are not codec dependent and that they can be used in future and old codecs.
A couple of months ago I had an idea that I believe that can be combined with ours. Since I don’t have a clone of my self I archived it for future work. Now, it’s time to release it.
In a panning situation (moving the camera to the sides, e.g. stefan), most of the macroblocks that are close to the frame edge that is leaving the frame are not used in the following frames as a reference. This is because those pixels are not present in the following frames and therefore not used for reference any more. The idea is to apply a heavier compression in those boundary blocks that are representing image areas about leave the frame. In order to known which blocks should be subject to this heavier compression I would suggest to use their motion trend assuming that the motion direction and speed is constant. So, if the motion vector is bigger than the block size, that means that those pixels should be out of the frame in the following frame.
Note:In addition to the heavier compression in those blocks, the use of INTRA coding techniques should be used only if strictly necessary otherwise it would be a waste of bits that will not bring any benefit in the future frames.
Can you give me any comments on this idea ?
Thanks
August 7th, 2009 at 9:05 am
@Sandro
MB-tree naturally does this.
August 8th, 2009 at 4:31 pm
“I would write the simplest possible MB-tree, the one that I could hack in as quickly as possible.”
Does that mean than this MBtree can be enhanced to be ever better than now, or you’ve finished your job on it ?
August 8th, 2009 at 6:35 pm
@Wyti
No, I was referring to the original one I wrote that took just over an hour–an incredibly hacked together piece of crap designed for a single purpose: to see if I could get it to work.
After that, I spent a week making it good
. Of course, nothing says it can’t get better.
August 11th, 2009 at 8:34 am
The mbtree files this generates are huge – they can rival low encode bitrates by themselves. They can be easily compressed about 1/3, not sure if that would be worth though.
Also, I miss a lossless preset/tune, since it’s not obvious which settings are good for that case (no bframes, all partitions, etc…)
August 26th, 2009 at 8:57 pm
[...] isn’t a part of this release was the recent enhancements to x264 – the MB-tree enhancement. The Handbrake devs rolled back to a previous version of x264 until more testing can be done and [...]