“There is nothing like looking, if you want to find something… You certainly usually find something, if you look, but it is not always quite the something you were after.”
– J.R.R Tolkien
About a year and a half ago, I had an idea: what if we made a graph of how each block of the video referenced other blocks temporally and used this graph to increase quality on blocks which are referenced a lot and lower it on those which are referenced less? Clearly this would greatly improve average quality… but when I thought through it, the problem became messier and messier. I decided to put it off to later. I ended up making it a Google Summer of Code project for 2008, but that student disappeared after a few weeks of relative non-work and a hardly-working initial patch. I mostly forgot about it; it was in the same category as explicit weighted prediction and MBAFF: messy things that might help, but I didn’t want to do. This idea in particular got filed away under the name “MB-tree.”
About a year ago I noticed that Mainconcept’s encoder, though much worse than x264 overall, seemed to have significantly better I-frame quantizer decision than x264 did in a 1000-frame segment of Elephant’s Dream. My guess was it had a simple 2-pass heuristic for I-frame quantizers; low-motion scenes would get higher quality I-frames and high-motion scenes would get lower quality I-frames; it seemed pretty straightforward. The way I decided to do it was via a concept I called “propagation.”
I added a new value to the stats file for ratecontrol: the ratio between the inter and intra cost of a frame. My logic was that this value roughly represented the amount of information propagated across the frame: if the frame costs 70% less as an inter frame than as an intra frame, then about 30% of the frame is new information and 70% is old information. Using this, I set up a simple loop to propagate the effect of the I-frame into future frames and try to guess the optimal I-frame quantizer.
It worked pretty well. Across a large set of test clips, it was neutral or gave improvements most of the time. But it seemed hacky; while part of the formula I was able to derive with a bit of calculus, there was a constant factor in the expression that I had no theoretical justification for and was completely experimental. There were also a few test clips where it made things worse. Then the thought hit me…
… what if we use this for P-frames too? Surely it makes sense that all frames’ quality should depend on how much the information contained within propagates to future frames. But before we go further, let’s jump back in time almost a decade.
Quantizer curve compression, or “qcomp”, is a very old and simple idea. Lower the quality in areas of the video with high complexity and raise the quality in areas with low complexity. The original implementation dates at least back to the original libavcodec ratecontrol. There are three justifications for this:
1. High-complexity scenes generally have high motion, and one is less likely to notice quality loss in a high motion scene, since fine detail is impossible to see in motion anyways.
2. High-complexity scenes are extraordinarily costly bit-wise, so even if it makes quality somewhat worse, it might be worth it to save those bits to use elsewhere.
3. Each frame in a high-complexity scene is not referenced very far into the future since there is a great deal of change between each frame, so even from a PSNR perspective, one should allocate fewer bits to those frames as opposed to frames which are nearly static.
x264 uses qcomp pretty much unchanged except that it performs a gaussian blur over the frame complexities to avoid too much local fluctuation between quantizers.
Clearly, the propagation method is just another way of implementing the basic concept of qcomp: frames whose data doesn’t propagate far are basically high complexity, and frames whose data does propagate far are basically low complexity. As a result, I disabled qcomp when testing this idea. And the tests bore me out: there was a significant improvement across most test clips! But on a few clips, especially anime, there was a very significant loss of quality. Why?
As one might expect, in anime, the vast majority of complexity is usually confined to a small portion of the frame–for example, a character walking across an otherwise-static frame. Furthermore, the sharp lines making up the character are much more “complex” than the static background. Thus, it can appear to the propagation algorithm that a series of frames is complex, when in reality only the character’s motion is complex, while the rest of the frame is static. The algorithm then lowers the quality of all the frames, including the background, greatly decreasing quality, despite the fact that it should only have lowered quality on the moving character instead. If only we could apply this propagation algorithm to individual blocks instead of the whole frame…
… hmm, wait, where have we seen that idea before?
But this time, I decided that I would actually do it. I would write the simplest possible MB-tree, the one that I could hack in as quickly as possible. To avoid having to implement a statsfile, I wrote it in lookahead and used CRF only. I made it only support P-frames–not even I-frames–which I dealt with by making the first frame of my test clip black and setting the keyframe interval to a very large number. The entire thing took me just 80 minutes to write this initial hacky attempt–and yet I had spent a year and a half not writing it! So then I tested it on LosslessTouhou2.mkv… and got over 60% improved SSIM.
“It is the job that is never started that takes longest to finish.”
– J.R.R Tolkien