A curious SIMD assembly challenge: the zigzag
Most SIMD assembly functions are implemented in a rather straightforward fashion. An experienced assembly programmer can spend 2 minutes looking at C code and either give a pretty good guess at how one would write SIMD for it–or equally–rule out SIMD as an optimization technique for that code. There might be a nonintuitive approach that’s somewhat better, but one can usually get very good results merely by following the most obvious method.
But in some rare cases there is no “most obvious method”, even for functions that would seem extraordinarily simple. These kind of functions present an unusual situation for the assembly programmer: they find themselves looking at some embarrassingly simple algorithm–one which simply cries out for SIMD–and yet they can’t see an obvious way to do it! So let’s jump into the fray here and look at one of these cases.