Diary Of An x264 Developer

08/24/2009 (7:54 pm)

Announcing ARM support

Filed under: ARM,assembly,GSOC,speed,x264 ::

Thanks to our Google Summer of Code student David Conrad (aka Yuvi), we now have ARM support in x264, along with a significant amount of SIMD acceleration via NEON, available on the Cortex A8 and A9 chips.  Yes, that’s right, x264 can now run on an iPhone.  Total performance increase from the NEON optimizations (so far) is about 280% on default settings.

With low power becoming more important and ARM chips increasing in speed dramatically (multi-core chips are already hitting silicon), being able to do high quality, high-speed realtime video encoding on ARM chips will become more and more important.  Staying ahead of the game as always, x264 will be the premiere encoder on ARM as well.

One situation showing the usefulness of low-power encoding was brought up a month or two ago: a remote-control airplane enthusiast wanted to make his airplane broadcast camera footage over the cell network so that he can remote control it many miles away from his current location.  The cell network is generally low bandwidth, so he needs a high-efficiency video encoder.  But he can’t afford a powerful system; his airplane is already extremely low power and he needs an encoder that is both low-power and low-weight.  The ARM chip is perfect: it uses a fraction of a watt, almost no space, and now, he can run x264 on it.

Special thanks to Mans Rullgard for helping with lots of assembly questions and contributing the NEON deblocking code, originally used in the ffmpeg H.264 decoder.

Want to play with x264 on an ARM?  Get a Beagleboard.

Commits: 1 2 3 4 5 6 7 8 9 10

18 Responses to “Announcing ARM support”

  1. Joshua Haberman Says:

    Another option for playing with ARM at home is SheevaPlug, which has Ethernet instead of USB/audio/video.

  2. Vardas Says:

    ^ I somehow doubt SheevaPlug support for NEON as it comes from Marvell ( ex intel)

  3. Give_me_the_data Says:

    while it appears the Marvell® 88F6281 SoC with Sheeva™ based on kirkwood does not have the SIMD NEON capabilitys.

    there may be versions out there with it included by 3rd partys OC now or in time if you send some time researching it.

    this SheevaPlug Does come with some impressive capabilitys of its own that we AV x264 people might make use use.

    no least the
    Audio and MPEG Transport Stream Interface

    http://www.marvell.com/files/products/embedded_processors/kirkwood/88F6281-004_ver1.pdf

    the TI and ARM Cortex optimization speed imprivements on page 11 of this seems impressive and its better today OC as times passed and clock speeds and 3rd partys adding other Vidio/Audio IP to this core block, lego style plugin SOC blocks have increased since then.

    http://www.arm.com/miscPDFs/23881.pdf

    and you can find some of the developer info here
    http://www.plugcomputer.org/index.php/us/component/search/SheevaPlug?ordering=&searchphrase=all

    SheevaPlug Development Kit README-Rev1.2

    x264 might be multithreading on the available cores, but its a shame theres no way to multi process some parts of an Encode on these or other gigE devices/plugs as a generic option, one day perhaps for fun, someone might try and find a good way sometime.

  4. Christopher Friedt Says:

    Support for iWMMXt would be pretty sweet.

  5. Dark Shikari Says:

    @Chris

    iWMMXt is basically deprecated at this point, so there’s no real point, especially since most ARM chips that support it are way too slow for video encoding anyways…

  6. Chris Templeton Says:

    Is the an ARMv5 version in the works?

  7. Dark Shikari Says:

    @Chris

    Probably not, the ARM5 is way too slow to do serious video encoding.

  8. Chris Templeton Says:

    what would it take to port the current armv6 asm code? (to get working on arm5) my app is low frame rate, so it’s ok if it doesn’t perform great.

  9. yuvi Says:

    armv5 has no simd, so there is no significant advantage to be gained from asm: the only advantage would be fixing compiler stupidities.

    The existing v6/neon asm functions don’t really help in writing v5 asm; you’d have to rewrite most everything since the key simd instructions aren’t there.

  10. Ad Says:

    Hi,

    I was wondering if you could give me pointers on where/how to get started in compiling and using baseline x264 for a very slow ARM7TDMI. I know it is a bad MCU option for video. I am trying to do this for a purely experimental purpose as well as research purposes. I intend to play with CIF image sizes.

    Thanks a lot for your time,
    Ad.

  11. Dark Shikari Says:

    @Ad

    If you’re looking for assistance, drop by IRC (#x264 or #x264dev on Freenode). Talk to Yuvi, he can likely help you with ARM-related issues.

  12. CocoBongo Says:

    Hi!

    I read about the performance boost and that 280% is enticing, but can we get some actual numbers? Like what frame rates can we expect on different frame sizes…I have no way right now to try it for myslef, but I would really like to know this!

    Thanks!

  13. Dark Shikari Says:

    @CocoBongo

    You can get roughly ~32fps on absolute fastest settings (constant QP, –preset ultrafast) with CIF resolution video on a 500mhz Cortex A8.

    The A9 is a lot faster and the clock speed will rise as well, so VGA encoding at ~15fps isn’t out of the question in the near future.

  14. pip Says:

    http://www.dailywireless.org/2010/02/15/mwc-2010-really-big-show/

    has some news on ARM Cortex-A9 MPCore 1.2GHz dual chips coming to a mobile phone soon.

    ST-Ericsson’s U8500 platform
    http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20100215005149&newsLang=en
    not to sure about the Mali-400™ graphic processor capability’s as regards high profile though!

    anyone got one to try and report its HP@L4.0 abilitys etc.

  15. pip Says:

    in other news people might find a good use for this too while on the move.

    http://www.theregister.co.uk/2010/02/15/wi_fi_sim/

    a wifi SIM installed in your A9 cluster of mobile phones and PMP encoding an x264 job or two , just a clustered x264 patch or two away in the future perhaps ;)

  16. analyzer Says:

    The 280% improvement from Neon is impressive. This is a relative performance number. Do you have any absolute performance numbers for given A8/A9-based hardware platforms?

  17. RIM Says:

    hmm, i wonder if RIM will provide for free some playbook’s to x264 Arm dev’s before their retail release next quarter ? :)

    http://www.engadget.com/2010/09/27/rim-introduces-playbook-the-blackberry-tablet/

    “a Cortex A9-based, dual-core 1GHz CPU (the company calls it the “fastest tablet ever”

    7-inch LCD, 1024 x 600, WSVGA, capacitive touch screen with full multi-touch and gesture support

    BlackBerry Tablet OS with support for symmetric multiprocessing

    1 GHz dual-core processor

    1 GB RAM

    Dual HD cameras (3 MP front facing, 5 MP rear facing), supports 1080p HD video recording

    Video playback: 1080p HD Video, H.264, MPEG, DivX, WMV

    Audio playback: MP3, AAC, WMA

    HDMI video output

    Wi-Fi – 802.11 a/b/g/n

    Bluetooth 2.1 + EDR

    Connectors: microHDMI, microUSB, charging contacts

    Open, flexible application platform with support for WebKit/HTML-5, Adobe Flash Player 10.1, Adobe Mobile AIR, Adobe Reader, POSIX, OpenGL, Java

    Ultra thin and portable:

    Measures 5.1″x7.6″x0.4″ (130mm x 193mm x 10mm)

    Weighs less than a pound (approximately 0.9 lb or 400g)

    RIM intends to also offer 3G and 4G models in the future.”

    also is there any benefit for ARM x264 div’ to check for these types of memcpy speed improvements , or doesnt it effect the the x264 ARM codebase ? , on the face of it given the charts he seems to get a lot of extra cycles back in testing

    http://projects.powerdeveloper.org/project/imx515/795

    see While on the subject of memcpy..
    posted by martin krastev on 23rd September 2010 entry.

  18. Ali Mirtar Says:

    Hi,

    I am working on power consumption of x264 on beagle board. I need an information about Core-A8 utilization and Core-A8 NEON utilization to study power consumption.

    Do you have any data about these two utilization when running x264?

    Thanks

    I highly appreciate if you can email me with your answer.

Leave a Reply