woodruffw 9 months ago

Capstone supports an impressive breadth of architectures. However, if all you need is x86/AMD64 decoding and disassembly, there are much higher quality (in terms of accurate decoding) libraries out there.

I wrote a differential fuzzer for x86 decoders a few years ago, and XED and Zydis generally performed far better (in terms of accuracy) than Capstone[1]. And on the Rust side, yaxpeax and iced-x86 perform very admirably.

[1]: https://blog.trailofbits.com/2019/10/31/destroying-x86_64-in...

  • monads 9 months ago

    In my previous job, I've worked on a project that requires disassembling large amounts of x86/amd64 instructions (several billion instructions each running is very common). I've found also that Zydis is much faster than Capstone.

  • meisel 9 months ago

    How is there any discrepancy in accuracy? Isn’t it just a matter of following the spec?

    • woodruffw 9 months ago

      The spec is very large, not particularly well written, and is not “total” (in the sense that AMD64 and IA32e and other x86-64 flavors are all subtly different). There are a lot of ways to get it wrong; even XED (the reference decoder from Intel) has bugs.

      If I remember correct, the Intel SDM alone is over 3000 pages long.

    • saagarjha 9 months ago

      lol, no. For one Capstone has a lot of bugs (it uses some old version of LLVM as its base) but the whole question of how to decode things is complicated because there are a lot of pitfalls and inconsistencies that different disassemblers handle differently. And what the hardware does is a different question entirely: it may not match the spec, or even other processors with the same ISA.

      • xvilka 9 months ago

        It just updated to the nearly latest LLVM, so that argument is void: https://github.com/capstone-engine/capstone/blob/next/docs/c...

        • saagarjha 9 months ago

          I'll believe it when I see it. If I can go a few years without wasting time during a CTF because of an incorrect decode I'll change my tune.

          • woodruffw 9 months ago

            This has been my experience as well. I’ve had to rip Capstone out of more research projects than I care to admit.

  • canucker2016 9 months ago

    Did you mean x86/x64 decoding?

    Looking at the libs, none of them seem to mention ARM64 inst. decoding.

    • woodruffw 9 months ago

      Yep, I meant AMD64, fixed.

jstrieb 9 months ago

Capstone is very useful!

Someone (not me) has also cross-compiled Capstone to WebAssembly so it can be used in client-side browser applications.

https://alexaltea.github.io/capstone.js/

I've used this in a couple of projects to support disassembly in static web apps with no back end.

__alexander 9 months ago

If you find Capstone interesting, check out the Unicorn Engine.

https://github.com/unicorn-engine/unicorn

Also, if anyone is interested in an example of using capstone for basic disassembly and analysis, here is a link to my capstool project.

https://github.com/alexander-hanel/capstool

  • emmanueloga_ 9 months ago

    Right, three related multi-platform and multi-architecture frameworks from the same people:

    * Capstone: disassembly.

    * Keystone: assembler.

    * Unicorn: CPU emulator.

  • the_biot 9 months ago

    Unicorn is fantastic. I used it to emulate an SoC's boot environment to get around a very weird HAL, and it worked perfectly. Awesome tool!

nicolodev 9 months ago

Another good replacement for capstone/keystone based on LLVM is nyxstone https://github.com/emproof-com/nyxstone

  • xvilka 9 months ago

    It's just a wrapper around LLVM. So any project would be forced to ship also the corresponding LLVM version, if it's not present on the system - e.g. for Windows or embedded applications. A bit too much for a simple disassembler. So it's not a direct replacement for Capstone.

  • ashvardanian 9 months ago

    It looks pretty promising! How would you compare the strengths/weaknesses?

    • stuxnot 9 months ago

      Full disclosure, I'm one of the nyxstone developers - so I might be biased.

      In comparison to capstone, nyxstone lacks the features of instruction decomposition and providing read/written registers. In addition, nyxstone directly interfaces with LLVM and thus is expected to be a lot slower than capstone, which uses instruction tables generated by a modified LLVM.

      I want to note here that Nyxstone is intended more as a replacement for Keystone than Capstone. We added the disassembler mainly because we could. Compared to Keystone, nyxstone allows precise definition of target triple and ISA extensions, allows definition of external labels, supports structured output with instruction details (address, bytes, assembly), rejects partial and invalid inputs and rejects instructions not supported by the specific core (for example UMAAL is supported by Cortex-M4, but not by Cortex-M3), and is more up to date. Nyxstone does not require patches in the LLVM source tree, and thus is (I'd argue) more maintainable and easier to keep up to date.

  • saagarjha 9 months ago

    That's basically what Capstone is? Except not vendoring its own LLVM.

    • xvilka 9 months ago

      Capstone doesn't vendor LLVM either. It just contains some pieces of the LLVM-ish infrastructure that were converted from C++ to the pure C and are pretty lean, without any external dependency.