Links for 2017-08-12

  • Hyperscan

    a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, yet functions as a standalone library with its own API written in C. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions, as well as matching of regular expressions across streams of data. Hyperscan is typically used in a DPI library stack. Hyperscan began in 2008, and evolved from a commercial closed-source product 2009-2015. First developed at Sensory Networks Incorporated, and later acquired and released as open source software by Intel in October 2015.  Hyperscan is under a 3-clause BSD license. We welcome outside contributors.
    This is really impressive — state of the art in parallel regexp matching has improved quite a lot since I was last looking at it. (via Tony Finch)

    (tags: via:fanf regexps regular-expressions text matching pattern-matching intel open-source bsd c dpi scanning sensory-networks)

This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Posted August 14, 2017 at 01:35 | Permalink

    (Hyperscan project chief architect)

    Note that Hyperscan makes a lot of use of SIMD parallelism and data parallelism but is not really “parallel regex matching” as a purist would understand. Anyone interested in a proper parallel algorithm as a research project should contact us; we are interested in this area but don’t really have the resources to push such a development especially given how difficult it would be to use parallel matching for our core use (network security scanning, primarily). I think there’s a fantastically interesting problem there and it would be fun to work on it.

  2. Posted August 14, 2017 at 08:45 | Permalink

    Thanks for the comment Geoff! You’re right, “parallel regex matching” isn’t the correct term — I meant to refer to the use-case of matching a large set of regular expressions in a single pass, rather than one by one, which it looks like Hyperscan does. Hyperscan looks very impressive and if I was still working in the field I’d be investigating using it right now!