Cross-posting from reddit:
The PR has more details, but here are a few ad hoc benchmarks using ripgrep on my M2 mac mini while searching a 5.5GB file.
This one is just a case insensitive search. A case insensitive regex expands to something like (ignoring Unicode)
[Ss][Hh][Ee][Rr]...
, which means that it has multiple literal prefixes. In fact, you can enumerate them! As long as the set is small enough, this is something that the new SIMD acceleration onaarch64
can handle (and has done for a long time onx86-64
):$ time rg-before-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en 3055 real 8.208 user 7.731 sys 0.467 maxmem 5600 MB faults 191 $ time rg-after-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en 3055 real 1.137 user 0.695 sys 0.430 maxmem 5904 MB faults 203
And of course, using multiple literals explicitly also uses this optimization:
$ time rg-before-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en 3804 real 9.055 user 8.580 sys 0.474 maxmem 4912 MB faults 11 $ time rg-after-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en 3804 real 1.121 user 0.697 sys 0.422 maxmem 4832 MB faults 11
And it doesn’t just work for prefixes, it also works for inner literals too:
$ time rg-before-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en 773 real 9.065 user 8.586 sys 0.477 maxmem 6384 MB faults 11 $ time rg-after-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en 773 real 1.124 user 0.702 sys 0.421 maxmem 6784 MB faults 11
If you’re curious about how the SIMD stuff works, you can read my description of Teddy here. I ported this algorithm out of the Hyperscan project several years ago, and it has been one of the killer ingredients for making ripgrep fast in a lot of common cases. But it only worked on
x86-64
. With the rise and popularity ofaarch64
and Apple silicon, I was motivated to port it over. I just recently finished analogous work for thememchr
crate as well.This sounds really great and will probably have quite an impact on a lot of users. So, nice work!