Graph showing the speed of models before and after this PR, showing the boost in speed of this contribution.

Increasing tokenization speed across spaCy’s core languages

One of my first core contributions to the spaCy open-source NLP library!

This PR increases tokenization speed by 2-3 times across languages at the same accuracy by refactoring the regular expressions and replacing `regex` with `re`.

→  Code: Github