Increasing tokenization speed (PR)

1 February 2019

Increasing tokenization speed across spaCy’s core languages

One of my first core contributions to the spaCy open-source NLP library!

This PR increases tokenization speed by 2-3 times across languages at the same accuracy by refactoring the regular expressions and replacing `regex` with `re`.

→ Code: Github