If your NLP project involves disambiguating textual mentions to different meanings (linked to unique IDs), this new video tutorial is for you! I use spaCy, an open-source library for advanced Natual Language Processing in Python, to implement and train a custom Entity Linking (EL) model. I showcase the functionality on an example use-case of disambiguating mentions of the person “Emerson” to unique identifiers in WikiData. I accomplish this by first annotating some data with our tool Prodigy, and then training a machine learning model from scratch. Near the end of the video, I show how to use the trained model on unseen text and evaluate the performance.
In summary, these are the steps to succesfully implement Entity Linking:
- Named Entity Recognition to recognize the textual entities
- Create a custom Knowledge Base (KB) that holds information about unique identifiers and likely aliases
- Annotate some training text where you manually perform the disambiguation of mentions to their correct KB identifiers
- Train a new Entity Linking component on your training data
- Test its performance on a held-out test dataset
Hope you have fun implementing Entity Linking with spaCy!
→ Video: Youtube
→ Code (spaCy v2): Github
→ Code (spaCy v3): Github
→ Blog post: LinkedIn