Video banner

Training a custom Entity Linking model with spaCy 

If your NLP project involves disambiguating textual mentions to different meanings (linked to unique IDs), this new video tutorial is for you! I use spaCy, an open-source library for advanced Natual Language Processing in Python, to implement and train a custom Entity Linking (EL) model. I showcase the functionality on an example use-case of disambiguating mentions of the person “Emerson” to unique identifiers in WikiData. I accomplish this by first annotating some data with our tool Prodigy, and then training a machine learning model from scratch. Near the end of the video, I show how to use the trained model on unseen text and evaluate the performance.

In summary, these are the steps to succesfully implement Entity Linking:

  • Named Entity Recognition to recognize the textual entities
  • Create a custom Knowledge Base (KB) that holds information about unique identifiers and likely aliases
  • Annotate some training text where you manually perform the disambiguation of mentions to their correct KB identifiers
    • Train a new Entity Linking component on your training data
    • Test its performance on a held-out test dataset

Hope you have fun implementing Entity Linking with spaCy!

→ Video: Youtube

→  Code (spaCy v2): Github

→  Code (spaCy v3): Github

→  Blog post: LinkedIn