Sofie’s Blog
Banner for the ODSC talk

spaCy: A customizable NLP toolkit designed for developers

In this talk, I first give an overview of the built-in functionality available in spaCy, using pretrained supervised models. I showcase how linguistic information such as part-of-speech tags and dependency parses can help you identify interesting patterns or phrases in your documents and ultimately perform document classification or other information retrieval tasks.

In the second part of the talk, I switch gears and showcase how Large Language Models (LLMs) can be integrated into your NLP pipelines. Due to their impressive natural language capabilities, recent LLMs like GPT-4 are paving the way for fast prototyping of NLP applications in any business domain. Most practical use-cases however will benefit from a structured, pipeline approach in which LLMs can be complemented with supervised models or even rule-based approaches. I showcase how to build such pipelines for a realistic business application, using spaCy and its recently published extension ‘spacy-llm’.

Finally, I discuss how to manage different (and often conflicting) performance features such as accuracy, speed, memory usage, reliability, maintainability and customizability of your NLP solutions, and how you can transform a quick prototype into a robust production-ready solution.

→  Venue: ODSC Europe (London, UK)

→  Slides: Speakerdeck