Overview
The Apache OpenNLP library is a machine learning based toolkit for processing natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and coreference resolution.
In this instructor-led, live training, participants will learn how to create models for processing text based data using OpenNLP. Sample training data as well customized data sets will be used as the basis for the lab exercises.
By the end of this training, participants will be able to:
- Install and configure OpenNLP
- Download existing models as well as create their own
- Train the models on various sets of sample data
- Integrate OpenNLP with existing Java applications
Audience
- Developers
- Data scientists
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Requirements
- Java programing experience
Course Outline
Introduction to Machine Learning and Natural Language Processing
Installing and Configuring OpenNLP
Overview of OpenNLP’s Library Structure
Downloading Existing Models
Calling the OpenNLP’s APIs
Sentence Detection and Tokenization
Part-of-Speach (POS) Tagging
Phrase Chunking
Parsing
Name Finding
English Coreference
Training the Tools
Creating a Model from Scratch
Extending OpenNLP
Closing remarks