GitXplorerGitXplorer
N

NLP_Quickbook

public
574 stars
231 forks
5 issues

Commits

List of commits on branch master.
Unverified
c8d36ef66b515cd3a7738150e334f940f7040ebe

Remove CNAME

NNirantK committed a year ago
Unverified
bbbac77a910ccef343f850cd6f355f16d0ceea86

Renamed

NNirantK committed 3 years ago
Unverified
16ec53c75087a932f20aa408f29e67a1a994feeb

Add todo

NNirantK committed 3 years ago
Unverified
d56b313a706a4831e55defc74a900414e32d5040

Add Coherence draft

NNirantK committed 3 years ago
Verified
754cec6112b9e802a1d38fc9d3f32e275d9cb372

Create CNAME

NNirantK committed 3 years ago
Verified
3a96eb69b937666757f85c320aaa508c13716b5d

Update README.md

NNirantK committed 4 years ago

README

The README file for this repository.

Natural Language Processing Notebooks

Available as a Book: NLP in Python - Quickstart Guide

Written for Practicing Engineers

This work builds on the outstanding work which exists on Natural Language Processing. These range from classics like Jurafsky's Speech and Language Processing to rather modern work in The Deep Learning Book by Ian Goodfellow et al.

While they are great as introductory textbooks for college students - this is intended for practitioners to quickly read, skim, select what is useful and then proceed. There are several notebooks divided into 7 logical themes.

Each section builds on ideas and code from previous notebooks, but you can fill in the gaps mentally and jump directly to what interests you.

Chapter 01

Introduction To Text Processing, with Text Classification

  • Perfect for Getting Started! We learn better with code-first approaches

Chapter 02

  • Text Cleaning notebook, code-first approaches with supporting explanation. Covers some simple ideas like:
    • Stop words removal
    • Lemmatization
  • Spell Correction covers almost everything that you will ever need to get started with spell correction, similar words problems and so on

Chapter 03

Leveraging Linguistics is an important toolkit in any practitioners toolkit. Using spaCy and textacy we look at two interesting challenges and how to tackle them:

  • Redacting names
    • Named Entity Recognition
  • Question and Answer Generation
    • Part of Speech Tagging
    • Dependency Parsing

Chapter 04

Text Representations is about converting text to numerical representations aka vectors

  • Covers popular celebrities: word2vec, fasttext and doc2vec - document similarity using the same
  • Programmer's Guide to gensim

Chapter 05

Modern Methods for Text Classification is simple, exploratory and talks about:

  • Simple Classifiers and How to Optimize Them from scikit-learn
  • How to combine and ensemble them for increased performance
  • Builds intuition for ensembling - so that you can write your own ensembling techniques

Chapter 06

Deep Learning for NLP is less about fancy data modeling, and more engineering for Deep Learning

  • From scratch code tutorial with Text Classification as an example
  • Using PyTorch and torchtext
  • Write our own data loaders, pre-processing, training loop and other utilities

Chapter 07

Building your own Chatbot from scratch in 30 minutes. We use this to explore unsupervised learning and put together several of the ideas we have already seen.

  • simpler, direct problem formulation instead of complicated chatbot tutorials commonly seen
  • intents, responses and templates in chat bot parlance
  • hacking word based similarity engine to work with little to no training samples