Generalization in NLP

Natural language processing is still at a point closer to Eliza than HAL.

The Shallowness of Google Translate points out a number of cases where the algorithms fall short of human-level nuance.

The Gradient has some interesting reading on the brittleness of NLP models and how that’s being addressed. The article NLP’s generalization problem, and how researchers are tackling it names three strategies:

using more inductive bias.
working towards imbuing NLP models with common sense,
working with unseen distributions and unseen tasks.

How to evaluate generalization isn’t entirely clear. One proposal is: “Every paper, together with evaluation on held-out test sets, should evaluate on a novel distribution or on a novel task”

Transfer learning has been very successful in computer vision. Applying pretrained networks based on the Imagenet corpus is now a standard starting point for any image-related task.

A pair of articles describes work to bring transfer learning into the NLP domain. Word embeddings were the first step in this direction. Now, pretrained networks are being built based on language modeling, predicting the next word in a sequence.

Also featuring pretraining in NLP: Efficient Contextualized Representation and Phrase-Based & Neural Unsupervised Machine Translation.

Unsupervised Machine Translation: Build word embeddings & language models (LM) from monolingual data, use LMs to correct naive monolingually-trained models while generating fake parallel data, then iterate entire process. Great start for MT from mono data. https://t.co/4RarBppCuj pic.twitter.com/ckzAM0uuuY
— Reza Zadeh (@Reza_Zadeh) September 10, 2018