I’ve been researching in Abstractive Summarization for a substantial amount of time now. This series of articles is meant for someone to get started into the field and following key intuitive concepts of the papers presented since the advent of the pointer-generator network [1]. One thing that sets abstractive summarization a little apart from its counterparts in NLP is the heavy usage of RL, optimizing on ROUGE scores.

Summarization is the task of taking a long piece of text and producing a shorter version of the original one, which is relevant, informative and should give the reader a fair idea of the original text without reading it completely. There are two kinds of summarization, namely, extractive and abstractive. Extractive deals with creating a summary by highlighting the key sentences in the text, and then calling these “important” highlighted sentences, a summary. Although these selected sentences don’t encompass the entire article, the attempt here is to select the ones which form the crux of the text. Abstractive summarization, is something that we as humans do, which is reading a long piece of text, understanding the text, and then writing it on our own in clear, concise and coherent language.

Now viewing these tasks from a NLP perspective, Extractive summarization is the task of selecting relevant and important sentences from the original text, and simply combining these together. While Abstractive summarization would involve Natural language understanding to comprehend the text and then natural language generation to produce text from this understanding. Most of pre-neural NLP based summarization focused on Extractive summarization for this was mostly rule based summarization. In the neural NLP era, the focus has now shifted to abstractive summarization, a very active area of research and has a lot of scope for improvement.

But why is Abstractive summarization very difficult in the first place? This is mostly owed to the fact that neural networks are pretty bad at generating text even when context is given. It’s just in recent times with the advent of GPT-2 [2] that they have been getting better, but these are models trained on large corpora of text not aimed at any specific task, although attempts to fine-tune these models onto tasks which require language generation is being done. Secondly, coming down to the nitty-gritties of the task itself, Neural networks when attempting to generate text suffer from repetition( basically getting stuck on generating a word, for example “i had the the the ….” ), it’s very difficult to handle factual details (for example, the original text might say “100000 people turned up for the protests”, and you would want the model to correctly copy this fact ) and finally a fundamental problem in NLP models is that while training them we constrain models to be trained on a vocubulary of 50000-200000 words and since the english language has a million words atleast and infinitely many when you think of derivational morphology, handling out of vocabulary words is a difficult task for neural networks.