Home | Other Centers | Sitemap
Search
C-DAC Pune
   Saranshak (The Summarizer)  
 

Natural Language Based Summarizer

Saranshak is a Natural Language Based Summarizer. In the today's fast pace life, people has hardly any time for going through pages of document in to order to come to some conclusion. People keep abreast of world affairs by listening to new bites. They base investment decision on stock market updates. They even go to movies largely on the basis of reviews they've seen. With summaries, they can make effective decision in less time. This has created the tremendous need for a Summarizer.

Summarizer is the art of abstracting key content from one or more information source.

Summarizer is categoried into two approaches:

  • Extraction Based: which involves selecting original pieces from the source document and concatenating them to yield a shorter text. This approach does little to ensure that the summary is coherent, which can make the text hard to read.
  • Abstraction Based: which paraphrases in more general terms what the text is about.

Salient Features
Currently, this system uses a Concept and Name based information extraction approach.

Saranshak tool

  • Automatically extracts the most relevant sentences from a document
  • Creates a summary of the document from these sentences
  • Uses a set of ranking strategies on sentence and on word level to calculate the relevancy of a sentence to a document.
  • The length of the summary may be set by the user