loading page

Automatic Text Summarization Using Deep Learning Methods
  • Tasmiul Alam Shopnil,
  • Dr Arijit Das,
  • Diganta Saha
Tasmiul Alam Shopnil
Jadavpur University
Dr Arijit Das
Jadavpur University

Corresponding Author:[email protected]

Author Profile
Diganta Saha
Jadavpur University

Abstract

Text document summarization is critical for managing today's vast textual data. This paper presents an approach to text document summarization that does not rely on word embedding techniques. Instead, our method follows a step-by-step process, including sentence segmentation, sentence embedding, K-means clustering, and summary generation. The input text is segmented into individual sentences using an NLP tool such as NLTK's sentence tokenizer. Next, we extract contextual embeddings for each sentence using the Sentence Transformer method. These embeddings capture the meaning of each sentence within the context of the surrounding text. The sentence embeddings are then subjected to K-means clustering. This step enables the creation of clusters that represent semantically related sentences. To generate the summary, depending on how far each sentence is from the cluster centroid, we choose one sentence from each cluster. The sentence with the lowest distance from the centroid is chosen, and the selected sentences are ordered as they appeared in the original text. We implemented the summarizer and evaluated its performance on the DUC 2007 dataset, a collection of news articles with manually crafted summaries by human experts. The results demonstrate that our summarizer produces informative and concise summaries, surpassing a baseline approach that solely extracts top-ranked sentences from the input text. Our work contributes to text document summarization by presenting an alternative approach that does not rely on word embedding techniques. By leveraging sentence segmentation, contextual embeddings, K-means clustering, and centroid-based selection, our method offers a viable solution for generating high-quality summaries. Further research can explore enhancements to our approach and its application in various domains where text summarization is essential.
10 Mar 2024Submitted to TechRxiv
18 Mar 2024Published in TechRxiv