A Method for Creating Structural Models of Text Documents Using Neural Networks

Authors

  • Dmitriy V. Berezkin Author
  • Ilya A. Kozlov Author
  • Polina A. Martynyuk Author
  • Artyom M. Panfilkin Author

Abstract

The article describes modern neural network BERT-based models and considers their application for Natural Language Processing tasks such as question answering and named entity recognition. The article presents a method for solving the problem of automatically creating structural models of text documents. The proposed method is hybrid and is based on jointly utilizing several NLP models. The method builds a structural model of a document by extracting sentences that correspond to various aspects of the document. Information extraction is performed by using the BERT Question Answering model with questions that are prepared separately for each aspect. The answers are filtered via the BERT Named Entity Recognition model and used to generate the contents of each field of the structural model. The article proposes two algorithms for field content generation: Exclusive answer choosing algorithm and Generalizing answer forming algorithm, that are used for short and voluminous fields respectively. The article also describes the software implementation of the proposed method and discusses the results of experiments conducted to evaluate the quality of the method.

Author Biographies

  • Dmitriy V. Berezkin
    к.т.н., доцент, кафедра "Компьютерные системы и сети"
  • Ilya A. Kozlov
    магистр, младший научный сотрудник, научно-учебный комплекс "Информатика и системы управления"
  • Polina A. Martynyuk
    магистрант, кафедра "Компьютерные системы и сети"
  • Artyom M. Panfilkin
    магистрант, кафедра "Компьютерные системы и сети"

Published

2023-03-23

Issue

Section

Informatics, Computers and Control