Back to Search View Original Cite This Article

Abstract

<jats:p>&lt;p&gt;&lt;strong&gt;Context and relevance.&lt;/strong&gt; Modern intelligent systems for processing and analyzing the semantic content of industry texts work with textual information containing thousands of classes of objects and an unlimited number of semantic relationships. All this knowledge about high-tech industries can be implemented in ontological reference books. The need for automated generation of such tools is an urgent task, including to ensure researchers' access to scientific resources. Nevertheless, the manual creation of ontologies for high-tech industries is extremely time-consuming, and the use of &amp;laquo;raw&amp;raquo; Large Language Models (LLM) creates problems of &amp;laquo;hallucinations&amp;raquo; and data heterogeneity. Thus, there is a need to use hybrid methods for automated construction of industry ontologies that combine the accuracy of linguistic analysis with the scalability of LLM. &lt;strong&gt;Objective.&lt;/strong&gt; To investigate the existing methods of creating ontologies and methods of their comparison, to consider the problems that arise in this case, and to propose new methods for the automated creation of industry-specific ontologies of subject areas of science. &lt;strong&gt;Hypothesis.&lt;/strong&gt; The integration of linguistic statistical methods, the method of conceptual phraseological analysis and specialized LLM in a single cycle will increase the completeness and accuracy of automatic extraction of concepts and generic relationships between them from scientific and technical documentation, while reducing the level of &amp;laquo;hallucinations&amp;raquo; and the heterogeneity of the resulting ontologies. &lt;strong&gt;Methods and materials.&lt;/strong&gt; The research is based on the analysis of four groups of methods: Lexical Syntax Patterns (LSP), statistical methods of frequency analysis, Deep Learning methods (including LLM) and hybrid approaches. To verify the proposed approach, an array of 3,000 full-text documents on aviation topics has been collected. &lt;strong&gt;Results.&lt;/strong&gt; As a result of the analysis of existing methods for constructing ontologies, their key disadvantages have been identified. Preliminary studies have shown that the proposed method of automated creation of industry-specific ontological reference books is workable and will provide an opportunity to solve the task with minimal effort. An algorithm for creating industry-specific ontologies has been developed, in which LLMs are used not in isolation, but in conjunction with the rules of linguistic analysis. &lt;strong&gt;Conclusions. &lt;/strong&gt;It is shown that none of the existing classes of methods for automatic ontology construction is universal: linguistic patterns require manual adaptation, statistics are limited in accuracy, and neural network models suffer from &amp;laquo;hallucinations&amp;raquo;. The developed hybrid approach, unlike well-known solutions, allows LLM to be applied not as a &amp;laquo;black box&amp;raquo;, but in conjunction with linguistic rules, providing a closed loop of iterative ontology refinement. The prospects for further work include scaling the approach to other scientific fields and integration with existing knowledge bases. The results of the work will be implemented in the technological process of VINITI RAS to solve the problems of text analysis in various subject areas.&lt;/p&gt;</jats:p>

Show More

Keywords

methods ontologies analysis linguistic automated

Related Articles

PORE

About

Connect