Humanities Review
en

Słowa znaczące, słowa kluczowe, słowozbiory – o statystycznych metodach wyszukiwania wyrazów istotnych

2016, 60, No. 3

Polska Akademia Nauk, Uniwersytet Pedagogiczny im. Komisji Edukacji Narodowej w Krakowie, Instytut Języka Polskiego PAN, Wydział Filologiczny

DOI

-

Publication date

01.09.2016

Publishing model

open access

License type


Field

arts and humanities

Discipline

philosophy, history, archeology, linguistics, literary studies, culture and religion studies, arts studies, polish studies

Language of publication

Polish

Downloads

PDF 342 KB

Article

Number of views:182

Number of downloads:58

Crossref citations:0

Altmetric score:0


Abstract

This article discusses automatic extraction of relevant words from sets of texts. The author briefly presents three methods aimed to extract the words from the corpus of words with regard to their frequency, or words whose occurrence next to each other is not random. First, he focuses on the keyword analysis method, then he discusses the Zeta method developed by John Burrows and Hugh Craig, and the third method covered in the article is the topic modelling method, which is becoming very popular recently, and consists in finding clusters of words co-occurring in similar contexts. Topic modelling was intended for a quick content search in large collections of documents. On the basis of 100 Polish novels, the article presents how this method can be used for linguistic studies.

Keywords: