• In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized...
    8 KB (858 words) - 09:48, 14 November 2024
  • Thumbnail for Parallel text
    begin being deciphered. Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level...
    12 KB (1,182 words) - 13:40, 27 July 2024
  • Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...
    20 KB (2,335 words) - 01:45, 24 May 2025
  • Habeas corpus (/ˈheɪbiəs ˈkɔːrpəs/ ; from Medieval Latin, lit. 'you should have the body') is a legal procedure by which a report can be made to a court...
    76 KB (9,363 words) - 12:59, 25 May 2025
  • The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between...
    3 KB (230 words) - 02:09, 26 March 2025
  • Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected. Text corpora are used by both AI...
    23 KB (2,470 words) - 10:37, 24 May 2025
  • Thumbnail for Electronic Text Corpus of Sumerian Literature
    The Electronic Text Corpus of Sumerian Literature (ETCSL) is an online digital library of texts and translations of Sumerian literature that was created...
    4 KB (368 words) - 11:40, 17 March 2024
  • The Neo-Assyrian Text Corpus Project is an international scholarly project aimed at collecting and publishing ancient Assyrian texts of the Neo-Assyrian...
    10 KB (117 words) - 00:24, 25 February 2025
  • Thumbnail for Brown Corpus
    University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American...
    11 KB (1,270 words) - 02:43, 26 March 2025
  • British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...
    31 KB (3,894 words) - 01:18, 14 June 2024
  • The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
    4 KB (348 words) - 21:01, 11 January 2025
  • Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language...
    27 KB (750 words) - 08:51, 5 May 2025
  • The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains...
    1 KB (132 words) - 18:09, 24 November 2023
  • Thumbnail for Corpus spongiosum (penis)
    is also called the corpus cavernosum urethrae in older texts. The proximal part of the corpus spongiosum is expanded to form the urethral bulb, and lies...
    4 KB (405 words) - 05:31, 2 May 2025
  • 2019, the corpus had grown to 560 million words. As of November 2021, the Corpus of Contemporary American English is composed of 485,202 texts. According...
    9 KB (1,135 words) - 14:04, 24 May 2025
  • Look up corpus, corpora, or corpuses in Wiktionary, the free dictionary. Corpus (plural corpora) is Latin for "body". It may refer to: Text corpus, in linguistics...
    2 KB (317 words) - 00:15, 8 March 2025
  • Thumbnail for Aratta
    "The Electronic Text Corpus of Sumerian Literature". Etcsl.orinst.ox.ac.uk. Retrieved 30 December 2018. "The Electronic Text Corpus of Sumerian Literature"...
    20 KB (2,183 words) - 14:28, 28 April 2025
  • The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English...
    3 KB (349 words) - 01:27, 28 May 2025
  • Corpus (OEC), a massive text corpus that is written in the English language. In total, the texts in the Oxford English Corpus contain more than 2 billion...
    16 KB (872 words) - 06:35, 28 April 2025
  • Thumbnail for Feast of Corpus Christi
    The Feast of Corpus Christi (Ecclesiastical Latin: Dies Sanctissimi Corporis et Sanguinis Domini Iesu Christi, lit. 'Day of the Most Holy Body and Blood...
    48 KB (5,128 words) - 10:18, 15 April 2025
  • The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently...
    5 KB (605 words) - 10:56, 26 January 2025
  • Word list (category Articles lacking in-text citations from December 2023)
    analysis within a given text corpus, and is used in corpus linguistics to investigate genealogies and evolution of languages and texts. A word which appears...
    27 KB (2,849 words) - 03:54, 27 May 2025
  • Thumbnail for Sumerian religion
    Corpus of Sumerian Literature. Archived from the original on 2012-05-15. Retrieved 2010-02-20. "A balbale to Nanna (Nanna B)". Electronic Text Corpus...
    40 KB (4,064 words) - 00:32, 25 May 2025
  • Thumbnail for Corpus Hermeticum
    The Corpus Hermeticum is a collection of 17 Greek writings whose authorship is traditionally attributed to the legendary Hellenistic figure Hermes Trismegistus...
    11 KB (1,200 words) - 13:20, 14 March 2025
  • Thumbnail for Corpus Juris Civilis
    The Corpus Juris (or Iuris) Civilis ("Body of Civil Law") is the modern name for a collection of fundamental works in jurisprudence, enacted from 529 to...
    22 KB (2,736 words) - 20:39, 8 May 2025
  • Thumbnail for Sumerian literature
    Sumerian literature constitutes the earliest known corpus of recorded literature, including the religious writings and other traditional stories maintained...
    9 KB (1,026 words) - 04:33, 26 October 2024
  • Thumbnail for Habeas Corpus Suspension Act (1863)
    The Habeas Corpus Suspension Act, 12 Stat. 755 (1863), entitled An Act relating to Habeas Corpus, and regulating Judicial Proceedings in Certain Cases...
    37 KB (4,826 words) - 15:24, 11 May 2025
  • This is a list of Amarna letters–Text corpus, categorized by: Amarna letters–localities and their rulers. It includes countries, regions, and the cities...
    9 KB (156 words) - 22:24, 24 October 2024
  • canonical measure of the performance of an LLM is its perplexity on a given text corpus. Perplexity measures how well a model predicts the contents of a dataset;...
    114 KB (11,876 words) - 06:36, 29 May 2025
  • (ESA) is a vectoral representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA...
    9 KB (1,036 words) - 19:19, 23 March 2024