• In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized...
    8 KB (858 words) - 09:48, 14 November 2024
  • Thumbnail for Parallel text
    begin being deciphered. Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level...
    12 KB (1,182 words) - 13:40, 27 July 2024
  • British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...
    31 KB (3,894 words) - 01:18, 14 June 2024
  • Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...
    20 KB (2,335 words) - 16:18, 24 April 2025
  • Thumbnail for Brown Corpus
    University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American...
    11 KB (1,270 words) - 02:43, 26 March 2025
  • The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between...
    3 KB (230 words) - 02:09, 26 March 2025
  • Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected. Text corpora are used by both AI...
    23 KB (2,470 words) - 20:07, 8 March 2025
  • The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
    4 KB (348 words) - 21:01, 11 January 2025
  • Thumbnail for Electronic Text Corpus of Sumerian Literature
    The Electronic Text Corpus of Sumerian Literature (ETCSL) is an online digital library of texts and translations of Sumerian literature that was created...
    4 KB (368 words) - 11:40, 17 March 2024
  • Habeas corpus (/ˈheɪbiəs ˈkɔːrpəs/ ; from Medieval Latin, lit. 'you should have the body') is a legal procedure by which a report can be made to a court...
    76 KB (9,351 words) - 14:03, 11 May 2025
  • Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language...
    27 KB (750 words) - 08:51, 5 May 2025
  • The Neo-Assyrian Text Corpus Project is an international scholarly project aimed at collecting and publishing ancient Assyrian texts of the Neo-Assyrian...
    10 KB (117 words) - 00:24, 25 February 2025
  • The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains...
    1 KB (132 words) - 18:09, 24 November 2023
  • Thumbnail for Corpus spongiosum (penis)
    is also called the corpus cavernosum urethrae in older texts. The proximal part of the corpus spongiosum is expanded to form the urethral bulb, and lies...
    4 KB (405 words) - 05:31, 2 May 2025
  • Look up corpus, corpora, or corpuses in Wiktionary, the free dictionary. Corpus (plural corpora) is Latin for "body". It may refer to: Text corpus, in linguistics...
    2 KB (317 words) - 00:15, 8 March 2025
  • Thumbnail for Quranic Arabic Corpus
    The Quranic Arabic Corpus (Arabic: المدونة القرآنية العربية, romanized: al-modwana al-Qurʾāni al-ʿArabiyya) is an annotated linguistic resource consisting...
    6 KB (599 words) - 01:25, 28 March 2025
  • 2019, the corpus had grown to 560 million words. As of November 2021, the Corpus of Contemporary American English is composed of 485,202 texts. According...
    9 KB (1,135 words) - 05:28, 17 March 2025
  • Corpus (OEC), a massive text corpus that is written in the English language. In total, the texts in the Oxford English Corpus contain more than 2 billion...
    16 KB (872 words) - 06:35, 28 April 2025
  • Thumbnail for Corpus Juris Civilis
    The Corpus Juris (or Iuris) Civilis ("Body of Civil Law") is the modern name for a collection of fundamental works in jurisprudence, enacted from 529 to...
    22 KB (2,736 words) - 20:39, 8 May 2025
  • Word list (category Articles lacking in-text citations from December 2023)
    analysis within a given text corpus, and is used in corpus linguistics to investigate genealogies and evolution of languages and texts. A word which appears...
    27 KB (2,849 words) - 08:23, 25 April 2025
  • transliterations and translations of texts in a given corpus, and many offer supplementary material such as an introduction to the corpus, discussion of its historical...
    10 KB (399 words) - 23:07, 12 May 2024
  • Thumbnail for Aratta
    "The Electronic Text Corpus of Sumerian Literature". Etcsl.orinst.ox.ac.uk. Retrieved 30 December 2018. "The Electronic Text Corpus of Sumerian Literature"...
    20 KB (2,183 words) - 14:28, 28 April 2025
  • Thumbnail for Sumerian religion
    Corpus of Sumerian Literature. Archived from the original on 2012-05-15. Retrieved 2010-02-20. "A balbale to Nanna (Nanna B)". Electronic Text Corpus...
    40 KB (4,064 words) - 15:21, 14 April 2025
  • The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English...
    3 KB (349 words) - 10:55, 26 January 2025
  • Thumbnail for Habeas Corpus Suspension Act (1863)
    The Habeas Corpus Suspension Act, 12 Stat. 755 (1863), entitled An Act relating to Habeas Corpus, and regulating Judicial Proceedings in Certain Cases...
    37 KB (4,826 words) - 15:24, 11 May 2025
  • (ESA) is a vectoral representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA...
    9 KB (1,036 words) - 19:19, 23 March 2024
  • Thumbnail for Sketch Engine
    Sketch Engine (category Corpus linguistics)
    Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language...
    16 KB (1,418 words) - 09:45, 30 April 2025
  • canonical measure of the performance of an LLM is its perplexity on a given text corpus. Perplexity measures how well a model predicts the contents of a dataset;...
    114 KB (11,945 words) - 09:37, 17 May 2025
  • Thumbnail for Corpus callosum
    The corpus callosum (Latin for "tough body"), also callosal commissure, is a wide, thick nerve tract, consisting of a flat bundle of commissural fibers...
    32 KB (3,648 words) - 12:02, 6 February 2025
  • A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other...
    5 KB (474 words) - 15:44, 13 March 2025