In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized...
8 KB (858 words) - 09:48, 14 November 2024
begin being deciphered. Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level...
12 KB (1,182 words) - 13:40, 27 July 2024
British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...
31 KB (3,894 words) - 01:18, 14 June 2024
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...
20 KB (2,335 words) - 16:18, 24 April 2025
University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American...
11 KB (1,270 words) - 02:43, 26 March 2025
The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between...
3 KB (230 words) - 02:09, 26 March 2025
Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected. Text corpora are used by both AI...
23 KB (2,470 words) - 20:07, 8 March 2025
The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
4 KB (348 words) - 21:01, 11 January 2025
The Electronic Text Corpus of Sumerian Literature (ETCSL) is an online digital library of texts and translations of Sumerian literature that was created...
4 KB (368 words) - 11:40, 17 March 2024
Habeas corpus (/ˈheɪbiəs ˈkɔːrpəs/ ; from Medieval Latin, lit. 'you should have the body') is a legal procedure by which a report can be made to a court...
76 KB (9,351 words) - 14:03, 11 May 2025
Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language...
27 KB (750 words) - 08:51, 5 May 2025
The Neo-Assyrian Text Corpus Project is an international scholarly project aimed at collecting and publishing ancient Assyrian texts of the Neo-Assyrian...
10 KB (117 words) - 00:24, 25 February 2025
The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains...
1 KB (132 words) - 18:09, 24 November 2023
is also called the corpus cavernosum urethrae in older texts. The proximal part of the corpus spongiosum is expanded to form the urethral bulb, and lies...
4 KB (405 words) - 05:31, 2 May 2025
Look up corpus, corpora, or corpuses in Wiktionary, the free dictionary. Corpus (plural corpora) is Latin for "body". It may refer to: Text corpus, in linguistics...
2 KB (317 words) - 00:15, 8 March 2025
The Quranic Arabic Corpus (Arabic: المدونة القرآنية العربية, romanized: al-modwana al-Qurʾāni al-ʿArabiyya) is an annotated linguistic resource consisting...
6 KB (599 words) - 01:25, 28 March 2025
2019, the corpus had grown to 560 million words. As of November 2021, the Corpus of Contemporary American English is composed of 485,202 texts. According...
9 KB (1,135 words) - 05:28, 17 March 2025
Corpus (OEC), a massive text corpus that is written in the English language. In total, the texts in the Oxford English Corpus contain more than 2 billion...
16 KB (872 words) - 06:35, 28 April 2025
The Corpus Juris (or Iuris) Civilis ("Body of Civil Law") is the modern name for a collection of fundamental works in jurisprudence, enacted from 529 to...
22 KB (2,736 words) - 20:39, 8 May 2025
Word list (category Articles lacking in-text citations from December 2023)
analysis within a given text corpus, and is used in corpus linguistics to investigate genealogies and evolution of languages and texts. A word which appears...
27 KB (2,849 words) - 08:23, 25 April 2025
transliterations and translations of texts in a given corpus, and many offer supplementary material such as an introduction to the corpus, discussion of its historical...
10 KB (399 words) - 23:07, 12 May 2024
"The Electronic Text Corpus of Sumerian Literature". Etcsl.orinst.ox.ac.uk. Retrieved 30 December 2018. "The Electronic Text Corpus of Sumerian Literature"...
20 KB (2,183 words) - 14:28, 28 April 2025
Corpus of Sumerian Literature. Archived from the original on 2012-05-15. Retrieved 2010-02-20. "A balbale to Nanna (Nanna B)". Electronic Text Corpus...
40 KB (4,064 words) - 15:21, 14 April 2025
The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English...
3 KB (349 words) - 10:55, 26 January 2025
The Habeas Corpus Suspension Act, 12 Stat. 755 (1863), entitled An Act relating to Habeas Corpus, and regulating Judicial Proceedings in Certain Cases...
37 KB (4,826 words) - 15:24, 11 May 2025
(ESA) is a vectoral representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA...
9 KB (1,036 words) - 19:19, 23 March 2024
Sketch Engine (category Corpus linguistics)
Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language...
16 KB (1,418 words) - 09:45, 30 April 2025
canonical measure of the performance of an LLM is its perplexity on a given text corpus. Perplexity measures how well a model predicts the contents of a dataset;...
114 KB (11,945 words) - 09:37, 17 May 2025
The corpus callosum (Latin for "tough body"), also callosal commissure, is a wide, thick nerve tract, consisting of a flat bundle of commissural fibers...
32 KB (3,648 words) - 12:02, 6 February 2025
A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other...
5 KB (474 words) - 15:44, 13 March 2025