German Reference Corpus (original: Deutsches Referenzkorpus; short: DeReKo) is an electronic archive of text corpora of contemporary written German....
4 KB (537 words) - 20:49, 27 January 2023
The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between...
3 KB (230 words) - 02:09, 26 March 2025
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...
20 KB (2,335 words) - 10:40, 25 June 2025
The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples...
11 KB (1,270 words) - 02:43, 26 March 2025
The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...
7 KB (725 words) - 03:40, 16 April 2025
The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
4 KB (348 words) - 21:01, 11 January 2025
The Cambridge International Corpus (CIC) is a collection of over 2 billion words of real spoken and written English . The texts are stored in a database...
8 KB (1,028 words) - 00:21, 18 January 2025
The Europarl Corpus is a corpus (set of documents) that consists of the proceedings of the European Parliament from 1996 to 2012. In its first release...
6 KB (800 words) - 11:02, 15 September 2022
and corpus, accusative singular of corpus "body". In reference to more than one person, the phrase is habeas corpora. The writ of habeas corpus was described...
67 KB (8,173 words) - 23:55, 20 July 2025
PropBank (redirect from PropBank Corpus)
is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced...
4 KB (390 words) - 18:00, 28 June 2025
British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...
31 KB (3,894 words) - 01:18, 14 June 2024
The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently...
5 KB (605 words) - 10:56, 26 January 2025
The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired...
9 KB (1,135 words) - 14:04, 24 May 2025
The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas...
4 KB (459 words) - 18:16, 28 June 2025
Bank of English (category Corpus linguistics stubs)
French, German and Spanish corpora. Corpus of Contemporary American English (COCA) British National Corpus (BNC) The Collins Corpus COBUILD Reference v t...
1 KB (153 words) - 18:12, 28 June 2025
lemmatization of the Greek corpus (2006) – a substantial undertaking, given the highly inflected nature of Greek and the complexity of the corpus, covering more than...
5 KB (599 words) - 20:04, 26 August 2024
COBUILD (category Articles lacking reliable references from December 2023)
have been the creation and analysis of an electronic corpus of contemporary text, the Collins Corpus, later leading to the development of the Bank of English...
2 KB (181 words) - 18:11, 28 June 2025
(Danish web corpus) deTenTen (German web corpus) elTenTen (Greek web corpus) enTenTen (English web corpus) esTenTen (Spanish web corpus with European/American...
12 KB (1,204 words) - 06:39, 22 November 2024
The Czech National Corpus (CNC) (Czech : Český národní korpus) is a large electronic corpus of written and spoken Czech language, developed by the Institute...
4 KB (466 words) - 11:24, 12 July 2025
Sketch Engine (category Corpus linguistics)
Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language...
16 KB (1,437 words) - 13:48, 10 July 2025
VerbNet (category Corpus linguistics stubs)
Corpus German Reference Corpus Hamshahri Corpus National Corpus of Polish Neo-Assyrian Text Corpus Project Persian Speech Corpus Quranic Arabic Corpus Russian...
1 KB (96 words) - 02:16, 17 May 2025
The Feast of Corpus Christi (Ecclesiastical Latin: Dies Sanctissimi Corporis et Sanguinis Domini Iesu Christi, lit. 'Day of the Most Holy Body and Blood...
48 KB (5,104 words) - 16:30, 12 July 2025
The Quranic Arabic Corpus (Arabic: المدونة القرآنية العربية, romanized: al-modwana al-Qurʾāni al-ʿArabiyya) is an annotated linguistic resource consisting...
6 KB (599 words) - 01:25, 28 March 2025
The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions...
4 KB (388 words) - 18:44, 27 July 2023
The Bijankhan corpus (Persian: پیکرهٔ بیجنخان) is a tagged corpus that is suitable for natural language processing (NLP) research on the Persian language...
2 KB (158 words) - 12:41, 15 June 2025
The Russian National Corpus (Russian: Национальный корпус русского языка, lit. 'National Corpus of the Russian Language') is a corpus of the Russian language...
4 KB (379 words) - 18:21, 29 October 2024
lexicographic references for language learners. The JMdict Japanese-English dictionary selects its example sentences from the Tatoeba Corpus. OpenRussian...
23 KB (2,075 words) - 19:12, 23 June 2025
List of text corpora (category Corpus linguistics)
Corpus Slovenian National Corpus Czech National Corpus National Corpus of Polish Slovak National Corpora German Reference Corpus (DeReKo) More than 4 billion...
23 KB (2,460 words) - 20:27, 20 June 2025
The International Corpus of English (ICE) is a set of text corpora representing varieties of English from around the world. Over twenty countries or groups...
11 KB (1,229 words) - 00:56, 27 February 2025
The Hamshahri Corpus (Persian: پیکره همشهری) is a sizable Persian corpus based on the Iranian newspaper Hamshahri, one of the first online Persian-language...
3 KB (327 words) - 20:27, 20 June 2025