• Thumbnail for Data set
    Data set (redirect from DataSeT)
    Loading datasets using Python: $ pip install datasets from datasets import load_dataset dataset = load_dataset(NAME OF DATASET) List of datasets for machine-learning...
    10 KB (922 words) - 11:17, 2 June 2025
  • These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the...
    266 KB (15,006 words) - 03:49, 7 June 2025
  • Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched...
    4 KB (383 words) - 21:45, 14 August 2023
  • The Pile is an 886.03 GB diverse, open-source dataset of English text created as a training dataset for large language models (LLMs). It was constructed...
    14 KB (1,280 words) - 12:06, 1 July 2025
  • Thumbnail for Democracy-Dictatorship Index
    index of democracy and dictatorship or simply the DD index or the DD datasets was the binary measure of democracy and dictatorship whose publication...
    32 KB (1,706 words) - 04:04, 5 July 2025
  • Thumbnail for Apache Spark
    followed by the Dataset API. In Spark 1.x, the RDD was the primary application programming interface (API), but as of Spark 2.x use of the Dataset API is encouraged...
    30 KB (2,752 words) - 06:54, 10 June 2025
  • Thumbnail for EPSG Geodetic Parameter Dataset
    EPSG Geodetic Parameter Dataset (also EPSG registry) is a public registry of geodetic datums, spatial reference systems, Earth ellipsoids, coordinate...
    5 KB (444 words) - 20:42, 28 January 2025
  • The Worldwide Atrocities Dataset is a dataset collected by the Computational Event Data System at Pennsylvania State University and sponsored by the Political...
    4 KB (442 words) - 05:15, 20 June 2025
  • This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily...
    127 KB (7,858 words) - 10:04, 7 July 2025
  • Thumbnail for MNIST database
    MNIST database (redirect from MNIST dataset)
    original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken...
    32 KB (3,254 words) - 10:53, 30 June 2025
  • The National Hydrography Dataset (NHD) is a digital database of surface water features used to make maps. It contains features such as lakes, ponds, streams...
    3 KB (394 words) - 18:03, 8 October 2024
  • Thumbnail for CORA dataset
    database ReAnalysis) is a global oceanographic temperature and salinity dataset produced and maintained by the French institute IFREMER. Most of those...
    7 KB (571 words) - 21:48, 25 September 2023
  • of widespread internet access, researchers began compiling massive text datasets from the web ("web as corpus") to train statistical language models. Following...
    131 KB (13,790 words) - 21:58, 6 July 2025
  • The National Elevation Dataset (NED) consists of high precision topography or ground surface elevation data (digital elevation model) for the United States...
    2 KB (227 words) - 01:30, 18 December 2023
  • The UAH satellite temperature dataset, developed at the University of Alabama in Huntsville, infers the temperature of various atmospheric layers from...
    14 KB (1,341 words) - 16:58, 4 June 2024
  • A national lidar dataset refers to a high-resolution lidar dataset comprising most—and ideally all—of a nation's terrain. Datasets of this type typically...
    4 KB (97 words) - 16:43, 16 February 2025
  • method of measuring how many different types (e.g. species) there are in a dataset (e.g. a community). Diversity indices are statistical representations of...
    25 KB (3,462 words) - 06:28, 24 June 2025
  • high-profile datasets that describe qualities of different governments, annually published and publicly available for free. These datasets are used by...
    11 KB (905 words) - 21:06, 25 May 2025
  • coordinating efforts across multiple agencies towards a National LIDAR Dataset. The first meeting, a National LIDAR Initiative Strategy Meeting, was held...
    19 KB (395 words) - 06:35, 28 June 2025
  • Interlinked Datasets (VoID) is an RDF vocabulary, and a set of instructions, that enables the discovery and usage of linked data sets. A linked dataset is a...
    2 KB (136 words) - 11:34, 28 February 2023
  • a sheep if located on a grassland. Statistical classification List of datasets for machine learning research Hierarchical classification Ron Kohavi; Foster...
    20 KB (2,212 words) - 08:39, 27 May 2025
  • or tagging relevant metadata within a dataset to enable machines to interpret the data accurately. The dataset can take various forms, including images...
    7 KB (675 words) - 14:20, 3 July 2025
  • a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. It is opposed to the concept of an ancillary statistic...
    10 KB (1,548 words) - 16:15, 10 January 2025
  • The European Climate Assessment and Dataset (ECA&D) is a database of daily meteorological station observations across Europe and is gradually being extended...
    17 KB (1,973 words) - 00:56, 29 June 2024
  • Thumbnail for Iris flower data set
    The iris data set is widely used as a beginner's dataset for machine learning purposes. The dataset is included in R base and Python in the machine learning...
    18 KB (954 words) - 23:38, 16 April 2025
  • Thumbnail for Cross-validation (statistics)
    problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data)...
    44 KB (5,784 words) - 14:10, 9 July 2025
  • COVID-19 datasets are public databases for sharing case data and medical information related to the COVID-19 pandemic. Johns Hopkins Coronavirus Resource...
    13 KB (880 words) - 07:16, 9 March 2025
  • Thus the mean s ( i ) {\displaystyle s(i)} over all data of the entire dataset is a measure of how appropriately the data have been clustered. If there...
    14 KB (2,220 words) - 20:29, 20 June 2025
  • the dataset. The algorithm is most effective when the underlying dataset is imbalanced. It exploits the structures of conditional imbalanced datasets more...
    6 KB (951 words) - 13:19, 22 August 2022
  • allowed for that attribute. An example of random partitioning in a 2D dataset of normally distributed points is shown in the first figure for a non-anomalous...
    37 KB (4,553 words) - 03:02, 16 June 2025