• Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework...
    48 KB (4,939 words) - 19:00, 7 May 2025
  • MapReduce (redirect from Hadoop map)
    implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...
    46 KB (5,480 words) - 18:47, 12 December 2024
  • the benefits of dimensional models on Hadoop and similar big data frameworks. However, some features of Hadoop require us to slightly adapt the standard...
    13 KB (1,656 words) - 07:08, 4 April 2025
  • Apache Parquet (category Hadoop)
    storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most...
    13 KB (1,135 words) - 20:16, 19 May 2025
  • Thumbnail for Data lake
    enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository." Many companies use cloud storage services such as Google...
    9 KB (1,033 words) - 18:24, 14 March 2025
  • Thumbnail for Apache Avro
    procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...
    13 KB (1,326 words) - 05:49, 25 February 2025
  • Apache Hadoop is an open source software project sponsored by The Apache Software Foundation which implements the MapReduce architecture. Hadoop now encompasses...
    25 KB (3,139 words) - 02:30, 22 December 2024
  • Sqoop (category Hadoop)
    interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache...
    6 KB (439 words) - 19:04, 17 July 2024
  • Thumbnail for Apache Hive
    Apache Hive (category Hadoop)
    Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...
    21 KB (2,300 words) - 01:15, 14 March 2025
  • Apache Mahout (category Hadoop)
    linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout...
    8 KB (654 words) - 14:33, 29 May 2025
  • Python-based open source implementation of a software forge Ambari: makes Hadoop cluster provisioning, managing, and monitoring dead simple Ant: Java-based...
    38 KB (4,300 words) - 16:50, 29 May 2025
  • Thumbnail for Actian Vector
    processing version of Vector, in Hadoop with storage in HDFS. Actian Vortex was later renamed to Actian Vector in Hadoop. The basic architecture and design...
    28 KB (2,221 words) - 04:30, 23 November 2024
  • source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver...
    5 KB (306 words) - 16:50, 29 May 2025
  • Thumbnail for Doug Cutting
    manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop. Cutting graduated from Stanford University in 1985 with a bachelor's degree...
    8 KB (686 words) - 15:33, 27 July 2024
  • Cloudera (category Hadoop)
    in 2009 by Doug Cutting, a co-founder of Hadoop. Cloudera originally offered a free product based on Hadoop, earning revenue by selling support and consulting...
    15 KB (1,093 words) - 19:33, 20 April 2025
  • The following figures (from ) show how CSDs can be utilized in an Apache Hadoop cluster and on a Message Passing Interface-based distributed environment...
    10 KB (1,287 words) - 11:06, 27 May 2025
  • Thumbnail for Ali Ghodsi
    resource management and scheduling design in distributed systems such as Hadoop. In 2013, he co-founded Databricks, a company that commercializes Spark...
    6 KB (441 words) - 15:02, 29 March 2025
  • Quantcast File System (category Hadoop)
    batch-processing workloads. It was designed as an alternative to the Apache Hadoop Distributed File System (HDFS), intended to deliver better performance and...
    4 KB (458 words) - 20:37, 3 February 2024
  • Thumbnail for Aiyara cluster
    literally an elephant to reflect its underneath software stack, which is Apache Hadoop. Like Beowulf, an Aiyara cluster does not define a particular software stack...
    3 KB (443 words) - 14:36, 19 April 2023
  • Amazon S3. Amazon EMR deploys open source, big data frameworks like Apache Hadoop, Spark, Presto, HBase, and Flink. Amazon Redshift fully manages petabyte-scale...
    4 KB (431 words) - 13:10, 4 August 2024
  • Thumbnail for Hue (software)
    Hue (software) (redirect from Hue (Hadoop))
    Hue (Hadoop User Experience) is an open-source SQL Cloud Editor, licensed under the Apache License 2.0. Hue is an open-source SQL Assistant for querying...
    2 KB (119 words) - 17:42, 17 May 2023
  • Apache Impala (category Hadoop)
    (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which...
    6 KB (555 words) - 13:30, 13 April 2025
  • works with Apache Hadoop and other distributed file systems and Revolution Analytics has partnered with IBM to further integrate Hadoop into Revolution...
    18 KB (1,625 words) - 00:26, 2 June 2025
  • Thumbnail for XGBoost
    single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost gained much popularity and...
    14 KB (1,322 words) - 00:11, 20 May 2025
  • Thumbnail for Apache Kylin
    designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets. It was originally developed...
    6 KB (472 words) - 20:41, 22 December 2023
  • in-house development (according to LexisNexis). It is an alternative to Hadoop and other Big data platforms. The HPCC system architecture includes two...
    11 KB (1,116 words) - 18:56, 30 April 2025
  • create C-Store, a column-oriented database, and HadoopDB, a hybrid of relational databases and Hadoop. Both database systems were commercialized by companies...
    8 KB (583 words) - 21:36, 6 April 2025
  • integration: HBase and Rcfile__HadoopSummit2010". 2010-06-30. "Facebook has the world's largest Hadoop cluster!". 2010-05-09. "Apache Hadoop India Summit 2011 talk...
    12 KB (1,445 words) - 17:50, 2 August 2024
  • Cirata (category Hadoop)
    technology that moves large Internet of Things (IoT) datasets, edge data, and Hadoop. The company is dual-headquartered in Sheffield, England and San Ramon,...
    6 KB (413 words) - 20:39, 14 May 2025
  • Thumbnail for Lambda architecture
    updates completely replacing existing precomputed views.: 18  By 2014, Apache Hadoop was estimated to be a leading batch-processing system. Later, other, relational...
    10 KB (1,145 words) - 02:33, 11 February 2025