• Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving...
    49 KB (5,094 words) - 23:30, 26 April 2024
  • Thumbnail for Apache ZooKeeper
    Apache Hadoop Apache Accumulo Apache HBase Apache Hive Apache Kafka Apache Drill Apache Solr Apache Spark Apache NiFi Apache Druid Apache Helix Apache Pinot...
    8 KB (714 words) - 15:45, 24 October 2023
  • Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other...
    9 KB (740 words) - 21:39, 3 January 2024
  • Thumbnail for Apache Hive
    Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...
    21 KB (2,300 words) - 02:11, 16 April 2024
  • Thumbnail for Apache Avro
    remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...
    13 KB (1,326 words) - 18:53, 24 April 2024
  • Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio...
    10 KB (818 words) - 02:06, 12 April 2024
  • Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala...
    7 KB (577 words) - 03:15, 17 October 2022
  • platforms such as Apache Spark Beam, an uber-API for big data Bigtop: a project for the development of packaging and tests of the Apache Hadoop ecosystem. Bloodhound:...
    41 KB (4,600 words) - 22:48, 17 April 2024
  • past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala...
    8 KB (649 words) - 11:14, 4 September 2023
  • Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute...
    11 KB (979 words) - 18:51, 15 July 2022
  • Thumbnail for Apache Spark
    testing), Hadoop YARN, Apache Mesos or Kubernetes. For distributed storage, Spark can interface with a wide variety, including Alluxio, Hadoop Distributed...
    30 KB (2,732 words) - 02:20, 12 April 2024
  • Thumbnail for Apache ORC
    Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink and Apache Hadoop...
    4 KB (222 words) - 05:54, 12 January 2024
  • Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs. Workflows in Oozie are defined as a collection of control flow and action...
    3 KB (204 words) - 20:30, 27 March 2023
  • Thumbnail for Apache Nutch
    have been spun out into their own subproject, called Hadoop. In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a subproject...
    13 KB (625 words) - 22:52, 19 February 2024
  • MapReduce (redirect from Hadoop map)
    implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...
    46 KB (5,491 words) - 08:05, 19 December 2023
  • Hortonworks (category Hadoop)
    Platform (HDP): based on Apache Hadoop, Apache Hive, Apache Spark Hortonworks DataFlow (HDF): based on Apache NiFi, Apache Storm, Apache Kafka Hortonworks DataPlane...
    6 KB (474 words) - 19:49, 3 April 2023
  • The Apache Ambari project intends to simplify the management of Apache Hadoop clusters using a web UI. It also integrates with other existing applications...
    2 KB (106 words) - 00:30, 12 April 2024
  • Thumbnail for Apache Cassandra
    6, released Apr 12 2010, added support for integrated caching, and Apache Hadoop MapReduce 0.7, released Jan 08 2011, added secondary indexes and online...
    25 KB (2,256 words) - 00:11, 22 February 2024
  • Thumbnail for MapR
    MapR (category Hadoop)
    single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management...
    7 KB (526 words) - 16:44, 13 January 2024
  • Thumbnail for Apache Drill
    include: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR NoSQL: MongoDB, Apache HBase, Apache Cassandra Online...
    7 KB (700 words) - 02:01, 12 April 2024
  • Thumbnail for Apache Solr
    more advanced customization. Apache Solr is developed in an open, collaborative manner by the Apache Solr project at the Apache Software Foundation. In 2004...
    15 KB (1,446 words) - 01:51, 12 April 2024
  • Thumbnail for Open source
    Apache Software Foundation, which supports community projects such as the open-source framework Apache Hadoop and the open-source HTTP server Apache HTTP...
    106 KB (11,859 words) - 13:00, 28 April 2024
  • Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix...
    5 KB (306 words) - 19:56, 30 March 2024
  • Thumbnail for Apache Kudu
    Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks...
    4 KB (323 words) - 12:51, 23 December 2023
  • Thumbnail for Doug Cutting
    are now managed through the Apache Software Foundation. Cutting and Cafarella are also the co-founders of Apache Hadoop. Cutting graduated from Stanford...
    8 KB (688 words) - 14:35, 19 February 2024
  • Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache...
    6 KB (586 words) - 21:28, 16 April 2023
  • Cloudera (category Hadoop)
    Hadoop Development". The New York Times. VentureBeat. October 27, 2010. Rao, Leena (7 November 2011). "Ignition, Accel, Greylock Put $40M In Apache Hadoop...
    15 KB (1,071 words) - 23:18, 13 March 2024
  • Cascading (software) (category Hadoop)
    abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any...
    10 KB (776 words) - 19:08, 23 June 2023
  • Thumbnail for Oracle NoSQL Database
    from OND natively into Hadoop MapReduce jobs. One use for this class is to read NoSQL database records into Oracle Loader for Hadoop. Oracle Big Data SQL...
    19 KB (2,000 words) - 00:24, 5 December 2023
  • Thumbnail for Lambda architecture
    data warehouse, Yahoo has taken a similar approach, also using Apache Storm, Apache Hadoop, and Druid.: 9, 16  The Netflix Suro project has separate processing...
    11 KB (1,158 words) - 02:29, 12 April 2024