Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework...
48 KB (4,947 words) - 02:29, 8 June 2025
MapReduce (redirect from Hadoop map)
implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...
46 KB (5,480 words) - 18:47, 12 December 2024
the benefits of dimensional models on Hadoop and similar big data frameworks. However, some features of Hadoop require us to slightly adapt the standard...
13 KB (1,656 words) - 07:08, 4 April 2025
procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...
13 KB (1,326 words) - 05:49, 25 February 2025
enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository." Many companies use cloud storage services such as Google...
9 KB (1,033 words) - 18:24, 14 March 2025
Apache Parquet (category Hadoop)
storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most...
13 KB (1,135 words) - 20:16, 19 May 2025
Data-intensive computing (section Hadoop)
Apache Hadoop is an open source software project sponsored by The Apache Software Foundation which implements the MapReduce architecture. Hadoop now encompasses...
25 KB (3,139 words) - 02:30, 22 December 2024
Sqoop (category Hadoop)
interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache...
6 KB (439 words) - 19:04, 17 July 2024
Apache Hive (category Hadoop)
Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...
21 KB (2,300 words) - 01:15, 14 March 2025
resource management and scheduling design in distributed systems such as Hadoop. In 2013, he co-founded Databricks, a company that commercializes Spark...
6 KB (441 words) - 15:02, 29 March 2025
Hortonworks (category Hadoop)
that developed and supported open-source software (primarily around Apache Hadoop) designed to manage big data and associated processing. Hortonworks software...
7 KB (512 words) - 21:42, 17 January 2025
Python-based open source implementation of a software forge Ambari: makes Hadoop cluster provisioning, managing, and monitoring dead simple Ant: Java-based...
38 KB (4,300 words) - 16:50, 29 May 2025
Cloudera (category Hadoop)
in 2009 by Doug Cutting, a co-founder of Hadoop. Cloudera originally offered a free product based on Hadoop, earning revenue by selling support and consulting...
15 KB (1,093 words) - 19:33, 20 April 2025
streaming Unofficial UDPCast Unofficial Play Framework web server Unofficial Hadoop NameNode default port Unofficial PHP-FPM default port Unofficial qBittorrent's...
320 KB (13,096 words) - 12:18, 8 June 2025
Actian Vector (section Actian Vector in Hadoop)
processing version of Vector, in Hadoop with storage in HDFS. Actian Vortex was later renamed to Actian Vector in Hadoop. The basic architecture and design...
28 KB (2,221 words) - 04:30, 23 November 2024
Amazon S3. Amazon EMR deploys open source, big data frameworks like Apache Hadoop, Spark, Presto, HBase, and Flink. Amazon Redshift fully manages petabyte-scale...
4 KB (431 words) - 13:10, 4 August 2024
The following figures (from ) show how CSDs can be utilized in an Apache Hadoop cluster and on a Message Passing Interface-based distributed environment...
10 KB (1,287 words) - 11:06, 27 May 2025
in-house development (according to LexisNexis). It is an alternative to Hadoop and other Big data platforms. The HPCC system architecture includes two...
12 KB (1,124 words) - 02:36, 8 June 2025
a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of HDFS and MapReduce framework and inherits Hadoop's scalability...
334 bytes (78 words) - 12:15, 16 October 2020
Apache ZooKeeper (category Hadoop)
large distributed systems (see Use cases). ZooKeeper was a sub-project of Hadoop but is now a top-level Apache project in its own right. ZooKeeper's architecture...
8 KB (733 words) - 13:35, 18 May 2025
Hue (software) (redirect from Hue (Hadoop))
Hue (Hadoop User Experience) is an open-source SQL Cloud Editor, licensed under the Apache License 2.0. Hue is an open-source SQL Assistant for querying...
2 KB (119 words) - 17:42, 17 May 2023
|journal= (help) Kellerman, Jim. "HBase: structured storage of sparse data for Hadoop" (PDF). Retrieved 20 February 2016. java - Cassandra - transaction support...
12 KB (319 words) - 23:30, 13 March 2025
manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop. Cutting graduated from Stanford University in 1985 with a bachelor's degree...
8 KB (686 words) - 15:33, 27 July 2024
integration: HBase and Rcfile__HadoopSummit2010". 2010-06-30. "Facebook has the world's largest Hadoop cluster!". 2010-05-09. "Apache Hadoop India Summit 2011 talk...
12 KB (1,445 words) - 17:50, 2 August 2024
for use in big data applications with cluster computing frameworks like Hadoop and Apache Spark, as a compressed block can be decompressed without having...
22 KB (2,859 words) - 05:43, 24 January 2025
designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets. It was originally developed...
6 KB (472 words) - 20:41, 22 December 2023
of records. A 2011 conference presentation on SciDB promoted it as "not Hadoop". Marilyn Matz became chief executive Paradigm4 in 2014. Comparison of object...
3 KB (202 words) - 07:09, 8 January 2025
source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver...
5 KB (306 words) - 16:50, 29 May 2025
file systems (DFS) of this type are the Google File System (GFS) and the Hadoop Distributed File System (HDFS). The file systems of both are implemented...
60 KB (7,502 words) - 19:51, 4 June 2025
Cascading (software) (category Hadoop)
abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based...
10 KB (776 words) - 21:37, 30 April 2025