Hadoop Search Results

Apache Hadoop

Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework...

48 KB (4,947 words) - 02:29, 8 June 2025

MapReduce (redirect from Hadoop map)

implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...

46 KB (5,480 words) - 18:47, 12 December 2024

Dimensional modeling (section Dimensional models, Hadoop, and big data)

the benefits of dimensional models on Hadoop and similar big data frameworks. However, some features of Hadoop require us to slightly adapt the standard...

13 KB (1,656 words) - 07:08, 4 April 2025

Apache Avro

procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...

13 KB (1,326 words) - 05:49, 25 February 2025

Data-intensive computing (section Hadoop)

Apache Hadoop is an open source software project sponsored by The Apache Software Foundation which implements the MapReduce architecture. Hadoop now encompasses...

25 KB (3,139 words) - 02:30, 22 December 2024

Data lake

enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository." Many companies use cloud storage services such as Google...

9 KB (1,033 words) - 18:24, 14 March 2025

Apache Parquet (category Hadoop)

storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most...

13 KB (1,135 words) - 20:16, 19 May 2025

Sqoop (category Hadoop)

interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache...

6 KB (439 words) - 19:04, 17 July 2024

Ali Ghodsi

resource management and scheduling design in distributed systems such as Hadoop. In 2013, he co-founded Databricks, a company that commercializes Spark...

6 KB (441 words) - 15:02, 29 March 2025

List of Apache Software Foundation projects

Python-based open source implementation of a software forge Ambari: makes Hadoop cluster provisioning, managing, and monitoring dead simple Ant: Java-based...

38 KB (4,300 words) - 16:50, 29 May 2025

Hortonworks (category Hadoop)

that developed and supported open-source software (primarily around Apache Hadoop) designed to manage big data and associated processing. Hortonworks software...

7 KB (512 words) - 21:42, 17 January 2025

In-situ processing

The following figures (from ) show how CSDs can be utilized in an Apache Hadoop cluster and on a Message Passing Interface-based distributed environment...

10 KB (1,287 words) - 11:06, 27 May 2025

Chukwa

a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of HDFS and MapReduce framework and inherits Hadoop's scalability...

334 bytes (78 words) - 12:15, 16 October 2020

List of TCP and UDP port numbers

streaming Unofficial UDPCast Unofficial Play Framework web server Unofficial Hadoop NameNode default port Unofficial PHP-FPM default port Unofficial qBittorrent's...

320 KB (13,096 words) - 12:18, 8 June 2025

HPCC

in-house development (according to LexisNexis). It is an alternative to Hadoop and other Big data platforms. The HPCC system architecture includes two...

12 KB (1,124 words) - 02:36, 8 June 2025

Cloudera (category Hadoop)

in 2009 by Doug Cutting, a co-founder of Hadoop. Cloudera originally offered a free product based on Hadoop, earning revenue by selling support and consulting...

14 KB (1,093 words) - 21:20, 9 June 2025

Apache Hive (category Hadoop)

Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...

21 KB (2,300 words) - 01:15, 14 March 2025

Doug Cutting

manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop. Cutting graduated from Stanford University in 1985 with a bachelor's degree...

8 KB (686 words) - 15:33, 27 July 2024

Cloud analytics

Amazon S3. Amazon EMR deploys open source, big data frameworks like Apache Hadoop, Spark, Presto, HBase, and Flink. Amazon Redshift fully manages petabyte-scale...

4 KB (431 words) - 13:10, 4 August 2024

Bzip2

for use in big data applications with cluster computing frameworks like Hadoop and Apache Spark, as a compressed block can be decompressed without having...

22 KB (2,859 words) - 05:43, 24 January 2025

Comparison of structured storage software

|journal= (help) Kellerman, Jim. "HBase: structured storage of sparse data for Hadoop" (PDF). Retrieved 20 February 2016. java - Cassandra - transaction support...

12 KB (319 words) - 23:30, 13 March 2025

Actian Vector (section Actian Vector in Hadoop)

processing version of Vector, in Hadoop with storage in HDFS. Actian Vortex was later renamed to Actian Vector in Hadoop. The basic architecture and design...

28 KB (2,207 words) - 04:30, 23 November 2024

Hue (software) (redirect from Hue (Hadoop))

Hue (Hadoop User Experience) is an open-source SQL Cloud Editor, licensed under the Apache License 2.0. Hue is an open-source SQL Assistant for querying...

2 KB (119 words) - 17:42, 17 May 2023

XGBoost

single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost gained much popularity and...

14 KB (1,322 words) - 00:11, 20 May 2025

RCFile

integration: HBase and Rcfile__HadoopSummit2010". 2010-06-30. "Facebook has the world's largest Hadoop cluster!". 2010-05-09. "Apache Hadoop India Summit 2011 talk...

12 KB (1,445 words) - 17:50, 2 August 2024

Sector/Sphere

architecture a two to four times better performance than the competitor Hadoop which is written in Java, a statement supported by an Aster Data Systems...

8 KB (955 words) - 20:45, 10 October 2024

Apache Kylin

designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets. It was originally developed...

6 KB (472 words) - 20:41, 22 December 2023

Distributed file system for cloud (section Hadoop distributed file system)

file systems (DFS) of this type are the Google File System (GFS) and the Hadoop Distributed File System (HDFS). The file systems of both are implemented...

60 KB (7,502 words) - 19:51, 4 June 2025

SciDB

of records. A 2011 conference presentation on SciDB promoted it as "not Hadoop". Marilyn Matz became chief executive Paradigm4 in 2014. Comparison of object...

3 KB (202 words) - 07:09, 8 January 2025

GPFS (section Compared to Hadoop Distributed File System (HDFS))

heterogeneous cluster, disaster recovery, security, DMAPI, HSM and ILM. Hadoop's HDFS filesystem, is designed to store similar or greater quantities of...

15 KB (1,679 words) - 12:38, 18 December 2024