Apache Arrow

Apache Arrow
開発元	Apache Software Foundation
初版	2016年10月10日 (8年前)
最新版	19.0.1 / 16 February 2025
リポジトリ	https://github.com/apache/arrow
プログラミング; 言語	C、C++、C#、Go、Java、JavaScript、MATLAB、Python、R、Ruby、Rust
種別	データフォーマット、アルゴリズム
ライセンス	Apache License 2.0
公式サイト	arrow.apache.org
	テンプレートを表示

Apache Arrowは、列指向データ処理のためのデータ分析アプリケーションを開発するための言語非依存の（英語版）ソフトウェアフレームワークである。モダンなCPUやGPUハードウェア上で、フラットで階層的なデータに対して効率よく分析的な操作が行える、標準化された列指向のメモリフォーマットが含まれている^[2]^[3]^[4]^[5]^[6]。これにより、DRAMの費用・ボラタリティ・物理的な制約などの、大規模なデータの処理を行う実現可能性を制限する要因を減少または排除することが可能になる^[7]。

相互運用性

ArrowはApache Parquet（英語版）、Apache Spark、NumPy、PySpark、pandas、他のデータ処理ライブラリと合わせて利用できる。

プロジェクトには、C、C++、C#、Go、Java、JavaScript、Julia、MATLAB、Python、R、Ruby、Rust向けのネイティブのソフトウェアライブラリがある。Arrowを利用すると、これらの言語やシステム間で、シリアライズのオーバーヘッドなしにゼロコピーの読み込みと高速なデータアクセスとデータ交換が可能になる^[2]。

アプリケーション

Arrowは、データ分析^[8]、ゲノミクス^[9]^[7]、クラウドコンピューティング^[10]など、さまざまなドメインで使用されている^[10]。

Apache ParquetとORCとの比較

人気のあるディスク上の列指向データフォーマットの例としては、Apache Parquet（英語版）とApache ORC（英語版）がある。Arrowは、メモリ内でのデータ処理のために、これらのフォーマットを補完するように設計された^[11]。メモリ内処理のためのハードウェアリソースエンジニアリングのトレードオフは、ディスク上のストレージに関連するトレードオフとは異なる^[12]。ArrowとParquetプロジェクトには、これら2種類のフォーマット間でデータの読み込みと書き込みを可能にするライブラリが含まれる^[13]。

ガバナンス

Apache Arrowは、2016年2月17日にThe Apache Software Foundationから発表され^[14]、他のオープンソースのデータ分析プロジェクトの開発者たちの連合が開発を主導している^[15]^[16]^[6]^[17]^[18]。初期のコードベースとJavaライブラリはApache Drillのコードが元になっている^[14]。

出典

^ "Release 19.0.1". 16 February 2025. 2025年2月20日閲覧。
^ ^a ^b “Apache Arrow and Distributed Compute with Kubernetes” (2018年12月13日). 2025年3月5日閲覧。
^ Baer (2016年2月17日). “Apache Arrow: Lining Up The Ducks In A Row... Or Column”. Seeking Alpha. 2025年3月5日閲覧。
^ Baer (2019年2月25日). “Apache Arrow: The little data accelerator that could”. ZDNet. 2025年3月5日閲覧。
^ Hall (2016年2月23日). “Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark”. The New Stack. 2025年3月5日閲覧。
^ ^a ^b “Apache Arrow aims to speed access to big data | InfoWorld”. web.archive.org (2016年8月19日). 2025年3月5日閲覧。
^ ^a ^b Tanveer Ahmad (2019). “ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework”. bioRxiv: 741843. doi:10.1101/741843.
^ Dinsmore T.W. (2016). “In-Memory Analytics: Satisfying the Need for Speed”. Disruptive Analytics. Apress, Berkeley, CA. pp. 97–116. doi:10.1007/978-1-4842-1311-7_5. ISBN 978-1-4842-1312-4
^ “Scalable genomics: from raw data to aligned reads on Apache YARN”. IEEE International Conference on Big Data: 1232–1241. (2016).
^ ^a ^b “Return of the runtimes: rethinking the language runtime system for the cloud 3.0 era”. Proceedings of the 16th Workshop on Hot Topics in Operating Systems (ACM): 138–143. (2017). doi:10.1145/3102980.3103003.
^ Le Dem. “Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory”. KDnuggets. Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ “Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?” (2017年10月31日). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ PyArrow:Reading and Writing the Apache Parquet Format
^ ^a ^b “The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project”. The Apache Software Foundation Blog (2016年2月17日). 2016年3月13日時点のオリジナルよりアーカイブ。 Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ Martin (2016年2月17日). “Apache Foundation rushes out Apache Arrow as top-level project”. The Register. Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ “Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says.” (2016年2月17日). 2016年7月27日時点のオリジナルよりアーカイブ。2018年1月31日閲覧。
^ Le Dem (2016年11月28日). “The first release of Apache Arrow”. SD Times. Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow.

外部リンク

Apache Arrow プロジェクトのウェブサイト
Apache Arrow GitHub プロジェクトのソースコード

[wikidata-f2409ec4efd4bad32fc0907fb05f04bf11abc26f-v18-1] "Release 19.0.1". 16 February 2025. 2025年2月20日閲覧。

[xenonstack-2] “Apache Arrow and Distributed Compute with Kubernetes” (2018年12月13日). 2025年3月5日閲覧。

[seekingalpha-3] Baer (2016年2月17日). “Apache Arrow: Lining Up The Ducks In A Row... Or Column”. Seeking Alpha. 2025年3月5日閲覧。

[zdnet-4] Baer (2019年2月25日). “Apache Arrow: The little data accelerator that could”. ZDNet. 2025年3月5日閲覧。

[5] Hall (2016年2月23日). “Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark”. The New Stack. 2025年3月5日閲覧。

[:2-6] “Apache Arrow aims to speed access to big data | InfoWorld”. web.archive.org (2016年8月19日). 2025年3月5日閲覧。

[biorxiv-7] Tanveer Ahmad (2019). “ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework”. bioRxiv: 741843. doi:10.1101/741843.

[8] Dinsmore T.W. (2016). “In-Memory Analytics: Satisfying the Need for Speed”. Disruptive Analytics. Apress, Berkeley, CA. pp. 97–116. doi:10.1007/978-1-4842-1311-7_5. ISBN 978-1-4842-1312-4

[9] “Scalable genomics: from raw data to aligned reads on Apache YARN”. IEEE International Conference on Big Data: 1232–1241. (2016).

[:1-10] “Return of the runtimes: rethinking the language runtime system for the cloud 3.0 era”. Proceedings of the 16th Workshop on Hot Topics in Operating Systems (ACM): 138–143. (2017). doi:10.1145/3102980.3103003.

[11] Le Dem. “Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory”. KDnuggets. Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[12] “Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?” (2017年10月31日). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[13] PyArrow:Reading and Writing the Apache Parquet Format

[:0-14] “The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project”. The Apache Software Foundation Blog (2016年2月17日). 2016年3月13日時点のオリジナルよりアーカイブ。 Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[reg17Feb2016-15] Martin (2016年2月17日). “Apache Foundation rushes out Apache Arrow as top-level project”. The Register. Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[16] “Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says.” (2016年2月17日). 2016年7月27日時点のオリジナルよりアーカイブ。2018年1月31日閲覧。

[17] Le Dem (2016年11月28日). “The first release of Apache Arrow”. SD Times. Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[18] Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

表話編歴 Apacheソフトウェア財団
トップレベルプロジェクト	Accumulo（英語版） ActiveMQ Airflow Ambari（英語版） Ant Aries（英語版） Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound（英語版） Brooklyn（英語版） Buildr（英語版） Calcite（英語版） Camel（英語版） Cassandra Cayenne（英語版） Chemistry（英語版） CloudStack（英語版） Cocoon Cordova CouchDB cTAKES（英語版） CXF（英語版） Derby Directory（英語版） Drill Druid（英語版） Empire-db（英語版） Felix（英語版） Flex Flink（英語版） Flume（英語版） FreeMaker（英語版） Geronimo Giraph（英語版） Gump（英語版） Hadoop HBase Helix（英語版） Hive Impala（英語版） Jackrabbit（英語版） James Jena（英語版） Jini（英語版） JMeter（英語版） Kafka Kudu（英語版） Kylin（英語版） Lucene Mahout Maven MINA（英語版） mod perl（英語版） MyFaces（英語版） NetBeans Nutch（英語版） NuttX（英語版） OFBiz（英語版） Oozie（英語版） OpenEJB OpenJPA OpenNLP OpenOffice ORC（英語版） PDFBox（英語版） Parquet（英語版） Phoenix（英語版） POI Pig（英語版） Pinot（英語版） Pivot Qpid（英語版） Roller RocketMQ（英語版） Samza（英語版） ServiceMix（英語版） Shiro（英語版） SINGA（英語版） Sling（英語版） Solr Spark Storm（英語版） SpamAssassin Struts 1 Struts 2（英語版） Subversion Apache Superset SystemDS（英語版） Tapestry Thrift Tika（英語版） Tomcat Traffic Server（英語版） Turbine（英語版） UIMA（英語版） Velocity Wicket Xalan Xerces XMLBeans Yetus（英語版） ZooKeeper
Commons	BCEL（英語版） BSF（英語版） Collections Daemon（英語版） DBUtils Email IO Jelly（英語版） Lang Apache Commons Logging（英語版） Math
Incubator	MXNet（英語版） Taverna（英語版）
その他のプロジェクト	Apache Batik Chainsaw（英語版） FOP Ivy（英語版） log4j
Attic	Abdera（英語版） Apex（英語版） AxKit Beehive（英語版） Bluesky（英語版） iBATIS c++ Standard Library（英語版） Cactus（英語版） Click（英語版） Continuum（英語版） Deltacloud（英語版） Etch（英語版） Excalibur（英語版） Forrest（英語版） Hama（英語版） Harmony HiveMind（英語版） Jakarta Lenya（英語版） Marmotta（英語版） ODE（英語版） Shale（英語版） Slide（英語版） Shindig（ハンガリー語版） Stanbol（英語版） Tuscany（英語版） Wave（英語版） Wink（英語版） XML
ライセンス	Apache License
Category Commons