Published in

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10

DOI: 10.1145/1851476.1851594

Links

Tools

Export citation

Search in Google Scholar

Pydoop

Proceedings article published in 2010 by Simone Leo ORCID, Gianluigi Zanetti
This paper was not found in any repository; the policy of its publisher is unknown or unclear.
This paper was not found in any repository; the policy of its publisher is unknown or unclear.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

MapReduce has become increasingly popular as a simple and efficient paradigm for large-scale data processing. One of the main reasons for its popularity is the availability of a production-level open source implementation, Hadoop, written in Java. There is considerable interest, however, in tools that enable Python programmers to access the framework, due to the language's high popularity. Here we present a Python package that provides an API for both the MapReduce and the distributed file system sections of Hadoop, and show its advantages with respect to the other available solutions for Hadoop Python programming, Jython and Hadoop Streaming.