mongodb | Chimpler

Playing with Apache Hive, MongoDB and the MTA

2013/03/06 3 Comments

Apache Hive is a popular datawarehouse system for Hadoop that allows to run SQL queries on top of Hadoop by translating queries into Map/Reduce jobs. Due to the high latency incurred by Hadoop to execute Map/Reduce jobs, Hive cannot be used in applications that require fast access to data. One common technique is to use Hive to pre-aggregate data logs stored in HDFS and then sync the data to a Datawarehouse.

In this post we’re going to describe how to install Hive and then, as New York City straphangers, we’re going to load subway train movement data from the MTA in HDFS, execute Hive queries to aggregate the number of daily average train movements per line and store the result in MongoDB.
Read more of this post

Filed under hadoop Tagged with aggregation, hadoop, hive, mongodb, mta

Using Hadoop Pig with MongoDB

2013/02/07 6 Comments

In this post, we’ll see how to install MongoDB support for Pig and we’ll illustrate it with an example where we join 2 MongoDB collections with Pig and store the result in a new collection.

Requirements

Download Hadoop 1.1.1 from http://www.apache.org/dyn/closer.cgi/hadoop/common/. Set your PATH to the bin directory in Hadoop.
Download the latest Mongo Java driver from https://github.com/mongodb/mongo-java-driver/downloads and put it in your directory ~/pig_libraries.

Building Mongo Hadoop

We’re going to use the GIT project developed by 10gen but with a slightly modification that we made. Because the Pig language doesn’t support variable that starts with underscore (e.g., _id) which is used in MongoDB, we added the ability to use it by replacing the _ prefix with u__ so _id becomes u__id.

First get the source:

$ git clone https://github.com/darthbear/mongo-hadoop

Compile the Hadoop pig part of it:

$ ./sbt package
$ ./sbt mongo-hadoop-core/package
$ ./sbt mongo-hadoop-pig/package
$ mkdir ~/pig_libraries
$ cp ./pig/target/mongo-hadoop-pig-1.1.0-SNAPSHOT.jar \
./target/mongo-hadoop-core-1.1.0-SNAPSHOT.jar ~/pig_libraries

Running a join query with Pig on MongoDB collections

One of the thing you can’t do in MongoDB is to do a join between 2 collections. So let’s see how we can do it simply with a pig script.
Read more of this post

Filed under Installation Tagged with hadoop, installation, join, mongodb, pig

Chimpler

Playing with Apache Hive, MongoDB and the MTA

Using Hadoop Pig with MongoDB

Requirements

Building Mongo Hadoop

Running a join query with Pig on MongoDB collections

Authors

Websites

Recent Posts

Tweets

Recent Comments

Categories

Archives

Meta

Blog Stats

Chimpler

Playing with Apache Hive, MongoDB and the MTA

Share this:

Using Hadoop Pig with MongoDB

Requirements

Building Mongo Hadoop

Running a join query with Pig on MongoDB collections

Share this:

Authors

Websites

Recent Posts

Tweets

Recent Comments

Categories

Archives

Meta

Blog Stats