linux | Chimpler

Installing and comparing MySQL/MariaDB, MongoDB, Vertica, Hive and Impala (Part 1)

2013/05/10 25 Comments

A common thing a data analyst does in his day to day job is to run aggregations of data by generally summing and averaging columns using different filters. When tables start to grow to hundreds of millions or billions of rows, these operations become extremely expensive and the choice of a database engine is crucial. Indeed, the more queries an analyst can run during the day, the better he can be at understanding the data.

In this post, we’re going to install 5 popular databases on Linux Ubuntu (12.04):

MySQL / MariaDB 10.0: Row based database
MongoDB 2.4: NoSQL database
Vertica Community Edition 6: Columnar database (similar to Infobright, InfiniDB, …)
Hive 0.10: Datawarehouse built on top of HDFS using Map/Reduce
Impala 1.0: Database implemented on top of HDFS (compatible with Hive) based on Dremel that can use different data formats (raw CSV format, Parquet columnar format, …)

Then we’ll provide some scripts to populate them with some test data, run some simple aggregation queries and measure the response time. The tests will be run on only one box without any tuning using a relatively small dataset (160 million rows) but we’re planning on running more thorough tests in the cloud later with much bigger datasets (billions of rows). This is just to give a general idea on the performance of each of the database.
Read more of this post

Filed under hadoop Tagged with columnar, comparison, dremel, hadoop, hive, impala, installation, linux, mariadb, mysql, parquet, ubuntu, vertica

Installing Storm on Ubuntu

2013/01/25 1 Comment

Storm is an open source ETL created by Nathan Marz in late 2011. Unlike Hadoop where data are processed offline in big batches, Storm takes another approach by aggregating streaming data on the fly so that aggregated data are immediately available. It is scalable, fault tolerant (no data loss guarantee) and the benchmarks showed that every node can process over a million tuples per seconds.

We describe below the different steps to install Storm in Ubuntu Linux describing the issues we had during the process.
Read more of this post

Filed under Installation Tagged with bigdata, etl, installation, linux, realtime, storm, ubuntu

Chimpler

Installing and comparing MySQL/MariaDB, MongoDB, Vertica, Hive and Impala (Part 1)

Installing Storm on Ubuntu

Authors

Websites

Recent Posts

Tweets

Recent Comments

Categories

Archives

Meta

Blog Stats

Chimpler

Installing and comparing MySQL/MariaDB, MongoDB, Vertica, Hive and Impala (Part 1)

Share this:

Installing Storm on Ubuntu

Share this:

Authors

Websites

Recent Posts

Tweets

Recent Comments

Categories

Archives

Meta

Blog Stats