Playing with HazelCast, a distributed datagrid on Amazon EC2 with jclouds-cli

datagridHazelcast is an open-source in-memory datagrid that allows to store data in memory distributed across a cluster of servers and to execute distributed tasks. It can be used as an in-memory database that can be queried using SQL-like queries or any filter that you can implement in Java. To prevent data loss, data in memory can be backed by a persistent storage (file, relational database, NoSQL database, …). Data can be persisted synchronously when the data are written to the in-memory database (write through) or asynchronously to batch the writes (write behind).

In applications which are read and write intensive, relying on a relational database server (or a master/slaves configuration) can be very inefficient as it often becomes a bottleneck and a single point of failure. With data in memory, reads and writes are very fast and as data is distributed and replicated there is no single point of failure. Indeed, if we consider a replication factor of 3, we have a primary and 2 backups nodes so if one node of the cluster were to go down, other nodes of the network can take over and get reassigned the data. In the catastrophic event where the database goes down, writes in the cache are queued in a log file so the writes can be persisted in the database once it is backed up.

There are other products offering similar features than Hazelcast:

  • Oracle Coherence: it is a very robust and popular data grid solution used by many financial companies and systems having to deal with a lot of transaction. It also has an active community.
  • VMWare Gemfire: It is used by some financial companies and provides most of the features Coherence has but the community is much smaller so it’s harder to find documentation.
  • GigaSpaces XAP: The system provides a lot of features. It allows among other things to dynamically instantiate services on multiple servers and handles services failover.

In this tutorial we are going to deploy hazelcast on an EC2 cluster. Then we will run some operations in the datagrid and finally we will stop the cluster.

Read more of this post

Deploying Hadoop on EC2 with Whirr

Apache Whirr is a set of tools to deploy cloud services. It can be used on Amazon Elastic Cloud(EC2), Rackspace Cloud and many other cloud providers.

Requirement

You need to have an account on Amazon EC2. If you don’t have an account yet, that’s a good news because you are eligible for the AWS Free Tier (750 hours of cloud computing per month for free for 12 month). In the example below, we are using micro instances so you are not going to pay anything (up to 750 hours) with the free tier plan.

Make sure that you have Java JDK 6 or 7 installed on your machine.

Installation

You can download whirr at http://www.apache.org/dyn/closer.cgi/whirr/

Uncompress the archive:

tar xvfz whirr-0.8.1.tar.gz

Now we are going to write a config file to tell whirr how to deploy hadoop on amazon ec2. Create the file ~/hadoop-ec2.properties with the following content:
Read more of this post