Playing with HazelCast, a distributed datagrid on Amazon EC2 with jclouds-cli

datagridHazelcast is an open-source in-memory datagrid that allows to store data in memory distributed across a cluster of servers and to execute distributed tasks. It can be used as an in-memory database that can be queried using SQL-like queries or any filter that you can implement in Java. To prevent data loss, data in memory can be backed by a persistent storage (file, relational database, NoSQL database, …). Data can be persisted synchronously when the data are written to the in-memory database (write through) or asynchronously to batch the writes (write behind).

In applications which are read and write intensive, relying on a relational database server (or a master/slaves configuration) can be very inefficient as it often becomes a bottleneck and a single point of failure. With data in memory, reads and writes are very fast and as data is distributed and replicated there is no single point of failure. Indeed, if we consider a replication factor of 3, we have a primary and 2 backups nodes so if one node of the cluster were to go down, other nodes of the network can take over and get reassigned the data. In the catastrophic event where the database goes down, writes in the cache are queued in a log file so the writes can be persisted in the database once it is backed up.

There are other products offering similar features than Hazelcast:

  • Oracle Coherence: it is a very robust and popular data grid solution used by many financial companies and systems having to deal with a lot of transaction. It also has an active community.
  • VMWare Gemfire: It is used by some financial companies and provides most of the features Coherence has but the community is much smaller so it’s harder to find documentation.
  • GigaSpaces XAP: The system provides a lot of features. It allows among other things to dynamically instantiate services on multiple servers and handles services failover.

In this tutorial we are going to deploy hazelcast on an EC2 cluster. Then we will run some operations in the datagrid and finally we will stop the cluster.

Requirement

To do this tutorial, we would need:

  • An amazon web services account(works with Free Tier)
  • JDK 1.6 or 1.7
  • git to fetch the hazelcast code example
  • Maven (to compile the project)
  • Jclouds-cli (to deploy to the cloud)

Compiling the project

To fetch the project:

$ git clone https://github.com/fredang/hazelcast-jclouds-example.git

The project is divided into 3 modules:

  • common: common classes uses by server and client
  • server: code to start a hazelcast node locally on a ec2 instance
  • client: code to run some operations on the hazelcast cluster

To compile the project, type:

$ cd hazelcast-jclouds-example
$ mvn clean install

Then create the server package that we are going to deploy on the cluster:

$ cd server
$ mvn assembly:single -Daws.accessKey=[AWS_ACCESS_KEY] -Daws.secretKey=[AWS_SECRET_KEY]

Then create the client package that we are going to use to run some operations on the cluster:

$ cd client
$ mvn assembly:single -Daws.accessKey=[AWS_ACCESS_KEY] -Daws.secretKey=[AWS_SECRET_KEY]

Hazelcast

Server

In this tutorial, we are using two distributed maps: person and budget-account. We define them in the file hazelcast.xml. In this config file we also specify the group and the password to use to access to the datagrid. Note that we use the tag <aws enamed=”true”> so that HazelCast can be aware of the nodes in the EC2 cluster by querying the Amazon AWS API. We also define some indexes on the distributed map person to speed up queries.

<hazelcast
	xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-2.0.xsd"
	xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<group>
		<name>dev</name>
		<password>dev-pass</password>
	</group>
	<network>
		<join>
			<tcp-ip conn-timeout-seconds="20" enabled="false">
			</tcp-ip>
			<multicast enabled="false">
				<multicast-group>224.2.2.3</multicast-group>
				<multicast-port>54327</multicast-port>
			</multicast>
			<tcp-ip enabled="false">
				<interface>192.168.1.2</interface>
			</tcp-ip>
			<aws enabled="true">
			    <!-- aws.accessKey and aws.secretKey will be replaced by maven -->
				<access-key>${aws.accessKey}</access-key>
				<secret-key>${aws.secretKey}</secret-key>
				<region>us-east-1</region>
				<security-group-name>jclouds#hazelcast</security-group-name>
			</aws>
		</join>
	</network>
	<map name="budget-account">
		<backup-count>2</backup-count>
		<eviction-policy>NONE</eviction-policy>
	</map>

	<map name="person">
		<backup-count>2</backup-count>
		<eviction-policy>NONE</eviction-policy>

		<indexes>
			<index ordered="false">ssn</index>
			<index ordered="true">age</index>
			<index ordered="false">company.name</index>
			<index ordered="false">company.address.state</index>
			<index ordered="false">address.state</index>
		</indexes>
	</map>

</hazelcast>

The code to start a hazelcast node that will be part of the hazelcast cluster is quite straightforward:

public class NodeStarter {

	public static void main(String args[]) throws Exception {
		Config config = new ClasspathXmlConfig("hazelcast.xml");
		config.setInstanceName(InetAddress.getLocalHost().getHostName());
		Hazelcast.newHazelcastInstance(config);
	}
}

Clients

We wrote some clients that will write some data in those distributed maps, read data, listen to the changes, and run sql-like queries on those.
You can look at the sourcecode in PersonClient.java and AccountClient.java.

To initialize a hazelcast client:

	public static HazelcastInstance initHazelcastClient() throws Exception {
		ClientConfig hazelCastClientConfig = new ClientConfig();
		hazelCastClientConfig.getGroupConfig().setName("dev").setPassword("dev-pass");

		AWSCredentialsProvider awsCredentialProvider = new ClasspathPropertiesFileCredentialsProvider("aws.properties");
		AmazonEC2Client ec2 = new AmazonEC2Client(awsCredentialProvider);
        DescribeAvailabilityZonesResult availabilityZonesResult = ec2.describeAvailabilityZones();
        System.out.println("You have access to " + availabilityZonesResult.getAvailabilityZones().size() +
        		" Availability Zones.");

        DescribeInstancesResult describeInstancesRequest = ec2.describeInstances();

        for (Reservation reservation: describeInstancesRequest.getReservations()) {
        	for(Instance instance: reservation.getInstances()) {
        		for(GroupIdentifier group: instance.getSecurityGroups()) {
        			if (group.getGroupName().equals("jclouds#hazelcast")) {
                        System.out.println("EC2 instance " + instance.getPublicIpAddress());
                		hazelCastClientConfig.addAddress(instance.getPublicIpAddress(),
                				instance.getPublicIpAddress() + ":5701");
        			}
        		}
        	}
        }
        HazelcastInstance hazelCastClient = HazelcastClient.newHazelcastClient(hazelCastClientConfig);

        return hazelCastClient;
	}

We use the Amazon AWS api to get the list of instances belonging to the hazelcast group. Then we use the ip addresses of those instances to initialize the hazelcast client.

To write a data in the distributed map, the code is pretty straightforward:

HazelcastInstance instance = initHazelcastClient();
IMap<String, Person> personMap = instance.getMap("person");

Person person = new Person();
person.setAccountId("chimpler");
person.setBudget(123.45);
personMap.add(person.getAccountId(), person);

To get a data from the distributed map:

HazelcastInstance instance = initHazelcastClient();
IMap<String, Person> personMap = instance.getMap("person");
Person person = personMap.get("chimpler");

To run a query on the distributed map:

HazelcastInstance instance = initHazelcastClient();
IMap<String, Person> personMap = instance.getMap("person");
Collection<Person> persons = personMap.values(new SqlPredicate("address.state=NY AND age > 20"));

Initializing the EC2 cluster

To initialize the EC2 cluster, we are using jclouds-cli to communicate with cloud providers(Amazon EC2, Rackspace, …) to instantiate and configure servers.

In this tutorial, we will use jclouds-cli in interactive mode. We’ll show in the last section how to use it in non-interactive mode (useful for scripting).

Download jclouds-cli version 1.5.3 instead of the latest version 1.5.7_1 as the command to run a script on a group of nodes is not working.

Decompress the archive:

$ tar xvfz Downloads/jclouds-cli-1.5.3.tar.gz

Now we need to set some environment variables to run jclouds-cli:

$ export JCLOUDS_COMPUTE_PROVIDER="aws-ec2"
$ export JCLOUDS_COMPUTE_IDENTITY=[AWS_ACCESS_KEY]
$ export JCLOUDS_COMPUTE_CREDENTIAL=[AWS_SECRET_KEY]
$ export JCLOUDS_BLOBSTORE_PROVIDER="aws-s3"
$ export JCLOUDS_BLOBSTORE_IDENTITY=[AWS_ACCESS_KEY]
$ export JCLOUDS_BLOBSTORE_CREDENTIAL=[AWS_SECRET_KEY]

$ cd jclouds-cli-1.5.3
$ bin/jclouds-cli

We also need to setup a rsa key on your local machine so you’ll be able to ssh to the ec2 instances:

$ ssh-keygen -t rsa -P ''

Save the key in the default location by pressing enter.

Run the following command to create 2 micro-instances on EC2 with the group hazelcast:

jclouds> node-create --hardwareId t1.micro --imageId us-east-1/ami-0145d268 --adminAccess --locationId us-east-1 hazelcast 2

You can check that the nodes are up by typing:

jclouds> node-list
[id] [location] [hardware] [group] [status]
us-east-1/i-36f62545 us-east-1d t1.micro hazelcast RUNNING
us-east-1/i-166ebd65 us-east-1c t1.micro hazelcast RUNNING

When creating instances, jclouds-cli also configure the ssh access of those instances so you can connect to them from your machine(and only from your machine). Also with the option –adminAccess you can sudo on those instances without password. To know the ip address of an ec2 instance simply type:

jclouds> node-info us-east-1/i-36f62545

To install HazelCast on both nodes, we are going to upload our HazelCast jclouds example server package to s3 and make all the ec2 instances download the file from s3.

If you don’t have a bucket in s3, you can create one by typing (you have to choose a bucket name which is not used by someone else, otherwise you’ll get an error):

jclouds> blobstore-create [BUCKET_NAME]

And then upload hazelcast to S3, the file in this bucket will be private and will not be accessible by other users:

jclouds> blobstore-write [BUCKET_NAME] hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar [HAZELCAST_JCLOUDS_EXAMPLE_DIR]/server/target/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar

Then we need to install the amazon s3 tools on all the instances:

jclouds> group-runscript -d 'sudo nohup apt-get -qy install libnet-amazon-s3-tools-perl' hazelcast

We have to use nohup with sudo otherwise the apt-get will not be executed (there is some issues with the file descriptors closed by sudo).

Now we can get the hazelcast jclouds example server package from the cloud:

jclouds> group-runscript -d 's3get --access-key [ACCESS_KEY] --secret-key [SECRET_KEY] [BUCKET_NAME]/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar > ~/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar' hazelcast

In order to run it, we need to install JRE 1.7:

jclouds> group-runscript -d 'sudo nohup apt-get update' hazelcast
jclouds> group-runscript -d 'sudo nohup apt-get -y install openjdk-7-jre' hazelcast

Now we can run the hazelcast nodes:

jclouds> group-runscript -d 'java -jar ~/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar &' hazelcast

Running Hazelcast clients

First, we would need to open the port 5701 of the EC2 instances so they can be accessed by other machines outside the cluster. To do so, go to the amazon EC2 console at https://console.aws.amazon.com/ec2/home?region=us-east-1#s=SecurityGroups

Then select the group jclouds#hazelcast and in the bottom pane, select Inbound and define the following rule:

Create a new rule: Custom TCP rule
Port range:  5701
Source: 0.0.0.0/0

And click on “Apply Rule Changes”.

We can now run an hazelcast client on our local machine that will communicate with the Hazelcast EC2 cluster.
Note that those examples don’t show the real speed of HazelCast. Everytime we run a command, we have to do an API call to amazon AWS to get the list of hazelcast instances. Then we need to connect to the hazelcast cluster, then we execute the command, and finally we close the connection to the HazelCast datagrid. In real world applications, the initialization will happen only once.

Account Client

This client uses the budget-account distributed map.
This map associate an account id to a BudgetAccount.

BudgetAccount:
    - accountId(string)
    - budget(double)

You can start by running a listener(to see what happens in the distributed map budget-account):

$ cd client
$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.AccountClient listen

In another shell, you can add some money to an account:

$ cd client
$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.AccountClient add-money chimpler 100

And run a continuous spender on this account:

$ cd client
$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.AccountClient continuous-spend chimpler

You can check in the shell that ran the listener that the account is updated.

You can look at other possible actions you can do with this client by typing:

$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.AccountClient

Person client

This client uses the person distributed map. It is used to test sql-like queries on the distributed map.
The distributed map person associate to a SSN string a Person object.

Person:
- ssn(string)
- firstName(string)
- lastName(string)
- age(integer)
- company
    - name(string)
    - address
        - line(string)
        - city(string)
        - state(string)
- address
    - line(string)
    - city(string)
    - state(string)

First we need to add some fake data to the person map:

$ cd client
$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.PersonClient add-random-data 100

Now you can run a query to get all persons older than 80 years:

$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.PersonClient query "age > 80"

Or get all persons living in the New York State:

$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.PersonClient query "address.state=NY"

Or get all persons living in the New York State and working in the Illinois State:

$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.PersonClient query "address.state=NY AND company.address.state=IL"

You can use the ‘LIKE’ operator:

$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.PersonClient query "company.name LIKE '%Sirius%'"

You can look at other actions that you can run with the PersonClient by typing:

$ java -cp target/hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar com.chimpler.example.hazelcast.PersonClient

For more information on the queries you can do with Hazelcast, look at the Distributed Query page on Hazelcast Wiki.

When you are done with your tests, you can stop the nodes:

jclouds> node-destroy-all

And delete the file in S3:

jclouds> blobstore-delete [BUCKET_NAME] hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar

Deploying in non interactive mode

Instead of using the program jclouds-cli, you can use the program jclouds to run commands in non interactive mode. It’s very useful if you want to integrate those commands in a shell script. We wrote below the same commands that we used before but with jclouds:

export JCLOUDS_COMPUTE_PROVIDER="aws-ec2"
export JCLOUDS_COMPUTE_IDENTITY=[AWS_ACCESS_KEY]
export JCLOUDS_COMPUTE_CREDENTIAL=[AWS_SECRET_KEY]

export JCLOUDS_BLOBSTORE_PROVIDER="aws-s3"
export JCLOUDS_BLOBSTORE_BUCKET=[BUCKET_NAME]
export JCLOUDS_BLOBSTORE_IDENTITY=[AWS_ACCESS_KEY]
export JCLOUDS_BLOBSTORE_CREDENTIAL=[AWS_SECRET_KEY]

# start 2 ec2 instances
jclouds node create --hardwareId t1.micro --imageId us-east-1/ami-0145d268 --adminAccess --locationId us-east-1 hazelcast 2

# create a bucket (if you don't have one yet)
jclouds blobstore create $JCLOUDS_BLOBSTORE_BUCKET

# copy server package to amazon s3
jclouds blobstore write $JCLOUDS_BLOBSTORE_BUCKET/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar [HAZELCAST_JCLOUDS_EXAMPLE_DIR]/server/target/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar

# install amazon s3 tools on the ec2 instances
jclouds group runscript "-d 'sudo nohup apt-get -qy install libnet-amazon-s3-tools-perl' hazelcast"

# copy hazelcast server package from s3 to ec2 instances
jclouds group runscript "-d 's3get --access-key $JCLOUDS_BLOBSTORE_IDENTITY --secret-key $JCLOUDS_BLOBSTORE_CREDENTIAL $JCLOUDS_BLOBSTORE_BUCKET/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar > ~/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar' hazelcast"

# install java on ec2 instances
jclouds group runscript "-d 'sudo nohup apt-get -y update' hazelcast"
jclouds group runscript "-d 'sudo nohup apt-get -y install openjdk-7-jre' hazelcast"

# run hazelcast nodes
jclouds group runscript "-d 'java -jar ~/hazelcast-jclouds-example-server-1.0-jar-with-dependencies.jar &' hazelcast"

# stop hazelcast nodes
jclouds group runscript -d "'killall -9 java' hazelcast"

# stop the cluster
jclouds node destroy-all

# delete the files in amazon s3
jclouds blobstore delete $JCLOUDS_BLOBSTORE_BUCKET hazelcast-jclouds-example-client-1.0-jar-with-dependencies.jar

Conclusion

We showed in this tutorial how to use jclouds-cli to deploy HazelCast on Amazon EC2 and how to use a distributed map on HazelCast. For simplicity we didn’t mention other features that Hazelcast offers, in addition to distributed maps, it supports distributed queues, topics, locks, atomic numbers and many other data structures. It can also run distributed tasks and integrates well with application frameworks: Spring, JBoss, … It is fault tolerant and allows at any time any nodes to dynamically join or leave the cluster. This is pretty useful when we want to spin off more EC2 instances when the cluster load is high and kill some instances when the load is low. For more information about HazelCast, you can check their documentation on their website.

Advertisements

About chimpler
http://www.chimpler.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: