Analyzing your audience location with Twitter Streams and Heat Maps

With the democratization of GPS and IP geolocation in portable devices (laptop, tablet, phone, Internet of things, …), more and more data containing geolocation information become available. Geolocation is now used by most of the main web applications to improve their services. For instance social network, transport network company or dating sites can use your instant location to show potential matches around you. Search engines can provide more personalized search result based on your location and ads network to better target their audience. With this geolocated data available in realtime, some applications such as Swarm, FourSquare are now allowing to be notified of friends coming nearby or events happening in their neighborhood.

In this post we will describe how to listen to tweet streams and represent their positions on a world map.

Introduction

A geo location is described by three values:

  • the latitude: the angular distance of the place to the earth’s equator (range from -90 to 90 degrees)
  • the longitude: the angular distance of the place to the greenwish meridian (range from -180 to 180 degrees)
  • the elevation: height above sea level

To represent the Earth’s surface on a two dimensional plane, we can use different map projections (Mercator, Tissot, …), each having their own advantages and drawbacks in term of distance, area and angle distortions.

We are going to use the equirectangular projection (also known as Platte Carrée) which is quite popular because of its simplicity. On a 2D map, the x-coordinate position is proportional to the longitude and the y-coordinate position to the latitude.

cartesian2d

To represent a point from its latitude and longitude to the screen or an image:

  • PIXEL\_X = IMAGE\_WIDTH * \frac{LONGITUDE}{360} + \frac{IMAGE\_WIDTH}{2}
    Note that we divide the longitude by 360 because the longitude ranges from -180 to 180. And we multiply it by IMAGE_WIDTH to scale it to the image width. Finally we add IMAGE_WIDTH / 2  because the (0, 0) pixel on an image is on the top-left corner.
  • PIXEL\_Y = - IMAGE\_HEIGHT * \frac{LATITUDE}{180} + \frac{IMAGE\_HEIGHT}{2}
    Note that unlike the computation of X, we have a minus sign at the beginning of the formula because in an image, the y axis is oriented from top to bottom.

 

Requirements

To run the code on this project, you would need

  • scala (> 2.10)
  • sbt (to build the application)
  • git
  • a twitter account and api keys (we will describe how to get them)

To get the code used in this blog, type:

git clone https://github.com/chimpler/tweet-heatmap.git

Getting geolocalized tweets

Twitter provides an API to continuously listen to a stream of tweets. In order to use the API, you need a twitter api keys and access tokens.

To get those, log in on https://apps.twitter.com/

Click on “Create New App”. Fill in the Name, Description, Website, Callback URL and click on “Create your twitter application”.

Now go to the “API Keys” tab and click on “Create my access token”

To run the program to listen to a tweet stream and write the tweets to disk, we would need to create the file twitter-credentials.txt on your disk with the following content:

TWITTER_API_KEY=<API KEY>
TWITTER_API_SECRET=<API SECRET>
TWITTER_ACCESS_TOKEN=<ACCESS TOKEN>
TWITTER_ACCESS_TOKEN_SECRET=<ACCESS TOKEN SECRET>

The program FetchTweets connects to twitter and then listens to a tweet streams and save the tweets in a file as they are received:


object FetchTweets extends App {
  if (args.length < 2) {
    sys.error("Arguments: <credential_file> <output_file> <keywords>")
  }

  val properties = new Properties()
  properties.load(new FileInputStream(args(0)))

  val apiKey = properties.getProperty("TWITTER_API_KEY").trim
  val apiSecret = properties.getProperty("TWITTER_API_SECRET").trim
  val accessToken = properties.getProperty("TWITTER_ACCESS_TOKEN").trim
  val accessTokenSecret = properties.getProperty("TWITTER_ACCESS_TOKEN_SECRET").trim

  val twitterConfig = new twitter4j.conf.ConfigurationBuilder()
    .setOAuthConsumerKey(apiKey)
    .setOAuthConsumerSecret(apiSecret)
    .setOAuthAccessToken(accessToken)
    .setOAuthAccessTokenSecret(accessTokenSecret)
    .build

  val twitterStream = new TwitterStreamFactory(twitterConfig).getInstance()

  val outputFile = args(1)
  val fileWriter = new FileWriter(outputFile)

  val geoStatusListener = new GeoStatusListener(fileWriter)
  twitterStream.addListener(geoStatusListener)

  val queryKeywords = args.slice(2, args.length)

  var query = new FilterQuery()

  // with the twitter api, we cannot do filter by both location and keyword
  if (queryKeywords.isEmpty) {
    // cover all the globe so we only get tweets with geolocation information
    val locations = Array(Array(-180d, -90d), Array(180d, 90d))
    query = query.locations(locations)
  } else {
    query = query.track(queryKeywords)
  }

  twitterStream.filter(query)
}

class GeoStatusListener(writer: Writer) extends StatusListener() {
  def onStatus(status: Status) {
    val geoLocation = status.getGeoLocation
    if (geoLocation != null) {
      val text = status.getText.replaceAll("[\r\n]", " ")
      val line = s"${geoLocation.getLatitude},${geoLocation.getLongitude},$text\n"
      print(line)
      writer.write(line)
      writer.flush()
    }
  }

  def onDeletionNotice(statusDeletionNotice: StatusDeletionNotice) {}
  def onTrackLimitationNotice(numberOfLimitedStatuses: Int) {}
  def onException(ex: Exception) {}
  def onScrubGeo(arg0: Long, arg1: Long) {}
  def onStallWarning(warning: StallWarning) {}
}

You can run the following command to get the tweets having a word related to a drink:

sbt "run-main com.chimpler.example.twitter.FetchTweets twitter-credentials.txt tweets.csv \
	redbull schweppes coke cola pepsi fanta orangina soda \
	coffee cafe expresso latte tea \
	alcohol booze alcoholic whiskey tequila vodka booze cognac baccardi \
	drink beer rhum liquor gin ouzo brandy mescal alcoholic wine drink"

When you have enough tweets, stop the program by pressing CTRL + C

Unfortunately only a fraction of all the tweets have geolocation information(the publisher has to tweet from a phone and has to opt in to send its position). So you might need to wait several hours (even days if the words are not popular) to get enough tweets to draw on a map. For your convenience, we provide the file tweets_drink.csv which already contains the tweets that we collected using those keywords.

The tweet file contains on each line, the tweet latitude,  longitude and message:

14.653564,121.034568,Chilling (@ The Coffee Bean &amp; Tea Leaf - @cbtlph) http://t.co/jVTcLv1OAI http://t.co/eGo7mqIgYP
43.589288,-116.379348,That's cute, I remember when I had my first beer.
47.261988,-122.62807,Fire and wine on my last night in Washington. http://t.co/ItdkV420Ov
3.133039,101.687747,I'm at Coffee Planet (KL Sentral, Kuala Lumpur) http://t.co/8JYLsXfVqu
-6.88755,107.5781,lewat aja (with wansa at Take Ichi japanese cafe) — https://t.co/YE17FAQwTu
14.558974,121.006336,Afternoon...one gloomy aftie deserves a cool drink with a kick! #happysunday http://t.co/n8IJm99SKy

Drawing tweets on a map

To draw the tweets on the map, we just need to convert the latitude and longitude coordinates into pixel coordinates:

  def toImageCoordinates(latitude: Double, longitude: Double, imageWidth: Int, imageHeight: Int): (Int, Int) = {
    (
      (imageWidth * (0.5 + longitude / 360)).toInt,
      (imageHeight * (0.5 - latitude / 180)).toInt
      )
  }

To generate an image, the program GenerateMap will do the following:

  • filter out the tweets matching one of the keyword
  • draw a satellite image of the earth as the background. Apply a dark filter.
  • For each tweets convert the latitude, longitude coordinate into pixel coordinate and draw them on the map
To run the program:
sbt "run-main com.chimpler.example.twitter.GenerateMap tweets_drink.csv /tmp/map_drinks.png coke"

The arguments are:

  • tweet filename
  • image ouput filename
  • keyword list filter (only tweets containing at least one of those keyword are drawn on the map)

map_drinks

However with this map, it can be difficult to clearly see the density of the tweets in the image.

Drawing the heat map

We implemented a simple heat map algorithm that computes for each pixel of the image, the sum of heat it receives from each tweets. The energy being inversely proportional to the square of the distance:

pixel\_heat(x, y) = \sum_{t \in tweets}\frac{1}{(t.x - x)^2 + (t.y - y)^2}

  private def computeHeat(x: Int, y: Int, tweetGeos: mutable.Buffer[(Double, Double)], imageWidth:Int, imageHeight: Int): Double = {
    var heat = 0.0d
    val intensity = 1.0d
    val maxDistance = imageWidth / 5
    for (tweetGeo <- tweetGeos) {
      // we compute the cartesian distance of 2 geo points
      // this formula does not compute the real distance but a approximate distance
      val (tweetX, tweetY) = Utils.toImageCoordinates(tweetGeo._1, tweetGeo._2, imageWidth, imageHeight)
      //if the tweets are too far from the pixel, we can skip them to save some CPU
      if (Math.abs(tweetX - x) < maxDistance &amp;&amp; Math.abs(tweetY - y) < maxDistance) {
        val distanceSquare = ((tweetX - x) * (tweetX - x)) + ((tweetY - y) * (tweetY - y))
        if (distanceSquare == 0) {
          heat += intensity
        } else {
          heat += intensity / distanceSquare
        }
      }
    }
    heat
  }

  def computeHeatMap(tweetGeos: mutable.Buffer[(Double, Double)], imageWidth:Int, imageHeight: Int) = {
    val imageHeatMap = Array.ofDim[Double](imageWidth, imageHeight)
    for(x <- 0 until imageWidth ; y <-0 until imageHeight) {
      val heat = computeHeat(x, y, tweetGeos, imageWidth, imageHeight)
      imageHeatMap(x)(y) = heat
    }
    imageHeatMap
  }

The heat value can vary a lot depending on the total number of tweets and the way they are clustered together. So we allow the user to define a maximum heat. For instance, in this example below, we use a maximum heat of 0.5. So if the pixel heat is greater than 0.5, we consider it to be 0.5. Then we normalize the heat:

normalized\_heat = \frac{max(heat, max\_heat)}{max\_heat}

Then we associate to each heat value a different color. In this example it goes from black(0) to green(1.0).

  private class SimpleColorizer(red: Int, green: Int, blue: Int) extends Colorizer {
    override def getColor(weight: Double): Color = {
      val r = (red * weight).toInt
      val g = (green * weight).toInt
      val b = (blue * weight).toInt

      new Color(
        normalizeColorComponent(r),
        normalizeColorComponent(g),
        normalizeColorComponent(b),
        200
      )
    }
  }

To generate the tweet map, type:

sbt "run-main com.chimpler.example.twitter.GenerateHeatMap tweets_drink.csv /tmp/heatmap_coke.png green 0.5 coke"

The arguments are:

  • tweet filename
  • image output filename
  • color scheme(see below)
  • max heat
  • keyword list

heatmap_coke

With the heatmap, we can more clearly see where most of the tweets containing the work ‘coke’ are coming from.

We have implemented different color scheme, you can see them in the file Colorizer.scala:

  • red
  • green
  • blue
  • yellow
  • multi

You may have seen some heatmaps using red, yellow, green and blue colors to represent different heat range. We have implemented it in the multi color scheme (you can look at the class MultiColorColorizer):

  • from 0 to 0.25 => blue
  • from 0.25 to 0.50 => yellow
  • from 0.50 to 0.75 => green
  • from 0.75 to 1 => red

sbt "run-main com.chimpler.example.twitter.GenerateHeatMap tweets_drink.csv /tmp/heatmap_coke_multi.png green 0.5 coke"

heatmap_coke_multi

 

Last but not least, it can be interesting to compare the location of tweets containing one word over those containing another word, for instance to see the distribution of brands across the world.

We can show on a map the tweets containing the word ‘coke’ versus the tweets containing the word ‘pepsi’:

sbt "run-main com.chimpler.example.twitter.GenerateMultiHeatMap tweets_drink.csv /tmp/heatmap_coke.png 0.5 coke pepsi"

heatmap_coke_pepsi

On this map, we can clearly see that the word ‘coke'(in green) is used much more than the word ‘pepsi'(in red) in the tweets. There are some places which are yellow, that means that there are tweets on coke and tweets on pepsi (yellow = red + green). Interestingly enough, we can see that coke is not used much in South America unlike pepsi which is used in Brazil and in Argentina.

Conclusion

We have seen how the tweeter streaming API works and how to convert a geolocation into a pixel coordinate. Representing data on a map can bring some challenges as points can be grouped very tightly together and we want to convey this information visually. Using the heat map representation can be one way to achieve that. Other techniques such as clustering can allow to automatically find hot spots. We will study those other techniques in future posts.

FAQ

How can I get the latitude, longitude of a city?

Just ask Google: New York coordinates

newyork_coordinates

 

How can I know what is at a given latitude and longitude?

Again, just ask Google: 48.8567° N, 2.3508° E

How can I find map images that uses the equirectangular projection?

You can search them by using Google Image: Equirectangular projection map

What is the file tweets_tech.csv?

This file was generated by listening to a set of keywords related to the tech world by using the following command:

sbt "run-main com.chimpler.example.twitter.FetchTweets twitter-credentials.txt tweets_tech.csv \
	android samsung htc motorola acer nokia apple iphone ipad tablet phablet  microsoft surface \
	windows blackberry kindle linux macbook ibm lenovo dell inspiron motox motog google htcone \
	facebook bing firephone fitbit jawbone fuelband ebay amazon yahoo linkedin tumblr itunes appstore"

You can use this data set to see the presence of the brands all over the world.

Can I use other formulas for the heat calculation?

We are using the inverse distance weighting formula, you can customize it to fit your needs.

Related Articles

Advertisements

About chimpler
http://www.chimpler.com

One Response to Analyzing your audience location with Twitter Streams and Heat Maps

  1. Pingback: Segmenting Audience with KMeans and Voronoi Diagram using Spark and MLlib | Chimpler

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: