Playing with OpenCV in Scala to do face detection with Haarcascade classifier using a webcam

2014/11/18 Leave a comment

Detecting objects in images has been used in many applications: auto tagging pictures (e.g. Facebook, Phototime), counting the number of people in a street(e.g. Placemeter), classifying pictures, … And even in Bistro, a device to feed cats.

This post is a small introduction to OpenCV an open source computer vision library using scala.
The OpenCV library provides several features to manipulate images(apply filters, transformation), detect faces and recognize faces in images.
In this post, we are going to implement two small scala programs:

read an image and run the Haar cascade classifier to detect the faces in the image
use the webcam and detect faces in real time

To detect faces, we are using the Haar Feature-based Cascade Classifier which is a classifier to detect objects in an image. You can find a good introduction on the openCV website and on the Facial Recognition youtube video by Tom Neumark

Note that detecting faces and recognizing faces(who?) are two different problems and use two different approaches. In this post we are going to only look at face detection.

Prerequisites

scala
sbt

You can fetch the code used in this post on github:

git clone https://github.com/chimpler/blog-scala-javacv.git

Detecting faces in an image

In this section we are going to detect the faces in the skyfall movie cast picture below by using a Haar Cascade classifier:

We start by reading the image using opencv:
Read more of this post

Filed under machine learning, opencv Tagged with opencv

Classifiying documents using Naive Bayes on Apache Spark / MLlib

2014/06/11 4 Comments

In recent years, Apache Spark has gained in popularity as a faster alternative to Hadoop and it reached a major milestone last month by releasing the production ready version 1.0.0. It claims to be up to a 100 times faster by leveraging the distributed memory of the cluster and by not being tied to the multi stage execution of Map/Reduce. Like Hadoop, it offers a similar ecosystem with a database (Shark SQL), a machine learning library (MLlib), a graph library (GraphX) and many other tools built on top of Spark. Finally Spark integrates well with Scala and one can manipulate distributed collections just like regular Scala collections and Spark will take care of distributing the processing to the different workers.

In this post, we describe how we used Spark / MLlib to classify HTML documents using the popular Reuters 21578 collection of documents that appeared on Reuters newswire in 1987 as a training set.
Read more of this post

Filed under machine learning, scala, spark Tagged with machine learning, mllib, scala, spark

Using the Mahout Naive Bayes Classifier to automatically classify Twitter messages (part 2: distribute classification with hadoop)

2013/06/24 12 Comments

In this post, we are going to categorize the tweets by distributing the classification on the hadoop cluster. It can make the classification faster if there is a huge number of tweets to classify.

To go through this tutorial you would need to have run the commands in the post Using the Mahout Naive Bayes Classifier to automatically classify Twitter messages.

To distribute the classification on the hadoop nodes, we are going to define a mapreduce job:

the csv containing the tweets to classify is split into several chunks
each chunk is sent to the hadoop node that will process it by running the map class
the map class loads the naive bayes model and some document/word frequency into memory
for each tweet of the chunk, it computes the best matching category. The result is written in the output file. We are not using a reducer class as we don’t need to do aggregations.

To download the code used in this post, you can fetch it from github:

$ git clone https://github.com/fredang/mahout-naive-bayes-example2.git

To compile the project:

$ mvn clean package assembly:single

Finding association rules with Mahout Frequent Pattern Mining

2013/05/02 21 Comments

Association Rule Learning is a method to find relations between variables in a database. For instance, using shopping receipts, we can find association between items: bread is often purchased with peanut butter or chips and beer are often bought together. In this post, we are going to use the Mahout Frequent Pattern Mining implementation to find the associations between items using a list of shopping transactions. For details on the algorithms(apriori and fpgrowth) used to find frequent patterns, you can look at “The comparative study of apriori and FP-growth algorithm” from Deepti Pawar.

EDIT 2014-01-08: updated link to data sample marketbasket.csv (old link was dead). Corrected lift computation. Thanks Felipe F. for pointing the error in the formula.
Read more of this post

Filed under machine learning, mahout Tagged with association rules, frequent pattern mining, machine learning, mahout

Generating EigenFaces with Mahout SVD to recognize person faces

2013/04/17 11 Comments

In this tutorial, we are going to describe how to generate and use eigenfaces to recognize people faces.
Eigenfaces are a set of eigenvectors derived from the covariance matrix of the probability distribution of the high-dimensional vector space of possible faces of human beings. It can be used to identify a face on a picture from a person face database very quickly. In this post, we’ll not give much details on the mathematical aspects but if you are interested on those, you can look at the excellent post Face Recognition using Eigenfaces and Distance Classifiers: A Tutorial from the Onionesque Reality Blog.

Using the Mahout Naive Bayes Classifier to automatically classify Twitter messages

2013/03/13 118 Comments

mahout2 Classification algorithms can be used to automatically classify documents, images, implement spam filters and in many other domains. In this tutorial we are going to use Mahout to classify tweets using the Naive Bayes Classifier. The algorithm works by using a training set which is a set of documents already associated to a category. Using this set, the classifier determines for each word, the probability that it makes a document belong to each of the considered categories. To compute the probability that a document belongs to a category, it multiplies together the individual probability of each of its word in this category. The category with the highest probability is the one the document is most likely to belong to.

To get more details on how the Naive Bayes Classifier is implemented, you can look at the mahout wiki page.

This tutorial will give you a step-by-step description on how to create a training set, train the Naive Bayes classifier and then use it to classify new tweets.

Chimpler

Playing with OpenCV in Scala to do face detection with Haarcascade classifier using a webcam

Prerequisites

Detecting faces in an image

Classifiying documents using Naive Bayes on Apache Spark / MLlib

Using the Mahout Naive Bayes Classifier to automatically classify Twitter messages (part 2: distribute classification with hadoop)

Finding association rules with Mahout Frequent Pattern Mining

Generating EigenFaces with Mahout SVD to recognize person faces

Using the Mahout Naive Bayes Classifier to automatically classify Twitter messages

Authors

Websites

Recent Posts

Tweets

Recent Comments

Categories

Archives

Meta

Blog Stats

Prerequisites

Detecting faces in an image

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Authors

Websites

Recent Posts

Recent Comments

Categories

Archives

Meta

Blog Stats