Build Hadoop Cluster with 5 clicks

This is the first post of a new series about BigData cluster for OSX and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster
  3. The Next Version of PocketCluster

* update: Dec 5, 2015. PocketCluster is updated to 0.1.3 for Apache Spark!
** update: Oct 2, 2017. The next version of PocketCluster is coming soon!

It was back in February of this year when the first set of Raspberry PI 2 (henceforth RPI2) cluster was built and Apache Spark ran on it. Back then, it was much of an experiment and fun since only handful of folks did give a second thought to putting RPI in that sort of use.

After opening up RPI2 cluster case (part 1, part 2, open schematic), things have changed a tiny bit. Many had good laughter on the cluster, and some were enthusiastic about the possibilities the cluster had.

In fact, Makezine and MagPi (pg #28) have covered the case, and many have retweeted and downloaded the schematics. Nonetheless, one question rang so strong and motivated me to finalize what I started back in July.

The question indicates he seems to believe there is no clear ways to put the power of multiple RPI2 in good use. I, for one, was a bit shocked since there was indeed a good use case in front of me. If I may be so daring, I’d like to change the viewpoint just a bit by presenting a dead-simple way to build a BigData cluster on a MacBook with RPI2s for experiments and experiences.

Let’s firstly go through existing solutions. We don’t want to reinvent wheel. One can easily point that we already have Cloudera, Hortonworks, MapR, and Apache Ambari to automate the tedious setup process.

I would agree on a specific domain. Say we are working on multiple racks of powerful datacenter grade nodes after nodes, and we need an enterprise level support. Then they are definitely the answer. I would encourage to click the link. They are, however, designed to work on powerful machines, not like a MacBook or RPI2s.

Another can give us multiple links of how to install Hadoop and Spark with 5 seconds of google search. Here is actually one you can try on RPI2. (Thank you Jonas!) It wouldn’t be much of fun if you’d have to open up lots of text files after text files on RPI2 after RPI2.

Shouldn’t there be something that just works, is lightweight, and gets the tedious installation process out of your way when you just want to play with a BigData cluster on a MacBook and RPI2s?


Screen Shot 2015-10-21 at 8.37.46 PM

Here comes PocketCluster (ver 0.1.3).

It builds you a Hadoop Cluster with 3 slave nodes for Vagrant/VirtualBox, and up to 6 slave nodes for Raspberry PI 2 within 5 mouse clicks. (Yes, PocketCluster builds you Hadoop cluster on two different platforms.) It takes no command line configuration to install and run.

Seeing is believing. Watch the videos below and chill with me.

Vagrant Cluster (3-Nodes)

Vagrant Cluster is designed to work without a single Raspberry PI 2. PocketCluster will create 3 slave nodes based on Vagrant + VirtualBox, and utilize OSX as master. This variation is there for you who have a MacBook and want to carry a multi-nodes environment all the time. By the way, you need at least 3 GB of Memory and 9 GB of free disk space. (A Mac with at least 8 GB of Memory is recommended.)

Install the following pre-requisites first, and make sure remote login service is enabled. Copy PocketCluster into Application folder. Then you are all set to go.

  1. Java 1.8
  2. Homebrew
  3. VirtualBox 5.0.10
  4. Vagrant 1.7.4

All the requirements are commonly used, and many have already installed on their Macs. I strongly recommend you update your installation to the latest version. Actual installation could take up to 15 minutes. It mostly takes time to download files so make sure your internet connection is solid. 😉

Raspbery PI 2 Cluster (Up to 6-Nodes)

Here come the fun part. RPI2 cluster does not require Vagrant/VirtualBox since it uses *real* nodes, which execute distributed jobs and store data. PocketCluster will setup up to 6 Raspberry PI 2 as slave nodes, and use your Mac as a master. Whatever challenges you’d experience in this environment, you will encounter in a datacenter, a cloud, or/and an in-house cluster.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at this point.)

Operating RPI2 cluster does not need as much memory or disk space as Vagrant cluster. I’d say it is safe to operate the cluster on a Mac with 4 GB memory configuration.

RPI2 cluster requires an Ethernet connection. (It does not work with WIFI.) If PocketCluster runs on a Macbook, you’d need one of the goodies below. (iMac, Mac mini, and Mac Pro do not need an adapter.)

Make sure all the RPI2s are behind the same router as the Mac is, and remote login service is enabled just for you.

You firstly need to download the RPI2 ubuntu image below, and bake the image into an SD Card (at least 8 GB SD Card recommended). Everything you’d need is already installed and configured. Slide the baked SD cards into RPI2 slots, and power up RPI2s.

YOU DO NOT NEED TO CONFIGURE A SINGLE THING FOR ALL RASPBERRY PIs!

Then, install following pre-requisites, and copy PocketCluster into Application folder. You are good to go then.

  1. Java 1.8
  2. Homebrew

Actual installation could take up to 15 minutes or longer depending on your internet connection. Make sure your internet connection is solid.


Here comes the download page again. PocketCluster (ver 0.1.3).

This is indeed a continuation from my previous post. The points of having a Big Data cluster on your Mac could boil down to followings.

  • Friendlier Environment.
  • Experiments and Experiences.
  • Multi-nodes cluster.

Vagrant cluster is more suited when you literally want to carry a cluster around in your pocket. On the other hand, RPI2 cluster will give you a real-life environment and challenges.

Since this is barely version 0.1.2, there are lots of improvements to be made, and more BigData software packages to be added. (Apache Spark is planned to be added next time!) If you have another suggestion or question, please leave a comment below. You can also tweet me @stkim1.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!

10 thoughts on “Build Hadoop Cluster with 5 clicks

    • Hi Suki,

      Thanks for the compliment! I need to hit somewhere between economically effective and practically functioning number of RPI2. There were few numbers I tried but 6 was the best. I’d love to support more but the effort that goes behind scene isn’t negligible, sadly. 😦

      Like

      • So if i plug in 8 nodes the application will still only recognise 6-nodes only? That’s because I am planing to buy more Rpis for a small internal projects. So if the app not recognise the Rpis beyond 6 than i just order 6 Pis.

        Anyway this is still handy for me. Thanks for making this happen and wish you have a happy Lunar New Year!

        Like

  1. @Suki
    No. PocketCluster would not recognize more than 6 devices at this time. If there are more people looking for the capacity, I’ll work things out. 😉 Good luck with your project!

    Like

  2. Instead of the Ubuntu Image, what would happen if I used a Raspbian image?

    Also when the PocketCluster installs into the Pi, does it wipe everything previously on it?

    Like

    • @simon,

      It simply would not work if you’re to use Raspbian image.

      I don’t think that I understand your question.

      PocketCluster should be installed on clean slave nodes that there should be nothing to be erased.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s