The Next Version of PocketCluster

It has been quite some time since PocketCluster application and the Raspberry PI image disappeared from the download page, and many have asked when they would be available again.

Let’s rewind the clock a bit. The original version of PocketCluster was written to build a Apache Hadoop + Spark cluster with Raspberry PIs and a Mac. Back in late 2015 when Google TensorFlow became available to the public, it also became clear that PocketCluster needed to handle more than one cluster frameworks for its users to properly execute whatever task at their hands. For example, PocketCluster Index tracks 129 frameworks, let alone +1,200 libraries, toolsets, and models.

When installing Apache Spark with PocketCluster could take several hours in worst case, the new requirement on horizon was simply an over-stretching. Business-as-usual patchwork type upgrade here and there could resolve the issue only if there was a sounding foundation. Otherwise, the kind of duct-tape measure would not land PocketCluster nowhere near you would look for. It badly needed to be rebuilt with certain goals in mind, and that has been undertaken for some time.

Followings are the changes made so far. We can talk about the goals in a later post.

Same Simple Installation

drag-drop

Your Mac is the master node of your cluster, and Raspberry Pi (RPI) devices are slave nodes. Bake a provided image to SD cards and boot up RPIs. Drag and drop PocketCluster to Application folder. Then give a double-click; no black and white terminal, no command to copy & paste, no wall of text. This is 2017 after all.

 

All-In-One, Ready-made, Out-of-the-Box Package

spark-console

jupyter-ex

Cluster frameworks such as Apache Spark or Hadoop come in a package that needs no extra configuration steps. Have quality time focusing on your main task, and let PocketCluster take care of all the rest of small tedious chores.

Drastically Reduced Installation Time

package-installWith the previous versions, installing a package used to take up-to several hours in some cases. It often failed completely. The new version will complete installing a package of Hadoop + Spark + Jupyter across your entire cluster within half an hour.

Secure Network Connection

Most of PocketCluster network connections are securely encrypted. Not only this is done out of necessity of protecting your cluster from malicious infiltration attempts, but also is to provide you a shielded environment. This leads to a possibility where there could exist multiple clusters in a workplace or home, but your cluster operates just for you and nobody else.

All 64 bit Kernel

1254383-arm-aarch-64

PocketCluster runs RPI3 with 64bit kernel. The previous versions operated on 32bit kernel, and it significantly hampered the ability to handle data in large size. One might raise an argument that RPI3 only has 1GB of memory, and there is no point of deploying such memory hungry kernel.

Shifting kernel surely comes with a plan. Besides RPI3, three more single board computer models at about the same price range have been evaluated, and one or two will be added to supported device category. They will have significantly more memory and I/O capacity to surely enhance your experience with PocketCluster.

Few words on missing regular update

whynopost

It is highly unlikely that posting regular update on the progress takes the lowest priority. While it is indeed exactly the opposite, the rebuilding progress so far strongly resembles a job where you are to repair a road with thousands of small but deep crannies. You are to fill them all up, and make it smooth enough for cars to fly through without drivers feeling any bump. There have been more than many moments when cracks suddenly go down way deeper than what is foreseen. It was rather difficult to make updates for those moments, and weekly round-ups in the past were substitutes for progress updates like a life sign pulse.

Even at this point, there exist many corners literally taken off to meet a tentative timeline to release the new version before Christmas of this year. As there lies a strong and sounding foundation, however, the experience with PocketCluster is scheduled to enhance accordingly, and all those corners will be revisited and reconstituted eventually.

Thank you very much for keeping your interest in PocketCluster, and stay tuned.


1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Raspberry PI 3 has arrived, finally!

rpi3

Four Raspberry PI 3 have arrived today. I’ve seen some benchmarks here and there, and it’s thrilling to finally put my own hands on these goodies!

Of all the features the fresh upgrade bring on table, 64 bit support (BCM 64bit ARMv8 Cortex A53) is probably the most exciting thing. (I’ve told you 64 bit support could arrive in anytime!)  Things are finally becoming very serious. Stay tuned!

 

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Apache Spark 1.5.2 on Raspberry PI 2 cluster

This is the second post of a series about BigData cluster for OS X and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster
  3. The Next Version of PocketCluster

** update: Oct 2, 2017. The next version of PocketCluster is coming soon!

Once we have a foundation to build Big Data analytics stack, the next one that should come is Apache Spark. Spark provides speed boost to MapReduce based algorithms by executing such computations in memory and conducting DAG optimization. You can read more details in this paper.

Spark tremendously helps analytics computation since such computations are iterative in nature. Suppose you’re to read and write 100 GB of data again and agin on disk and how painfully slow that could be. (Some folks handle a couple petabytes everyday. Let’s not go there yet.) At the same time, Spark handles such operations inside memory. You just ought to experience the differences. If you’re to do anything with Big Data analytics, Spark is therefore just one thing you are to encounter any direction you go.

In fact, the very first two posts of this blog are about running Spark on Raspberry PI 2 (henceforth RPI2). Nevertheless, it wasn’t much of joy to build and run such thing. If you’ve been with me, you know we’ve crossed some serious creeks. Now, here comes an OS X application that deploys Apache Spark & Hadoop with few mouse clicks.

icon_256x256

PocketCluster 0.1.3

Just like the previous post, I’m going to play a video and talk about few more details.

First of all, PocketCluster supports Vagrant and Raspberry PI 2 at the same time. If you want to carry a multi-node Big Data environment with you all the time, it is definitely recommended to go with Vagrant version. The installation and operation process is exactly the same as the one depicted in the video.

Meanwhile, I would recommend to go with RPI2 if you’re working in a stationary environment. Six RPI2 could provide roughly the same amount of computation power as one Intel i7 processor does. You’d be able to 1) quickly test your hypothesis or 2) debug your prototype in a real, multi-node environment.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at the moment.)

Secondly, In order for PocketCluster to smoothly install and operate, you’d need a solid internet connection. All the software are downloaded and configured in run-time that jumpy connection could really ruin your experience. I am working on improving this one.

Third, Spark supports five different modes of operation. 1) Standalone 2) Pseudo cluster, 3) Standalone cluster, 4) YARN Client, and 5) Mesos client. PocketCluster supports Standalone cluster mode only at the moment.

Lastly, as soon as Spark installation completes, SparkR is configured run across slave nodes. It just that Homebrew installation of R takes forever since it needs to compile gcc to provide Fortran for R. Hence, should you need to use SparkR, just type following in a Terminal or iTerm shell on your Mac after installation. (*it could take about 40 mins.)

brew tap homebrew/science && brew install r && brew untap homebrew/science

For more detailed instructions about PocketCluster installation, please go to my previous post.

Here comes  PocketCluster 0.1.3  again. I’m looking for the next package to install. If you have a suggestion, please leave a comment below or tweet me @stkim1.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!

 

Build Hadoop Cluster with 5 clicks

This is the first post of a new series about BigData cluster for OSX and Raspberry PI 2.

  1. Build Hadoop Cluster with 5 clicks.
  2. Apache Spark 1.5.2 on Raspberry PI 2 cluster
  3. The Next Version of PocketCluster

* update: Dec 5, 2015. PocketCluster is updated to 0.1.3 for Apache Spark!
** update: Oct 2, 2017. The next version of PocketCluster is coming soon!

It was back in February of this year when the first set of Raspberry PI 2 (henceforth RPI2) cluster was built and Apache Spark ran on it. Back then, it was much of an experiment and fun since only handful of folks did give a second thought to putting RPI in that sort of use.

After opening up RPI2 cluster case (part 1, part 2, open schematic), things have changed a tiny bit. Many had good laughter on the cluster, and some were enthusiastic about the possibilities the cluster had.

In fact, Makezine and MagPi (pg #28) have covered the case, and many have retweeted and downloaded the schematics. Nonetheless, one question rang so strong and motivated me to finalize what I started back in July.

The question indicates he seems to believe there is no clear ways to put the power of multiple RPI2 in good use. I, for one, was a bit shocked since there was indeed a good use case in front of me. If I may be so daring, I’d like to change the viewpoint just a bit by presenting a dead-simple way to build a BigData cluster on a MacBook with RPI2s for experiments and experiences.

Let’s firstly go through existing solutions. We don’t want to reinvent wheel. One can easily point that we already have Cloudera, Hortonworks, MapR, and Apache Ambari to automate the tedious setup process.

I would agree on a specific domain. Say we are working on multiple racks of powerful datacenter grade nodes after nodes, and we need an enterprise level support. Then they are definitely the answer. I would encourage to click the link. They are, however, designed to work on powerful machines, not like a MacBook or RPI2s.

Another can give us multiple links of how to install Hadoop and Spark with 5 seconds of google search. Here is actually one you can try on RPI2. (Thank you Jonas!) It wouldn’t be much of fun if you’d have to open up lots of text files after text files on RPI2 after RPI2.

Shouldn’t there be something that just works, is lightweight, and gets the tedious installation process out of your way when you just want to play with a BigData cluster on a MacBook and RPI2s?


Screen Shot 2015-10-21 at 8.37.46 PM

Here comes PocketCluster (ver 0.1.3).

It builds you a Hadoop Cluster with 3 slave nodes for Vagrant/VirtualBox, and up to 6 slave nodes for Raspberry PI 2 within 5 mouse clicks. (Yes, PocketCluster builds you Hadoop cluster on two different platforms.) It takes no command line configuration to install and run.

Seeing is believing. Watch the videos below and chill with me.

Vagrant Cluster (3-Nodes)

Vagrant Cluster is designed to work without a single Raspberry PI 2. PocketCluster will create 3 slave nodes based on Vagrant + VirtualBox, and utilize OSX as master. This variation is there for you who have a MacBook and want to carry a multi-nodes environment all the time. By the way, you need at least 3 GB of Memory and 9 GB of free disk space. (A Mac with at least 8 GB of Memory is recommended.)

Install the following pre-requisites first, and make sure remote login service is enabled. Copy PocketCluster into Application folder. Then you are all set to go.

  1. Java 1.8
  2. Homebrew
  3. VirtualBox 5.0.10
  4. Vagrant 1.7.4

All the requirements are commonly used, and many have already installed on their Macs. I strongly recommend you update your installation to the latest version. Actual installation could take up to 15 minutes. It mostly takes time to download files so make sure your internet connection is solid. 😉

Raspbery PI 2 Cluster (Up to 6-Nodes)

Here come the fun part. RPI2 cluster does not require Vagrant/VirtualBox since it uses *real* nodes, which execute distributed jobs and store data. PocketCluster will setup up to 6 Raspberry PI 2 as slave nodes, and use your Mac as a master. Whatever challenges you’d experience in this environment, you will encounter in a datacenter, a cloud, or/and an in-house cluster.

(*Old generation Raspberry PI is not supported. Only Raspberry PI 2 is supported at this point.)

Operating RPI2 cluster does not need as much memory or disk space as Vagrant cluster. I’d say it is safe to operate the cluster on a Mac with 4 GB memory configuration.

RPI2 cluster requires an Ethernet connection. (It does not work with WIFI.) If PocketCluster runs on a Macbook, you’d need one of the goodies below. (iMac, Mac mini, and Mac Pro do not need an adapter.)

Make sure all the RPI2s are behind the same router as the Mac is, and remote login service is enabled just for you.

You firstly need to download the RPI2 ubuntu image below, and bake the image into an SD Card (at least 8 GB SD Card recommended). Everything you’d need is already installed and configured. Slide the baked SD cards into RPI2 slots, and power up RPI2s.

YOU DO NOT NEED TO CONFIGURE A SINGLE THING FOR ALL RASPBERRY PIs!

Then, install following pre-requisites, and copy PocketCluster into Application folder. You are good to go then.

  1. Java 1.8
  2. Homebrew

Actual installation could take up to 15 minutes or longer depending on your internet connection. Make sure your internet connection is solid.


Here comes the download page again. PocketCluster (ver 0.1.3).

This is indeed a continuation from my previous post. The points of having a Big Data cluster on your Mac could boil down to followings.

  • Friendlier Environment.
  • Experiments and Experiences.
  • Multi-nodes cluster.

Vagrant cluster is more suited when you literally want to carry a cluster around in your pocket. On the other hand, RPI2 cluster will give you a real-life environment and challenges.

Since this is barely version 0.1.2, there are lots of improvements to be made, and more BigData software packages to be added. (Apache Spark is planned to be added next time!) If you have another suggestion or question, please leave a comment below. You can also tweet me @stkim1.

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!

[Free Schematic] Raspberry PI 2 Cluster Assembly Tutorial

This is the third post of a series about Raspberry PI 2 BigData cluster case.

  1. Raspberry PI 2 Cluster Case pt1
  2. Raspberry PI 2 Cluster Case pt2
  3. Raspberry PI 2 Cluster Assembly Tutorial
  4. Build Hadoop Cluster with 5 clicks

 

Since my last post, Raspberry PI 2 Cluster Case pt2, readers have asked if I could open the schematics of the cluster case with detailed assembly instruction. So, here comes the schematics under Solderpad Hardware License. The license is equivalent to Apache, meaning you can make and distribute for free of royalty. I would encourage you to contribute in forms of improvements and extensions though.

Preparation

1. Boards

pannel-description

Let’s firstly prepare boards. Download the schematics. Please cut out one piece (1) of each and two pieces (2) of board-middle. You can use wood, synthetic, or Acrylic panel to cut out the shapes above. Whatever the material it is, you can use it as long as a panel is no thicker than 3 mm and strong enough to hold a USB charger.

comp-pannel-unwrap

If you cut out a board from Acrylic panel, they will come with protective films on. Peel them first.

comp-pannel
Two pieces of board-middle and one piece of each.

2. Power and Cables

comp-power-cable

  1. One Photive 6-ports 50 watt USB charger.
  2. Six 1ft ( 30 cm ) Cat6 Ethernet cables.
  3. Six 1ft ( 30 cm ) 90 Degree right-angled Micro USB to USB cable.

com-cable

Take a look at the cable. A Micro-USB-to-USB cable must be right-angled, not left-angled. You may also want to check the thickness of a wire, and make sure your cable is thick enough to deliver enough power. The one available to me comes with AWG 26 cable. Several different brands are available, an examples, at Amazon, Ebay, and Alibaba.

3. Screws

You then need screws. Buying right-sized screws are the hardest part of the whole process, at least for me. Here’s the list of them.

comp-screws-0

  1. 2 x M3 Hex Nuts.
  2. 22 x M3 25mm Pillar Screws.
  3. 4 x M3 Whirled Hex Nuts. (could be replaced with four of #1 plain M3 nuts.)
  4. 6 x M3 4mm Screws.
  5. 24 x M2.5 5mm Screws.
  6. 24 x M2.5 Hex Nuts
  7. 24 x M2.5 5mm Pillar Screws.

The four supporting holes in Raspberry PI 2 are 2.7 mm in diameter, and, ideally, you would want to use M2.6 screws and nuts for fixing Raspberry Pi 2 since they fit rather tight. The problem was M2.6 Pillar Screws did not come easy, and I had to custom order M2.6 Pillar Screws. 😦

The ones linked in here are therefore M2.5 screws and nuts. They are available off-the-shelf and smaller in diameter than holes on Raspberry Pi that it would not cause much issue. I hope these holes get bigger in the future so that I could simply use good-old plain M3 screws.

4. Raspberry PI 2 B+

comp-raspberry

  1. 6 x Raspberry PI 2 B+

We do not need to say more here. 🙂 Let’s call them RPI henceforth.

5. Network Switch (Optional)

In case you want to fit a network switch to in the cluster, pick something slim. I picked up one below.

  1. D-Link 8-Port Gigabit Desktop Switch

One thing to remember is that you have a choice not to put a network switch at all. You can simply put Raspberry PI only, and connect them to an existing network. This switch is just there to provide the cluster’s own network.

Assembly

Once you have the components and materials, it’s rather straightforward. Firstly put M2.5 5mm Pillar Screws on boards and tighten ’em with M2.5 Hex nuts like below. You’d have to do this for 6 sets.

assem-1

assem-2

Then fix RPI on the boards with M2.5 Screws.

assem-3

Once you’re done, you’d have six sets. You’d see RPIs on right side are all flipped. In this way, all RPIs’ power inputs are placed right next to USB charger reducing the cluster volume. The big round holes you’d also see are for ventilation.

assem-5

Apply two pieces of double-sided tape on a USB charger supporter. This is to give a bit more strength to the board to support the charger.

assem-6

Place it on the board named board-middle-end like below.

assem-7

assem-8

Then it is time to stack up RPIs. Look carefully how each piece is stacked.

assem-9

assem-10

Once you complete stacking up RPIs, plug in Micro-USB-to-USB cables.

assem-13

assem-14

Once you plug in the cables, place your USB charger in the middle, and close the cluster with board-top. Don’t forget to tighten up with M3 4mm Screws.

assem-15

Now plug in USB ends to the USB charger. See how the boards hold the charger, and clear the entrance for USB ends? 🙂

assem-17

Place the network switch at the bottom layer and close it with board-bottom and four M3 Hex Nuts like below.

assem-18

Then connect your RPIs and the network switch with Cat6 Ethernet Cables.

assem-19

Once you supply power to the charger, all the RPIs will lit up!

assem-20

assem-21

This is how a 6-nodes RPI cluster is assembled. You can replace as many parts as you wish, and modify as much as you want. All these instructions and schematics are here for a template.

if you get to build one, please share yours with #rpicluster hashtag on twitter so that we all can see!!!  Now, here comes the schematics again, and let the fun begin!
 

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!

Raspberry PI 2 Cluster Case pt2

This is the second post of a series about Raspberry PI 2 BigData cluster case.

  1. Raspberry PI 2 Cluster Case pt1
  2. Raspberry PI 2 Cluster Case pt2
  3. Raspberry PI 2 Cluster Assembly Tutorial
  4. Build Hadoop Cluster with 5 clicks

 

From Raspberry PI 2 Cluster Case pt1, I drew up a rough sketch with certain goals I set out to. They are,

  1. Able to fit in 6 RPI2 in a small, contained volume.
  2. Has to be stackable.
  3. Able to fit in a power supply and a 8 port network switch for the 6 RPI2.
  4. Able to cool off the cluster with no fan.
  5. Has to be cheap.

Prototyping

I then cut out an Acrylic panel at a near laser shop.

Two RPI2 on a cluster panel

The panel have two big round holes for ventilation, 8 holes for RPI2 mounts, and 8 more holes for pillar screws.

Once you put pillars, it looked like this.

Pillar Screws and mounts on a cluster panel

I put a giant square hole in the center to put a USB charger as power supply. You can see the two big round holes for heat ventilation.

At this point, I was terrorized with fact that I could not put a USB-to-MicroUSB cable to the center USB charger. You see the square hole is completely closed? I found that the panel blocked the USB charger’s USB socket, rendering the charger useless. What a bummer…

Gah! No way!

Of course, my happy story did not end there. I then put a RPI to see if it fitted in. To my disappointment, the mount was just too high. You see USB ports sticks to the upper level panel, and there is space underneath the RPI? That’s bad.

Long Mount
High Mountain

In fact, the mount was just too Long. I figured a mount should be around 4~5 mm in length. On top of that, I put one too many pillar screws. I had to use only the minimum number of them to maintain structure strength.

All in all, three problems.

  1. Entrance to USB charger.
  2. Appropriate Mount screw.
  3. Minimum # of pillar screws.

The Final Result

After Fixing three issues above, I now have this.

6 Nodes RPI2 Cluster
6 Nodes RPI2 Cluster
pt-9
The cluster from different angle.

Those USB-to-MicroUSB cables are only 30 centimeters in length. They still stand out and take up quite a bit of space. Does anyone know where I can get a shorter cable?

8 Ports Network Switch at the bottom stack
8 Ports Network Switch at the bottom stack

Of course You can put the entire cluster on your desk.

A cluster on a desk
A cluster on a desk

Assessment

I’ve run the cluster closed to two months so far. I’ve experienced neither heat issue nor performance hit. No single node has gone down while I’m running Apache Spark/Hadoop in cluster mode. It has been amazingly stable and easy to operate. Although there are few more issues I like to fix up later. (Especially the USB charger cable.)

Let’s move on to JAVA on Raspberry PI next time!
 

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join my channel!