The Next Version of PocketCluster

It has been quite some time since PocketCluster application and the Raspberry PI image disappeared from the download page, and many have asked when they would be available again.

Let’s rewind the clock a bit. The original version of PocketCluster was written to build a Apache Hadoop + Spark cluster with Raspberry PIs and a Mac. Back in late 2015 when Google TensorFlow became available to the public, it also became clear that PocketCluster needed to handle more than one cluster frameworks for its users to properly execute whatever task at their hands. For example, PocketCluster Index tracks 129 frameworks, let alone +1,200 libraries, toolsets, and models.

When installing Apache Spark with PocketCluster could take several hours in worst case, the new requirement on horizon was simply an over-stretching. Business-as-usual patchwork type upgrade here and there could resolve the issue only if there was a sounding foundation. Otherwise, the kind of duct-tape measure would not land PocketCluster nowhere near you would look for. It badly needed to be rebuilt with certain goals in mind, and that has been undertaken for some time.

Followings are the changes made so far. We can talk about the goals in a later post.

Same Simple Installation


Your Mac is the master node of your cluster, and Raspberry Pi (RPI) devices are slave nodes. Bake a provided image to SD cards and boot up RPIs. Drag and drop PocketCluster to Application folder. Then give a double-click; no black and white terminal, no command to copy & paste, no wall of text. This is 2017 after all.


All-In-One, Ready-made, Out-of-the-Box Package



Cluster frameworks such as Apache Spark or Hadoop come in a package that needs no extra configuration steps. Have quality time focusing on your main task, and let PocketCluster take care of all the rest of small tedious chores.

Drastically Reduced Installation Time

package-installWith the previous versions, installing a package used to take up-to several hours in some cases. It often failed completely. The new version will complete installing a package of Hadoop + Spark + Jupyter across your entire cluster within half an hour.

Secure Network Connection

Most of PocketCluster network connections are securely encrypted. Not only this is done out of necessity of protecting your cluster from malicious infiltration attempts, but also is to provide you a shielded environment. This leads to a possibility where there could exist multiple clusters in a workplace or home, but your cluster operates just for you and nobody else.

All 64 bit Kernel


PocketCluster runs RPI3 with 64bit kernel. The previous versions operated on 32bit kernel, and it significantly hampered the ability to handle data in large size. One might raise an argument that RPI3 only has 1GB of memory, and there is no point of deploying such memory hungry kernel.

Shifting kernel surely comes with a plan. Besides RPI3, three more single board computer models at about the same price range have been evaluated, and one or two will be added to supported device category. They will have significantly more memory and I/O capacity to surely enhance your experience with PocketCluster.

Few words on missing regular update


It is highly unlikely that posting regular update on the progress takes the lowest priority. While it is indeed exactly the opposite, the rebuilding progress so far strongly resembles a job where you are to repair a road with thousands of small but deep crannies. You are to fill them all up, and make it smooth enough for cars to fly through without drivers feeling any bump. There have been more than many moments when cracks suddenly go down way deeper than what is foreseen. It was rather difficult to make updates for those moments, and weekly round-ups in the past were substitutes for progress updates like a life sign pulse.

Even at this point, there exist many corners literally taken off to meet a tentative timeline to release the new version before Christmas of this year. As there lies a strong and sounding foundation, however, the experience with PocketCluster is scheduled to enhance accordingly, and all those corners will be revisited and reconstituted eventually.

Thank you very much for keeping your interest in PocketCluster, and stay tuned.

1000+ tools, frameworks and libraries indexed at PocketCluster Index!
Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

Weekly Roundup – Dec. 20 2015


New framework, services, and toolsets are added to Index!


Logistic Regression with Dropout

This is an extension of Spark MLlib, implementing logistic regression with dropout regularization.


CSV data source for Spark SQL and DataFrames



Velox Modelserver

Velox is a system for serving machine learning predictions.



Luigi is a Python module that helps you build complex pipelines of batch jobs.



spark-notebook – 0.6.2

Use Apache Spark straight from the Browser

spark-dataflow – 0.4.2

Provides a Spark backend for executing Dataflow pipelines.


Live-updating Spark UI built with Meteor


REST job server for Apache Spark


Gaussian Mixture Model Implementation in Pyspark


Pig on Apache Spark


You can find a lot more tools, frameworks and libraries at PocketCluster Index. Go check it out! Looking into adding your repo? tweet to @stkim1!

E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!

PocketCluster Index is Now Online!



PocketCluster Index

PocketCluster Index is now online. The site is to collect all the Big Data tools, services such as Hadoop, and examples for you to take a look and make convenient references from.

There are a couple of good reasons for including examples. Most Big Data softwares have their ways of doing things. For example, Hadoop wants you to build Mapper and Reducer to properly interact with YARN/HDFS. Spark wants to you start with SparkContext and do things in terms of .map() and .collect(). There are not-so-subtle details like them in all over the places, but a starter in this field would not know until he/she bumps into an issue related to them. (Well, at least that was the case for me.)

To be honest, it won’t be like having an awesome mentor and working with her/him, but going through good examples could provide you good amount of comprehensive industry practices authors have acquired over long period of time. Examples usually include ways of work processes should flow, tips for particular situation, and even philosophies behind. After all, I believe providing and going through examples is a very effective way of handing accumulated knowledge to another member of community.

Some might wonder how collecting examples could be related to running Big Data stack on Raspberry PI. When you are in learning process, you simply want something to mess up with and to start over when everything breaks. On what place could it be better to run and break examples of Hadoop or Spark than PocketCluster?

PocketCluster has started with the idea of having fun through experiment and experience. It has huge distance from very rigid, don’t-touch-this-or-that environment. Go break it and start over whenever you want.

I will relentlessly add more entries as we go alone. If you believe something is missing or look into adding yours, just tweet me @stkim1.

Don’t forget to check out PocketCluster!


E-mail Subscribtion
Subscribe for upcoming posts!
Join Slack
Join the channel!