Thursday, May 23, 2013

Distributed Large-Scale Object-Oriented Parallel Processing Framework

In 2007 I developed a prototype implementation of a large-scale computing framework capable of solving problems similar to Map-Reduce, as well as many perhaps more general computing problems and services.  (Feel free to contact me for more real-world examples of the computing tasks this framework was designed to support.)

The design for this framework evolved informally in my head from 2002-2007 until I actually got around to creating a default implementation in late 2007.

Why I'm sharing this work

Last year, I completed the +Coursera Machine Learning course taught by +Andrew Ng and also I am currently taking the Introduction to Data Science course taught by +Bill Howe of UW.  These two courses gave me my first exposure to Map-Reduce/Hadoop.

Since the distributed parallel computing framework I describe in this posting is similar to and perhaps more flexible in some ways than Map-Reduce, I thought I'd share this old/prior work from my startup of the time.

Quick Introduction

This link (http://goo.gl/55xYj) has a few more details along with a few unit test results including examples of how wildcarding could be exploited, but here is a high-level overview, mostly about how the novel message bus functioned:

• I designed and created a novel message bus similar to JMS.

Every message was an RPC request.

• Each message's recipient(s) could be addressed via multiple fields:

Namespace: Any unique string, but usually a hierarchical path (wildcards supported)

Class name: The leaf class name, or any ancestor class or implemented java interface (wildcards supported)

Uuid: A globally-unique identifier (wildcards supported)

Method name: The method name to be invoked by the recipient

• Each RPC request could be unicast or broadcast (one-to-many) by using wildcarding.  (For example, the uuid could be set to '*' which would result in all instances of a specific class in the specified namespace (which could be '*' as well) receiving and invoking the RPC request in parallel.  Similarly, the hierarchical namespace could have wildcards in its hierarchical path.)

• The potentially very large network of message servers were loosely-coupled and dynamically configured into small, local groups/clusters for routing/switching.

• Messages were sent using the HTTP protocol, and platform-specific header fields were added to each HTTP message header which were used:

1) By  message servers to route the messages

2) By the message recipients to invoke their RPC header-fields-addressed method

3) To specify where to send the results of the computation

• While most RPC requests were asynchronous, synchronous RPC requests were also supported.

• All message servers routed their messages via dynamically-configurable DecisionTree routing objects which could access the HTTP header fields to compute next-hop message routing.  So, the routing methodology was quite flexible.  The last example in the unit test output file dumps out one example DecisionTree object.

Example Unit Test Output

Here is an excerpt from the unit test output (note uuid is wildcarded):

cmd selected: MsgServer1,2.getUuid()

RPC Request:

ns = /autogeny/sys
cn = net.autogeny.sockproto.MsgServerImpl
uuid = *
mn = getUuid

Responses:
MsgServer1
MsgServer2

(ns=namespace, cn=classname, mn=method name)

-----------------------------

Okay, that's a quick wrap...  Any comments/discussion are welcome, both private and public.  Thanks for reading:)