Distributed Large-Scale Object-Oriented Parallel Processing Framework
In
 2007 I developed a prototype implementation of a large-scale computing 
framework capable of solving problems similar to Map-Reduce, as well as 
many perhaps more general computing problems and services.  (Feel free to contact me for more real-world examples of the computing tasks this framework was designed to support.)
The
 design for this framework evolved informally in my head from 2002-2007 
until I actually got around to creating a default implementation in late
 2007.
Why I'm sharing this work
Last year, I completed the +Coursera Machine Learning course taught by +Andrew Ng and also I am currently taking the Introduction to Data Science course taught by +Bill Howe of UW.  These two courses gave me my first exposure to Map-Reduce/Hadoop.
Since
 the distributed parallel computing framework I describe in this posting
 is similar to and perhaps more flexible in some ways than Map-Reduce, I
 thought I'd share this old/prior work from my startup of the time.
Quick Introduction
This link (http://goo.gl/55xYj) has a few more details along with a few unit test 
results including examples of how wildcarding could be exploited, but 
here is a high-level overview, mostly about how the novel message bus 
functioned:
• I designed and created a novel message bus similar to JMS.
• Every message was an RPC request.
• Each message's recipient(s) could be addressed via multiple fields:
Namespace: Any unique string, but usually a hierarchical path (wildcards supported)
Class name: The leaf class name, or any ancestor class or implemented java interface (wildcards supported)
Uuid: A globally-unique identifier (wildcards supported)
Method name: The method name to be invoked by the recipient
• Each RPC request could be unicast or broadcast (one-to-many) by using wildcarding.  (For
 example, the uuid could be set to '*' which would result in all 
instances of a specific class in the specified namespace (which could be
 '*' as well) receiving and invoking the RPC request in parallel.  
Similarly, the hierarchical namespace could have wildcards in its 
hierarchical path.)
• The potentially very large network of 
message servers were loosely-coupled and dynamically configured into 
small, local groups/clusters for routing/switching.
• Messages 
were sent using the HTTP protocol, and platform-specific header fields 
were added to each HTTP message header which were used:
1) By  message servers to route the messages
2) By the message recipients to invoke their RPC header-fields-addressed method
3) To specify where to send the results of the computation
• While most RPC requests were asynchronous, synchronous RPC requests were also supported.
•
 All message servers routed their messages via dynamically-configurable 
DecisionTree routing objects which could access the HTTP header fields 
to compute next-hop message routing.  So, the routing methodology was 
quite flexible.  The last example in the unit test output file dumps out
 one example DecisionTree object.
Example Unit Test Output
Here is an excerpt from the unit test output (note uuid is wildcarded):
cmd selected: MsgServer1,2.getUuid()
RPC Request:
ns = /autogeny/sys
cn = net.autogeny.sockproto.MsgServerImpl
uuid = *
mn = getUuid
Responses:
MsgServer1
MsgServer2
(ns=namespace, cn=classname, mn=method name)
-----------------------------
Okay, that's a quick wrap...  Any comments/discussion are welcome, both private and public.  Thanks for reading:)
Thursday, May 23, 2013
Subscribe to:
Comments (Atom)