Distributed Large-Scale Object-Oriented Parallel Processing Framework
In
2007 I developed a prototype implementation of a large-scale computing
framework capable of solving problems similar to Map-Reduce, as well as
many perhaps more general computing problems and services. (Feel free to contact me for more real-world examples of the computing tasks this framework was designed to support.)
The
design for this framework evolved informally in my head from 2002-2007
until I actually got around to creating a default implementation in late
2007.
Why I'm sharing this work
Last year, I completed the +Coursera Machine Learning course taught by +Andrew Ng and also I am currently taking the Introduction to Data Science course taught by +Bill Howe of UW. These two courses gave me my first exposure to Map-Reduce/Hadoop.
Since
the distributed parallel computing framework I describe in this posting
is similar to and perhaps more flexible in some ways than Map-Reduce, I
thought I'd share this old/prior work from my startup of the time.
Quick Introduction
This link (http://goo.gl/55xYj) has a few more details along with a few unit test
results including examples of how wildcarding could be exploited, but
here is a high-level overview, mostly about how the novel message bus
functioned:
• I designed and created a novel message bus similar to JMS.
• Every message was an RPC request.
• Each message's recipient(s) could be addressed via multiple fields:
Namespace: Any unique string, but usually a hierarchical path (wildcards supported)
Class name: The leaf class name, or any ancestor class or implemented java interface (wildcards supported)
Uuid: A globally-unique identifier (wildcards supported)
Method name: The method name to be invoked by the recipient
• Each RPC request could be unicast or broadcast (one-to-many) by using wildcarding. (For
example, the uuid could be set to '*' which would result in all
instances of a specific class in the specified namespace (which could be
'*' as well) receiving and invoking the RPC request in parallel.
Similarly, the hierarchical namespace could have wildcards in its
hierarchical path.)
• The potentially very large network of
message servers were loosely-coupled and dynamically configured into
small, local groups/clusters for routing/switching.
• Messages
were sent using the HTTP protocol, and platform-specific header fields
were added to each HTTP message header which were used:
1) By message servers to route the messages
2) By the message recipients to invoke their RPC header-fields-addressed method
3) To specify where to send the results of the computation
• While most RPC requests were asynchronous, synchronous RPC requests were also supported.
•
All message servers routed their messages via dynamically-configurable
DecisionTree routing objects which could access the HTTP header fields
to compute next-hop message routing. So, the routing methodology was
quite flexible. The last example in the unit test output file dumps out
one example DecisionTree object.
Example Unit Test Output
Here is an excerpt from the unit test output (note uuid is wildcarded):
cmd selected: MsgServer1,2.getUuid()
RPC Request:
ns = /autogeny/sys
cn = net.autogeny.sockproto.MsgServerImpl
uuid = *
mn = getUuid
Responses:
MsgServer1
MsgServer2
(ns=namespace, cn=classname, mn=method name)
-----------------------------
Okay, that's a quick wrap... Any comments/discussion are welcome, both private and public. Thanks for reading:)
Thursday, May 23, 2013
Subscribe to:
Posts (Atom)