jGrid: Grid Management Software for Java

Current release: v0.53.

What is jGrid?

Aside from being a play on the name "Hagrid", jGrid is software that manages parallel processing across a collection of Java VMs. The software employs a very simple multithreading model--all large processing jobs must be broken up into smaller chunks by jGrid clients before being submitted to the grid manager. The grid manager is then responsible for scheduling these jobs for execution on one of the clients on the grid, and communicating the results of that execution back to the grid client. It is also responsible for handling failures of any number of worker machines.

Author's Note: jGrid is a proof-of-concept. Thus, it lacks some of the professional touches such as real documentation and proper object separation. It also lacks serious authentication and a facility to prevent tampering of data, code, or results. These could be added to a future version, should there proof enough interest.

Design

Below is a discussion of how the various pieces of jGrid work together. The jGrid package is split into four main components--jobs, workers, clients, and managers. Each section below discusses one of the three components.

This design reminds me of the Command pattern...

Using jGrid

As laid out earlier, there are three different components that need to be set up before a jGrid can be put online. Generally, all three components have these platform requirements:

NOTE: The details of remote class loading have not yet been worked out. For now, class files for all submitted jobs will have to be placed in some publicly accessible location. One suggestion is to put them on a networked filesystem that client, manager, and worker can all access. Another would be to put them on a web server and let the RMI classloader work out the details of downloading them over HTTP; use -Djava.rmi.server.codebase=some_URL to tell the system where the classes can be found.

To install jGrid, simply download the tarball and decompress it. If you are on Unix/OSX/Linux, use tar -xzf jGrid.tar.gz; Windows users can use WinZip (offsite). This will create a directory jGrid/. All instructions below assume that you have run cd jGrid/. And yes, you can run all three on the same machine.

Managers

Since the Manager controls the operation of the entire grid, it must be started before all the other pieces. As with all other parts of jGrid, there is a console-only and a GUI version of this program. The console version will give you raw output and other debugging information; the GUI program tidies up all of the information data and presents it as a neat table describing each job, its status, and who (if anyone) is working on it.

NOTE: Other machines MUST be able to connect to TCP/IP ports on this machine! Configure your firewall/router as appropriate, if necessary. Otherwise, the grid will not work.

To start the command line manager, do this:

$ java -cp . -Djava.security.policy=java.policy grid.worker.Main

To start the GUI manager, do this:

$ java -cp . -Djava.security.policy=java.policy grid.worker.App

Please note the name or IP address of this machine. You will need it when you want to connect workers and clients to this grid's manager. Now let's move on to adding some workers.

Workers

There are three ways to run a worker: command-line, application, and applet. This is to maximize flexibility in getting workers onto the grid.

To start the command-line program, run:

$ java -cp . grid.worker.Main
worker> add servername 1

This creates one worker on your machine and connects it to the manager. To exit the program, type quit at any time.

To start the GUI application, run:

$ java -cp . grid.worker.App

You will see a program window appear with a prompt to add a worker. Fill out the text fields appropriately, and click Add.

Add Worker Dialog

Now there is a worker thread, and you can move on to starting a client.

Clients

This is perhaps the most difficult part of setting up the grid--programming it to do what you want. There are several steps involved:

  1. You must create a job object that implements the grid.Workable interface, and the execute method that goes with it.
  2. Next, create a client callback class that extends UnicastRemoteObject and implements JobFinishedCallback and Serializable. All of thse heavy declarations are necessary because this callback will be given to the manager when jobs are submitted, but the manager has to call back to this client. Therefore, one must extend the handler code to do something when each job finishes.
  3. Finally, create a class that connects to the manager service, creates grid.manager.Job objects and invokes scheduleJob on the job. Note that you can put the jobs into a List and send them as one big batch.
  4. Build the code and run it. Watch your jobs head off!

Here is sample client source code if you want to get a head-start.

Test-Drive

If you want to turn your computer into a worker bee, there is a rough webapplet to do so. Load jGrid as a Web App.

Future Steps

jGrid has several areas that need improvements. Keep in mind that the software exists as it is now merely to prove that it can be done, and not necessarily done *well*. That said, I have a list of things that may or may not be integrated into the package. Generally speaking, these improvements are to enhance the security or the reliability of the package; aside from that, I would prefer to keep the code body as small and efficient as possible.

The first area of improvement concerns the RPC facility. The current design of the worker requires that, when idle, it continuously poll the manager. Several revisions ago, the design was that the worker would call into the manager, where the call would block until a job became available. RMI, alas, has no way of cancelling the server-side code if the client should disconnect. This needs to be fixed. Moreover, RMI requires a special RMI directory to be running on the manager, and the user may not be able to do this.

Security is a another feature that is quite high on the to-do list. jGrid does not provide ANY security outside of whatever the Java security model affords. There is no way to authenticate workers or clients, and no encryption is provided to protect job objects or job results as they pass across the wire.

Performance-wise, the grid software does pretty well. My informal testing has indicated that if a clever programmer makes the job processing time large relative to the time required to talk to the server, then cluster utilization can scale nearly linearly. Unfortunately, there exists no way of matching a job size to a cluster node's capability to process the node. Were this possible, one could submit jobs of varying size and have the manager map them to an appropriately powerful machine. Currently, a worker will receive whatever job happens to be at the front of the list of unprocessed jobs. Lastly, there is no way to control CPU usage on the node, except at the level of the underlying OS.

Conclusion

So that is my cluster software for the Java platform. Feel free to send me feedback, suggestions, patches, or other encouragement. I hope you enjoy it!

Copyright ©1996-2024, Darrick Wong. All Rights Reserved. Send feedback.