Holumbus-MapReduce: a distributed MapReduce framework

[ distributed-computing, library, program ] [ Propose Tags ]
Versions [RSS] 0.0.1, 0.1.0, 0.1.1
Dependencies base (>=3), binary (>=0.4), bytestring (>=0.9), containers (>=0.1), directory (>=1.0), haskell98 (>=1), Holumbus-Distribution (>=0.0.1), Holumbus-Storage (>=0.0.1), hslogger (>=1.0), hxt (>=8.2), network (>=2.1), time (>=1.1), unix (>=2.3) [details]
License LicenseRef-OtherLicense
Copyright Copyright (c) 2008 Uwe Schmidt, Stefan Schmidt
Author Uwe Schmidt, Stefan Schmidt
Maintainer Stefan Schmidt <sts@holumbus.org>
Category Distributed Computing
Home page http://holumbus.fh-wedel.de
Uploaded by StefanSchmidt at 2009-03-08T13:04:14Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables Master
Downloads 3101 total (7 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for Holumbus-MapReduce-0.0.1

[back to package description]
This is the Holumbus-MapReduce Framework

Version 0.0.1

Stefan Schmidt sts@holumbus.org

http://holumbus.fh-wedel.de


About
-----

Holumbus is a set of Haskell libraries. This package contains the 
Holumbus-MapReduce library for building and running distributed MapReduce
systems.

This library depends on the Holumbus-Distributed and Holumbus-Storage libraries. 
If you want to run some of the examples, e.g. the distributed Web-Crawler 
and Indexer, the the Holumbus-Searchengine library must also be installed.


Contents
--------

Examples  Some example applications and utilities.
Programs  The applications you need to run a distributed MapReduce system.
source    Source code of the Holumbus-MapReduce library.


Requirements
------------

So far, this library is only tested under Linux, please let us know, if there are 
any problems under Windows or other OSes.
The Holumbus-MapReduce library requires at least GHC 6.10 and the 
following packages (available via Hackage).

  containers
  hslogger
  directory
  network
  time
  bytestring
  binary
  hxt
  Holumbus-Distribution


Installation
------------

A Cabal file is provided, therefore Holumbus-MapReduce can be installed using
the standard Cabal way:

$ runhaskell Setup.hs configure
$ runhaskell Setup.hs build
$ runhaskell Setup.hs install # with root privileges

If you prefer to do it the old way there with make:

$ make build
$ make install # with root privileges


Steps to make the system running
--------------------------------

1. Compile and install the Holumbus-Distribution library.

2. Compile and install the Holumbus-MapReduce framework.
   If this is done with cabal, step 3. can be skipped, the
   master will be build and installed with cabal

$ make build
$ make install       # this with root privileges

3. Compile the Master-Program.
   It is located in Programs/

$ make programs

4. Start the PortRegistry from the Holumbus-Distribution library.

5. Start the Master.
   If you want to start the master on another machine than the PortRegistry,
   then you have to copy the file "registry.xml" from the tmp directory to the
   the tmp directory on the machine the master should run on.

$ cd Programs/Master
$ ./Master

   Alternatively start the Master program, installed with cabal

Now, the system itself is running, but as you might wonder, you only have 
started the registry for communication and the master for the MapReduce system.
To run your own system, you have to create a MapReduce Client and a MapReduce
Worker. Here, we'll show you to run the examples, they'll show you how to
create you own system.

6. Compile the Examples. 
   For some examples (e.g. the Crawler), this requires that you have installed 
   the Holumbus-Searchengine framework. You can get it from the Holumbus 
   homepage (http://holumbus.fh-wedel.de), please read its documentation for further
   instructions and details. You'll only need the Searchegine library installed
   for the MapReduce examples, not the library itself.

$ make examples    # some examples require the Searchengine lib

7. Start the Workers.
   At this time every worker has to run in its own directory, so create for
   each worker, you want to run, its own subdirectory and copy the executeable
   to it. If you want to run the workers on different machines, you have to
   copy the file "registry.xml" from your tmp directory to the tmp directory
   of the machine the worker will work on.

8. Start the Client.
   Now you can start the client and wait the things to come. If you want to run
   the client on another machine than the PortRegistry, you have to copy the
   registry.xml file from the tmp directory, too.