sparkle: Distributed Apache Spark applications in Haskell

[ bsd3, distributed-computing, ffi, java, jvm, library, program ] [ Propose Tags ]

See README.md


[Skip to Readme]
Versions [faq] 0.1, 0.1.0.1, 0.2, 0.3, 0.4, 0.4.0.1, 0.4.0.2, 0.5, 0.5.0.1, 0.6, 0.7, 0.7.1, 0.7.2, 0.7.2.1, 0.7.3, 0.7.4 (info)
Dependencies base (>=4.8 && <5), binary (>=0.7), bytestring (>=0.10), distributed-closure (>=0.3), filepath (>=1.4), inline-java (>=0.1), process (>=1.2), regex-tdfa (>=1.2), singletons (>=2.0), sparkle, text (>=1.2), vector (>=0.11), zip-archive (>=0.2) [details]
License BSD-3-Clause
Copyright 2016 EURL Tweag
Author Tweag I/O
Maintainer alp.mestanogullari@tweag.io
Revised Revision 1 made by AlpMestanogullari at Mon Jun 20 17:09:11 UTC 2016
Category FFI, JVM, Java, Distributed Computing
Source repo head: git clone https://github.com/tweag/sparkle(sparkle)
Uploaded by AlpMestanogullari at Mon Jun 20 15:57:23 UTC 2016
Distributions NixOS:0.7.4
Executables sparkle
Downloads 4865 total (230 in the last 30 days)
Rating (no votes yet) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs not available [build log]
All reported builds failed as of 2016-11-23 [all 3 reports]

Modules

  • Control
    • Distributed
      • Control.Distributed.Spark
        • Control.Distributed.Spark.Closure
        • Control.Distributed.Spark.Context
        • ML
          • Feature
            • Control.Distributed.Spark.ML.Feature.CountVectorizer
            • Control.Distributed.Spark.ML.Feature.RegexTokenizer
            • Control.Distributed.Spark.ML.Feature.StopWordsRemover
          • Control.Distributed.Spark.ML.LDA
        • Control.Distributed.Spark.PairRDD
        • Control.Distributed.Spark.RDD
        • SQL
          • Control.Distributed.Spark.SQL.Context
          • Control.Distributed.Spark.SQL.DataFrame
          • Control.Distributed.Spark.SQL.Row

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

For package maintainers and hackage trustees


Readme for sparkle-0.2

[back to package description]

Sparkle: Apache Spark applications in Haskell

Circle CI

Sparkle [spär′kəl]: a library for writing resilient analytics applications in Haskell that scale to thousands of nodes, using Spark and the rest of the Apache ecosystem under the hood.

This is an early tech preview, not production ready.

Getting started

The tl;dr using the hello app as an example on your local machine:

$ stack build hello
$ stack exec sparkle package sparkle-example-hello
$ spark-submit --master 'local[1]' sparkle-example-hello.jar

Requirements:

  • the Stack build tool;
  • either, the Nix package manager,
  • or, OpenJDK, Gradle and Spark >= 1.6 installed from your distro.

To run a Spark application the process is as follows:

  1. create an application in the apps/ folder, in-repo or as a submodule;
  2. add your app to stack.yaml;
  3. build the app;
  4. package your app into a deployable JAR container;
  5. submit it to a local or cluster deployment of Spark.

If you run into issues, read the Troubleshooting section below first.

To build:

$ stack [--nix] build

You can optionally pass --nix to all Stack commands to ask Nix to make Spark and Gradle available in a local sandbox for good build results reproducibility. Otherwise you'll need these installed through your OS distribution's package manager for the next steps (and you'll need to tell Stack how to find the JVM header files and shared libraries).

To package your app (omit the square bracket part entirely if you're not using --nix):

$ [stack --nix exec --] sparkle package <app-executable-name>

Finally, to run your application, for example locally:

$ [stack --nix exec --] spark-submit --master 'local[1]' <app-executable-name>.jar

See here for other options, including lauching a whole cluster from scratch on EC2.

Troubleshooting

jvm library or header files not found

You'll need to tell Stack where to find your local JVM installation. Something like the following in your ~/.stack/config.yaml should do the trick, but check that the paths match up what's on your system:

extra-include-dirs: [/usr/lib/jvm/java-7-openjdk-amd64/include]
extra-lib-dirs: [/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server]

Or use --nix: since it won't use your globally installed JDK, it will have no trouble finding its own locally installed one.

Can't build sparkle on OS X

OS X is not a supported platform for now. There are several issues to make sparkle work on OS X, tracked in this ticket.

Gradle <= 2.12 incompatible with JDK 9

If you're using JDK 9, note that you'll need to either downgrade to JDK 8 or update your Gradle version, since Gradle versions up to and including 2.12 are not compatible with JDK 9.

License

Copyright (c) 2015-2016 EURL Tweag.

All rights reserved.

Sparkle is free software, and may be redistributed under the terms specified in the LICENSE file.

About

Tweag I/O

Sparkle is maintained by Tweag I/O.

Have questions? Need help? Tweet at @tweagio.