scheduler: Work stealing scheduler.

[ bsd3, concurrency, library, parallelism ] [ Propose Tags ]

A work stealing scheduler that is primarly developed for massiv array librarry, but it is general enough to be useful for any computation that fits the model of few workers many jobs.


[Skip to Readme]
Versions [faq] 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0
Dependencies atomic-primops, base (>=4.9 && <5), deepseq, exceptions, unliftio-core [details]
License BSD-3-Clause
Copyright 2018-2019 Alexey Kuleshevich
Author Alexey Kuleshevich
Maintainer alexey@kuleshevi.ch
Category Parallelism, Concurrency
Home page https://github.com/lehins/haskell-scheduler
Source repo head: git clone https://github.com/lehins/massiv
Uploaded by lehins at Wed Mar 27 01:24:38 UTC 2019
Distributions NixOS:1.4.0, Stackage:1.4.0
Downloads 264 total (146 in the last 30 days)
Rating 2.25 (votes: 2) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2019-03-27 [all 1 reports]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for scheduler-1.0.0

[back to package description]

scheduler

This is a work stealing scheduler, which is very useful for tasks parallelization.

Whenever you have many actions you'd like to perform in parallel, but would only like to use a few threads to do the actual computation, this package is for you.

| Language | Travis | AppVeyor | Hackage | Nightly | LTS | |:--------:|:------:|:--------:|:-------:|:-------:|:---:| | GitHub top language | Travis | AppVeyor | Hackage| Nightly | Nightly |

QuickStart

A few examples in order to get up and running quickly.

Schedule simple actions

Work scheduling that does some side effecty stuff and discards the results:

interleaveFooBar :: IO ()
interleaveFooBar = do
  withScheduler_ (ParN 2) $ \ scheduler -> do
    putStrLn "Scheduling 1st job"
    scheduleWork scheduler (putStr "foo")
    putStrLn "Scheduling 2nd job"
    scheduleWork scheduler (putStr "bar")
    putStrLn "Awaiting for jobs to be executed:"
  putStrLn "\nDone"

In the example above two workers will be created to handle the only two jobs that have been scheduled. Printing with putStr is not thread safe, so the output that you would get with above function is likely be interleaved:

λ> interleaveFooBar
Scheduling 1st job
Scheduling 2nd job
Awaiting for jobs to be executed:
foboar
Done

Important to note that only when inner action supplied to the withScheduler_ exits will the scheduler start executing scheduled jobs.

Keeping the results of computation

Another common scenario is to schedule some jobs that produce useful results. In the example below four works will be spawned off. Due to ParOn each of the workers will be pinned to a particular core.

scheduleSums :: IO [Int]
scheduleSums =
  withScheduler (ParOn [1..4]) $ \ scheduler -> do
    scheduleWork scheduler $ pure (10 + 1)
    scheduleWork scheduler $ pure (20 + 2)
    scheduleWork scheduler $ pure (30 + 3)
    scheduleWork scheduler $ pure (40 + 4)
    scheduleWork scheduler $ pure (50 + 5)

Despite that the fact that sums are computed in parallel, the results of computation will appear in the same order they've been scheduled:

λ> scheduleSums
[11,22,33,44,55]

Exceptions

Whenever any of the scheduled jobs result in an exception, all of the workers will be killed and the exception will get re-thrown in the scheduling thread:

infiniteJobs :: IO ()
infiniteJobs = do
  withScheduler_ (ParN 5) $ \ scheduler -> do
    scheduleWork scheduler $ putStrLn $ repeat 'a'
    scheduleWork scheduler $ putStrLn $ repeat 'b'
    scheduleWork scheduler $ putStrLn $ repeat 'c'
    scheduleWork scheduler $ pure (4 `div` (0 :: Int))
    scheduleWork scheduler $ putStrLn $ repeat 'd'
  putStrLn "\nDone"

Note, that if there was no exception, printing would never stop.

λ> infiniteJobs
aaaaaaaaabcdd*** Exception: divide by zero

Nested jobs

Scheduling actions can themselves schedule actions indefinitely. That of course means that order of results produced is no longer deterministic, which is to be expected.

nestedJobs :: IO ()
nestedJobs = do
  withScheduler_ (ParN 5) $ \ scheduler -> do
    scheduleWork scheduler $ putStr $ replicate 10 'a'
    scheduleWork scheduler $ do
      putStr $ replicate 10 'b'
      scheduleWork scheduler $ do
        putStr $ replicate 10 'c'
        scheduleWork scheduler $ putStr $ replicate 10 'e'
      scheduleWork scheduler $ putStr $ replicate 10 'd'
    scheduleWork scheduler $ putStr $ replicate 10 'f'
  putStrLn "\nDone"

The order in which characters appear is important, since it directly relates to the actual order in which jobs are being scheduled and executed:

  • c, d and e characters will always appear after b
  • e will always appear after c
λ> nestedJobs
abbafbafbafbafbafbafbafbafbaffcdcdcdcdcdcdcdcdcdcdeeeeeeeeee
Done

Nested parallelism

Nothing really prevents you from having a scheduler within a scheduler. Of course, having multiple schedulers at the same time seems like an unnecessary overhead, which it is, but if you do have a use case for it, don't make me stop you, it is OK to go that route.

nestedSchedulers :: IO ()
nestedSchedulers = do
  withScheduler_ (ParN 2) $ \ outerScheduler -> do
    scheduleWork outerScheduler $ putStr $ replicate 10 'a'
    scheduleWork outerScheduler $ do
      putStr $ replicate 10 'b'
      withScheduler_ (ParN 2) $ \ innerScheduler -> do
        scheduleWork innerScheduler $ do
          putStr $ replicate 10 'c'
          scheduleWork outerScheduler $ putStr $ replicate 10 'e'
        scheduleWork innerScheduler $ putStr $ replicate 10 'd'
    scheduleWork outerScheduler $ putStr $ replicate 10 'f'
  putStrLn "\nDone"

Note that the inner scheduler's job schedules a job for the outer scheduler, which is a bit crazy, but totally safe.

λ> nestedSchedulers
aabababababababababbffffffffffcccccccdcdcdcdddededededeeeeee
Done

Single worker schedulers

If we only have one worker, than everything becomes sequential and deterministic. Consider the same example from before, but with Seq computation strategy.

nestedSequentialSchedulers :: IO ()
nestedSequentialSchedulers = do
  withScheduler_ Seq $ \ outerScheduler -> do
    scheduleWork outerScheduler $ putStr $ replicate 10 'a'
    scheduleWork outerScheduler $ do
      putStr $ replicate 10 'b'
      withScheduler_ Seq $ \ innerScheduler -> do
        scheduleWork innerScheduler $ do
          putStr $ replicate 10 'c'
          scheduleWork outerScheduler $ putStr $ replicate 10 'e'
        scheduleWork innerScheduler $ putStr $ replicate 10 'd'
    scheduleWork outerScheduler $ putStr $ replicate 10 'f'
  putStrLn "\nDone"

No more interleaving, everything is done in the same order each time the function is invoked.

λ> nestedSchedulers
aaaaaaaaaabbbbbbbbbbccccccccccddddddddddffffffffffeeeeeeeeee
Done

Avoiding deadlocks

Any sort of concurrency primitives such as mutual exclusion, semaphores, etc. can easily lead to deadlocks, starvation and other common problems. Try to avoid them and be careful if you do end up using them.