karps-0.2.0.0: Haskell bindings for Spark Dataframes and Datasets

Safe HaskellNone
LanguageHaskell2010

Spark.Core.Internal.ContextInternal

Synopsis

Documentation

inputSourcesRead :: ComputeGraph -> [HdfsPath] Source #

A list of file sources that are being requested by the compute graph

prepareComputation :: ComputeGraph -> SparkStatePure (Try Computation) Source #

Given a context for the computation and a graph of computation, builds a computation object.

buildComputationGraph :: ComputeNode loc a -> Try ComputeGraph Source #

Builds the computation graph by expanding a single node until a transitive closure is reached.

It performs the naming, node deduplication and cycle detection.

TODO(kps) use the caching information to have a correct fringe

performGraphTransforms :: SparkSession -> ComputeGraph -> Try ComputeGraph Source #

Performs all the operations that are done on the compute graph:

  • fullfilling autocache requests
  • checking the cache/uncache pairs
  • pruning of observed successful computations
  • deconstructions of the unions (in the future)

This could all be done on the server side at this point.

getObservables :: Computation -> [UntypedLocalData] Source #

Retrieves all the observables from a computation.

insertSourceInfo :: ComputeGraph -> [(HdfsPath, DataInputStamp)] -> Try ComputeGraph Source #

Exposed for debugging

Inserts the source information into the graph.

Note: after that, the node IDs may be different. The names and the paths will be kept though.

updateCache :: ComputationID -> [(NodeId, NodePath, DataType, PossibleNodeStatus)] -> SparkStatePure ([(NodeId, Try Cell)], [(NodePath, NodeCacheStatus)]) Source #

Updates the cache, and returns the updates if there are any.

The updates are split into final results, and general update status (scheduled, running, etc.)