Safe Haskell | None |
---|---|
Language | Haskell2010 |
- dataset :: (ToSQL a, SQLTypeable a, HasCallStack) => [a] -> Dataset a
- dataframe :: DataType -> [Cell] -> DataFrame
- constant :: (ToSQL a, SQLTypeable a) => a -> LocalData a
- asLocalObservable :: ComputeNode LocLocal a -> LocalFrame
- asDouble :: (Num a, SQLTypeable a) => LocalData a -> LocalData Double
- (.+) :: forall a1 a2. (Num a1, Num a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2
- (.-) :: forall a1 a2. (Num a1, Num a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2
- (./) :: (Fractional a1, Fractional a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2
- div' :: forall a1 a2. (Num a1, Num a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2
- (@@) :: CanRename a txt => a -> txt -> a
- _1 :: FixedProjection1
- _2 :: FixedProjection2
- collect :: forall ref a. SQLTypeable a => Column ref a -> LocalData [a]
- collect' :: DynColumn -> LocalFrame
- count :: forall a. Dataset a -> LocalData Int
- identity :: ComputeNode loc a -> ComputeNode loc a
- autocache :: Dataset a -> Dataset a
- cache :: Dataset a -> Dataset a
- uncache :: ComputeNode loc a -> ComputeNode loc a
- joinInner :: Column ref1 key -> Column ref1 value1 -> Column ref2 key -> Column ref2 value2 -> Dataset (key, value1, value2)
- joinInner' :: DynColumn -> DynColumn -> DynColumn -> DynColumn -> DataFrame
- broadcastPair :: Dataset a -> LocalData b -> Dataset (a, b)
Creation
dataset :: (ToSQL a, SQLTypeable a, HasCallStack) => [a] -> Dataset a Source #
dataframe :: DataType -> [Cell] -> DataFrame Source #
Creates a dataframe from a list of cells and a datatype.
Wil fail if the content of the cells is not compatible with the data type.
Standard conversions
asLocalObservable :: ComputeNode LocLocal a -> LocalFrame Source #
Converts a local node to a local frame. This always works.
asDouble :: (Num a, SQLTypeable a) => LocalData a -> LocalData Double Source #
Casts a local data as a double.
Arithmetic operations
(.+) :: forall a1 a2. (Num a1, Num a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2 Source #
A generalization of the addition for the Karps types.
(.-) :: forall a1 a2. (Num a1, Num a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2 Source #
A generalization of the negation for the Karps types.
(./) :: (Fractional a1, Fractional a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2 Source #
div' :: forall a1 a2. (Num a1, Num a2, GeneralizedHomo2 a1 a2) => a1 -> a2 -> GeneralizedHomoReturn a1 a2 Source #
Utilities
Standard library
collect :: forall ref a. SQLTypeable a => Column ref a -> LocalData [a] Source #
Collects all the elements of a column into a list.
NOTE: This list is sorted in the canonical ordering of the data type: however the data may be stored by Spark, the result will always be in the same order. This is a departure from Spark, which does not guarantee an ordering on the returned data.
collect' :: DynColumn -> LocalFrame Source #
See the documentation of collect.
identity :: ComputeNode loc a -> ComputeNode loc a Source #
The identity function.
Returns a compute node with the same datatype and the same content as the previous node. If the operation of the input has a side effect, this side side effect is *not* reevaluated.
This operation is typically used when establishing an ordering between some
operations such as caching or side effects, along with logicalDependencies
.
autocache :: Dataset a -> Dataset a Source #
Automatically caches the dataset on a need basis, and performs deallocation when the dataset is not required.
This function marks a dataset as eligible for the default caching level in Spark. The current implementation performs caching only if it can be established that the dataset is going to be involved in more than one shuffling or aggregation operation.
If the dataset has no observable child, no uncaching operation is added: the autocache operation is equivalent to unconditional caching.
cache :: Dataset a -> Dataset a Source #
Caches the dataset.
This function instructs Spark to cache a dataset with the default persistence level in Spark (MEMORY_AND_DISK).
Note that the dataset will have to be evaluated first for the caching to take
effect, so it is usual to call count
or other aggregrators to force
the caching to occur.
uncache :: ComputeNode loc a -> ComputeNode loc a Source #
Uncaches the dataset.
This function instructs Spark to unmark the dataset as cached. The disk and the memory used by Spark in the future.
Unlike Spark, Karps is stricter with the uncaching operation: - the argument of cache must be a cached dataset - once a dataset is uncached, its cached version cannot be used again (i.e. it must be recomputed).
Karps performs escape analysis and will refuse to run programs with caching issues.
joinInner :: Column ref1 key -> Column ref1 value1 -> Column ref2 key -> Column ref2 value2 -> Dataset (key, value1, value2) Source #
Explicit inner join.