karps-0.2.0.0: Haskell bindings for Spark Dataframes and Datasets

Safe HaskellNone
LanguageHaskell2010

Spark.Core.Internal.OpStructures

Contents

Description

A description of the operations that can be performed on nodes and columns.

Synopsis

Documentation

type SqlFunctionName = Text Source #

The name of a SQL function.

It is one of the predefined SQL functions available in Spark.

type UdafClassName = Text Source #

The classpath of a UDAF.

type OperatorName = Text Source #

The name of an operator defined in Karps.

data HdfsPath Source #

A path in the Hadoop File System (HDFS).

These paths are usually not created by the user directly.

Constructors

HdfsPath Text 

data DataInputStamp Source #

A stamp that defines some notion of uniqueness of the data source.

The general contract is that: - stamps can be extracted fast (no need to scan the whole dataset) - if the data gets changed, the stamp will change.

Stamps are used for performing aggressing operation caching, so it is better to conservatively update stamps if one is unsure about the freshness of the dataset. For regular files, stamps are computed using the file system time stamps.

Constructors

DataInputStamp Text 

data TransformInvariant Source #

The invariant respected by a transform.

Depending on the value of the invariant, different optimizations may be available.

Constructors

Opaque

This operator has no special property. It may depend on the partitioning layout, the number of partitions, the order of elements in the partitions, etc. This sort of operator is unwelcome in Karps...

PartitioningInvariant

This operator respects the canonical partition order, but may not have the same number of elements. For example, this could be a flatMap on an RDD (filter, etc.). This operator can be used locally with the signature a -> [a]

DirectPartitioningInvariant

The strongest invariant. It respects the canonical partition order and it outputs the same number of elements. This is typically a map. This operator can be used locally with the signature a -> a

data Locality Source #

The dynamic value of locality. There is still a tag on it, but it can be easily dropped.

Constructors

Local

The data associated to this node is local. It can be materialized and accessed by the user.

Distributed

The data associated to this node is distributed or not accessible locally. It cannot be accessed by the user.

PHYSICAL OPERATORS ***********

data StandardOperator Source #

An operator defined by default in the release of Karps. All other physical operators can be converted to a standard operators.

data ScalaStaticFunctionApplication Source #

A scala method of a singleton object.

data ColOp Source #

The different kinds of column operations that are understood by the backend.

These operations describe the physical operations on columns as supported by Spark SQL. They can operate on column -> column, column -> row, row->row. Of course, not all operators are valid for each configuration.

Constructors

ColExtraction !FieldPath

A projection onto a single column An extraction is always direct.

ColFunction !SqlFunctionName !(Vector ColOp)

A function of other columns. In this case, the other columns may matter TODO(kps) add if this function is partition invariant. It should be the case most of the time.

ColLit !DataType !Value

A constant defined for each element. The type should be the same as for the column A literal is always direct

ColStruct !(Vector TransformField)

A structure.

Instances

Eq ColOp Source # 

Methods

(==) :: ColOp -> ColOp -> Bool #

(/=) :: ColOp -> ColOp -> Bool #

Show ColOp Source # 

Methods

showsPrec :: Int -> ColOp -> ShowS #

show :: ColOp -> String #

showList :: [ColOp] -> ShowS #

data UdafApplication Source #

When applying a UDAF, determines if it should only perform the algebraic portion of the UDAF (initialize+update+merge), or if it also performs the final, non-algebraic step.

Constructors

Algebraic 
Complete 

data AggField Source #

A field in the resulting aggregation transform.

Constructors

AggField 

Fields

data SemiGroupOperator Source #

The representation of a semi-group law in Spark.

This is the basic law used in universal aggregators. It is a function on observables that must respect the following laws:

f :: X -> X -> X commutative associative

A neutral element is not required for the semi-group laws. However, if used in the context of a universal aggregator, such an element implicitly exists and corresponds to the empty dataset.

Constructors

OpaqueSemiGroupLaw !StandardOperator

A standard operator that happens to respect the semi-group laws.

UdafSemiGroupOperator !UdafClassName

The merging portion of a UDAF

ColumnSemiGroupLaw !SqlFunctionName

A SQL operator that happens to respect the semi-group laws.

DATASET OPERATORS ************

OBSERVABLE OPERATORS *******

AGGREGATION OPERATORS *****

data Pointer Source #

A pointer to a node that is assumed to be already computed.

data NodeOp Source #

Constructors

NodeLocalOp StandardOperator

An operation between local nodes: [Observable] -> Observable

NodeLocalLit !DataType !Value

An observable literal

NodeBroadcastJoin

A special join that broadcasts a value along a dataset.

NodeOpaqueAggregator StandardOperator

Some aggregator that does not respect any particular invariant.

NodeGroupedReduction !AggOp 
NodeReduction !AggTransform 
NodeAggregatorReduction UniversalAggregatorOp

A universal aggregator.

NodeAggregatorLocalReduction UniversalAggregatorOp 
NodeStructuredTransform !ColOp

A structured transform, performed either on a local node or a distributed node.

NodeDistributedLit !DataType !(Vector Value)

A distributed dataset (with no partition information)

NodeDistributedOp StandardOperator

An opaque distributed operator.

NodePointer Pointer 

Instances

makeOperator :: Text -> SQLType a -> StandardOperator Source #

Makes a standard operator with no extra value