time-series- Time series analysis.

Safe HaskellNone




Scratches and notes.



From: 2.2.3 Random Projection

sketch_of_t :: [Double]Source

Size 2 sketch vector of t constructed from random vectors v1 and v2.

>>> sketch_of_t

dotp :: Num a => [a] -> [a] -> aSource

From: 2.3 Matching Pursuit


"Given a vector pool containing n time series V = (v_1,v_2,...,v_n), a target vector v_t, tolerated error \eps, and approximating vector set V_a = \theta. Define cos\theta a_1 = ṽ_t * ṽ_1 as the cosine between v_t and a vector v_1 in V. Here vector v = v / ‖v‖.

  1. Set i = 1;
  2. Search the pool V and find the vector v_1 whose |costheta_1| with respect to v_t is maximal;
  3. Compute the residue r = v_t - c_1v_1 where c_1 = (‖v_t‖/‖v_1‖) cos\theta. V_a = V_a ⋃ {v_1}
  4. If ‖r‖ < epsilon terminate, return V_a.
  5. Else set i = i + 1 and v_t = r, go back to 2."

From: 3.4.1 The Sketch Approach

Quote from the pdf:

" ... Quantitatively, given a point x ∈ Rᵐ, we compute its dot product with d random vectors rᵢ ∈ {1, -1}ᵐ. The first random projection of x is given by:

  • y1 = (x*r_1, x*r_2, ..., x*r_d)

We compute 2b more such random projections y_1,...,y_2b_{+1}. If w is another point in Rᵐ and z_1,...,z_2b_{+1} are its projections using dot products with the same random vectors then the median of

  • ‖y_1-z_1‖, ‖y_2-z_2‖, ..., ‖y_2b_{+1}-z_2b_{+1}‖

is a good estimate of ‖x-w‖. It lies within a theta(1/d) factor of ‖x-w‖ with probability 1 - (1/2)ᵇ. ... "


"...Our approach is to use a \"structured\" random vector. The apparently oxymoronic idea is to form each structured random vector r from the concatenation of nb = sw/nb random vectors: r = s_1,...s_{nb}, where each si has length bw. Further each si is either u or -u, and u is a random vector in {1, -1}ᵐ. This choice is determined by a random binary nb-vector b: if b = 1, si = u and if bi = 0, si = -u. The structured approach leads to an asymptonic performance of O(nb) integer additions and O(log bw) floating point operations per datum and per random vector..."

norm :: Floating a => [a] -> aSource

dist_direct :: [Double] -> [Double] -> DoubleSource

Direct distance.



:: Int

Sketch size d.

-> [Double]

First vector.

-> [Double]

Second vector.

-> Double 

Sketch distance.

chunks :: Int -> [a] -> [[a]]Source

direct_div_by_sketch_distance :: Integer -> Int -> DoubleSource

d / s, where d is direct distance and s is sketch distance.

print_dist_ratios :: IO ()Source

Print direct_div_by_sketch with different sketch sizes.

From: 3.4.2 Partitioning Sketch Vectors

Quote from the pdf:

"... Note that we use correlation and distance more or less interchangebly because one can be computed from the other once the data is normalized. Pearson correlation is related to Euclidean distance as follows:

 D^2(x̂,ŷ) = 2(1 - corr(x,y))

Here x̂ and ŷ are obtained from the raw time sereis by computing:"

      x - avg(x)
 x̂ = ———————————–

corr_from_direct_distance :: CorrFuncSource

From above:

 corr(x,y) = 1 - ———————

corr_from_distance :: DistanceFunc -> CorrFuncSource

Using normalized input vector X and Y.

From: 3.4.3 Algorithmic Framework

"... Suppose we are seeking points within some distance d in the original time series space.

  • Partition each sketch vector s of size N into groups of some size g.
  • The ith group of each sketch vector s is placed in the ith grid structure of dimension g.
  • If two sketch vectors s1 and s2 are within distance c ✕ d in more than a fraction f of the groups, then the corresponding windows are candidate highly correlated windows and will be checked exactly."

From: 3.5 The Issues in Implementation

"... data normalization and its sketch within a sliding window are as follows (k = sw):

       X_k - avg(X_k)
 X̂_k = ——————————————

 Xsk^k = X̂_k⋅R_k

... difficulty lies in that avg(X_k) and var(X_k) change over each basic window .... we will show that the updating is trivial and the sketch needs to be computed only once."



:: Integer

Random seed

-> Int

bw, basic window size

-> Int

sw, sliding window size

-> [Double]

Input data

-> [([Double], Double, Double)]

Sliding window contents, avg(Xₖ), and var(Xₖ) (k = sw)

Update avg and var in a manner described in section 3.5.

compare_naive_and_incremental_avg_and_var :: Integer -> Int -> Int -> [Double] -> IO [Double]Source

Self descriptive function to show the comparison of naive and incremental avg and var.

comparison_of_avg_and_var :: IO ()Source

Sample comparison of avg and var, as shown in the function name.

From: 3.6.3 Performance Tests

"... To make the comparison concrete, we should specify our software architecture a bit more. The multi-dimentional search structure we use is in fact a grid structure. The reason we have rejected more sophisticated structures is that we are asking a radius query: which windows (represented as points) are within a certain distance of a given point? A multi-scale structure such as a quadtree or R-tree would not help in this case. Moreover, the grid structure can be stored densely in a hash table so empty cells take up no space. .."

From: Appendix B: Structured Random Projection for Sliding Window

"Given a sliding window length sw and a basic window length bw, we show how to compute the sketches for the sliding windows starting at each timepoint ... we will use a far more efficient approach based on the convolution between "structured random vector" and a data vector of length sw."

And from 3.5: The Issues in Implementation

$ "... To normalize the whole sliding window based on the same average and variance, the sketch of the basic window Xbwⁱ,i=0,1,...,nb will be updated as follows.

         (Xbwⁱ⋅R) - avgsw  ∑  Ri
 Xskⁱ = ——————————————————————————

where avg will be updated by removing the oldest basic window and adding the new arrival Xnb, ...."



:: Integer

Random seed.

-> Int

nb, sliding window size / basic window size.

-> Int

bw, basic window size.

-> RandomVector 

Random vector rbw:

  • rbw = (r₀,r₁,...,rbw-1

and control vector b:

  • b = (b₀,b₁,...,bnb-1



:: Integer

Random seed.

-> Int

nb, sliding window size / basic window size.

-> Int

bw, basic window size.

-> [Double]


Random vector for sliding window, built from unit random vector and control vector:

  • r = (rbw⋅b₀,rbw⋅b₁,...,rbw⋅bnb-1

data RandomVector Source

Random vector used for structured sketches.




rvUnit :: [Double]

Random values treated as unit.

rvControl :: [Double]

Random values treated as controlling vector.

print_comparisons_of_single_sketch :: IO ()Source

Print comparisions of single sketch.

Showing sliding window contents, sketch from structured vector, sketch from direct dot product, diff of the two, and ratio of the two.



:: Integer

Random seed.

-> Int

Size of sketch.

-> Int

bw, basic window size.

-> Int

sw, sliding window size.

-> [Double]

Input data stream.

-> [([Double], [Double])]

Sliding window contents and sketches.

Scratch of: Figure B.6: Structured convolution procedure every basic window. Still has "Curse of Dimensionality".

This time, sketch size is controlled with given argument.

XXX: Normalization is improper.

push :: [a] -> [a] -> [a]Source

Push the contents of first list to second list, returning list having same length as second list.

print_comparisons_incr :: IO ()Source

Show ratio of correlation value computed with direct function and computed with standardized convolved sketch distance.

From: Appendix C: An Upper Bound of the Grid Size


"Now let's examine how to embed this sketch filter in a grid structure. At first, we assume our parameter group is (N,g,c,f) and therfore there are totally ng = N/g groups. We will assign one grid structure to each sketch group. For each grid structure the critical parameter is the largest value A and diameter a. Now let's bound the size of A...".


len :: Num b => [a] -> bSource

mean :: Fractional a => [a] -> aSource

prg64Doubles :: Integer -> [Double]Source

Generate list of doubles between 0 to 1 with PRG64.

sq :: Floating a => a -> aSource