{-
the following (currently 5) sparse formats will live here


DirectSparse 1dim



one subtlety and a seemingly subtle point will be
that contiguous / inner contiguous sparse arrays
in  2dim  (and  1dim) will have an ``inner dimension" shift int.
This is so that slices can  be zero copy on *BOTH* the array of values,
and the Format indexing array machinery.

Note that in the 2dim case, it still wont quite be zero copy, because the
offsets into the inner dimension lookup table (not quite the right word)
will have to change when a general slice is used rather than a slice
that acts only on the outermost dimension.
-}



-- {-# LANGUAGE PolyKinds   #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE StandaloneDeriving#-}
{-# LANGUAGE FlexibleInstances  #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE UndecidableInstances #-}

#if defined(__GLASGOW_HASKELL__) && __GLASGOW_HASKELL__ >= 707
 {-# LANGUAGE AutoDeriveTypeable #-}
#endif
module Numerical.Array.Layout.Sparse(
  Layout(..)
  ,DirectSparse
  ,CSR
  ,CSC
  ,CompressedSparseRow
  ,CompressedSparseColumn --  FIX ME, re add column support later
  ,Format(FormatDirectSparseContiguous
      ,FormatContiguousCompressedSparseRow
      ,FormatInnerContiguousCompressedSparseRow
      ,FormatContiguousCompressedSparseColumn
      ,FormatInnerContiguousCompressedSparseColumn)
  ,ContiguousCompressedSparseMatrix(..)
  ,InnerContiguousCompressedSparseMatrix(..)
  ,module Numerical.Array.Layout.Base
  ) where

import Data.Data
import Data.Bits (unsafeShiftR)
import Control.Applicative
import Numerical.Array.Layout.Base
--import Numerical.Array.Shape
import Numerical.InternalUtils
import qualified  Data.Vector.Generic as V
import Prelude hiding (error )


data CompressedSparseRow
  deriving (Typeable)

type CSR = CompressedSparseRow

data CompressedSparseColumn
    deriving (Typeable)

type CSC = CompressedSparseColumn

data DirectSparse
    deriving (Typeable)



data instance Format DirectSparse 'Contiguous ('S 'Z) rep =
    FormatDirectSparseContiguous {
      _logicalShapeDirectSparse:: {-# UNPACK#-} !Int
      ,_logicalBaseIndexShiftDirectSparse::{-# UNPACK#-} !Int
      ,_indexTableDirectSparse :: ! (BufferPure rep Int )  }


deriving instance Show  (BufferPure rep Int )  => Show (Format DirectSparse 'Contiguous ('S 'Z) rep)



{-
for some listings of the design space of Sparse matrices
as found in other tools,
see < https://software.intel.com/en-us/mkl_11.2_ref >
and then navigate to the section  "Sparse Matrix Storage Formats"  within
"BLAS and Sparse BLAS Routines"

<  http://netlib.org/linalg/html_templates/node90.html > is also pretty readable

theres a subtle detail about the invariants of contiguous vs inner inner contiguous
for CSR and CSC
when I do an inner contiguous / contiguous slice / projection,
what "address shifts" do i need to track to make sure the slices
are zero copy as much as possible

just slicing on the outer dimension doesn't need any row shifts,
but a generalized (a,b) ... (a+x,b+y) selection when a,b!=0 does need a inner
dim shift,

NOTE that translating the inner dimension table's addresses to the corresponding
value buffer's address can require a shift!
This will happen when doing a MajorAxis (outer dimension) slice
the picks out a Suffix of the CSR matrix's rows


note that there are 2 formulations of CSR (/ CSC) formats

a) 3 array: value, column index,  and  row start vectors

b) 4 array: value, column index, rowstart, and row end vectors

lets use choice a) for contiguous vectors, and choice b) for
inner contiguous vectors.

In both cases we need to enrich the type with a "buffer shift"
to handle correctly doing lookups on submatrices picked out
by either a major axis slice

-}


--deriving instance (Show (Shape (S (S Z)) Int), Show (BufferPure rep Int) )
    -- => Show (Format CompressedSparseRow Contiguous (S (S Z)) rep)

--deriving instance  (Eq (Shape (S (S Z)) Int), Eq (BufferPure rep Int) )
    -- => Eq (Format CompressedSparseRow Contiguous (S (S Z)) rep)

--deriving instance (Data (Shape (S (S Z)) Int), Data (BufferPure rep Int) )
  --- => Data (Format CompressedSparseRow Contiguous (S (S Z)) rep)

--deriving instance  (Typeable (Shape (S (S Z)) Int ), Typeable (BufferPure rep Int) )
 -- => Typeable (Format CompressedSparseRow Contiguous (S (S Z)) rep)
    --deriving (Eq,Data,Typeable)


{-
NOTE!!!!!
_logicalBaseIndexShiftDirectSparse (and friends)
are so that major axis slices can still use the same buffer,
(needed for both Contiguous and InnerContiguous cases).
So When looking up the Address for a value based upon its
Inner dimension, we need to *SUBTRACT* that shift
to get the correct offset index into the current SLICE.

NB: THIS IS A TERRRIBLE EXPLANATION, FIXMEEEEE

Phrased differently, This address shift is the *Discrepancy/Difference*
between the size of the elided prefix of the Vector and the starting
position of the manifest entries.

(Q: does this ever ever matter, or can i punt that to vector, and only
need this )


This is kinda a good argument for not punting the Slicing on the raw buffers to
Vector, because it generally makes this a bit more subtle to think about
and someone IS going to implement something wrong this way!


Another subtle and potentially confusing point is distinguishing between
Affine shifts in the Index Space vs the Address space.

Only the outer dimension lookup table shift is needed in the Contiguous
2dim case, but the 2dim InnerContiguous case is a bit more confusing
because of the potential for a slice along the inner dimension

Rank 1 sparse  (like Direct sparse) is only Contiguous,
and either a) doesn't need a shift, or b) only needs an index shift
commensurate matching the leading implicit index of a Major Axis Slice


theres a BIG corner case in most standard CSR / CSC formats which is
underspecified in most docs about CSC and CSR formats.
Consider Without loss of generality, CSR format
  1) how are empty rows modeled/signaled?
  2) if the last row is empty, how is that signaled?

2) The last row is signaled to be be empty by having
  the last entry of _outerDim2InnerDim buffer be set to >=
  length of _innerDimIndex buffer (ie >= 1 + largest index of _innerDimIndex)
1)

note that the outer index table has 1+#rows length, with the last one being the
length of the array

-}

data ContiguousCompressedSparseMatrix rep =
    FormatContiguousCompressedSparseInternal {
     -- does this need the index space shift for outer range slices???
      _outerDimContiguousSparseFormat ::  {-# UNPACK #-} !Int
      ,_innerDimContiguousSparseFormat ::  {-# UNPACK #-} !Int
      ,_innerDimIndexContiguousSparseFormat :: !(BufferPure rep Int)
      ,_outerDim2InnerDimContiguousSparseFormat:: ! (BufferPure rep Int )
  }
  deriving (Typeable)

deriving instance (Show (BufferPure rep Int))=> Show (ContiguousCompressedSparseMatrix rep)

{-
  outerDim innerDim  innerTable  outer2InnerStart
-}



{-
for Row major Compressed Sparse (CSR)
the X dim (columns) are the inner dimension, and Y dim (rows) are the outer dim
-}



data  InnerContiguousCompressedSparseMatrix rep =
   FormatInnerContiguousCompressedSparseInternal {
      _outerDimInnerContiguousSparseFormat ::    {-# UNPACK #-} !Int
      ,_innerDimInnerContiguousSparseFormat ::  {-# UNPACK #-} !Int
      ,_innerDimIndexShiftInnerContiguousSparseFormat:: {-# UNPACK #-} !Int

      ,_innerDimIndexInnerContiguousSparseFormat :: !(BufferPure rep Int)
      ,_outerDim2InnerDimStartInnerContiguousSparseFormat:: ! (BufferPure rep Int )
      ,_outerDim2InnerDimEndInnerContiguousSparseFormat:: ! (BufferPure rep Int )
         }
     deriving Typeable

deriving instance (Show (BufferPure rep Int))=> Show (InnerContiguousCompressedSparseMatrix rep)


newtype instance Format CompressedSparseRow 'Contiguous ('S ('S 'Z)) rep =
    FormatContiguousCompressedSparseRow {
      _getFormatContiguousCSR :: (ContiguousCompressedSparseMatrix rep) }

deriving instance Show (ContiguousCompressedSparseMatrix rep)
    => Show (Format CompressedSparseRow 'Contiguous ('S ('S 'Z)) rep)

newtype instance Format CompressedSparseColumn 'Contiguous ('S ('S 'Z)) rep =
    FormatContiguousCompressedSparseColumn {
      _getFormatContiguousCSC :: (ContiguousCompressedSparseMatrix rep) }

deriving instance Show (ContiguousCompressedSparseMatrix rep)
    => Show (Format CompressedSparseColumn 'Contiguous ('S ('S 'Z)) rep)

newtype instance Format CompressedSparseRow 'InnerContiguous ('S ('S 'Z)) rep =
    FormatInnerContiguousCompressedSparseRow {
      _getFormatInnerContiguousCSR :: (InnerContiguousCompressedSparseMatrix rep )
  }
deriving instance  Show (InnerContiguousCompressedSparseMatrix rep )
    =>  Show (Format CompressedSparseRow 'InnerContiguous ('S ('S 'Z)) rep)

newtype instance Format CompressedSparseColumn 'InnerContiguous ('S ('S 'Z)) rep =
    FormatInnerContiguousCompressedSparseColumn {
      _getFormatInnerContiguousCSC :: (InnerContiguousCompressedSparseMatrix rep )
  }

deriving instance Show (InnerContiguousCompressedSparseMatrix rep )
  =>  Show (Format CompressedSparseColumn 'InnerContiguous ('S ('S 'Z)) rep)

      --deriving (Show,Eq,Data)

{-
  FormatInnerContiguous rowsize columnsize

-}
--newtype instance Format CompressedSparseColumn Contiguous (S (S Z)) rep =
--    FormatContiguousCompressedSparseColumn {
--      _getFormatContiguousCSC ::  (ContiguousCompressedSparseMatrix rep)
--  }
    --deriving (Show,Eq,Data)

--newtype  instance Format CompressedSparseColumn InnerContiguous (S (S Z)) rep =
--    FormatInnerContiguousCompressedSparseColumn {
--     _getFormatInnerContiguousCSC :: (InnerContiguousCompressedSparseMatrix rep)
--  }
--    --deriving (Show,Eq,Data)

--CSR and CSC go here, and their version of lookups and next address and next index






--  Offset binary search --- cribbed with permission from
-- edward kmett's structured lib

{-
todo: theres some neat micro optimizations that are
possible If I know how indexed structures are paged aligned and what not
eg, when binary search, check both the first and last slot of a page I land on.
Also on >= Nehalem, pages are "paired" so if you land on the lower page, the
upper page is always loaded, etc etc. Not doing these for now.


also should compare against search strategies defined in
the vector-algorithms package, namely the
galloping ones
-}




{-
-- Assuming @l <= h@. Returns @h@ if the predicate is never @True@ over @[l..h)@
-- requires p be a "monotonic" predicate  (FFFFFTTTTT)
-}
bsearchUp :: (Int -> Bool) -> Int -> Int -> Int
bsearchUp p = go where
  go l h
    | l == h    = l
    | p m       = go l m
    | otherwise = go (m+1) h
    where hml = h - l
          m = l + unsafeShiftR hml 1 + unsafeShiftR hml 6
{-# INLINE bsearchUp #-}
{-
 Assuming @l <= h@. Returns @l@ if the predicate is never @True@ over @(l..h]@
  assumes predicate p is monotonic decreasing TTTTTFFFFF
  -}
bsearchDown :: (Int -> Bool) -> Int -> Int -> Int
bsearchDown p = go where
  go l h
    | l == h    = l
    | p (m+1)       = go (m+1) h
    | otherwise = go l m
    where hml = h - l
          m = l + unsafeShiftR hml 1 + unsafeShiftR hml 6
{-# INLINE bsearchDown #-}

{-
-- Assuming @l <= h@. Returns @h@ if the predicate is never @True@ over @[l..h)@
-- requires p be a "monotonic" predicate  (FFFFFTTTTT)
-- does a linear scan on the first constant number of slots
(for now 97 because i had to pick a number thats ~ log MaxInt)
and then falls back to binary search.
Meant to have O(1) average case, O(log n) worst case
-}
basicHybridSearchUp :: (Int -> Bool ) -> Int -> Int -> Int
basicHybridSearchUp  p = goCaseMe where
  goCaseMe l h  | (h-l <= magicConstant) || p magicConstant
                  {- either the range is short, OR
                    we know match happens in the first magicConstant size subrange
                  -}
                    = linearSearchUp p l (min h magicConstant)
                | otherwise = bsearchUp p magicConstant h
{-# INLINE  basicHybridSearchUp #-}


basicHybridSearchDown :: (Int -> Bool)-> Int -> Int -> Int
basicHybridSearchDown  p = goCaseMe where
  goCaseMe l h  | (h-l <= magicConstant)  || p (h- magicConstant)
                {-  either the range is short, OR
                    we know match happens in the first magicConstant size subrange
                 -}
                    = linearSearchDown p  (max l (h - magicConstant)) h
                | otherwise = bsearchDown p l (h - magicConstant)
{-# INLINE basicHybridSearchDown #-}

{-
i chose 97 because it seemed like a number thats ~ log MaxInt always (within 4x)
And is a range that should stay in L1 cache sizes for most purposes
-}
magicConstant :: Int
magicConstant = 97


-- Assuming @l <= h@. Returns @h@ if the predicate is never @True@ over @[l..h)@
linearSearchUp :: (Int -> Bool)-> Int -> Int -> Int
linearSearchUp p = go where
  go l h
    | l ==h = l
    | p l = l
    | otherwise = go (l+1) h
{-#INLINE linearSearchUp #-}

-- Assuming @l <= h@. Returns @l@ if the predicate is never @True@ over @(l..h]@
linearSearchDown :: (Int -> Bool)-> Int -> Int -> Int
linearSearchDown p = go where
  go l h
    | l ==h = l
    | p h = h
    | otherwise = go l (h-1)
{-#INLINE linearSearchDown #-}




--
-- now assumed each key is unique and ordered
--
-- Assuming @l <= h@. Returns @h@ if the predicate is never @True@ over @[l..h)@

-- should at some point try out a ternary search scheme to have even better
-- cache behavior (and benchmark of course)

searchOrd :: (Int -> Ordering) -> Int -> Int -> Int
searchOrd  p = go where
  go l h
    | l == h    = l
    | otherwise = case p m of
                  LT -> go (m+1) h
                  ---  entry is less than target, go up!
                  EQ -> m
                  -- we're there! Finish early
                  GT -> go l m
                  -- entry is greater than target, go down!
    where hml = h - l
          m = l + unsafeShiftR hml 1 + unsafeShiftR hml 6
{-# INLINE searchOrd #-}

lookupExact :: (Ord k, V.Vector vec k) => vec k -> k -> Maybe Int
lookupExact ks key
  | j <- searchOrd (\i -> compare (ks V.! i)  key) 0 (V.length ks - 1)
  , ks V.! j == key = Just $! j
  | otherwise = Nothing
{-# INLINE lookupExact #-}

lookupExactRange :: (Ord k, V.Vector vec k) => vec k -> k -> Int -> Int -> Maybe Int
lookupExactRange  ks key lo hi
  | j <- searchOrd (\i -> compare (ks V.! i)  key) lo hi
  , ks V.! j == key = Just $! j
  | otherwise = Nothing
{-# INLINE lookupExactRange  #-}

--lookupLUB ::  (Ord k, V.Vector vec k) => vec k -> k -> Maybe Int
--lookupLUB  ks key
--  | j <- search  (\i -> compare (ks V.! i)  key) 0 (V.length ks - 1)
--  , ks V.! j <= key = Just $! j
--  | otherwise = Nothing
--{-# INLINE lookupLUB  #-}

type instance  Transposed (Format DirectSparse 'Contiguous ('S 'Z) rep )=
   (Format DirectSparse 'Contiguous ('S 'Z) rep )




type instance LayoutAddress (Format DirectSparse 'Contiguous ('S 'Z) rep) =  Address


instance V.Vector (BufferPure rep) Int
  => Layout  (Format DirectSparse 'Contiguous ('S 'Z) rep ) ('S 'Z) where

  transposedLayout  = id
  -- {-# INLINE transposedLayout #-}

  basicLogicalShape = \ form -> _logicalShapeDirectSparse form  :* Nil
  -- {-# INLINE basicLogicalShape #-}

  basicCompareIndex = \ _ (a:* Nil) (b :* Nil) -> compare a b
  -- {-# INLINE basicCompareIndex #-}

  basicAddressRange = \form ->
    case (minAddress form , maxAddress form ) of
      (Just least, Just greatest) -> Just (Range least greatest )
      _ -> Nothing

    where
        minAddress =
          \ (FormatDirectSparseContiguous _ _   lookupTable) ->
              if  V.length lookupTable >0 then  Just $! Address 0 else Nothing

        maxAddress =
          \ (FormatDirectSparseContiguous _ _   lookupTable) ->
            if (V.length lookupTable >0 )
               then Just $! Address (V.length lookupTable - 1 )
               else Nothing

-- TODO, double check that im doing shift correctly
  {-# INLINE basicToAddress #-}
  basicToAddress =
      \ (FormatDirectSparseContiguous shape  indexshift lookupTable) (ix:*_) ->
         if  not (ix < shape && ix > 0 ) then  Nothing
          else  fmap Address  $! lookupExact lookupTable (ix + indexshift)

  {-# INLINE basicToIndex #-}
  basicToIndex =
    \ (FormatDirectSparseContiguous _ shift lut) (Address addr) ->
        ((lut V.! addr ) - shift) :* Nil
  {-# INLINE basicAddressAsInt #-}
  basicAddressAsInt = \ _ (Address a) -> a

  {-# INLINE basicNextAddress #-}
  basicNextAddress =
    \ (FormatDirectSparseContiguous _ _ lut) (Address addr) ->
      if  addr >= (V.length lut) then Nothing else Just  (Address (addr+1))

  -- {-# INLINE basicAddressPopCount #-}
  basicAddressPopCount = \ form (Range loadr@(Address lo) hiadr@(Address hi)) ->
    if not ( lo <= hi ) then
      error $! "basicAddressPopCount was passed a bad Address Range " ++ show loadr ++" " ++ show hiadr
      else
        case  basicAddressRange form of
          Nothing -> 0
          Just (Range (Address loBound) (Address  hiBound)) ->
            if not $ (loBound<= lo ) && (hi <= hiBound)
              then error $!
               "basicAddressPopCount was passed a bad Address Range: "
                ++show lo++" "++ show hi++"\nwith format Address range"
                ++ show loBound ++ " " ++ show hiBound
              else hi - lo


{-
    i've said it before, i'll say it again, scanning forward in the index space
    for sparse structures is really weird, :)

    NOTE: also need to remember to do those index space shifts for
    1dim direct sparse, and test them thoroughly
-}
  -- {-# INLINE basicNextIndex #-}
  basicNextIndex =
    \form@(FormatDirectSparseContiguous size shift lut) (ix:*Nil) mebeAddress ->
      if  ix >= size || ix >= (lut V.! (V.length lut -1) - shift ) then Nothing
            -- if ix is out of bounds or the last element, we're done!
      else
        let
            resAddr = Address $! bsearchUp  (\lix-> ix < ((lut V.! lix)-shift) )
                        0 (V.length lut )
        in case mebeAddress of
          Nothing ->  resAddr `seq` (Just (basicToIndex form resAddr ,  resAddr))
                -- Q: do i want the Index part of the tuple to be strict or not?
                -- leaving it lazy for now
                -- TODO / FIX / AUDIT ME / NOT SURE
              -- this is the fall back binary search based lookup

          Just (Address adr)->
          -- make sure the address hint is in bounds and
          -- is <= the current position
              if adr >0 && adr < (V.length lut -1) && ix >=((lut V.! adr )-shift)
              then
                -- by construction we know theres at least one applicable index
                -- thats
                let !nextAddr = Address $!
                                basicHybridSearchUp
                                  (\lix-> ix <  ((lut V.! lix)-shift ) )
                                  adr (V.length lut -1)
                  in  Just (basicToIndex form nextAddr ,  nextAddr)
              else
                resAddr `seq` (Just (basicToIndex form resAddr ,  resAddr))


------------
------------

type instance Transposed (Format CompressedSparseRow 'Contiguous ('S ('S 'Z)) rep )=
    (Format CompressedSparseColumn 'Contiguous ('S ('S 'Z)) rep )


type instance LayoutAddress (Format CompressedSparseRow 'Contiguous ('S ('S 'Z)) rep ) = SparseAddress

instance  (V.Vector (BufferPure rep) Int )
  => Layout  (Format CompressedSparseRow 'Contiguous ('S ('S 'Z)) rep ) ('S ('S 'Z)) where

  transposedLayout  = \(FormatContiguousCompressedSparseRow repFormat) ->
                          (FormatContiguousCompressedSparseColumn  repFormat)
  {-# INLINE transposedLayout #-}


  basicLogicalShape = \ form -> (_innerDimContiguousSparseFormat $ _getFormatContiguousCSR  form ) :*
         ( _outerDimContiguousSparseFormat $ _getFormatContiguousCSR form ):* Nil
          --   x_ix :* y_ix
  {-# INLINE basicLogicalShape #-}


  basicCompareIndex = \ _ as  bs -> shapeCompareRightToLeft as bs
  {-# INLINE basicCompareIndex #-}


  {-# INLINE basicAddressPopCount #-}
  basicAddressPopCount = \ form (Range (SparseAddress _ lo) (SparseAddress _ hi)) ->
    if not ( lo <= hi ) then
      error $! "basicAddressPopCount was passed a bad Address Range " ++ show lo ++" " ++ show hi
      else
        case  basicAddressRange form of
          Nothing -> 0
          Just (Range (SparseAddress _ loBound) (SparseAddress _ hiBound)) ->
            if not $ (loBound<= lo ) && (hi <= hiBound)
              then error $!
               "basicAddressPopCount was passed a bad SparseAddress Range: "
                ++show lo++" "++ show hi++"\nwith format SparseAddress range"
                ++ show loBound ++ " " ++ show hiBound
              else hi - lo

   -- {-# INLINE rangedFormatAddress #-}
  basicAddressRange = \ form ->
    case (minAddress form,maxAddress form) of
      (Just least, Just greatest)-> Just (Range least greatest)
      _ -> Nothing

    where
      {-
      probably should deduplicate min/maxAddress
      -}
      minAddress =
            \(FormatContiguousCompressedSparseRow
                (FormatContiguousCompressedSparseInternal  y_row_range x_col_range
                                                            columnIndex rowStartIndex)) ->
                    if  y_row_range < 1  || x_col_range < 1|| (V.length columnIndex  < 1)
                      then Nothing
                      else
                      -- the value buffer has the invariant the the end points
                      -- of the buffer MUST be valid  in bounds values if length buffer > 0
                    --SparseAddress $! 0 $! 0

                    -- hoisted where into if branch as let so lets could be strict
                        let
                          !addrShift = columnIndex V.! 0

                          -- for now assuming candidateRow is ALWAYS valid
                          --- haven't proven this, FIXME
                          !candidateRow= {-linearSearchUp-}
                               basicHybridSearchUp nonZeroRow 0 (y_row_range-1 )


                          {- FIXME, to get the right complexity
                          to linear search on first log #rows + 1 slots, then fall
                          back to binary search
                          punting for now because this probably wont matter than often

                          the solution will be to replace linearSearchUp
                          with a hybridSearchUp
                           -}
                          nonZeroRow =
                              \ !row_ix ->
                                   -- the first row to satisfy this property
                                  (rowStartIndex V.! (row_ix+1) >  rowStartIndex V.! row_ix)
                                  -- if the start index is >0, already past the min address row!
                                    ||  (rowStartIndex V.! row_ix) - addrShift > 0

                                  --else  maxIxP1 >  rowStartIndex V.! row_ix
                        in Just $! SparseAddress  candidateRow $! 0

      maxAddress  =
        \(FormatContiguousCompressedSparseRow
            (FormatContiguousCompressedSparseInternal   y_row_range x_col_range
                                                        columnIndex rowStartIndex)) ->
                if  y_row_range < 1  || x_col_range < 1|| (V.length columnIndex  < 1)
                  then Nothing
                  else
                  -- the value buffer has the invariant the the end points
                  -- of the buffer MUST be valid  in bounds values if length buffer > 0
                --SparseAddress $! 0 $! 0

                -- hoisted where into if branch as let so lets could be strict
                    let
                      !addrShift = columnIndex V.! 0
                      !maxIxP1 = V.length columnIndex

                      -- for now assuming candidateRow is ALWAYS valid
                      --- haven't proven this, FIXME
                      !candidateRow= {-linearSearchDown-}
                          basicHybridSearchDown nonZeroRow 0 (y_row_range-1 )


                      {- FIXME, to get the right complexity
                      to linear search on last log #rows + 1 slots, then fall
                      back to binary search
                      punting for now because this probably wont matter than often

                      the solution will be to replace linearSearchDown
                      with a hybridSearchDown
                       -}
                      nonZeroRow =
                          \ !row_ix ->
                       -- the first row to satisfy this property (going down from last row)
                              (rowStartIndex V.! (row_ix+1) >  rowStartIndex V.! row_ix)
                      -- if the start index is >= maxIxP1, havent gone down to max addres yet
                      -- if < maxIxp1, we're at or below the max address
                                ||  (rowStartIndex V.! row_ix) - addrShift < maxIxP1

                              --else  maxIxP1 >  rowStartIndex V.! row_ix
                    in
                        Just $!
                         SparseAddress  candidateRow $! (V.length columnIndex - 1 )

       -- \ (FormatContiguousCompressedSparseRow
       -- (FormatContiguousCompressedSparseInternal _ y_range
       --          columnIndex _)) ->
       --       SparseAddress (y_range - 1) (V.length columnIndex - 1 )

  {-#  INLINE basicAddressAsInt #-}
  basicAddressAsInt = \ _ (SparseAddress _ addr)-> addr

  {-# INLINE basicToIndex #-}
  basicToIndex =
        \ (FormatContiguousCompressedSparseRow
            (FormatContiguousCompressedSparseInternal  _ _ columnIndex _))
            (SparseAddress outer inner) ->
              (columnIndex V.! inner ) :* outer :*  Nil
          -- outer is the row (y index) and inner is the lookup position for the x index


{-
theres 3 cases for contiguous next address:
in the middle of a run on a fixed outer dimension,
need to bump the outer dimension, or we're at the end of the entire array

we make the VERY strong assumption that no illegal addresses are ever made!

note that for very very small sparse matrices, the branching will have some
overhead, but in general branch prediction should work out ok.
-}
  {-# INLINE basicNextAddress #-}
  basicNextAddress =
         \ (FormatContiguousCompressedSparseRow
            (FormatContiguousCompressedSparseInternal  _ _
              columnIndex rowStartIndex))
            (SparseAddress outer inner) ->
              if  inner < (V.length columnIndex -1)
               -- can advance further
                 -- && ( outer == (y_row_range-1)
                  --- either last row
                  || ((inner +1) < (rowStartIndex V.! (outer + 1)  - (rowStartIndex V.! 0 )))
                     -- or our address is before the next row starts
                     -- 3 vector CSR has a +1 slot at the end of the rowStartIndex

                then
                  Just (SparseAddress outer (inner+1))
                else
                  if inner == (V.length columnIndex -1)
                    then Nothing
                    else Just (SparseAddress (outer + 1) (inner + 1 ) )


  -- {-# INLINE basicToAddress #-}
  basicToAddress =
        \ (FormatContiguousCompressedSparseRow
            (FormatContiguousCompressedSparseInternal  y_row_range x_col_range
              columnIndex rowStartIndex))
          (ix_x:*ix_y :* _ ) ->
            if  not (ix_x >= x_col_range ||  ix_y >=y_row_range )
              then
              -- slightly different logic when ix_y < range_y-1 vs == range_y-1
              -- because contiguous, don't need the index space shift though!
                let
                  shift = (rowStartIndex V.! 0)
                  checkIndex i =
                      if  (columnIndex V.!i) == ix_x
                        then Just i
                        else Nothing
                in
                 (SparseAddress ix_y  <$>) $!
                    checkIndex =<<
                 --- FIXME  : need to check
                      lookupExactRange columnIndex ix_x
                          ((rowStartIndex V.! ix_y) - shift)
                          ((rowStartIndex V.! (ix_y+1) ) - shift)

              else   (Nothing :: Maybe SparseAddress )


  -- {-# INLINE basicNextIndex #-}
  {-  because nextIndex acts like a range query
      it doesn't make sense for inner loops
  -}
  basicNextIndex =
     \_form@(FormatContiguousCompressedSparseRow
              (FormatContiguousCompressedSparseInternal
                y_row_range x_col_range _columnIndex _rowStartIndex))
      _ix@(innerX :* outerY :*Nil) mebeSparseAddress ->
        if  not $ (innerX >=0 && innerX  < x_col_range ) && (outerY >= 0 && outerY < y_row_range)
          -- checking if index is inbounds for logical shape
          -- return Nothing if its out of bounds
          -- QUESTION: should it throw an error instead of returning nothing?
        then Nothing
        else
          case mebeSparseAddress of
            Nothing -> error "finish me "
              where
              {- Okay here we check if the proposed current index is manifest, or not
                Is it the right Row To search for the next index,
                Or if We need to search further along. This is the way that
                enables Usage of operations That give a complexity that is O(1)
                in the average/best case and O(log N )in the worst case

                The logical we do is roughly first check If there is an element
                strictly Greater than ix in next we are doing the successor
                That is within that Row And if so we can directly
                  do a binary search therein
                -}
                _resRow = error "finish me "

            (Just (SparseAddress _innerix _outerix) )
                -> error "really finish me"


        --case mebeAddress of
        --  Nothing ->
        --    let
        --    resAddr = Address $! bsearchUp  (\lix-> ix < ((lut V.! lix)-shift) )
        --                0 (V.length lut )
        --  in
        --   resAddr `seq` (Just (basicToIndex form resAddr ,  resAddr))
        --        -- Q: do i want the Index part of the tuple to be strict or not?
        --        -- leaving it lazy for now
        --        -- TODO / FIX / AUDIT ME / NOT SURE
        --      -- this is the fall back binary search based lookup
        --  Just (Address adr)->
        --  -- make sure the address hint is in bounds and
        --  -- is <= the current position
        --      if adr >0 && adr < (V.length lut -1) && ix >=((lut V.! adr )-shift)
        --      then
        --        -- by construction we know theres at least one applicable index
        --        -- thats
        --        let !nextAddr = Address $!
        --                        basicHybridSearchUp
        --                          (\lix-> ix <  ((lut V.! lix)-shift ) )
        --                          adr (V.length lut -1)
        --          in  Just (basicToIndex form nextAddr ,  nextAddr)
        --      else
        --        resAddr `seq` (Just (basicToIndex form resAddr ,  resAddr))




--type instance Transposed (Format CompressedSparseRow InnerContiguous (S (S Z)) rep )=
--    (Format CompressedSparseColumn InnerContiguous (S (S Z)) rep )

--type instance Transposed (Format CompressedSparseColumn InnerContiguous (S (S Z)) rep )=
--    (Format CompressedSparseRow InnerContiguous (S (S Z)) rep )


--instance Layout (Format CompressedSparseRow InnerContiguous (S (S Z)) rep ) (S (S Z)) where
--  transposedLayout  = \(FormatInnerContiguousCompressedSparseRow a b c d e f) ->
--    (FormatInnerContiguousCompressedSparseColumn a b c d e f)
--  {-# INLINE transposedLayout #-}
--  basicFormShape = \ form -> logicalRowShapeInnerContiguousCSR form  :*
--         logicalColumnShapeInnerContiguousCSR form :* Nil
--  {-# INLINE basicFormShape #-}
--  basicCompareIndex = \ _ as  bs ->shapeCompareRightToLeft as bs
--  {-# INLINE basicCompareIndex#-}



--instance  (V.Vector (BufferPure rep) Int )
--  => SparseLayout (Format CompressedSparseRow InnerContiguous (S (S Z)) rep ) (S (S Z)) where

--      type LayoutAddress (Format CompressedSparseRow
--          InnerContiguous (S (S Z)) rep ) = SparseAddress

--      {-# INLINE minSparseAddress #-}
--      minSparseAddress = \_ -> SparseAddress 0 0

--      {-# INLINE maxSparseAddress#-}
--      maxSparseAddress  =
--       \ (FormatInnerContiguousCompressedSparseInternal _ outer_dim_range _
--          innerDimIndex _) ->
--              SparseAddress (outer_dim_range - 1) (V.length innerDimIndex - 1 )


--      {-#INLINE basicToIndex #-}
--      basicToIndex =
--       \ (FormatInnerContiguousCompressedSparseInternal _ _  _ innerDimIndex _)
--          (SparseAddress outer inner) -> (innerDimIndex V.! inner ) :* outer :*  Nil
--          -- outer is the row (y index) and inner is the lookup position for the x index



--theres 3 cases for contiguous next address:
--in the middle of a run on a fixed outer dimension,
--need to bump the outer dimension, or we're at the end of the entire array

--we make the VERY strong assumption that no illegal addresses are ever made!

--note that for very very small sparse matrices, the branching will have some
--overhead, but in general branch prediction should work out ok.

--      {-# INLINE basicNextAddress #-}
--      basicNextAddress =
--         \ (FormatInnerContiguousCompressedSparseRow
--                (FormatInnerContiguousCompressedSparseInternal _ _ _
--                                                         columnIndex rowstartIndex))
--            (SparseAddress outer inner) ->
--              if not  (inner == (V.length columnIndex -1)
--                                          {- && outer == (y_range-1) -}
--                     || (inner +1) == (rowstartIndex V.! (outer + 1)))
--                then
--                  Just (SparseAddress outer (inner+1))
--                else
--                  if inner == (V.length columnIndex -1)
--                    then Nothing
--                    else Just (SparseAddress (outer + 1) (inner + 1 ) )

--        --  error "finish me damn it"
--      {-# INLINE basicToSparseAddress #-}
--      basicToSparseAddress =
--        \ (FormatInnerContiguousCompressedSparseRow
--            (FormatInnerContiguousCompressedSparseInternal x_range y_range addrShift
--                      columnIndex rowstartIndex))
--          (ix_x:*ix_y :* _ ) ->
--            if  not (ix_x >= x_range ||  ix_y >=y_range )
--              then
--              -- slightly different logic when ix_y < range_y-1 vs == range_y-1
--              -- because contiguous, don't need the index space shift though!
--                       SparseAddress ix_y   <$>
--                          lookupExactRange columnIndex ix_x
--                              -- ((rowstartIndex V.! ix_y) - addrShift)
--                            (if  ix_y < (y_range-1)
--                              -- addr shift is for correcting from a major axis slice
--                              then  (rowstartIndex V.! (ix_y+1) ) - addrShift
--                              else V.length columnIndex  - 1 )
--              else   (Nothing :: Maybe SparseAddress )