{-# LANGUAGE TypeSynonymInstances #-}

-----------------------------------------------------------------------------
-- |
-- Module      : Berp.Base.Identity
-- Copyright   : (c) 2010 Bernie Pope
-- License     : BSD-style
-- Maintainer  : florbitous@gmail.com
-- Stability   : experimental
-- Portability : ghc
--
-- A unique identity. All Python objects have a unique identity. In CPython
-- the identity is the machine address of the object. That relies on the 
-- object being kept at a fixed virtual address in memory. We can't do the
-- same thing in Haskell because the garbage collector can move data around.
--
-- TODO: consider lazy identities, which are only constructed on demand.
--
-----------------------------------------------------------------------------

module Berp.Base.Identity (Identity, newIdentity) where

import Berp.Base.Unique
import Berp.Base.Hash (Hash (..))
import Berp.Base.LiftedIO (MonadIO, liftIO)

type Identity = Unique

newIdentity :: MonadIO m => m Unique
newIdentity = liftIO newUnique

instance Hash Identity where
   hash = hashUnique

{- 

Some comments on this important module.

All Python objects have an identity which is unique to the object.
The identity of an object can be obtained by the primitive function id().

id is a "built in type or function". 

It returns an integer (class 'int'). According to the __doc__ for id
in Python 3.0 it says:

   id(...)
       id(object) -> integer
    
       Return the identity of an object.  This is guaranteed to be unique among
       simultaneously existing objects.  (Hint: it's the object's memory address.)

There's probably Python code out there which depends on the result being
an actual integer. But it would be nicer if it returned an abstract type.

There's also a builtin called hash():

   hash(...)
       hash(object) -> integer
    
       Return a hash value for the object.  Two objects with the same value have
       the same hash value.  The reverse is not necessarily true, but likely.
 
In some cases the hash function uses the identity of the object to obtain the hash
value.

The hash is quite useful, particularly because it is used to allow an object to be
a key in a dictionary. 

CPython's garbage collector does not move objects allocated on the heap.
This means it can use the address of the object as its
identity. Obviously this is problematic if we want to use GHC's collector which
does move objects.

Thus we must generate a unique identity for all objects when they are constructed.

A couple of important considerations:
   a) The scheme must scale. We should not have any limit on the number of
      identities that we can generate. 
   b) As computation time goes on we'd like to keep
      a handle on the size of individual identities. A constant size would be
      ideal, but we might allow for growth in the size of the identity value
      if it has reasonable asymptotic behaviour.
   c) It should work well with threads. Global counters must be atomically
      updated. 
   d) It is better if the scheme is portable (does not rely on deep GHC magic). 
   e) Should be fast.

A couple of options for implementation:

   1) A global mutable Integer counter protected by an MVar.
         - Satisfies a).
         - Size of counter grows logaithmically, but very slowly, so may be
           practical for the vast majority of applications. So probably 
           satisfies b).
         - Will work with threads, but at what cost? Each time an object
           is constructed the running thread must take the lock on the MVar,
           increment the counter, and release the lock. Incrementing an 
           Integer is not trivial, so there may be lock contention. 
           Probably satisfies c), but the significance of the time costs
           are unknown. 
         - MVars are not too magical, so probably satisfies d).

   2) A Stable Name.
         - Satisfies a). The number of stable names is only limited to the
           number of objects in memory (I think).
         - Satisfies b). The size of a stable name is constant. (good).
         - Satisfies c). I don't think there is any issue with thread 
           contention.
         - I think stable names are part of the FFI, so should be portable.
           Better check this. However, I don't think they work with parallel
           Haskell at present. Is this important? Hard to say.

   Regarding the speed of each method: it is hard to say without measuring
   them on real programs. My intution is that Stable Names have some advantage
   in multi-threaded programs because they don't go via MVars (or maybe they
   do, if the stable name table is locked in the runtime - better check this).
   I'm also a bit concerned that Stable Names were not designed to support a very
   large number of objects, and so may perform badly on Python programs which
   allocate many objects. 
-}