|Version 2 (modified by dterei, 2 years ago)|
Base Package Safety
This page presents a module breakdown of the safety of the Base package.
- Green: Made safe with no modifications
- Blue: Made trustworthy with no modifications
- Yellow: Split out some unsafe functions to Module.Unsafe, made Module trustworthy
- Red: Left unsafe
Most blue squares are blue because they import GHC.Base which is currently unsafe. Other also import unsafePerformIO operations.
For splitting modules that contain both Safe and Unsafe Symbols, I've moved the entire definition to a new module called say GHC.Arr.Imp. Then added two new module, GHC.Arr.Safe, GHC.Arr.Unsafe. Then changed GHC.Arr to import the Safe and Unsafe modules and either just export the Safe API or export both Safe and Unsafe depending on a CPP flag. This allows us to choose at compile time if we want the base package to be safe by default or not. I could have used a simpler approach like having the entire module defined in GHC.Arr.Unsafe and not have a Imp module but I preferred the Safe and Unsafe modules having disjoint API's rather than Safe being a subset.
Below is the breakdown for just the GHC modules in base:
*I tried to split Weak into Unsafe and Safe modules and have GHC.Weak just expose the Safe api (i.e this would make it a yellow box like the others). However I wasn't able to figure out how to move the definition of Weak. Many of the GHC modules are wired in and require changes to compiler/prelude/PreNames. For all other modules I was able to update their builtin location fine but for Weak I continually got links errors when trying to build libRts.a if I tried to move the definition of GHC.Weak around.
These are notes on specific modules and why they are the colour they are... ect.
GHC.Base and GHC.Prim: Leaving unsafe. Had a go at making safe versions but gets pretty ugly and complex quickly. See Base Module for a more detailed discussion.
GHC.Conc: Is it safe to expose ThreadId's constructors?
For the moment I've hidden both
GHC.Conc.IO and GHC.Conc.IO.Windows: Made safe version that doesn't contain the asyncReadBA, asyncWriteBA functions. Perhaps these can be left in and GHC.Conc.IO just made trustworthy since their result is in the IO monad but they take a 'MutableByteArray# RealWorld' as a second parameter.
GHC.Event: Made trustworthy... Not sure of this though
GHC.Exts: Left unsafe and didn't make safe / unsafe split Mostly seems fine, only worry is access to Ptr constructor. Also re-exports GHC.Prim
GHC.Ptr: made safe/unsafe split Exposes Ptr constructor Cast operations of funptr to ptr seem dangerous as well, removed from safe version.
GHC.ForeignPtr: Made ForeignPtr type abstract Has an '!unsafeForeignPtrToPtr' function also excluded The whole module seems a little dangerous. (e.g castForeignPtr) As long as pointers can only be dereferenced in the IO monad we should be OK though.
(Foreign.ForeignPtr - as above) (Foreign.Ptr - as above)
GHC.IO.Encoding.CodePage?.Table: Exports raw Addr# arrays. Also pretty specific code so doesn't seem that useful outside of the base package.
GHC.IOBase: keeping unsafe and no safe version as depreciated module.
GHC.IORef: Made safe version due to access to IORef constructor
GHC.Pack: keeping unsafe and no safe version. unpackCString# Among others seem quite unsafe.
GHC.Weak: *Made a Safe version but I had to leave GHC.Weak alone. When I tried to move GHC.Weak to GHC.Weak.Imp I would constantly get link errors when linking the libRts library. I changed the values in compiler/prelude/PrelNames.hs for GHC.Weak but this didn't seem to work. So there is GHC.Weak.Safe and GHC.Weak.Unsafe but no GHC.Weak.Imp and GHC.Weak has to be unsafe.
GHC.Word: Left unmodified and made trustworthy 'uncheckedShiftRL64' is a little scary sounding but seems fine.
Data.Data and Data.Dynamic and Data.Typeable' Left unsafe due to whole Typeable issue.
Debug.Trace: Was left unsafe. It can leak information to the console without detection.
The root of the base package and so of Haskell is GHC.Base and GHC.Prim. These both contain a lot of code and a lot of it is unsafe. Some of it obviously other less so. For example:
- Addr# and Array# types are basically C style pointers, so no bounds checks. Can access arbitary memory with them, buffer overflows... ect
- divInt :: Int -> Int -> Int seems perfectly safe but division by zero throws an uncatchable exception that crashes the program. (Is this intentional or a bug?)
It is also quite difficult to split this up since 1) GHC.Prim is defined inside of GHC not in any module text file, 2) GHC.Base is defined in a text file but extended by GHC (so GHC.Base exports Bool but Bool isn't defined in the actual GHC.Base text file).
This is potentially another argument for symbol level safety, it would make handling Base and Prim easier.
This does mean a lot of stuff is trustworthy though since they import Base. I'd be happy to deal with the complexity of making Safe versions but it seemed like the ongoing maintenance work wouldn't be worth the benefits.
The best solution might be to leave Base and Prim alone and make Base.Safe and Prim.Safe that are both extended on demmand. (e.g we just add safe symbols to them as needed to get modules that use Base and Prim in a safe way to work in -XSafe). A fine grained total split of Base and Prim is doable but seems like it might be a maintenance problem.
I feel we could enable all of this except make Typeable abstract so that instances can't be defined. (Could also still allow deriving of these instances). My understanding is that all of this dynamic stuff works fine as long as the typeOf method basically doesn't lie and pretend two types are the same. The original SYB paper on Typeable from memory basically said this and said that allowing programmers to define their own instances of typeOf was really an implementation artifact and that it should be left up to the compiler.