Ticket #1338 (new task)

Opened 2 years ago

Last modified 6 months ago

base package breakup

Reported by: simonmar Assigned to:
Priority: normal Milestone: 6.12 branch
Component: libraries/base Version: 6.6.1
Severity: normal Keywords:
Cc: Bulat.Ziganshin@gmail.com, id@isaac.cedarswampstudios.org, jpm@cs.uu.nl Difficulty: Unknown
Test Case: Operating System: Unknown/Multiple
Architecture: Unknown/Multiple

Description

This ticket replaces #710, some of which we've now done.

Latest proposal for splitting the base package: http://www.haskell.org/pipermail/libraries/2007-April/007342.html

Attachments

packagegraph.png (14.2 kB) - added by igloo on 08/06/08 12:45:27.

Change History

05/19/07 06:17:04 changed by igloo

  • owner set to igloo.

05/28/07 06:25:30 changed by igloo

Partially done. The packages that are already split out could do with some refactoring (e.g. changing to use filepath where appropriate). Still remaining is:

Ready to go:

System.Posix.Signals
--> unix (System.Cmd depends on it, but moves to new package process)

Control.Concurrent.*, System.Timeout
--> new package concurrent

Data.Unique
--> new package unique (dep on concurrent)

System.Console.GetOpt
---> new package getopt

Not ready:

Not clear what to do with these:
 Control.Applicative
 Data.Foldable, Data.Traversable
 Data.Map, Data.IntMap, Data.Set, Data.IntSet
 Data.Sequence, Data.Tree
 Data.HashTable
 Data.Graph
 ---> new package collections? containers?  or split further?
      (dep. on array, generics, concurrent)
 Data.Array.*
 --> new package array (maybe; I'm slightly dubious here)
      (dep. on concurrent for Data.Array.Diff)

Needs the above to happen first:
 Data.Generics.*
 --> generics (maybe; Data class is defined for everything and is derivable)

Will happen around the second half of June:
 Data.ByteString.*
 --> bytestring (dep. on base, generics, array)

Other modules we might move:
Text.Printf, Data.Monoid, System.CPUTime

Ross suggests System.Posix.Signal might belong in process too.

05/28/07 06:35:08 changed by igloo

[14:31] < ndm> Igloo: re the "generics" package
[14:31] < ndm> perhaps it should be the syb package
[14:31] < ndm> since its an implementation of syb, not an implementation of 
               generics
[14:31] < ndm> i'm going to be releasing Data.Generics.Uniplate shortly, and 
               thats going to want to be a separate package outside of generics
[14:32] < ndm> (it should have always probably been Data.Generics.SYB, but its 
               too late to change that)

07/03/07 02:48:06 changed by guest

  • cc set to Bulat.Ziganshin@gmail.com.

this is very important task from my POV, especially splitting out ByteStrings? which are still quickly improved

it will be great also to split out i/o and data structures. if there some problems that doesn't alllow to do it (such as Exception which defined via Handle and Prelude that includes i/o functions), we can setup ticket to change this too (probably in 6.10, though)

07/03/07 10:14:55 changed by Isaac Dupree

  • cc changed from Bulat.Ziganshin@gmail.com to Bulat.Ziganshin@gmail.com, id@isaac.cedarswampstudios.org.

As I think I mentioned somewhere else, we could put Prelude proper in a package other than base. Probably base needs to export Prelude, so it could re-export from some other package that base depends on, and move several things "pre-base", which will require some support from GHC. Eventually normal programs might not need to depend on "base" because they specify all their dependencies otherwise. (Or we could move the big ugly Prelude to package haskell98 and force people to depend on that if they want to import Prelude. Which wouldn't work very well until we have a way to avoid importing Prelude that works in all important Haskell compilers.)

It's important but will take a while. I'm patient. It would be nice if we could avoid each major release of GHC being incompatible in how to compile things with it, but we shouldn't let that hinder us from making progress towards a state where packages are more upgradeable. (We also need more from Cabal, which isn't exactly part of GHC proper.)

07/03/07 11:12:48 changed by igloo

Duncan is working on getting the new bytestring package into shape, and should be done comfortably before the GHC RC. At that point we'll remove Data.ByteString?* from base and use that instead.

07/06/07 11:04:04 changed by igloo

Ready to go:

System.Console.GetOpt
---> new package getopt

Not ready:

Causes a unix<->process dependency loop:
System.Posix.Signals
--> unix (System.Cmd depends on it, but moves to new package process)

Not clear what to do with these:
 Control.Applicative
 Data.Foldable, Data.Traversable
 Data.Map, Data.IntMap, Data.Set, Data.IntSet
 Data.Sequence, Data.Tree
 Data.HashTable
 Data.Graph
 ---> new package collections? containers?  or split further?
      (dep. on array, generics, concurrent)
 Data.Array.*
 --> new package array (maybe; I'm slightly dubious here)
      (dep. on concurrent for Data.Array.Diff)

Needs the above to happen first:
 Data.Generics.*
 --> generics (maybe; Data class is defined for everything and is derivable)

Needs Data.Array.Diff to move out of base first:
Control.Concurrent.*, System.Timeout
--> new package concurrent

Needs concurrent to be done first:
Data.Unique
--> new package unique (dep on concurrent)

Will happen soon:
 Data.ByteString.*
 --> bytestring (dep. on base, generics, array)

Other modules we might move:
Text.Printf, Data.Monoid, System.CPUTime

Ross suggests System.Posix.Signal might belong in process too.

07/06/07 12:24:09 changed by simonmar

Regarding System.Posix.Signals, it looks like the unix->process dependency is bogus, the two internal bits that unix depends on:

import System.Process.Internals ( pPrPr_disableITimers, c_execvpe )

should be moved to the unix package, and then process can depend on unix. With any luck that will enable some bits of process to be cleaned up, too.

07/07/07 08:27:13 changed by duncan

Please, please can we keep the class interfaces in the same package as Monad, Functor etc. So that'd be Control.Applicative, Data.Foldable and Data.Traversable. Otherwise people will be highly dissuaded from making their data types instances of Applicative etc. Just imagine if Functor was not in the base package and people had to depend on another package specifically, noone would ever make their data types an instance of functor since people prefer to keep deps to a minimum. So common interfaces should stay relatively close to the root of the package dep tree, implementations can be further down.

So moving the concrete implementations Map, Set, etc etc to a data/collections package is fine of course.

08/14/07 15:17:44 changed by igloo

  • milestone changed from 6.8 to 6.10.

We're about as far as we're going to get for 6.8 now, so moving to milestone 6.10.

06/18/08 16:43:21 changed by igloo

This is currently more-or-less blocked on "extensible exceptions", as that should get rid of most of the circular import problems.

07/09/08 14:50:05 changed by igloo

  • component changed from Compiler to libraries/base.

07/14/08 07:15:22 changed by simonmar

  • priority changed from high to normal.

Not essential for 6.10.1.

08/06/08 12:43:55 changed by igloo

Updated proposal. I'll attach packagegraph.png showing the package deps.

This block is mostly as before, except timeout has to be in its own package so that unique can sit in the middle:

timeout:        System.Timeout

unique:         Data.Unique

concurrent:     Control.Concurrent
                Control.Concurrent.Chan
                Control.Concurrent.MVar
                Control.Concurrent.QSem
                Control.Concurrent.QSemN
                Control.Concurrent.SampleVar

st can be pulled out:

st:             Control.Monad.ST
                Control.Monad.ST.Lazy
                Control.Monad.ST.Strict
                Data.STRef
                Data.STRef.Lazy
                Data.STRef.Strict

control should probably actually be merged with containers, but making it its own package made my experimenting simpler:

control:        Control.Applicative
                Data.Foldable
                Data.Monoid
                Data.Traversable

ghc-exts:       GHC.Exts
                GHC.PArr

The System.Mem modules don't really seem to fit here, but I didn't have anywhere better to put them, and they are under System after all.

system:         System.CPUTime
                System.Environment
                System.Exit
                System.Info
                System.Mem
                System.Mem.StableName
                System.Mem.Weak

numeric:        Data.Complex
                Data.Fixed
                Data.Ratio

generics:       Data.Generics
                Data.Generics.Aliases
                Data.Generics.Basics
                Data.Generics.Instances
                Data.Generics.Schemes
                Data.Generics.Text
                Data.Generics.Twins

version:        Data.Version

Little misc packages; we might want to fold some of these back in later, but for now I just wanted to get them out of the way:

getopt:         System.Console.GetOpt

debug:          Debug.Trace

printf:         Text.Printf

Again, these I was just getting out of the way. They're internal to GHC, so where they end up shouldn't much matter:

ghc-bits:       GHC.ConsoleHandler
                GHC.Desugar
                GHC.Environment
                GHC.TopHandler

The rest of base I currently have cut in 2, with a foreign package stuck in the middle. If things don't improve here then I expect we'll stick them all back together for 6.10:

base-top:       Control.Exception
                Control.OldException
                Control.Category
                Control.Arrow
                Control.Monad.Fix
                Control.Monad.Instances
                Text.Show
                Text.Show.Functions
                System.IO.Error
                System.IO
                System.Posix.Types
                System.Posix.Internals
                Data.Ix
                Data.Function
                Prelude

foreign         Foreign
                Foreign.C
                Foreign.C.Error
                Foreign.C.String
                Foreign.Concurrent (GHC-only)
                Foreign.ForeignPtr
                Foreign.Marshal
                Foreign.Marshal.Alloc
                Foreign.Marshal.Array
                Foreign.Marshal.Error
                Foreign.Marshal.Pool
                Foreign.Marshal.Utils
                Foreign.Ptr
                Foreign.StablePtr

base:           Control.Monad
                Data.Bits
                Data.Bool
                Data.Char
                Data.Dynamic
                Data.Either
                Data.Eq
                Data.HashTable
                Data.IORef
                Data.Int
                Data.List
                Data.Maybe
                Data.Ord
                Data.String
                Data.Tuple
                Data.Typeable
                Data.Word
                Foreign.C.Types
                Foreign.Storable
                Numeric
                System.IO.Unsafe
                Text.ParserCombinators.ReadP
                Text.ParserCombinators.ReadPrec
                Text.Read
                Text.Read.Lex
                Unsafe.Coerce
                (plus a load of GHC-only internal modules)

08/06/08 12:45:27 changed by igloo

  • attachment packagegraph.png added.

08/12/08 06:31:09 changed by simonmar

Yes to pulling out concurrent, st, generics, getopt, and moving the Control.Applicative stuff into containers. The rest don't seem to buy us a great deal, and I'm concerned that we're ending up with a plethora of tiny packages.

I'll commit the base3-compat stuff as soon as I can get it to validate on Windows, and then it'll need to be updated to reflect these changes.

08/15/08 16:52:57 changed by igloo

08/19/08 05:13:02 changed by simonpj

I'm also a bit concerned about creating lots of tiny packages. Maybe we can do this a step at a time?

If, indeed, we need do anything at all. What is the Main Goal here? Who is pushing for further decomposition of 'base', and what gains does it bring? Are these gains the most important thing to spend our limited effort budget on? There are plenty of other pressing issues! (Untying the mutual recursion is good regardless of further break-up.)

Simon

(follow-up: ↓ 21 ) 08/19/08 05:30:24 changed by igloo

One advantage of making base small is that if you are, for example, debugging GHC.Handle then you don't have to recompile >100 other modules every time you make a change in it.

Being able to separately upgrade the different parts is another advantage. Also, it means that we can have a separate maintainer for, e.g., SYB (well, this doesn't technically need it to be a separate package, but it's conceptually simpler if it is).

Breaking base up into packages also makes it much easier to see what the hierarchy is, and makes it easier to restructure the hierarchy. Plus it means that people can't re-tangle the logically separate components, which is all too easy to do when you just have one huge package.

It also means that packages are clearer about what they depend on. One possibility, which I think would be really cool, is to separate all the IO modules from the non-IO modules; between that and looking at the extensions used (e.g. TH and FFI) it would then be clear whether or not a library could do any IO. Of course, the Prelude is a hurdle for this goal.

08/19/08 05:33:03 changed by spl

I can't speak for most packages, but I would like to see that the Data.Generics modules are broken out into a separate package for easier maintainability and upgradeability.

Also, as was discussed in the thread linked by igloo, I would like to call it "syb" instead of "generics."

08/19/08 05:35:14 changed by simonpj

Igloo: these are all good goals. The question is really how high up our priority list they are.

spl: yes, I agree there's a specific reason for the generics stuff.

(in reply to: ↑ 18 ; follow-up: ↓ 22 ) 08/19/08 06:05:28 changed by simonmar

Replying to igloo:

One advantage of making base small is that if you are, for example, debugging GHC.Handle then you don't have to recompile >100 other modules every time you make a change in it.

All the other advantages are good, but this one is false I think. If you modify GHC.Handle you do have to recompile all the modules above it in the hierarchy, regardless of whether they're in another package or not. GHC may be able to avoid actual recompilation, but you at least need to invoke GHC on every module. (currently the build system doesn't do this except within a package, which is bad, and something we hope to fix).

(in reply to: ↑ 21 ; follow-up: ↓ 23 ) 08/19/08 06:12:57 changed by igloo

Replying to simonmar:

Replying to igloo:

One advantage of making base small is that if you are, for example, debugging GHC.Handle then you don't have to recompile >100 other modules every time you make a change in it.

All the other advantages are good, but this one is false I think. If you modify GHC.Handle you do have to recompile all the modules above it in the hierarchy, regardless of whether they're in another package or not.

Let me try to clarify: If you're debugging GHC.Handle then you don't need to recompile, for example, GetOpt after adding a debugging print or when you want to test a fix.

Once you've actually fixed the bug you'll need to recompile everything so that the other libraries all work again, agreed.

(in reply to: ↑ 22 ) 08/20/08 04:10:38 changed by simonmar

Replying to igloo:

Let me try to clarify: If you're debugging GHC.Handle then you don't need to recompile, for example, GetOpt after adding a debugging print or when you want to test a fix. Once you've actually fixed the bug you'll need to recompile everything so that the other libraries all work again, agreed.

Ok, I see what you meant. Thanks!

08/20/08 05:09:22 changed by dreixel

  • cc changed from Bulat.Ziganshin@gmail.com, id@isaac.cedarswampstudios.org to Bulat.Ziganshin@gmail.com, id@isaac.cedarswampstudios.org, jpm@cs.uu.nl.

08/25/08 16:19:40 changed by igloo

  • owner deleted.
  • milestone changed from 6.10 branch to 6.12 branch.

I've done the parts of this that nobody objected to namely:

concurrent, unique, timeout
st
syb (was: generics)
getopt

09/30/08 08:37:19 changed by simonmar

  • architecture changed from Unknown to Unknown/Multiple.

09/30/08 08:51:18 changed by simonmar

  • os changed from Unknown to Unknown/Multiple.

01/15/09 06:11:04 changed by igloo

Although it looks, from the source, like the IO part of Data.Typeable should be able to be split off from the Typeable classes etc, this is sadly not the case.

Right down at the bottom of the module hierarchy we have

error s = throw (ErrorCall s)

which needs ErrorCall to have a Typeable instance. Although in the source this is just deriving Typeable, the generated code calls mkTyCon, which calls mkTyConKey, which does IO (hidden by unsafePerformIO).