While running my test and benchmarking program for Etage-Graph package, I am getting sometimes (in around 1% runs) the following error (with different closure type number):
test: internal error: evacuate: strange closure type 4869608 (GHC version 7.1.20101124 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabugAborted
As it is not very often it is hard to debug. I am running the program as:
./test -s 400 +RTS -N4
Where 400 is number of nodes in the graph. Maybe it also happens with smaller number of nodes.
Ok, I left the -s 400 command running for 1.5 hours or so, and it didn't complete or crash (I had -debug turned on which might slow things down). I've also been trying -s 100 repeatedly; each one takes 20s or so, but no crashes so far.
You have to let the test finish. This is one run. It takes some time, yes. ;-) And in around 1 of 10 (program/test) runs at 400 nodes I get this error.
I'm having the same issue with git-annex and ghc 7.0.2, and can reproduce the bug reliably using git-annex.
Several git-bisect tests showed that fairly trivial (and seemingly unrelated) changes introduce the issue, hence the maintainer advised to report the bug here.
Detailed info is available "at the git-annex tracker".
I'm unsure if it's the same issue though, since ghc version seem to be different here and it's i686, not x86_64. Should I open a separate ticket?
I'm not really familiar with haskell language, but if there's any more helpful info or test data I can provide on the issue, I'd be happy to do so.
I'm having the same issue with git-annex... Should I open a separate ticket?
Yes, please make a separate ticket. I looked at the link you gave, but couldn't immediately see how to reproduce the problem. Can you give me enough information to be able to reproduce the problem here? I'll need the exact version of git-annex, how to build it, the input data (repo?), and the commands that provoke the error.
I'm having the same issue with git-annex... Should I open a separate ticket?
Yes, please make a separate ticket. I looked at the link you gave, but couldn't immediately see how to reproduce the problem. Can you give me enough information to be able to reproduce the problem here? I'll need the exact version of git-annex, how to build it, the input data (repo?), and the commands that provoke the error.
Tried to reproduce this on a separate, clean x86_64 machine with the same exherbo linux and i386 debian linux vm without any luck.
Since then I've updated ghc (to 7.0.3), git-annex and configuration of repository in question, and the issue seem to be gone. Reverting git-annex doesn't seem to help either, guess I'll try to rollback ghc update, but failing that I probably won't be able to get it again, alas.
I managed to get a segfault with this example and an up to date GHC built yesterday, using the suggested options (-s 400 +RTS -N4). I've rebuilt the binary with -debug and I'm trying to provoke a segfault again, but two runs so far have been sucessful:
Generating a random graph of size 400.Graph contains 400 nodes and 59999 edges.Dijkstra search time for shortest paths: 993.855562sEtage search time for shortest paths: 0.172937s (5.0s timeout)Etage graph (external structure) growing time: 7.674317sFound 0.75 % shortest paths.etage-graph-test: DissolvingException "()"[1] 31310 exit 1
at least, I assume that's a successful run.
I'm not hopeful about finding this bug, because the program takes so long to run and ties up 4 cores. I'll keep trying though.
I should have mentioned: if you know of a way to trigger the crash more often or more quickly, that would help a lot. Do certain heap settings make it more likely to fail?
Hm, this does not look like successful run, it seems your computer is slower (or probably because of the debugging is now slower) than mine and timeout is too low and not all paths have been found. ;-) It should find 100 % of shortest paths. Maybe a little explanation of the program:
it generates a random graph of some size
it runs Dijkstra among all graph nodes
it generates a data-flow structure for search for shortest paths among all graph nodes
this structure is a structure of spark-based IO computations and connections between them
it runs search for all shortest paths, this is a message-passing algorithm among all nodes and a lot of sparks and inter-spark communication is happening (this is where a segfault occurs, because it really extensively use Haskell sparks)
stopping condition is that for some time (5 s timeout by default) no path has been improved, assuming all shortest paths have been found
it compares found shortest paths with known shortest paths (found with Dijkstra)
So that only 0.75 % paths have been found means that in fact the problematic part of the code have not run long (enough). This is probably why it has succeeded.
So when running in debug mode timeout should be increased. Please increase minCollectTimeout and initialCollectTimeout in src/Test.hs.
I am sorry but I do not know how to generate a crash quicker. It is really a huge and extensive use of Haskell sparks and it seems it is a rare problem so it takes time to get it. I have not tested different heap settings.
That's interesting. I tried to reproduce this on a virtual machine running Linux and I tried 7.0.4, 7.2.2 and 7.4.1 and I cannot reproduce it anymore. It is true that I kept the same Haskell platform (2012.2) for all tests. That is probably a good thing.
But what is even more interesting is that my algorithm does not work correctly on 7.2.2 and 7.4.1 anymore! It is a message-passing shortest-path searching algorithm which incrementally updates states of each node as it discovers better and better paths. And when it finds a better node, it informs all the neighbors about that which might also improve their list of best paths. And this is repeated. Every node is a Haskell spark, edges, too. So I really create a lot of sparks. At least around 60000 of them for -s 400. :-)
And on 7.0.4 algorithm works. When messages stop being passed around, all shortest paths are found. But on 7.2.2 and 7.4.1 this is not so. Again and again only around 91% paths are found and then messages stop and program finishes because of this, but not all paths are found.
I am not working on this project anymore so I also don't have time or motivation to really debug it. I just wanted to publish my findings. So something is different between 7.0 and 7.2+ versions. Maybe there is a bug in my code which was not visible before. Maybe some API semantic changed just slightly, so that GHC sill compiles, but behavior is changed. I don't know. And it is too complex and stochastic to easy debug it. But of course, this is also why it is a good example of a complex program which really pushes GHC and its runtime to limits.