Ticket #1887 (closed bug: worksforme)

Opened 6 years ago

Last modified 4 years ago

internal error while running parallel program

Reported by: mrd Owned by: simonmar
Priority: normal Milestone: 6.10 branch
Component: Runtime System Version: 6.9
Keywords: sanity error threads Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Problems occur when I use parFlatMap instead of concatMap.

Appears to be non-deterministic, some kind of subtle threading-related heap corruption.

$ ghc --make -debug -threaded mat_mult_ndp
$ gdb ./mat_mult_ndp
(gdb) run -p 4 test5.mat out.mat +RTS -N4 -DS

mat_mult_ndp: internal error: ASSERTION FAILED: file Sanity.c, line 86

    (GHC version 6.9.20071105 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
[New Thread 1124096336 (LWP 933)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 1107310928 (LWP 931)]
0x0000003e076305b5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003e076305b5 in raise () from /lib64/libc.so.6
#1  0x0000003e07632060 in abort () from /lib64/libc.so.6
#2  0x000000000047f888 in rtsFatalInternalErrorFn (s=0x529948 "ASSERTION FAILED: file %s, line %u\n", 
    ap=0x42002d80) at RtsMessages.c:164
#3  0x000000000047f44c in barf (s=0x529948 "ASSERTION FAILED: file %s, line %u\n") at RtsMessages.c:40
#4  0x000000000047f4a6 in _assertFail (filename=0x5390b8 "Sanity.c", linenum=86) at RtsMessages.c:55
#5  0x00000000004b301e in checkClosureShallow (p=0x2aaaaafb0050) at Sanity.c:86
#6  0x00000000004b2efe in checkSmallBitmap (payload=0x2aaaaafb1378, bitmap=2, size=4) at Sanity.c:51
#7  0x00000000004b324c in checkStackFrame (c=0x2aaaaafb1370) at Sanity.c:144
#8  0x00000000004b3418 in checkStackChunk (sp=0x2aaaaafb1360, stack_end=0x2aaaaafb1400)
    at Sanity.c:201
#9  0x00000000004b4a02 in checkTSO (tso=0x2aaaaafb1000) at Sanity.c:715
#10 0x00000000004833fb in threadStackOverflow (cap=0x7872b0, tso=0x2aaaaafb1000) at Schedule.c:2799
#11 0x0000000000481d90 in scheduleHandleStackOverflow (cap=0x7872b0, task=0x7b2bc0, t=0x2aaaaafb1000)
    at Schedule.c:1658
#12 0x00000000004810e2 in schedule (initialCapability=0x7872b0, task=0x7b2bc0) at Schedule.c:694
#13 0x0000000000482f3b in workerStart (task=0x7b2bc0) at Schedule.c:2528
#14 0x0000003e096061c5 in start_thread () from /lib64/libpthread.so.0
#15 0x0000003e076d062d in clone () from /lib64/libc.so.6

I tested it on an older install of GHC 6.7.20070831 and it had the same problem. test5.mat is attached sample input describing two matrices of size nxn (=64 in this case). Smaller inputs didn't seem to tickle the bug, or at least, not often enough to be noticed.

Attachments

mat_mult_ndp.hs Download (1.9 KB) - added by mrd 6 years ago.
source code
test5.mat Download (39.3 KB) - added by mrd 6 years ago.
sample data

Change History

Changed 6 years ago by mrd

source code

Changed 6 years ago by mrd

sample data

Changed 6 years ago by mrd

Latest GHC HEAD seems to have cut down on the occurrence of the error. It still happens, just less often. I don't see any reason why this might have happened, from the changelog.

Changed 6 years ago by simonmar

  • owner set to simonmar
  • difficulty set to Unknown
  • milestone set to 6.8.3

Changed 5 years ago by simonmar

Tried to repro this without success so far. I couldn't build ndp with 6.8.1 due to this:

ccTyCon: base:GHC.Base.Bool{(w) tc 3c}
ccTyCon: base:GHC.Base.Bool{(w) tc 3c}
ccTyCon: base:GHC.Base.Bool{(w) tc 3c}
ccTyCon: base:GHC.Base.Bool{(w) tc 3c}
ccTyCon: base:GHC.Base.Bool{(w) tc 3c}
ghc-6.8.1: panic! (the 'impossible' happened)
  (GHC version 6.8.1 for x86_64-unknown-linux):
        vectorise/Vectorise.hs:(303,0)-(327,78): Non-exhaustive patterns in function vectAlgCase

which I presume is not a bug. With the HEAD I build ndp and ran the test program on 2 processors many times (20+) successfully. Tried also with -N4, also this machine only has two cores.

I'll try next on an 8-proc Windows box, but if you have any more hints as to how to reproduce this I'd be grateful.

Changed 5 years ago by simonmar

  • milestone changed from 6.8.3 to 6.10 branch

no point looking at this on the branch, vectorization is only working on the HEAD.

Changed 4 years ago by simonmar

  • status changed from new to closed
  • resolution set to worksforme

Manuel Chakravarty says:

The program isn't really NDP program. It uses unboxed arrays from the old ndp package, but I don't see why it wouldn't just use unboxed arrays or the uvector/vector package. In fact, the combination of distributed arrays from the old NDP package together with strategies makes no sense whatsoever.

mrd, could you update the program and try again? As I reported above, I wasn't able to reproduce the problem here.

(closing as worksforme for now, re-open if you can reproduce the problem using up-to-date libraries and GHC).

Note: See TracTickets for help on using tickets.