Ticket #2615 (closed bug: fixed)

Opened 5 years ago

Last modified 16 months ago

ghci doesn't play nice with linker scripts

Reported by: AlecBerryman Owned by: hgolden
Priority: high Milestone: 6.12.3
Component: GHCi Version: 7.0.3
Keywords: dlopen, dynamic linking Cc: maeder, fasta, slyfox, ghc@…
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I'm trying to use HsHyperEstraier? with ghci. I can compile and run the included examples, but when I run them in ghci, I see:

$ ghci
GHCi, version 6.8.3: http://www.haskell.org/ghc/  :? for help
Loading package base ... linking ... done.
Prelude> :l HelloWorld.hs
[1 of 1] Compiling Main             ( HelloWorld.hs, interpreted )
Ok, modules loaded: Main.
*Main> main
[...]
Loading package HsHyperEstraier-0.2.1 ... can't load .so/.DLL for: c
(/usr/lib/libc.so: invalid ELF header)

I see a similar error message if I specify '-package HsHyperEstraier?' on the command line.

I did some looking and came up with these messages:

 http://www.haskell.org/pipermail/glasgow-haskell-users/2004-May/006632.html  http://www.nabble.com/RE:-idea-to-allow-ghci-to-use-a-different-libs-list-p1830432.html

Debian's /usr/lib/libc.so is indeed a GNU linker script, not an actual shared library. If I remove all the libraries in HsHyperEstraier?'s ~/.ghc/.../package.conf that are linker scripts (pthreads and c), it loads up fine.

Could ghci either recognize or ignore linker scripts?

Attachments

T2615a.dsend Download (53.1 KB) - added by hgolden 3 years ago.
FIX #2615 - ghc repository
T2615b.dsend Download (42.9 KB) - added by hgolden 3 years ago.
FIX #2615 - testsuite repository
libncursesw.so Download (32 bytes) - added by greenrd 2 years ago.

Change History

  Changed 5 years ago by igloo

  • difficulty set to Unknown
  • milestone set to 6.10.2

This is a long-standing bug, but I can't find a ticket for it. Anyway, we should fix it.

  Changed 5 years ago by maeder

  • cc maeder added
  • architecture changed from x86_64 (amd64) to Unknown/Multiple

here is another example using ghc-6.8.3

Loading package cairo-0.9.13 ... can't load .so/.DLL for: pthread (/usr/lib/libpthread.so: invalid ELF header)

/usr/lib/libpthread.so contains:

/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf32-i386)
GROUP ( /lib/libpthread.so.0 /usr/lib/libpthread_nonshared.a )

  Changed 4 years ago by maeder

  • version changed from 6.8.3 to 6.10.1

This bug stops me from using template haskell (that uses ghci) and gtk in the sources for a cabal package.

  Changed 4 years ago by simonmar

  • priority changed from normal to high

Seems important to do something about this, but I'm not sure exactly what.

  Changed 4 years ago by maeder

Removing "pthread" (and "m") from the extraLibraries of the gtk package in my package.conf solved the problem. (I've left in "-pthread" under ldOptions.)

  Changed 4 years ago by igloo

  • owner set to igloo

  Changed 4 years ago by simonmar

As Duncan says, this won't be a problem when we're using shared libraries:

It means that ghci will not need to link to system shared libs except when someone uses -lblah on the ghci command line. That's because when we link a Haskell package as a shared lib the system linker interprets any linker scripts and embeds the list of dependencies on other shared libs (other Haskell packages and system libs). Then ghci just dlopens the shared libs for the directly used Haskell packages that that automatically resolves all their deps on other Haskell and system shared libs.

  Changed 4 years ago by igloo

  • priority changed from high to normal

The problem is illustrated by this C program:

#include <stdio.h>
#include <dlfcn.h>

int main(void) {
    void *p;

    p = dlopen("/usr/lib/libgmp.so", RTLD_LAZY | RTLD_GLOBAL);
    if (p) printf("OK\n");
    else   printf("%s\n", dlerror());
    p = dlopen("/usr/lib/libpthread.so", RTLD_LAZY | RTLD_GLOBAL);
    if (p) printf("OK\n");
    else   printf("%s\n", dlerror());

    return 0;
}

which fails to dlopen /usr/lib/pthread.so because it's a linker script:

$ gcc -ldl c.c -o c
$ ./c
OK
/usr/lib/libpthread.so: invalid ELF header
$ cat /usr/lib/libpthread.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/libpthread.so.0 /usr/lib/libpthread_nonshared.a )

This most commonly crops up with -lpthread and -lc, and in both these cases you can work around it by just not passing the flag.

I've done some digging, but haven't been able to find a replacement for dlopen that can handle linker scripts. There are two things we could do:

  • Special case -lpthread and -lc. This wouldn't solve the problem in general, but would fix the most common instances of it.
  • Make an empty library linked with -lpthread (or whatever other -l flags we're given) and dlopen that library. Then the system linker takes care of it for us. This is ugly, but if we have code to generate .so libraries anyway (for making dynamic Haskell libraries) then at least it's not too much work to implement.

  Changed 4 years ago by igloo

  • owner igloo deleted

  Changed 4 years ago by igloo

  • milestone changed from 6.10.2 to 6.12.1

  Changed 4 years ago by fasta

  • cc fasta added

  Changed 4 years ago by hgolden

At least on Gentoo, I think this can be dealt with as follows:

  1. In Linker.c if dlopen fails, search the file with a regular expression that would recognize "GROUP ( ... )" where ... is the important part. In Gentoo, when a .so file contains a linker script, the actual file is specified by the GROUP ( ... ).
  2. If this is found, try the dlopen again using the filename.
  3. If this fails, report an error.

I'm not familiar with debian or debian-based distros. Do they use a similar approach? If so, a regular expression search for their filename in the script could be added as well.

  Changed 4 years ago by slyfox

  • cc slyfox added
  • failure set to None/Unknown

  Changed 3 years ago by igloo

  • milestone changed from 6.12.1 to 6.14.1

  Changed 3 years ago by hgolden

  • keywords dlopen, dynamic linking added
  • failure changed from None/Unknown to Incorrect result at runtime
  • owner set to hgolden

I have been testing a patch which has been reviewed by Simon M. and Duncan C. I am now incorporating the changes they requested and preparing a test case. I expect to have this completed by December 14, 2009.

follow-ups: ↓ 17 ↓ 19   Changed 3 years ago by guest

  • cc ghc@… added

I want to use the llvm package in GHCi. To this end I converted all of the libLLVM*.a files to local libLLVM*.so. When I start the main function of a Haskell program using LLVM functions then I get the known:

  Loading package llvm-0.6.7.0 ... can't load .so/.DLL for: pthread (/usr/lib/libpthread.so: invalid ELF header)

My libpthread.so is also a script like that shown by Christian Maeder. However, pthread is not mentioned in llvm wrapper source files. It only appears in the files generated by configuration.

  $ grep -r pthread .
  Match in binary file ./dist/setup/setup.
  ./config.status:S["llvm_ldflags"]="-L/usr/lib/llvm  -lpthread -ldl -lm "
  ./config.status:S["LDFLAGS"]="-L/usr/lib/llvm  -lpthread -ldl -lm  "
  ./config.log:configure:3698: gcc -o conftest -g -O2 -I/usr/include  -D_DEBUG  -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS  -L/usr/lib/llvm  -lpthread -ldl -lm   conftest.c  >&5
  ./config.log:configure:4010: g++ -o conftest -g -O2 -I/usr/include  -D_DEBUG  -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS  -L/usr/lib/llvm  -lpthread -ldl -lm   conftest.c -lLLVMCore  -lLLVMSupport -lLLVMSystem  >&5
  ./config.log:LDFLAGS='-L/usr/lib/llvm  -lpthread -ldl -lm  '
  ./config.log:llvm_ldflags='-L/usr/lib/llvm  -lpthread -ldl -lm '
  ./llvm.buildinfo:ld-options: -L/usr/lib/llvm  -lpthread -ldl -lm  /usr/lib/llvm/LLVMX86AsmPrinter.o /usr/lib/llvm/LLVMX86CodeGen.o -lLLVMSelectionDAG -lLLVMAsmPrinter /usr/lib/llvm/LLVMExecutionEngine.o /usr/lib/llvm/LLVMJIT.o -lLLVMCodeGen -lLLVMScalarOpts -lLLVMTransformUtils -lLLVMipa -lLLVMAnalysis -lLLVMTarget -lLLVMCore -lLLVMSupport -lLLVMSystem -lstdc++

Where do I have to remove pthread?

in reply to: ↑ 16   Changed 3 years ago by hgolden

Replying to guest:

Where do I have to remove pthread?

I think you can compile your LLVM modules using the -normal way instead of the -threaded way.

My patch fixes this problem. I'm still working on a test, but the patch works. I could send it to you immediately if you are willing to rebuild your ghc. (Try the above suggestion first!)

Changed 3 years ago by hgolden

FIX #2615 - ghc repository

Changed 3 years ago by hgolden

FIX #2615 - testsuite repository

  Changed 3 years ago by hgolden

  • owner changed from hgolden to igloo

My patches above pass validation. Please let me know if I need to do anything else.

in reply to: ↑ 16   Changed 3 years ago by guest

Replying to guest:

Where do I have to remove pthread?

In the special case of LLVM I could just remove occurences of pthread from ~/.ghc/i386-linux-6.10.4/package.conf for the llvm package in order to solve that problem.

  Changed 3 years ago by simonmar

Looks good to me. It needs validating on OS X though: I think the #ifdefs at the top may need to be tweaked, as I don't think the #include <regex.h> is enabled under OBJFORMAT_MACHO.

Ian, could you validate & push?

  Changed 3 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Done.

  Changed 3 years ago by igloo

  • status changed from closed to reopened
  • type changed from bug to merge
  • resolution fixed deleted
  • milestone changed from 6.14.1 to 6.12.3

  Changed 3 years ago by igloo

  • priority changed from normal to high

  Changed 3 years ago by igloo

  • status changed from new to merge

  Changed 3 years ago by igloo

  • type changed from merge to bug

  Changed 3 years ago by igloo

  • status changed from merge to closed
  • resolution set to fixed

Didn't get merged

  Changed 2 years ago by greenrd

  • owner igloo deleted
  • status changed from closed to new
  • resolution fixed deleted

This still doesn't work for me with ghc 7.0.2 on Fedora 15.

The file /usr/lib/libncursesw.so contained the text INPUT(libncursesw.so.5 -ltinfo)

ghc gave the error "file too short". I know this bug has recently been fixed, so I tried to make the linker produce the other error message, "invalid ELF header", by adding lots of newlines onto the end of the file. It does change the error message:

Loading package terminfo-0.3.1.3 ... <command line>: can't load .so/.DLL for: ncursesw (/usr/lib/libncursesw.so: invalid ELF header)

but ghc doesn't pick up on the error message and do the right thing.

I think this error arises from an attempt to use Template Haskell, rather than ghci - could this be relevant?

Or did my newlines mess it up somehow?

  Changed 2 years ago by hgolden

  • status changed from new to infoneeded

I'll take a look at this. Please attach the complete /usr/lib/libncursesw.so file to this ticket. At first glance, Fedora 15 may be using a different linker script pattern from other systems (e.g., Gentoo).

Changed 2 years ago by greenrd

  Changed 2 years ago by greenrd

  • status changed from infoneeded to new

  Changed 2 years ago by hgolden

  • owner set to hgolden
  • version changed from 6.10.1 to 7.0.3

My original patch was too simplistic. It only handled the GROUP( ... ) command, not the INPUT( ... ) command. Apparently, the Fedora 15 scripts use INPUT( ... ) for redirection. I will add this to the code.

follow-up: ↓ 32   Changed 2 years ago by greenrd

It looks like such a change has already been made in git head - so I guess this is fixed in head.

in reply to: ↑ 31   Changed 2 years ago by hgolden

Replying to greenrd:

It looks like such a change has already been made in git head - so I guess this is fixed in head.

I didn't see this when I looked. Could you send me a link?

in reply to: ↑ 33   Changed 2 years ago by hgolden

  • status changed from new to closed
  • resolution set to fixed

Looks good to me. I'm closing this based on igloo's patch linked above.

  Changed 18 months ago by SimonHengel

Fixed in 7.2.

follow-up: ↓ 37   Changed 16 months ago by alexp

I'm having the same problem as greenrd with /usr/lib/libncursesw.so.

So I thought to use the current ghc 7.2.2. But the ghc homepage advises not to build ghc manually and instead to use haskell-platform. But, if I were to do that then I would be waiting eleven months as I'm on Fedora 16. And the next release of Fedora in May 2012 will ship the current haskell-platform 2011.4.0.0 with ghc-7.0.4 also. So all hopes are on the next+1 version of Fedora that ships in November 2012, which might hopefully include ghc>7.2. Eleven months.

I'm thinking my best option is to patch the existing 7.0.4 ghc. I can also send this to the Fedora maintainer so it might go in as an update.

I see a whole lot of patches above, but I can't follow the code to put together a single patch for 7.0.4.

in reply to: ↑ 36   Changed 16 months ago by hgolden

Replying to alexp:

I see a whole lot of patches above, but I can't follow the code to put together a single patch for 7.0.4.

There's very little change between Linker.c in 7.0.4 and 7.2.2. I think the only thing you need to change is the regular expression. I suggest you do a diff between the 7.0.4 version and the 7.2.2 version and change the regular expression in 7.0.4 to match.

follow-up: ↓ 39   Changed 16 months ago by SimonHengel

Any chance to get this into the next 7.0 minor release (if any)?

in reply to: ↑ 38   Changed 16 months ago by simonmar

Replying to SimonHengel:

Any chance to get this into the next 7.0 minor release (if any)?

There won't be another 7.0 release, I'm afraid.

Note: See TracTickets for help on using tickets.