Ticket #6128 (closed bug: invalid)

Opened 13 months ago

Last modified 11 months ago

ghc 7.4.2 does not work with LDAP-0.6.6

Reported by: magicloud Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.4.2
Keywords: c binding poll Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Sample code as below. When I ghci/runhaskell/ghc (compile) it with ghc 7.2.2. Everything is fine. But when I use ghc 7.4.1, only ghci/runhaskell works. ghc compiled (without any extra parameters) binary always failed at the second ldap operation with message: LDAP error: ldapSimpleBind: LDAPException LdapServerDown?(-1): Can't contact LDAP server. I am sure there is nothing wrong with the network, the client library, or the server/service. Trying to capture the network package, it seemed that the program did not open a network connection at all.

Code:

ldap <- ldapInit domain ldapPort ldapSimpleBind ldap bindDN bindPW

Change History

  Changed 13 months ago by simonmar

  • status changed from new to infoneeded
  • difficulty set to Unknown

We'll probably need more information to debug this one. Is there a way to reproduce it easily?

  Changed 13 months ago by magicloud

Hi, it is hard to say "easily" since the condition is confusing. For me, Debian unstable/stable CentOS 6.1 were all tested with self-compiled 7.4.1 (without any extra configuration arguments, all gcc/binutils whatsoever came with the OS). All failed. For Chris Dornan from haskell-cafe, he used 7.4.1 to make the code work. I have compared the .hi/.o files between him and me. Nothing interesting. Hoping this helps.

  Changed 12 months ago by simonmar

What I meant was, can you give specific instructions to reproduce the problem? Is it possible to reproduce it without an LDAP server? (I don't have one here, AFAIK).

  Changed 12 months ago by magicloud

Ah, I see. I do not think it is possible to reproduce without a ldap server. Because using a non-existing server would get the same error of "Can't contact LDAP server". But if you knew one (MS AD works), you could simply use fake bindDN and bindPW to test, since bind fail is a different error message.

  Changed 12 months ago by simonmar

Please assume I know nothing at all about LDAP. Can you give me the complete source code for a program that demonstrates the failure, and tell me which packages it depends on? I can probably find out the address of a local AD server to test it against, if you tell me where exactly I need to put the address, and in what form (IP address or DNS name).

  Changed 12 months ago by magicloud

Sorry for that. Here is the code. Please input domain as the ldap server ip, bindDN/bindPW as some random strings. And the dependency is LDAP-0.6.6, which requires libldap2.

import LDAP

main :: IO ()
main =
    do putStrLn "domain>"
       domain <- getLine
       putStrLn "bindDN>"
       bindDN <- getLine
       putStrLn "bindPW>"
       bindPW <- getLine
       putStrLn "conecting..."
       ldap <- ldapInit domain ldapPort
       ldapSimpleBind ldap bindDN bindPW
       putStrLn "done"

  Changed 12 months ago by magicloud

With all default, 7.4.2 failed, too.

  Changed 11 months ago by magicloud

  • keywords c binding added; LDAP removed
  • version changed from 7.4.1 to 7.4.2

  Changed 11 months ago by magicloud

Did a packets capturing, and dig libldap2 source a little. It seems that, some runtime condition is damaged by ghc, that libldap2 falls into error.

follow-up: ↓ 11   Changed 11 months ago by magicloud

Exception has been caught in openldap-2.4.31/libraries/libldap/os-ip.c line 1123: rc = poll( sip->si_fds, sip->si_maxfd, to ); This poll call returns -1 and set errno to 4. I have not got the reason.

And following is the full debug:

ldap_open(company.com, 389) ldap_pvt_gethostbyname_a: host=test.company.com, r=0 ldap_url_parse_ext( ldap://localhost/) ldap_init: trying /usr/local/etc/openldap/ldap.conf ldap_init: using /usr/local/etc/openldap/ldap.conf ldap_init: HOME env is /home/magicloud ldap_init: trying /home/magicloud/ldaprc ldap_init: trying /home/magicloud/.ldaprc ldap_init: trying ldaprc ldap_init: LDAPCONF env is NULL ldap_init: LDAPRC env is NULL ldap_create ldap_new_connection 1 1 0 ldap_int_open_connection ldap_connect_to_host: TCP company.com:389 ldap_new_socket: 3 ldap_prepare_socket: 3 ldap_connect_to_host: Trying 10.254.1.101:389 ldap_pvt_connect: fd: 3 tm: -1 async: 0 ldap_open: succeeded ldap_simple_bind_s ldap_sasl_bind_s ldap_sasl_bind ldap_send_initial_request ldap_send_server_request ldap_result ld 0x755fb0 msgid 1 wait4msg ld 0x755fb0 msgid 1 (infinite timeout) wait4msg continue ld 0x755fb0 msgid 1 all 1 ** ld 0x755fb0 Connections: * host: company.com port: 389 (default)

refcnt: 2 status: Connected last used: Mon Jul 16 15:13:19 2012

** ld 0x755fb0 Outstanding Requests:

  • msgid 1, origid 1, status InProgress? outstanding referrals 0, parent count 0 ld 0x755fb0 request count 1 (abandoned 0)

** ld 0x755fb0 Response Queue:

Empty

ld 0x755fb0 response count 0

ldap_chkResponseList ld 0x755fb0 msgid 1 all 1 ldap_chkResponseList returns ld 0x755fb0 NULL ldap_int_select ldap_int_select returned -1: errno 4 ldap_err2string

in reply to: ↑ 10   Changed 11 months ago by magicloud

  • keywords poll added
  • summary changed from ghc 7.4.1 does not work with LDAP-0.6.6 to ghc 7.4.2 does not work with LDAP-0.6.6

Sorry I did not block the code:

ldap_open(vancloa.cn, 389)
ldap_pvt_gethostbyname_a: host=ctu1-tes-02.vancloa.cn, r=0
ldap_url_parse_ext(ldap://localhost/)
ldap_init: trying /usr/local/etc/openldap/ldap.conf
ldap_init: using /usr/local/etc/openldap/ldap.conf
ldap_init: HOME env is /home/magicloud
ldap_init: trying /home/magicloud/ldaprc
ldap_init: trying /home/magicloud/.ldaprc
ldap_init: trying ldaprc
ldap_init: LDAPCONF env is NULL
ldap_init: LDAPRC env is NULL
ldap_create
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP vancloa.cn:389
ldap_new_socket: 3
ldap_prepare_socket: 3
ldap_connect_to_host: Trying 10.253.3.51:389
ldap_pvt_connect: fd: 3 tm: -1 async: 0
ldap_open: succeeded
ldap_simple_bind_s
ldap_sasl_bind_s
ldap_sasl_bind
ldap_send_initial_request
ldap_send_server_request
ldap_result ld 0x755fb0 msgid 1
wait4msg ld 0x755fb0 msgid 1 (infinite timeout)
wait4msg continue ld 0x755fb0 msgid 1 all 1
** ld 0x755fb0 Connections:
* host: vancloa.cn  port: 389  (default)
  refcnt: 2  status: Connected
  last used: Mon Jul 16 15:13:19 2012


** ld 0x755fb0 Outstanding Requests:
 * msgid 1,  origid 1, status InProgress
   outstanding referrals 0, parent count 0
  ld 0x755fb0 request count 1 (abandoned 0)
** ld 0x755fb0 Response Queue:
   Empty
  ld 0x755fb0 response count 0
ldap_chkResponseList ld 0x755fb0 msgid 1 all 1
ldap_chkResponseList returns ld 0x755fb0 NULL
ldap_int_select
ldap_int_select4
ldap_int_select5
ldap_int_select6
ldap_int_select7
ldap_int_select8
ldap_int_select9
ldap_int_select returned -1: errno 4
ldap_err2string

  Changed 11 months ago by cdornan

I am not sure whether this is related to the problem discussed on the mailing list in May (see the thread containing  http://www.haskell.org/pipermail/haskell-cafe/2012-May/101491.html)where I failed to reproduce the problem on a variety GHC compilers including GHC-7.4.1 running on CentOS 6. This should have been an exact match for one of the configurations for which the problematic behaviour was being observed. I don't remember seeing any explanation for these differences in behaviour. (I use the LDAP quite heavily on CentOS 5 and CentOS 6 and have never seen any of these described problems.)

This is the test program I was using. It gets all the details of the LDAP server from the user so the exact same code can be used to test different installations.

import LDAP

main :: IO () main =

do putStrLn "domain>"

domain <- getLine putStrLn "bindDN>" bindDN <- getLine putStrLn "bindPW>" bindPW <- getLine putStrLn "conecting..." ldap <- ldapInit domain ldapPort ldapSimpleBind ldap bindDN bindPW putStrLn "done"

Of course it uses the LDAP package.

  Changed 11 months ago by magicloud

Found the interrupt, virtualTimerExpired. And in the aborted poll call, the timeout (var to in millisec) is 4294967295. Hoping someone be familiar with ghc runtime could help here.

Hi Cdornan, I think this might be a problem related to "the environment", by which I mean the network, ldap server, whatsoever. Here at my place, same code, I failed on some certain ldap servers, not the others. But fine on all when using ghci or runhaskell. And see the above, I do find a certain place that leads to the problem. And the same library used by ldap-utils or other language bindings is just working fine.

  Changed 11 months ago by simonmar

  • status changed from infoneeded to closed
  • resolution set to invalid

The RTS uses SIGVTALRM for its own purposes (scheduling and profiling). This can cause certain system calls to return EINTR, but C library code is supposed to handle EINTR properly and restart the system call. I suspect that LDAP is not doing this, which would be a bug in LDAP.

You can work around the problem by passing +RTS -V0 to GHC, although note that this may have a negative impact on performance, because the scheduler will context switch too often.

I'm closing the bug as invalid on the assumption that it is an LDAP bug. If you think this is wrong, please re-open the ticket.

  Changed 11 months ago by magicloud

OK. I can confirm that this works. Thanks. I assigned this to ghc was because that the same library worked with ghc 7.2.2 or earlier.

Note: See TracTickets for help on using tickets.