Ticket #3059 (closed bug: invalid)

Opened 4 years ago

Last modified 3 years ago

3 different behaviours depending on profiling settings and on a used-only-once form being top-level

Reported by: jkff Owned by:
Priority: low Milestone: 7.0.1
Component: Compiler Version: 6.10.1
Keywords: Cc:
Operating System: Windows Architecture: x86
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Below are two versions of a program using regex-tdfa-1.0. The versions differ in whether a form that is used exactly once in 'main' is top-level or declared inside 'where' in main. Both versions are compiled with/without profiling and the profiled version is called with/without +RTS -p. I obtain three drastically different performance results and, in the case where the form is top-level and profiling is turned on, a compilation bug is clearly seen.

module Main where

import Text.Regex.TDFA
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.ByteString.Char8 as S


main = putStrLn . show . length . filter (pat . S.concat . L.toChunks) . replicate 100000 $ L.pack "Hello world foo=123 whereas bar=456 Goodbye"

pat = (=~ ".*foo=([0-9]+).*bar=([0-9]+).*")


ghc -O2 --make regex-test.hs
./regex-test +RTS -s
0.00s

ghc -O2 --make -prof -auto-all regex-test.hs
./regex-test +RTS -s
265.31s

./regex-test +RTS -p
276.84s
'compile' is called 100000 times. 



module Main where

import Text.Regex.TDFA
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.ByteString.Char8 as S


main = putStrLn . show . length . filter (pat . S.concat . L.toChunks) . replicate 100000 $ L.pack "Hello world foo=123 whereas bar=456 Goodbye"
    where pat = (=~ ".*foo=([0-9]+).*bar=([0-9]+).*")


ghc -O2 --make regex-test.hs
./regex-test +RTS -s
8.86s

(I don't know how this result appeared: it is too small to result from 1mln regex compilations, but too big to be 'normal')

ghc -O2 --make -prof -auto-all regex-test.hs
./regex-test +RTS -s
0.00s

./regex-test +RTS -p
0.00s
'compile' is called 2 times.

Core for the first program with/without profiling differs in the aspect that with profiling, the partial application (=~) does not get memoized.

Change History

Changed 4 years ago by ChrisKuklewicz

module Text.Regex.TDFA.Wrap...

(=~) :: (RegexMaker Regex CompOption ExecOption source,RegexContext Regex source1 target)
     => source1 -> source -> target
(=~) x r = let make :: RegexMaker Regex CompOption ExecOption a => a -> Regex
               make = makeRegex
           in match (make r) x

The =~ API is very high level — if you need to cache the compiled regular expression then you should not depend on the compiler, but rather use "makeRegex" explicitly.

Changed 4 years ago by jkff

Thanks; however, the question of why addition of profiling changes the behaviour so much remains.

Changed 4 years ago by ChrisKuklewicz

I have investigated further. The problem is that the regex-tdfa-1.0.0 code is just too generic. For example: the test above will take the Lazy ByteString? and convert it to a String for searching. The text just needs to be tested, and yet the engine will try and find the best Posix match including captures.

An optimized Strict Bytestring version (will be in regex-tdfa after 1.0.0), using the test like  http://haskell.org/haskellwiki/?title=Regular_expressions#Sample_benchmark gives:

Old:

pamac-cek10:compare-tdfa chrisk$ time ./wiki-1.0.1-sbs wiki 1000000 Test for Data.ByteString?.Internal.ByteString? Right 1000000

real 0m22.692s user 0m5.352s sys 0m0.195s

New:

pamac-cek10:compare-tdfa chrisk$ time ./wiki-1.0.1-sbs-v2 wiki 1000000 Test for Data.ByteString?.Internal.ByteString? Right 1000000

real 0m0.714s user 0m0.159s sys 0m0.017s

So that is at least a thirty-fold improvement.

Changed 4 years ago by ChrisKuklewicz

In specializing regex-tdfa (post 1.0.0) for "matchTest" with strict bytestrings, I wrote code that worked great as I posted above.

I am always compiling with -O2 on PPC G4.

But only with -prof. Without -prof it took 4.0 seconds instead of 0.3 seconds (and half of that is overhead).

I used "ghc-core" to look at the core and saw nothing wrong.

I commented out everything and made tiny modifications. I have over 12 test versions before I narrowed it down to using mplus. Note: mplus is not in the core output by name.

case (IntMap.lookup ...) `mplus` maybeDefault of
  Nothing -> False
  Just -> ....iterate the loop...

And replacing mplus with my own two-stage case or the new "mm" combinator fixed it.

{-# INLINE mm #-}
mm :: Maybe Transition -> Maybe Transition -> (DT -> Bool) -> Bool
mm (Just (Transition {trans_many=DFA {d_dt=dfa'}})) _ f = f dfa'
mm Nothing (Just (Transition {trans_many=DFA {d_dt=dfa'}})) f = f dfa'
mm Nothing Nothing _ = False

So GHC really did not do the right thing with my code.

Changed 4 years ago by simonmar

  • difficulty set to Unknown

I'm not completely following all the various versions of things here. Could one of you attach a small example that illustrates the bad behaviour?

Changed 4 years ago by igloo

  • milestone set to 6.12 branch

Changed 3 years ago by igloo

  • milestone changed from 6.12 branch to 6.12.3

Changed 3 years ago by igloo

  • priority changed from normal to low
  • milestone changed from 6.12.3 to 6.14.1

Changed 3 years ago by igloo

  • status changed from new to closed
  • failure set to None/Unknown
  • resolution set to invalid

No response from submitter, and it's not clear to me that there's necessarily a bug here anyway.

Note: See TracTickets for help on using tickets.