Ticket #1984 (closed merge: fixed)

Opened 4 years ago

Last modified 4 years ago

weird performance drop with -O2 on x86

Reported by: guest Owned by: igloo
Priority: normal Milestone: 6.8.3
Component: Runtime System Version: 6.8.2
Keywords: Cc:
Operating System: Linux Architecture: x86
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Here's my program:

import Control.Concurrent
import Data.IORef

maker :: IORef Int -> IO ()
maker v = loop
    where
    loop = do
        x <- readIORef v
        writeIORef v $! x + 1
        forkIO (return ())
        loop

main :: IO ()
main = do
    v <- newIORef 0
    t <- forkIO (maker v)
    threadDelay 1000000
    killThread t
    x <- readIORef v
    print x

It's supposed to print the number of threads created in one second. With ghc -O2, I get around 61104; similarly for -O1. However, with no optimization I get results around 612274, i.e. approximately ten times more threads in the same time. What's going on here?

More data points:

6.6.1 behaves similarly but the numbers are a bit higher (~10% more iterations).

<dons> be sure to mention that results appear normal on amd64.

Change History

Changed 4 years ago by dons

I can reproduce this with 6.8.1 on x86/linux, but on amd64/openbsd things appear as normal.

Changed 4 years ago by daniel.is.fischer

I can reproduce it with 6.8.1 and 6.6.1 on x86/linux, not with 6.4.2, there's no significant difference in the numbers there.

Changed 4 years ago by igloo

  • milestone set to 6.8 branch

Changed 4 years ago by simonmar

  • status changed from new to closed
  • resolution set to duplicate

This is a result of #1589. When you turn on -O, the threads are created faster than they can complete and be GC'd, and the GC really slows down when there are lots of threads in the system. I'll be fixing this in 6.10.

Changed 4 years ago by simonmar

  • status changed from closed to reopened
  • resolution duplicate deleted
  • component changed from Compiler to Runtime System

Actually I lied - there was more to this than I thought. With -O turned on, the main thread was only allocating in forkIO, and it turned out that the fork primitive (and primitives in general) don't check the context switch flag, so we ended up creating lots of threads but not running any of them, so they didn't get a chance to die and the heap filled up with runnable threads causing GC to take a long time.

I have a fix for this in my tree, will commit in due course.

Changed 4 years ago by simonmar

  • owner set to igloo
  • status changed from reopened to new
  • type changed from bug to merge
  • milestone changed from 6.8 branch to 6.8.3

Fixed:

Tue Feb 19 10:22:12 GMT 2008  Simon Marlow <simonmar@microsoft.com>
  * Fix #1984: missing context switches

Changed 4 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Merged

Note: See TracTickets for help on using tickets.