Ticket #5559 (closed bug: fixed)
heap profile character encoding confusion
| Reported by: | guest | Owned by: | simonmar |
|---|---|---|---|
| Priority: | high | Milestone: | 7.4.1 |
| Component: | Profiling | Version: | 7.0.3 |
| Keywords: | heap profile, character encoding | Cc: | claudiusmaximus@… |
| Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
| Type of failure: | None/Unknown | Difficulty: | |
| Test Case: | profiling/T5559 | Blocked By: | |
| Blocking: | Related Tickets: |
Description
Heap profiling this UTF-8 source file (where ø is encoded as C3 B8) with ghc-7.0.3 on GNU/Linux with LANG=en_GB.utf8 seems to give an output .hp file in ISO-8859 encoding (where ø is encoded as F8).
føb :: Integer -> Integer føb n | n == 0 = 0 | n == 1 = 1 | n >= 2 = føb (n - 1) + føb (n - 2) main :: IO () main = print (føb 100)
hexdump extract from .hp file:
00000000 28 32 39 33 29 66 f8 62 2f 43 41 46 3a 6c 76 6c |(293)f.b/CAF:lvl| 00000010 31 5f 72 50 70 09 34 30 0a |1_rPp.40.| 00000019
This causes some problems for heap profile visualization programs:
- hp2ps: viewing the .ps in evince shows a wrong character (slashed-l instead of ø)
- hp2pretty: viewing the .svg with rsvg aborts with an invalid utf8 error
hp2any-core seemed to handle the character encoding correctly in this test (displayed as "\248") with correct appearance in hp2any-graph's OpenGL window.
I'd like to know if ISO-8859 will always be used for .hp files, or if the ISO-8859 is a misfeature and UTF-8 will be used in future, or if it will eventually use the current locale settings.
I didn't find any documentation on character encoding here: http://www.haskell.org/ghc/docs/latest/html/users_guide/prof-heap.html
