Ticket #3797 (closed bug: invalid)

Opened 3 years ago

Last modified 3 years ago

ByteString.Char8 damages Unicode

Reported by: Voker57 Owned by:
Priority: normal Milestone:
Component: libraries (other) Version:
Keywords: bytestring Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

import Data.Bytestring.Char8
unpack (pack "тест") == "тест"
-- False, should be True 
Data.ByteString.Char8.length $ pack "тест"
-- 4, should be 8 (UTF-8). Library truncates more-than-8bit chars

I'm not sure if this library should assume UTF-8 for {en,de}coding, but imho something has to be done about it.

Change History

Changed 3 years ago by dons

  • status changed from new to closed
  • resolution set to invalid

This is the expected behaviour, and documented. Bytestrings when packing will truncate the input to 8 bits. If you require a specific encoding, pre-process the string first with e.g. utf8-string or else use the unicode bytestring package: Data.Text.

Note: See TracTickets for help on using tickets.