melf
A Haskell library to parse/serialize
Executable and Linkable Format (ELF)
files.
Parsing the header and table entries
Module
Data.Elf.Headers
implements parsing and serialization of the ELF file header and the entries of section and segment tables.
ELF files come in two flavors: 64-bit and 32-bit.
To differentiate between them type
ElfClass
is defined:
data ElfClass
= ELFCLASS32 -- ^ 32-bit ELF format
| ELFCLASS64 -- ^ 64-bit ELF format
deriving (Eq, Show)
Some fields of the header and table entries have different bitwidth for 64-bit and 32-bit files.
So the type
WordXX a
was borrowed from the data-elf
package:
-- | @IsElfClass a@ is defined for each constructor of `ElfClass`.
-- It defines @WordXX a@, which is `Word32` for `ELFCLASS32`
-- and `Word64` for `ELFCLASS64`.
class ( SingI c
, Typeable c
, Typeable (WordXX c)
, Data (WordXX c)
, Show (WordXX c)
, Read (WordXX c)
, Eq (WordXX c)
, Ord (WordXX c)
, Bounded (WordXX c)
, Enum (WordXX c)
, Num (WordXX c)
, Integral (WordXX c)
, Real (WordXX c)
, Bits (WordXX c)
, FiniteBits (WordXX c)
, Binary (Be (WordXX c))
, Binary (Le (WordXX c))
) => IsElfClass c where
type WordXX c = r | r -> c
instance IsElfClass 'ELFCLASS32 where
type WordXX 'ELFCLASS32 = Word32
instance IsElfClass 'ELFCLASS64 where
type WordXX 'ELFCLASS64 = Word64
The header of the ELF file is represented with the type
HeaderXX a
:
-- | Parsed ELF header
data HeaderXX c =
HeaderXX
{ hData :: ElfData -- ^ Data encoding (big- or little-endian)
, hOSABI :: ElfOSABI -- ^ OS/ABI identification
, hABIVersion :: Word8 -- ^ ABI version
, hType :: ElfType -- ^ Object file type
, hMachine :: ElfMachine -- ^ Machine type
, hEntry :: WordXX c -- ^ Entry point address
, hPhOff :: WordXX c -- ^ Program header offset
, hShOff :: WordXX c -- ^ Section header offset
, hFlags :: Word32 -- ^ Processor-specific flags
, hPhEntSize :: Word16 -- ^ Size of program header entry
, hPhNum :: Word16 -- ^ Number of program header entries
, hShEntSize :: Word16 -- ^ Size of section header entry
, hShNum :: Word16 -- ^ Number of section header entries
, hShStrNdx :: ElfSectionIndex -- ^ Section name string table index
}
So we have two types HeaderXX 'ELFCLASS64
and HeaderXX 'ELFCLASS32
.
To be able to work with headers uniformly the type
Header
was introduced:
-- | Sigma type where `ElfClass` defines the type of `HeaderXX`
type Header = Sigma ElfClass (TyCon1 HeaderXX)
Header
is a pair.
The first element is an object of the type ElfClass
defining the width of the word.
The second element is HeaderXX
parametrized with the first element (i. e. Σ-type from
the languages with dependent types).
To simulate Σ-types the library
singletons
(Hackage,
"Introduction to singletons")
was used.
Header
is an instance of the
Binary
class.
So given a lazy bytestring containing large enough initial part of ELF file one can get the header of
that file with a function like this:
withHeader :: BSL.ByteString ->
(forall a . IsElfClass a => HeaderXX a -> b) -> Either String b
withHeader bs f =
case decodeOrFail bs of
Left (_, _, err) -> Left err
Right (_, _, (classS :&: hxx) :: Header) ->
Right $ withElfClass classS f hxx
The function
decodeOrFail
is defined in the package
binary
.
The function
withElfClass
creates a context with an implicit word width available and looks like
withSingI
:
-- | Convenience function for creating a
-- context with an implicit ElfClass available.
withElfClass :: Sing c -> (IsElfClass c => a) -> a
withElfClass SELFCLASS64 x = x
withElfClass SELFCLASS32 x = x
The module Data.Elf.Headers
also defines the types
SectionXX
,
SegmentXX
and
SymbolXX
for the elements of section, segment and symbol tables.
Parsing the whole ELF file
The module
Data.Elf
implements parsing and serialization of the whole ELF files.
To parse ELF file it reads ELF header, section table and segment table and uses that data to create
a list of elements of the type
ElfXX
representing the recursive structure of the ELF file.
It also restores section names from the the string table indexes.
That results in creating an object of type
Elf
:
-- | `Elf` is a forrest of trees of type `ElfXX`.
-- Trees are composed of `ElfXX` nodes, `ElfSegment` can contain subtrees
newtype ElfList c = ElfList [ElfXX c]
-- | Elf is a sigma type where `ElfClass` defines the type of `ElfList`
type Elf = Sigma ElfClass (TyCon1 ElfList)
-- | Section data may contain a string table.
-- If a section contains a string table with section names, the data
-- for such a section is generated and `esData` should contain `ElfSectionDataStringTable`
data ElfSectionData
= ElfSectionData BSL.ByteString -- ^ Regular section data
| ElfSectionDataStringTable -- ^ Section data will be generated from section names
-- | The type of node that defines Elf structure.
data ElfXX (c :: ElfClass)
= ElfHeader
{ ehData :: ElfData -- ^ Data encoding (big- or little-endian)
, ehOSABI :: ElfOSABI -- ^ OS/ABI identification
, ehABIVersion :: Word8 -- ^ ABI version
, ehType :: ElfType -- ^ Object file type
, ehMachine :: ElfMachine -- ^ Machine type
, ehEntry :: WordXX c -- ^ Entry point address
, ehFlags :: Word32 -- ^ Processor-specific flags
}
| ElfSectionTable
| ElfSegmentTable
| ElfSection
{ esName :: String -- ^ Section name (NB: string, not offset in the string table)
, esType :: ElfSectionType -- ^ Section type
, esFlags :: ElfSectionFlag -- ^ Section attributes
, esAddr :: WordXX c -- ^ Virtual address in memory
, esAddrAlign :: WordXX c -- ^ Address alignment boundary
, esEntSize :: WordXX c -- ^ Size of entries, if section has table
, esN :: ElfSectionIndex -- ^ Section number
, esInfo :: Word32 -- ^ Miscellaneous information
, esLink :: Word32 -- ^ Link to other section
, esData :: ElfSectionData -- ^ The content of the section
}
| ElfSegment
{ epType :: ElfSegmentType -- ^ Type of segment
, epFlags :: ElfSegmentFlag -- ^ Segment attributes
, epVirtAddr :: WordXX c -- ^ Virtual address in memory
, epPhysAddr :: WordXX c -- ^ Physical address
, epAddMemSize :: WordXX c -- ^ Add this amount of memory after the section when the section is loaded to memory by execution system.
-- Or, in other words this is how much `pMemSize` is bigger than `pFileSize`
, epAlign :: WordXX c -- ^ Alignment of segment
, epData :: [ElfXX c] -- ^ Content of the segment
}
| ElfRawData -- ^ Some ELF files (some executables) don't bother to define
-- sections for linking and have just raw data in segments.
{ edData :: BSL.ByteString -- ^ Raw data in ELF file
}
| ElfRawAlign -- ^ Align the next data in the ELF file.
-- The offset of the next data in the ELF file
-- will be the minimal @x@ such that
-- @x mod eaAlign == eaOffset mod eaAlign @
{ eaOffset :: WordXX c -- ^ Align value
, eaAlign :: WordXX c -- ^ Align module
}
Not each object of that type can be serialized.
-
Constructor ElfSection
still has a section number.
It is required as the symbol table and some other structures
refer to the sections by theirs indexes.
So the section indexes should be consecutive integers starting from 1.
Section with index 0 is always empty and is created by the library.
-
There should be a single ElfHeader
. It should be the first nonempty node of the tree.
-
If there exists at least one node ElfSection
then there should exist exactly one
node ElfSectionTable
and exactly one section that has ElfSectionDataStringTable
as the value
of its esData
field (the string table for the names of sections).
-
If there exists at least one node ElfSegment
then there should exist exactly one
node ElfSegmentTable
.
Correctly composed ELF object can be serialized with the function
serializeElf
and parsed with the function
parseElf
:
serializeElf :: MonadThrow m => Elf -> m ByteString
parseElf :: MonadCatch m => ByteString -> m Elf
ELF
is not an instance of the class Binary
because
PutM
is not an instance of the class MonadFail
.
Generation of object files
To create machine code that is used in the examples a pair of modules were created.
The module
AsmAArch64
provides a DSL embedded in Haskell.
This DSL is a kind of assembler language for the AArch64 platform.
It exports some primitives to generate machine instructions and organize machine code.
It also exports function assemble
that consumes the monad composed of those primitives and
produces an object of the type Elf
:
assemble :: MonadCatch m => StateT CodeState m () -> m Elf
The idea was inspired by the article
(Stephen Diehl "Monads to Machine Code").
Detailed description of this module is available in russian:
README_ru.md.
The module
HelloWorld
uses primitives from AsmAArch64
to compose relocatable executable code that uses system calls
to output a "Hello World!" message into standard output and exit:
helloWorld :: MonadCatch m => StateT CodeState m ()
Function assemble
uses the melf
library to generate an object file:
return $ SELFCLASS64 :&: ElfList
[ ElfHeader
{ ehData = ELFDATA2LSB
, ehOSABI = ELFOSABI_SYSV
, ehABIVersion = 0
, ehType = ET_REL
, ehMachine = EM_AARCH64
, ehEntry = 0
, ehFlags = 0
}
, ElfSection
{ esName = ".text"
, esType = SHT_PROGBITS
, esFlags = SHF_EXECINSTR .|. SHF_ALLOC
, esAddr = 0
, esAddrAlign = 8
, esEntSize = 0
, esN = textSecN
, esLink = 0
, esInfo = 0
, esData = ElfSectionData txt
}
, ElfSection
{ esName = ".shstrtab"
, esType = SHT_STRTAB
, esFlags = 0
, esAddr = 0
, esAddrAlign = 1
, esEntSize = 0
, esN = shstrtabSecN
, esLink = 0
, esInfo = 0
, esData = ElfSectionDataStringTable
}
, ElfSection
{ esName = ".symtab"
, esType = SHT_SYMTAB
, esFlags = 0
, esAddr = 0
, esAddrAlign = 8
, esEntSize = symbolTableEntrySize ELFCLASS64
, esN = symtabSecN
, esLink = fromIntegral strtabSecN
, esInfo = 1
, esData = ElfSectionData symbolTableData
}
, ElfSection
{ esName = ".strtab"
, esType = SHT_STRTAB
, esFlags = 0
, esAddr = 0
, esAddrAlign = 1
, esEntSize = 0
, esN = strtabSecN
, esLink = 0
, esInfo = 0
, esData = ElfSectionData stringTableData
}
, ElfSectionTable
]
It runs the State
monad that was passed as an argument.
As a result the final state of CodeState
includes all the data neсessary to produce ELF file, in
particular:
txt
refers to the content of the .text
section,
symbolTableData
refers to the content of the symbol table section,
stringTableData
refers to the content of the string table section linked to the symbol table.
Names with SecN
suffixes (textSecN
, shstrtabSecN
, symtabSecN
, strtabSecN
)
are predefined section numbers that conform to the conditions stated above.
For the sake of simplicity external symbol resolution and data section allocation were not implemented.
It requires implementation of relocation tables. On the other hand, the resulting code
is position-independent.
Use this module to produce object file and try to link it:
[nix-shell:examples]$ ghci
GHCi, version 8.10.7: https://www.haskell.org/ghc/ :? for help
Prelude> :l AsmAArch64.hs HelloWorld.hs
[1 of 2] Compiling AsmAArch64 ( AsmAArch64.hs, interpreted )
[2 of 2] Compiling HelloWorld ( HelloWorld.hs, interpreted )
Ok, two modules loaded.
*AsmAArch64> import HelloWorld
*AsmAArch64 HelloWorld> elf <- assemble helloWorld
*AsmAArch64 HelloWorld> bs <- serializeElf elf
*AsmAArch64 HelloWorld> BSL.writeFile "helloWorld.o" bs
*AsmAArch64 HelloWorld>
Leaving GHCi.
[nix-shell:examples]$ aarch64-unknown-linux-gnu-gcc -nostdlib helloWorld.o -o helloWorld
[nix-shell:examples]$
The linker accepted the object file. Try to run the result:
[nix-shell:examples]$ qemu-aarch64 helloWorld
Hello World!
[nix-shell:examples]$
It works.
Generation of executable files
The module
DummyLd
uses the section .text
of object file to create an executable file.
Code relocation and symbol resolution is not implemented so that procedure works only
for position-independent code that does not refer to external translation units,
for example, it works with the code described above.
Function dummyLd
consumes an object of the type Elf
and finds a section .text
(using elfFindSectionByName
)
and header
(using elfFindHeader
)
in it.
Then the header type is changed to ET_EXEC
, the address of the first executable instruction is specified and
a loadable segment containing the header and the content of .text
is formed:
data MachineConfig (a :: ElfClass)
= MachineConfig
{ mcAddress :: WordXX a -- ^ Virtual address of the executable segment
, mcAlign :: WordXX a -- ^ Required alignment of the executable segment
-- in physical memory (depends on max page size)
}
getMachineConfig :: (IsElfClass a, MonadThrow m) => ElfMachine -> m (MachineConfig a)
getMachineConfig EM_AARCH64 = return $ MachineConfig 0x400000 0x10000
getMachineConfig EM_X86_64 = return $ MachineConfig 0x400000 0x1000
getMachineConfig _ = $chainedError "could not find machine config for this arch"
dummyLd' :: forall a m . (MonadThrow m, IsElfClass a) => ElfList a -> m (ElfList a)
dummyLd' (ElfList es) = do
txtSection <- elfFindSectionByName es ".text"
txtSectionData <- case txtSection of
ElfSection { esData = ElfSectionData textData } -> return textData
_ -> $chainedError "could not find correct \".text\" section"
header <- elfFindHeader es
case header of
ElfHeader { .. } -> do
MachineConfig { .. } <- getMachineConfig ehMachine
return $ ElfList
[ ElfSegment
{ epType = PT_LOAD
, epFlags = PF_X .|. PF_R
, epVirtAddr = mcAddress
, epPhysAddr = mcAddress
, epAddMemSize = 0
, epAlign = mcAlign
, epData =
[ ElfHeader
{ ehType = ET_EXEC
, ehEntry = mcAddress + headerSize (fromSing $ sing @a)
, ..
}
, ElfRawData
{ edData = txtSectionData
}
]
}
, ElfSegmentTable
]
_ -> $chainedError "could not find ELF header"
-- | @dummyLd@ places the content of ".text" section of the input ELF
-- into the loadable segment of the resulting ELF.
-- This could work if there are no relocations or references to external symbols.
dummyLd :: MonadThrow m => Elf -> m Elf
dummyLd (c :&: l) = (c :&:) <$> withElfClass c dummyLd' l
Try to use this code to produce executable file without GNU linker:
[nix-shell:examples]$ ghci
GHCi, version 8.10.7: https://www.haskell.org/ghc/ :? for help
Prelude> :l DummyLd.hs
[1 of 1] Compiling DummyLd ( DummyLd.hs, interpreted )
Ok, one module loaded.
*DummyLd> import Data.ByteString.Lazy as BSL
*DummyLd BSL> i <- BSL.readFile "helloWorld.o"
*DummyLd BSL> elf <- parseElf i
*DummyLd BSL> elf' <- dummyLd elf
*DummyLd BSL> o <- serializeElf elf'
*DummyLd BSL> BSL.writeFile "helloWorld2" o
*DummyLd BSL>
Leaving GHCi.
[nix-shell:examples]$ chmod +x helloWorld2
[nix-shell:examples]$ qemu-aarch64 helloWorld2
Hello World!
[nix-shell:examples]$
It works.
These just parse/serialize ELF header and table entries but not the whole ELF files.
History
For the early history look at the branch "amakarov" of
the my copy of the elf repo.
Tests
Test data is committed with git-lfs.
Only testdata/orig/* tests are included to hackage distributive to keep the tarball size small.
License
BSD 3-Clause License (c) Aleksey Makarov