== Overview == The "main" function accepts a list of XML files on the command line, and goes through them one at a time. A minimal parse attempt is made to determine the DTD of the file, and then one big case statement decides what to do with it based on that DTD name. Each DTD name has an associated module in src/TSN/XML which can do a few things: 1. List the DTDs for which it's responsible 2. Parse a top-level element into an XmlTree 3. In rare cases (Weather, News) detect specific malformed documents Most of the XML modules are similar. The big idea is that every object (for example, a ) has both a database type and an XML type. When those two types differ, we need to be able to convert between them. So, for example, if the XML representation of a team differs from the database representation, we might define, > data Team = ... > data TeamXml = ... But if you're lucky, the database/XML representations will be the same, and you'd only need to define "Team"! The most common situation where the representations differ is when there exists a parent/child relationship. In the XML representation, you will have e.g. the Team contained within a Game: > data GameXml = GameXml { xml_game_id :: Int, xml_team :: TeamXml } But in the database representation--which looks a lot like a schema specification--there's no mention of the team at all. > data Game = Game { game_id :: Int } That's because the database representation of the Team will have a foreign key to a Game instead: > data Team = Team { games_id :: DefaultKey Game, ... } Most of the XML modules are devoted to converting back and forth between these two types. The XML modules are also responsible for "unpickling" the XML document, which essentially parses it into a bunch of Haskell data types (the FooXml representations). Furthermore, each top-level message element in the XML modules knows how to insert itself into the database. The "Message" type is always a member of the "DbImport" class, and that class defines two methods: dbmigrate, to run the migrations, and dbimport, which actually says how to import the thing. Each XML file is handed off to the appropriate XML module which then runs its migrations and tries to import the XML into the database. The results are reported and collected into a list so that later the processed files may be removed. == Pickle Failures == Our schemas are "best guesses" based on what we've seen on the wire. From time to time they'll be wrong, and thus the (un)pickler implementation will fail to unpickle some XML document. The easiest way to test a fix for this is interactively: it's quick, and error messages are written to the console. Here's an example of such a session (wrapped for readability): $ ghci htsn-import> runX $ xunpickleDocument TSN.XML.AutoRacingResults.pickle_message parse_opts "schemagen/AutoRacingResultsXML/21241892.xml" [Message {xml_xml_file_id = 21241892... stamp = 2014-06-08 04:05:00 UTC}] If there's an error, you'll see something like the following: $ ghci htsn-import> runX $ xunpickleDocument TSN.XML.AutoRacingResults.pickle_message parse_opts "schemagen/AutoRacingResultsXML/21241892-bad.xml" fatal error: document unpickling failed xpElem: got element name "RaceDate", but expected "RaceID" context: element "message" contents: IRL - Firestone 600 - Final Results Texas Motor Sp... []