xml-to-json: Simple command line tool for converting XML files to json

[ library, mit, program, web, xml ] [ Propose Tags ]

This simple tool converts XMLs to json format, gaining readability while losing information such as comments, attribute ordering, and such. The main purpose is to convert legacy XML-based data into a format that can be imported into JSON databases such as CouchDB and MongoDB.

See https://github.com/sinelaw/xml-to-json#readme for details and usage.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.1.0, 0.1.2.0, 1.0.0, 1.0.1, 2.0.0, 2.0.1
Dependencies aeson, base (>=4.5 && <4.6), bytestring, hashable, hxt, hxt-curl, hxt-expat, hxt-tagsoup, text, unordered-containers, vector [details]
License GPL-3.0-only
Copyright Copyright Noam Lewis, 2012
Author Noam Lewis
Maintainer jones.noamle@gmail.com
Category Web, XML
Home page https://github.com/sinelaw/xml-to-json
Bug tracker https://github.com/sinelaw/xml-to-json/issues
Source repo head: git clone https://github.com/sinelaw/xml-to-json
Uploaded by NoamLewis at 2012-10-24T00:22:43Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables xml-to-json
Downloads 6383 total (25 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2016-12-23 [all 7 reports]

Readme for xml-to-json-0.1.0.0

[back to package description]

xml-to-json

Fast & easy command line tool for converting XML files to JSON.

The output is designed to be easy to store and process using JSON-based databases, such as mongoDB and CouchDB. In fact, the original motivation for xml-to-json was to store and query a large (~10GB) XML-based dataset, using an off-the-shelf scalable JSON database.

Currently the tool processes XMLs according to lossy rules designed to produce sensibly minimal output. If you need to convert without losing information at all consider something like the XSLT offered by the jsonml project. Unlike jsonml, this tool - xml-to-json - produces json output similar (but not identical) to the xml2json-xslt project.

Implementation Notes

xml-to-json is implemented in Haskell. Currently the implementation is minimal - for example, the core translation functionality is not exported as a library. If you want to use it as a library, open an issue on this project (or better yet - do it and submit a pull request).

As of this writing, xml-to-json uses hxt with the expat-based hxt-expat parser. The pure Haskell parsers for hxt all seem to have memory issues which hxt-expat doesn't.

Contents

Usage

Basic usage

Just run the tool with the filename as a single argument, and direct the stdout to a file or a pipe:

xml-to-json myfile.xml > myfile.js

Advanced

Use the --help option to see the full command line options.

Here's a (possibly outdated) snapshot of the --help output:

Usage: xml-to-json [OPTION...] files...
  -h      --help          Show this help
  -t TAG  --tag-name=TAG  Start conversion with nodes named TAG (ignoring all parent nodes)
  -s      --skip-roots    Ignore the selected nodes, and start converting from their children
                          (can be combined with the 'start-tag' option to process only children of the matching nodes)
  -m      --multiline     Output each of the top-level converted json objects on a seperate line
  -n      --ignore-nulls  Ignore nulls (do not output them) in the top-level output objects
  -a      --as-array      Output the resulting objects in a top-level JSON array

Example output

Input file:

<?xml version="1.0"?>
<!DOCTYPE Test>
<Tests>
  <Test Name="The First Test">
    <Description Format="FooFormat">
Just a dummy
<!-- comment -->
Xml file.
    </Description>
  </Test>
  <Test Name="Second"/>
</Tests>

JSON output (formatted for readability - actual output a single line):

{
	"Tests" : { 
		"Test" : [
			{ "Name" : "The First Test", 
			  "Description" : {
				  "Format" : "FooFormat",
				  "value"  : "Just a dummy\n\nXml file."
              }
			},
			{ "Name" : "Second" }
		]
	}
}

Performance

The speed on a core-i5 machine is about 1.8MB of xml / sec, with a 100MB XML file resulting in a 56MB json output.

A few simple tests have shown this to be at least twice as fast as jsonml's xlst-based converter (however, the outputs are not similar, as stated above).