GitXplorerGitXplorer
e

xmlgen

public
16 stars
2 forks
0 issues

Commits

List of commits on branch main.
Verified
65d86d717c107f511e5950ac0a51ec9a441015d5

Update README

eeliben committed 3 years ago
Unverified
bb052026abfb0ea11f906222842da6c6362b1247

Update README

eeliben committed 6 years ago
Unverified
5cfb11027bb12286de7ae2f48dbe6457a49a6d5e

Add paper + README explanation

eeliben committed 6 years ago
Unverified
d652c5b171b8af512bb446469bc1a4b1860b2384

Move files

eeliben committed 6 years ago
Unverified
aa0545d6f1981398bb99ed09ee0e9ec8f32cd46a

Import

eeliben committed 6 years ago

README

The README file for this repository.

I originally downloaded this code from http://www.xml-benchmark.org a few years ago. That site is no longer active, so I've posted the code here, as is. It's copyrighted (C) by Florian Waas. See the original contents of the README below the build instructions.

The tool is described in a paper from 2002, a copy of which is also in this repo.

Update from March 2022: a kind reader informed me that the source for xmlgen can also be obtained from https://projects.cwi.nl/xmark/generator.html


./build.sh

After that run './xmlgen'

./xmlgen -f 1 produces ~116 MiB ./xmlgen -f 0.5 produces 58 MiB


xmlgen, version 0.92 by Florian Waas (flw@mx4.org) Copyright (C) Florian Waas

  1. What is xmlgen?

xmlgen is an XML data generator which produces scaled documents according to the DTD specified in The XML Benchmark Project. xmlgen is part of the benchmark suite and can be found at http://www.xml-benchmark.org. It has been one of the major design goals to achieve a maximum degree of portability and to date, xmlgen has been used on a number of platforms including Windows, Solaris, various Linux distributions, and IRIX. xmlgen was designed to produce large and very large XML documents in an efficient manner with low constant main memory requirements.

  1. How to use xmlgen?

xmlgen comes with a number of options to influence the output behavior:

-f scaling factor of the document, float value; 0 produces the "minimal document"

-o direct output to file

-h show usage info

-d use doctype preamble

-i renders the document somewhat more readable

-v shows current version, intended for bug reporting

-t display elapsed time, meant for profiling

-s split the doc in smaller chunks of only elements each; useful for systems which cannot cope with large input documents

-e dumps the DTD the doc is complying with (version 0.92 and later)

  1. Why is there no noise in the text?

Well, it's Shakespeare. In fact, the text has only little noise and many text indexing programs seem to be a little baffled by that. We plan to change this in a future release together with a more contemporary vocabulary. Also, Shakespeare is not quite politically correct.

  1. Can I control the level of recursion?

No, we purposely reduced the number of tuning parameters to only one single one: the scaling factor. Otherwise, the space of possible combinations is growing too quickly.