©Conrad Weisert, Information Disciplines, Inc.,
14 March 2006
| "Despite its document-centric roots, XML is evolving to the data format of
choice for the transmission and storage of information on the Internet."
- Harold & Means: XML in a Nutshell, O'Reilly, ISBN 0-596-00058-8, p. 225. |
The so-called "'extensible markup language" (XML) occupies a central position in new computer applications, especially web-based systems. A command of XML and its related tools and auxiliary languages is required of applicants for developer roles, and the use of XML is often taken for granted for many kinds of data.
Early evidence raises concern that:
It's too soon to judge whether those shortcomings are minor annoyances or serious obstacles. I'd like to share some of my concerns and solicit your feedback. Here are the things that so far make me uneasy about XML:
Even if the record-description notations were compact, that would be extremely wasteful for a file of identically structured records. Documents and databases are not at all the same thing.
<location>Central Park</location>
<time>08:10</time>
<temperature>31.0</temperature>
<humidity>44%</humidity>
This is legal <a> <b> <c>
data
</c></b></a> |
but this isn't:
<a> <b> <c>
data
</a></b></c> |
</> instead of
</ProductName>.
The name in the closing tag serves no purpose at all. (In fact, for further economy,
a named closing tag could be used to
denote multiple closure like this:
<a><b><c> data </a>.)
The combination of repeated record descriptions, clumsy character representation, and
redundant closing tags can easily yield an order of magnitude more
bytes than the data require. For example we'd expect the value of a temperature reading
to fit in 4 bytes, while
"<temperature>31.0</temperature>"
consumes 31 bytes!1
That's sure to inflict serious performance penalties,
especially when transmitted over the Internet or other telecommunication facilities.
(And that doesn't even count the overhead of scanning and decoding a numeric data item
before a program can do arithmetic on it.) Storage and transmission capacity have
become orders of magnitude less expensive that they used to be but they're still far
from free or infinite. That suggests that XML is unsuitable for high-volume
applications.
. . width="100" bgcolor="white".A 40-year-old principle of formal language design2 asserts that:
A language that disregards this long-accepted principle is unnecesarily crude and user-unfriendly (and hardly extensible).
XML is an evolving standard, and I may have overlooked features that are newer than
my reference books. Let me know
(cweisert@acm.org):
2 -- I first heard this from Alan Perlis.
Return to IDI home page
Last modified March 16, 2006