What is the advantage of XML?

cianuro · August 8, 2005, 3:08pm

For me, that was the best basic explaination ever. Its quite difficult to get a head around what XML is initially.

Galo · August 8, 2005, 3:22pm

Isn’t this why they invented Encoding ?

Standard XML works with Unicode which is a character hash that contains maximal 65536 characters, for each character there are two bytes reserved this means that all characters from western, asian, greek and Russian languages can be displayed, it’s just the Encoding format you choose that will decide what characters whould be displayed and what not

Check on http://www.unicode.org/ for more information.

The point is when i would use encoding=“UTF-8” signs like “Å” and “æ” would indeed cause error’s, however would u use encoding=“ISO-8859-1” you could use them without errors so it is posible…

edit : “æ” is a ISO-8895-15 character , sorry

Galo · August 8, 2005, 3:24pm

Dont forget the Structure that is holding the data, it’s not about the actual concrete data, data often comes from directives who refer to a piece of data which eventualy comes from a database, the structure XML presents is relevant, the data can come from anywhere, even a simple text file.

I tend to see XML as a IL (Intermediar Language) so different systems can talk the same smalltalk

timvw · August 8, 2005, 3:38pm

Actually, this is 1 - n. Each a-node can have multiple b-nodes.

So basically, you’re moving away from a database to some immature storage system… In that case i would advise to use a dbms to handle the data, and simply write a tool that can import/export XML from it.

R_U_Serious · August 8, 2005, 4:00pm

No, the Unicode standard defines slightly less than 100.000 characters at the moment. And even the older Unicode 3.1 (?) which IIRC is a requirement for XML covers more than 90k already.

for each character there are two bytes reserved

This is UTF-16 you are talking, which is one possible way of encoding Unicode characters (and even UTF-16 may use more than 2 bytes depending on whcih characters it is encoding). XML processors must also understand UTF-8 which is a variable-lenght encoding (usually 1-4 bytes).

this means that all characters from western, asian, greek and Russian languages can be displayed, it’s just the Encoding format you choose that will decide what characters whould be displayed and what not

That’s the way it was before unicode came along.

But with unicode you don’t have to choose, one of the advantages of unicode is that you can display all languages on the “same page”.

But to get back on topic: Yes, it’s a Good Thing that XML supports unicode. And with the option to add this metadata inside the file, probably saved some people an headache already.

Nick · August 8, 2005, 4:05pm

So i’ve read this thread, and still don’t exactly understand, what is the advantages of storing data in a XML file rather than a database? Because if you use a XML file it is more easily accesable to other programing languages?

timvw · August 8, 2005, 4:08pm

My understanding is that XML is a simple text format.

It appears XML is usable as a text container (human readable data) because the availability of tools for generation/parsing of XML is widely available…

As soon as you want to use it as a binary data container you will meet it’s limits. You would have to encode it (fe: base64), which makes it not very efficient.

Imho, this limitation makes it hard to use XML as a database. Another important issue is that it will take a (serious) while before XML can compete with the current DBMS products out there (efficiency/speed/query optimization)

XML can be very useful as a container for textual data coming from/going to a database (or any other text processing program). XSL comes in handy if you want to transform the data into something else. XQuery allows you to perform queries on XML. This may have it’s uses, but usually you would already select only the data you need (fe: SQL where clause, before you generate XML from the resultset.)

ronanmagee · August 8, 2005, 4:43pm

Although this is not a thread for deciding what xml is/is not, in my mind a simple text format is more akin to a csv file. With xml you can have complicated structures for data but more importantly you have data that describes data, something that xml was first introduced for.

As for encoding DB’s, require this as well, esp when storing images, albeit they do this in a much more efficient manner than xml currently, they still suffer from similar draw backs.

What advantges do I see of XML:
It can be deployed on a much simplier architecture and doesnt need any updating or patching unlike RDBMS

Ability to transfer data between systems, mainly web based - RPC, eBooks with ability to publish to pdf etc.

Validation - DTD/STD

Disadvantages:
Lacks the functionality more advanced features of DB have i.e. sql language and its ease of use, backup options, referential integrity

Not suitable for really large projects - mightn’t be the best idea to dump 100,000 customer records into an XML file and run xPath.

REMIYA · August 8, 2005, 4:56pm

No arguments on that

Galo · August 8, 2005, 4:56pm

R_U_Serious:

No, the Unicode standard defines slightly less than 100.000 characters at the moment. And even the older Unicode 3.1 (?) which IIRC is a requirement for XML covers more than 90k already.

This is UTF-16 you are talking, which is one possible way of encoding Unicode characters (and even UTF-16 may use more than 2 bytes depending on whcih characters it is encoding). XML processors must also understand UTF-8 which is a variable-lenght encoding (usually 1-4 bytes).

That’s the way it was before unicode came along.

But with unicode you don’t have to choose, one of the advantages of unicode is that you can display all languages on the “same page”.

But to get back on topic: Yes, it’s a Good Thing that XML supports unicode. And with the option to add this metadata inside the file, probably saved some people an headache already.

100.00 or 65.536… alright you win
Anyway, my conclusion is that you DO have to use an encoding to display specific characters, and yes it CAN use more then 2 bytes but never less.
But that’s not the topic indeed :D, i say go for XML, XML is joepie

timvw · August 8, 2005, 5:16pm

http://www.w3.org/XML/. From the Introduction:

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.

You will have to update/patch your tools to generate/parse XML too. And advantage you might experience, when you change your tools, you can easily convert your internal (==xml) format to a different format (using xsl). For a regular dbms this might be harder…

It can be a pita to agree on a good DTD/XSD for your XML data. Look at the format for the propagation of your blog and the pile of RSS/RDF/Atom formats

momos · August 8, 2005, 6:49pm

Actually, this is 1 - n. Each a-node can have multiple b-nodes.

<a>
<name>fruit</name>
apple
tomato
</a>
<a>
<name>vegetables</name>
tomato
</a>

fruit has multiple items, but tomato is a fruit and a vegetable (True, there is replication, but it is many-to-many)

timvw · August 8, 2005, 8:48pm

But there still aren’t b-nodes that hold multiple a-nodes…

Anyway, as i’ve been thinking about it, assuming there is a n-m relationshop between a-nodes and b-nodes, you can represent n - m relationships by introducing a set of ab-nodes. But an XML schema (DTD/XSD) alone doesn’t allow you to specify the constraints.

hgilbert · August 8, 2005, 10:10pm

I am not an advocate for XML.
But if all fails - I try XML/XSLT

Often I find it can help reduce complexity
You extract data (xml) from presentation (xslt)
The output is then (xhtml).

I should have an example very soon.

I was struggling with classical ASP because the business logic was way too complex.
.NET Webforms are not very accessible.
So I found best to generate an xml and resolve it via an xslt.
Then Response.Write the resulting xhtml.

It’s just another tool not a buzzword

wwb_99 · August 8, 2005, 11:26pm

As for the handling and storage of data within your own application, there are a few (but very few) reasons in my experience to employ XML to retrieve data and move it around. That is the job of your database. Someone mentioned (can’t remeber if it was this thread or another)… the benefit of simply moving XML files should your app move to another platform. Well, if you have your database access layer properly abstracted from your business logic… this is pretty much a non-issue.

I would argue the opposite. At least with languages outside of PHP (eg things with application servers and memories longer than one request).

First, you have to store your database configuration somewhere outside of the database. XML is as good an option as any, especially given the plethora of easy options to serailze and deserialize it from and into objects. All your DBAL has to do is go look for ./config/db.config and it is a very real possibility to have a self-configuring database. That is but the tip of the iceberg. It is usually much easier, especially insofar as deployment goes, to edit text files to get something flying rather than have to get the application to allow you to login to the content manager before configuring things.

That said, I do agree that using Xml as the persistence medium for most projects is a waste. If it makes sense anywhere, it is in the single-user, desktop application space where it eases deployment significantly, as one need not install a database engine with the application. Nor one need not write their own as most DOM-based parsing libraries are more capable than any DBMS you can cook up in a reasonable amount of time. The main issues are concurrency issues, which single user, desktop applications do not suffer from nearly as badly as web applications.

The place where Xml shines in the CMS space is communications between disparate systems and creating generic inter-process messaging. One project I am working on involves re-architecting a half-dozen web properties, none with remotely similar back-ends, to share much of the same data. Rather than ripping out all of them and completely replacing the back-ends, we are setting up some XML/Webservice based messaging systems to make the applications talk in real time.

momos · August 9, 2005, 6:45am

This feels like bad will…

<a id=“fruit”>
apple
tomato
</a>
<a id=“vegetables”>
tomato
</a>

The b-node tomato has an a-node with id=fruit and one with id=vegetable

timvw · August 9, 2005, 10:52am

No bad will intended, i just didn’t find it that easy human interpretable.

I admit an a-node can hold multiple b-nodes and b-nodes can have multiple a-nodes in your example. But do you see how you have copied the b-node into each a-node? Right now it may seem not that important, but imagine if each node had also multiple c-nodes. You would have to copy into each a-node that has the b-node too… It is for that sort of redundancy we throw out the hierarchical model and choose for a relational one.

momos · August 9, 2005, 12:02pm

But a data-model with a lot of many-to-many relationshipst is not really a simple model. On the other had, I don’t know if you ever wrote some .NET code, their datasets are all XML, and can contain many-to-many relationships.

bonefry · August 9, 2005, 12:09pm

Haven’t you heard of references ?


<a id="vegetables">
<b>tomato</b>
</a>
<a id="fruit" inherits="vegetables">
<b>apple</b>
</a>

and with the following DTD:


<!ELEMENT a (b+)>
<!ATTLIST a id ID #REQUIRED
                  inherits IDREF #IMPLIED>
<!ELEMENT b (#PCDATA)>

Nate · August 9, 2005, 12:11pm

This is a concept I’ve been working with for the past couple months.

I use PHP to generate all my content (even content out of a DB!), but instead of printing out XHTML with my dynamic content littered throughout it, I generate a XML document of all my dynamic content, then apply an XSL template that incorporates that same static XHTML. It’s quite powerful and remarkably elegant, if I may say so myself.

I’ve created a class that will take an array (created with basically the same hierarchy that I want my XML to have) and, at its most simple form, turn it into <key>value</key>. Obviously it is significantly more capable than that, but that’s the basic idea.

I’d be happy to give a better demonstration/explanation of how I’ve implemented this, even release my class, if there is sufficient interest. I’d like to be able to show it somewhere off the forums, but I don’t have a suitable host… so if anyone can help there (either by hosting or knowing of somewhere good/free) please send me a PM