Mike's demonstration of XML to HTML conversion and what XML is to the ordinary user.

Only really applicable to IE5+ users, that's cos it won't work in Netscape!!

OTHELLO in XML

With IE5 came the capability of viewing XML documents across the web.

What is an XML document?

Well, it's much like an HTML document except you are not restricted to the use of particular elements.

Explain

Ok, you know in HTML you are limited to the elements described in the HTML DTD (depending on which one you use). Well, using XML you are not, you can use whatever element you like.....simple, well yes but there are rules.

Lets take a Shakespeare play, Othello for instance, now most folk have seen a play written down and it would be relatively easy to write it in HTML, yep and here it is:

OTHELLO in HTML

Looks real nice.
OK, but look at it, view the source document, pretty meaningless huh? it's full of <DIV>'s, <P>'s and such like, it doesn't look much like a play to me.
A play has Acts and Scenes and stuff. Wouldn't it be nice to be able to structure a document so that it MEANS something. Well this is what XML does. It will allow you to tag an Act as <ACT>The act</ACT>, like this:

OTHELLO in raw XML

Brilliant, now it means something but there must be some sort of order to it all. Oh yes, you can't just go willy nilly writing documents with tags all over the place. Well you can but it'll be a bit useless.

Ok, so what can we do, well you write a 'structured document'. This means that for every <ACT> opening tag there must be a corresponding </ACT> tag. Yeah, well that's the same as in HTML, well, yes but that's how a structured document works.

The biggest bonus is that you make the rules for your documents, these rules say that a <PLAY> element MUST contain one or more <ACT> elements and each <ACT> element MUST contain one or more <SCENE> elements. Great, this means I can make documents that have structure.

Yes, and not only that, when you make these rules you can force the author of a document to play by these rules too. This means that he can't have an <ACT> that has no <SCENE>'s in it.

Ok, so how do we do that then? Well we use a DTD....that's a Document Type Definition which sets out the rules for a prticular type of document. So we have a PLAY document in mind so lets have some rules: ____________________________________________________________________________

<!-- DTD for Shakespeare    J. Bosak    1994.03.01, 1997.01.02 -->
<!-- Revised for case sensitivity 1997.09.10 -->
<!-- Revised for XML 1.0 conformity 1998.01.27 (thanks to Eve Maler) -->
<!-- modified slightly by Mike Howles 04.03.2001 -->

<!ELEMENT PLAY     (TITLE, AUTHOR, FM, PERSONAE, SCNDESCR, PLAYSUBT, 
				             PROLOGUE?, ACT+, EPILOGUE?)>
<!ELEMENT TITLE    (#PCDATA)>
<!ELEMENT AUTHOR    (#PCDATA)>
<!ELEMENT FM       (P+)>
<!ELEMENT P        (#PCDATA)>
<!ELEMENT PERSONAE (TITLE, (PERSONA | PGROUP)+)>
<!ELEMENT PGROUP   (PERSONA+, GRPDESCR)>
<!ELEMENT PERSONA  (#PCDATA)>
<!ELEMENT GRPDESCR (#PCDATA)>
<!ELEMENT SCNDESCR (#PCDATA)>
<!ELEMENT PLAYSUBT (#PCDATA)>
<!ELEMENT ACT      (TITLE, SUBTITLE*, PROLOGUE?, SCENE+, EPILOGUE?)>
<!ATTLIST ACT
		NAME ID #REQUIRED>
<!ELEMENT SCENE    (TITLE, SUBTITLE*, (SPEECH | STAGEDIR | SUBHEAD)+)>
<!ATTLIST SCENE
		NAME ID #REQUIRED>
<!ELEMENT PROLOGUE (TITLE, SUBTITLE*, (STAGEDIR | SPEECH)+)>
<!ELEMENT EPILOGUE (TITLE, SUBTITLE*, (STAGEDIR | SPEECH)+)>
<!ELEMENT SPEECH   (SPEAKER+, (LINE | STAGEDIR | SUBHEAD)+)>
<!ELEMENT SPEAKER  (#PCDATA)>
<!ELEMENT LINE     (#PCDATA | STAGEDIR)*>
<!ELEMENT STAGEDIR (#PCDATA)>
<!ELEMENT SUBTITLE (#PCDATA)>
<!ELEMENT SUBHEAD  (#PCDATA)>
____________________________________________________________________________
The Shakespeare play DTD

What's all that mean then, well lets take the first 'rule'
<!ELEMENT PLAY (TITLE, AUTHOR, FM, PERSONAE, SCNDESCR, PLAYSUBT, PROLOGUE?, ACT+, EPILOGUE?)>

This says that the element <PLAY> MUST contain 1(and 1 only) <TITLE> element followed by 1(and 1 only) <AUTHOR> element followed by 1(and 1 only) <FM element followed by 1 (and 1 only) <PERSONAE> element followed by 1 (and 1 only) <SCNDESCR> element followed by 1 (and 1 only) <PLAYSUBT> element followed by 1 (and 1 only) OPTIONAL <PROLOGUE> element followed by AT LEAST 1 and possibly more <ACT> elements followed by 1 (and 1 only) OPTIONAL <EPILOGUE> element.

So it could look like this:

<PLAY>
	<TITLE>Title of the play</TITLE>
	<AUTHOR>The author of the play</AUTHOR>
	<FM>Some non play related information</FM>
	<PERSONAE>Dramatis Personae</PERSONAE>
	<SCNDESCR>The scene description</SCNDESCR>
	<PLAYSUBT>The play subtitle</PLAYSUBT>
	<PROLOGUE>The OPTIONAL prologue</PROLOGUE>
	<ACT>Act 1</ACT>
	<ACT>Act 2</ACT>
	<ACT>Act 3</ACT>
	<ACT>Act 4</ACT>
	<EPILOGUE></EPILOGUE>
</PLAY>

But that is not quite right, where the <PERSONAE> element is, the rules for that element say that it MUST contain a <TITLE> element followed by 1 or more <PERSONA> elements OR 1 or more <PGROUP> elements. But thats the beauty of the DTD, it makes these rules so that a document is written in a strict way.

have a look at this again and you'll see:

OTHELLO in raw XML

Well zowie, but why??

Well it means that you original XML document, that now MEANS something can be used for loads of things, it makes printing and viewing easier because you can treat different elements in different ways without having to give them identities like you do in HTML. It means that you can be very specific when you are searching a document, for example, in this simple play you might want to search for the word 'castle' in a scene title, which you can do (with an appropriate application), which is something you can't do with HTML.

Styling is easy to as I don't need to style for a <P> element that has a CLASS of 'subpara', I can just style for the <SUBPARA> element.

So how do I view an XML document?

Well, there are two ways:
1. In IE5+ you can view an XML document directly as long as it has a stylesheet attached to it like this:
OTHELLO in XML
Go on, view the source document and you'll see the raw XML.
It's a slight cheat because right now there are no browsers available to view raw XML with a straight stylesheet. What is happening is this, The XML document AND the stylesheet are being sent to your browser, the stylesheet is an XSLT (eXtensible Stylesheet Language Transformation) stylesheet and it converts the XML tags to HTML tags that the browser can understand using IE5's internal XSLT transformation engine. It does this invisibly and all you see is styled XML.

2. You can convert your XML to HTML and send that to the browser, this is done on the server and the HTML placed there for you to see, like this:
OTHELLO in HTML
Looks the same, does the same BUT if you view the source you can see that it is an HTML document.

Ok so how to I convert XML to HTML

You use a transformation engine, I use either the MSXML transformation engine or Michael Kay's SAXON. Mike's engine is by far the best as it is the most up to date and has it's own extensions.

How do I use it

With the XML document and it's associated stylesheet and SAXON.EXE in the same directory you issue the command:

saxon -o othello.html othello.xml play.xsl

And this means:

saxon  =run the saxon application.
-o othello.html  =put the output in this file.
othello.xml  =read this xml document.
play.xsl  =and use this stylesheet with it.

Want to try it??

You can download the xml files here:

The XML files

You will need to get a copy of saxon.exe from the internet somewhere.

You need all three bits, download them to a directory like C:\XML\TEST for instance. When you've done that just unzip the files (from XML.zip and saxon.zip) into the same directory

First thing you can do is double-click on othello.xml and if you have got IE5 or above you should be able to see the Othello play, if you right click on it and select 'view source' you will see the XML.

Next thing to do is to create an HTML file from the XML. To do this just double-click on html.bat, a little DOS process will run and you will see othello.html has now appeared in the directory. If you view that in your browser, surprise surprise it looks exactly the same as the XML file,but right click on it and 'view source', Ah ah, it's that horrible HTML stuff.

Well there you go, that's a quicky on XML and XSLT. Want to know more about SAXON, then go here: ABOUT SAXON