Archived at Pineapplesoft
 ananas.org 
  The Pineapplesoft Link newsletter covered a wide range of technical topics, see the archived issues.
The newsletter was first emailed in 1998. In 2001 Benoît discontinued it in favour of professional writing for magazines.
The “XHTML, March 2000” page was archived in 2003 to preserve the original content of March 2000.
 
  | Home | Contact | Site map | Writings | Open source software |  


 

Welcome to the 27th issue of Pineapplesoft Link. Last month, the W3C adopted XHTML. XHTML is an important step in the XML-ization of the web so I devote this month's issue to it and discuss what it changes for you.

XHTML

As you see Pineapplesoft Link is still Pineapplesoft Link. Last month, I asked you to vote on a new name (Pineapplesoft Link vs PSLink). An overwhelming majority (75%) voted to keep Pineapplesoft Link. Thank you to all the voters, Pineapplesoft Link it will remain!

What is XHTML?

The name says it all: XHTML combines XML with HTML. More formally, XHTML is an XML rewriting of HTML. What does that mean in practice?

XML and HTML have a lot in common. One of the only differences (but it's an important one) is that XML is a generic markup language whereas HTML is a specific language for hypertext documents.

Understanding the difference between XML and HTML is essential to understanding XHTML so let me take an example. HTML is specific because it defines specific elements, e.g. there is an element for paragraphs (<P>), an element for images (<IMG>), an element for boldness (<B>).

XML, on the other hand, defines no elements. That's why it's generic. It is up to the author do define the elements he needs in his document. For example DocBook, which is an XML vocabulary for technical documentation, defines a paragraph element (<Para>) but MathML, an XML vocabulary for mathematics, does not define an element for paragraphs. There is no need for paragraphs in mathematical equations so there is no paragraph element in MathML! Instead MathML defines elements for sums (<sum>), exponentiation (<exp>) and other mathematical concepts.

Both DocBook and MathML, which are specific languages, are built on top of XML generic facilities. In fact, many other languages have been created on top for XML. There are XML vocabularies for multimedia, graphics, real-estate, electronic commerce and more.

This raises an interesting question: if XML is a generic language that is used to create specific languages and if HTML is a specific language then why not build HTML on top of XML? It has been done and it's called XHTML.

If you read the XHTML 1.0 recommendation, you will recognize the familiar HTML 4.0 elements (paragraphs, bold, images, etc.). No new element has been added. However XHTML follows the XML syntax, therefore every element must have both a start-tag and an end-tag. HTML only requires the start-tag for most elements.

Why Bother?

I guess that, so far, XHTML looks like a great idea to... waste your time. If it doesn't add anything to HTML, was it worth the effort?

There is no doubt that it was worth the effort but XHTML 1.0 is the first station on a longer road. XHTML 1.0 serves primarily two purposes: firstly it increases coherence within the W3C, secondly it will enable the modularization of HTML.

Coherence

In terms of coherence, the W3C has made it clear that all future markup language developments will be based on XML. It stands to reason that HTML, W3C's flagship markup language, must also evolve towards XML.

One of the reason for adopting an XML-based language, is the breadth of XML tools which are available. These tools can really make a difference.

Last year, I participated in a project to extract search information from portals like Alta-Vista. Our software would request an HTML page from Alta-Vista and parse it to extract the search information. Most of the development time was spent parsing the HMTL page. Had Alta-Vista use XHTML, we could have used one of the freely available XML parser and saved a lot of development effort.

Modularization

The second benefit is to make HTML more modular. Currently HTML is one big markup language: you cannot add to it and you cannot take away from it! It's the "one size fits all" markup language but, increasingly, it does not fit.

A number of groups need a simplified version of HTML. One such group is the Open eBook Authoring Group (http://www.openebook.org) which developed a, coincidentally XML-based, subset of HTML. This new subset is the basis for electronic books.

Why do they need a subset? Why not go for the real thing? Because eBook readers are built around cheap processors so they are not powerful enough for the full complexity of HTML.

Another group is the WAP Forum (http://www.wapforum.org) which develops standards for mobile phones. Mobile phones are also built around small, cheap components that could not process a complex language like HTML. So the WAP forum had to develop its own simplified markup language.

Yet while some groups simplify HTML, other groups want to add more options to it! Some areas of HTML are very primitive. Take forms, for example. HTML forms are very limited when compared to Java forms. Web-based applications would benefit from a more modern forms that support advanced widgets. For a good discussion on what HTML forms lack, visit http://www.mozquito.org and follow the links to XHTML-FML.

What it amounts to is that HTML is torn apart by two groups: one group wants to simplify HTML and the other wants to enhance it. The W3C answer is to modularize HTML: to break it into smaller pieces.

Ultimately there will be a text module (paragraph, bold), an image module, a form module, an object module (applet, plugin, ActiveX), a scripting module (JavaScript, EcmaScript), a frame module, etc.

Browsers will pick and choose according to their audience. For example, eBooks needs only the text and image modules. Intranet applications needs objects, scripts and forms. A palmtop browser needs text and scripts.

Where does that leave you?

XHTML version 1.0 is not yet modular. However XHTML 1.0 adapts HTML to the generic XML syntax. XML will serve as the foundation for the modularization.

So where does that leave you? There is no revolution under way but rather an evolution. You don't need to rush and redo your web pages in XHTML. The major browsers (Netscape and IE) are HTML browsers.

I think that, in order to prepare for the future, you need to choose your tools wisely. If you are still writing HTML with a text editor, you might wish to consider switching to a graphical tool.

In another issue of Pineapplesoft Link (http://www.psol.be/old/1/newsletter/19990801_xml.html), I described how I used XML to edit this newsletter. Thanks to this solution, it would not be difficult to adapt the newsletter to the Open eBook format, to WAP or to XHTML. I already enjoy the benefits today and I expect to gradually shift the whole web site to XML editing.

I think many web sites will undergo a similar evolution. When a company decides to support email, HTML, Open eBook, WAP, XHTML and goodness knows what else, it makes sense to introduce XML.

An alternative, particularly for small web sites, is to rely on your editing tool. I expect that tools like HoTMetaL, NetObject, Dreamwaver and even FrontPage will offer options to publish in XHTML and other XML variants of HTML. Therefore, if you are still writing your web site in Notepad, switching to a graphical editor may be a smart move.

Self-Promotion Department

"XML by Example" has gone back to the printer, only two months after its release! The comments I hear from readers have been mostly positive so I'm very pleased with this book. Thanks to all who took the time to to discuss it with me.

I have many other projects for books and other publications that I will announce in Pineapplesoft Link, so stay tuned!

About Pineapplesoft Link

Pineapplesoft Link is a free email magazine. Each month, it discusses technologies, trends and facts of interest to web developers.

The information and design of this issue of Pineapplesoft Link are owned by Benoit Marchal and Pineapplesoft. Permission to copy or forward it is hereby granted provided it is prefaced with the words: "As appeared in Pineapplesoft Link - http://www.pineapplesoft.com."

Editor: Benoit Marchal
Publisher: Pineapplesoft www.psol.be

Acknowledgments: thanks to Sean McLoughlin MBA for helping me with this issue.

Back issues are available at http://www.psol.be/old/1/newsletter/.

Although the editor and the publisher have used reasonable endeavors to ensure accuracy of the contents, they assume no responsibility for any error or omission that may appear in the document.

Last update: March 2000.
© 2000, Benoît Marchal. All rights reserved.
Design, XSL coding & photo: PineappleSoft OnLine.