polyglot xhtml

polyglot xhtml

  • Written by Walter Doekes

  • Published on: 13/01/2016

Polyglot XHTML: Serving pages that are valid HTML and valid XML at the same time.

A number of documents have been written on the subject, which I shall not repeat here.

My summary:

  • HTML5 is not going away.
  • XHTML pages validate in the browser.
  • If you can get better validation during the development of your website, then you’ll save yourself time and headaches.
  • Thus, for your development environment, you’ll set the equivalent of this:
    DEFAULT_CONTENT_TYPE = 'application/xhtml+xml' (and a workaround for Django pages if they’re out of your control)
  • But for your production environment, you’ll still use text/html.
  • Even though your page is served as html, you can still use XML parsers to do processing on it.

Apparently, the W3C resource about polyglot XHTML has been taken out of maintenance, without an explanation. I figure they figured it’s not worth the efforts, as the whatwg wiki states: “You have to be really careful for this to work, and it’s almost certainly not worth it. You’d be better off just using an HTML-to-XML parser.”

I think that’s an exaggeration. Jesper Tverskov wrote an excellent article called Benefits of polyglot XHTML5 where he summarized how little work it is.

For convenience, I’ve copied the “polyglot in a nutshell” here:

If you are new to polyglot XHTML, it might seem quite a challenge. But if your document is already valid HTML5, well-formed, and uses lower-case for element and attribute names, you are pretty close. The following 10 points almost cover all of polyglot XHTML:

  1. Declare namespaces explicitly.
  2. Use both the lang and the xml:lang attribute.
  3. Use <meta charset="UTF-8"/>.
  4. Use tbody or thead or tfoot in tables.
  5. When col element is used in tables, also use colgroup.
  6. Don’t use noscript element.
  7. Don’t start pre and textarea elements with newline.
  8. Use innerHTML property instead of document.write().
  9. In script element, wrap JavaScript in out-commented CDATA section.
  10. Many names in SVG and one in MathML use lowerCamelCase.

The following additional rules will be picked up by HTML5 validation if violated:

  • Don’t use XML Declaration or processing instructions. Don’t use xml:space and xml:base except in SVG and MathML.
  • Elements that can have content must not be minimized to a single tag element. That is <br/> is ok but <p></p> must be used instead of <p/>.
Back to overview