xmllint fails to validate XHTML 1.0 Transitional file

1.9k views Asked by At

Steps to reproduce on Debian Jessie GNU/Linux.

Check xmllint version:

$ xmllint --version
xmllint: using libxml version 20901
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma 

Make an XHTML 1.0 Transitional file by saving this as example.xhtml:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>A title</title>
</head>

<body>
Some content
</body>

</html>

N.B. Pasting the contents of example.xhtml into the W3C Validator yields "This document was successfully checked as XHTML 1.0 Transitional!", so it should also validate when using xmllint.

xmllint online validation

This fails, despite the fact that the computer has internet access:

$ xmllint --noout --valid example.xhtml
example.xhtml:1: warning: failed to load external entity "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
                                                                               ^
example.xhtml:2: validity error : Validation failed: no DTD found !
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
                                                                  ^

xmllint offline validation

Install XHTML 1.0 DTDs and entity files:

$ wget -qO- https://www.w3.org/TR/xhtml1/xhtml1.tgz | tar xvz
xhtml1-20020801/
xhtml1-20020801/W3C-REC.css
xhtml1-20020801/xhtml.css
xhtml1-20020801/logo-REC.png
xhtml1-20020801/w3c_home.png
xhtml1-20020801/wcag1AAA.png
xhtml1-20020801/acks.html
xhtml1-20020801/Cover.html
xhtml1-20020801/definitions.html
xhtml1-20020801/diffs.html
xhtml1-20020801/dtds.html
xhtml1-20020801/guidelines.html
xhtml1-20020801/introduction.html
xhtml1-20020801/issues.html
xhtml1-20020801/normative.html
xhtml1-20020801/Overview.html
xhtml1-20020801/prohibitions.html
xhtml1-20020801/references.html
xhtml1-20020801/xhtml1-diff.html
xhtml1-20020801/DTD/
xhtml1-20020801/DTD/xhtml-lat1.ent
xhtml1-20020801/DTD/xhtml-special.ent
xhtml1-20020801/DTD/xhtml-symbol.ent
xhtml1-20020801/DTD/xhtml.soc
xhtml1-20020801/DTD/xhtml1-frameset.dtd
xhtml1-20020801/DTD/xhtml1-strict.dtd
xhtml1-20020801/DTD/xhtml1-transitional.dtd
xhtml1-20020801/DTD/xhtml1.dcl
xhtml1-20020801/xhtml1.ps
xhtml1-20020801/xhtml1.pdf

Still fails:

$ xmllint --noout --dtdvalid xhtml1-20020801/DTD/xhtml1-transitional.dtd example.xhtml 
example.xhtml:1: warning: failed to load external entity "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
                                                                               ^

Likewise if using the --nonet option:

$ xmllint --noout --nonet --dtdvalid xhtml1-20020801/DTD/xhtml1-transitional.dtd example.xhtml 
I/O error : Attempt to load network entity http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
example.xhtml:1: warning: failed to load external entity "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
                                                                               ^

Questions

I have two questions:

  1. Why did none of these validation attempts succeed?
  2. The second one seems to fail because despite using the --dtdvalid option, xmllint still tries to visit http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd because it is referenced in example.xhtml. Is there some way to tell xmllint to ignore that reference and to instead use a local DTD (e.g. the one already stored at xhtml1-20020801/DTD/xhtml1-transitional.dtd?
1

There are 1 answers

3
AudioBubble On

It seems the simplest workaround is:

$ sudo apt-get install w3c-dtd-xhtml

This installs the relevant DTDs locally. Thereafter, validation succeeds:

$ xmllint --noout --valid example.xhtml
$

However, although this allows me to validate the XHTML file, it does not really answer the questions. Therefore, I will not mark this question as "answered", in the hope that someone will provide an answer that does indeed answer them.