What I know is (please correct me if I am wrong):
- XML is a subset of SGML.
- XHTML is an application and a subset of XML.
- HTML is an application of SGML.
Does this imply that XHTML is a subset of HTML or does this statement hold for any other reason?
Does this imply that XHTML is a subset of HTML
No it doesn't.
Is XHTML a subset of HTML?
No. And neither is HTML a subset of XHTML.
For example:
<input name="surname">
is valid HTML but not valid XHTML. <div />
is valid XHTML but not valid HTML.Traditionally, HTML and XHTML were specified separately. HTML5 defines both syntaxes in a single specification. But the two syntaxes are mutually exclusive. However, it is possible to use mark-up in your documents in such a way that will comply with both syntaxes. e.g. by using <input name="surname" />
and <div></div>
You asked two questions:
(a) is XHTML a subset of HTML? Answer, no. Easily proved by showing that XHTML allows things (such as an XML declaration) which HTML does not allow.
(b) if H is a subset of S and X is a subset of S, does this imply X is a subset of H? Most definitely not: that would be a gross logical error.
XML is not a subset of SGML but comes close. XML has its own rules that partly differ from SGML, though mostly it consists of SGML with most of its features omitted.
XHTML (which is a collective name for several markup languages) can be called an application and a subset of XML, but in addition to that, XHTML tags have meanings assigned to them, whereas XML as such does not say anything about meaning.
HTML is nominally an application of SGML in the HTML 2.0, HTML 3.2, HTML 4.0, and HTML 4.01 specifications, but this was always just theory. Only validators treat HTML as SGML. There was never any browser that implemented HTML as defined in those specifications; several SGML features, which are in princple part of them, lack all support.
Even if the subset relations were all true, the conclusion “XHTML is a subset of HTML” would not follow from the premises. A ⊂ B and C ⊂ B and D ⊂ B do not imply C ⊂ D.
XHTML is not a subset of HTML. There are XHTML specifications that contain elements not included in HTML specifications. And both XHTML and HTML are collective nouns. XHTML 1.0 has been characterized as being HTML 4.01 in XML syntax, so they might be called alternative syntactic forms. But even this isn’t strictly true; there are several poorly documented discrepancies between XHTML 1.0 and HTML 4.01, in addition to the obvious syntactic differences. HTML5 is being defined with two syntaxes, two serializations, “HTML” and “XHTML”, but even here, there are inevitable differences in addition to different serialization.