I have been generating an XML sitemap using Access and VBA. I asked our developers to implement a server side solution so that it can be ran every night without me having to remember to do it.
I generate the file by writing text to a file. Very simple. My file is around 1800KB.
The developer's solution writes text to a file (use the XmlWriter VB class). His file is around 900KB.
When he first showed me this I assumed he was missing a lot of data from the sitemap. When I checked the number of lines in each there are only 38 lines difference (out of around 22,500 lines of text).
How can this be?
Not sure if this is the correct stackexchange site to post this on but I don't of a more appropriate one.
Edit
Here is an example of the file
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.bodykind.com/index.aspx</loc>
</url>
<url>
<loc>http://www.bodykind.com/category/3-Supplements.aspx</loc>
</url>
<url>
<loc>http://www.bodykind.com/category/4-Wellbeing.aspx</loc>
</url>
...
Both the files are almost exactly the same but the <url> are in a different order and one has about 36 more lines than the other.
Edit 2
I have just checked the document properties. It seems the code set of the 900KB file is UTF-8 but the codeset of the 1800KB file is Unicode. I am assuming this is why there is such a big difference?
Edit 3
Since it is on the verge of being closed, here is the code for both
My VBA
Private Sub Class_Initialize()
pIndent = True
Set objADO = CreateObject("ADODB.Stream")
objADO.Type = 2
objADO.Charset = "utf-8"
objADO.LineSeparator = 10
objADO.Open
objADO.WriteText "<?xml version=""1.0"" encoding=""UTF-8""?>", 1
End Sub
... some code which writes the text to the file
Public Sub SaveToFile(ByVal PATH As String)
' Skip the BOM
objADO.Position = 3
Dim BinaryStream As Object
Set BinaryStream = CreateObject("ADODB.stream")
BinaryStream.Type = 1
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
'Strips BOM (first 3 bytes)
objADO.CopyTo BinaryStream
objADO.flush
objADO.Close
BinaryStream.SaveToFile PATH, 2
BinaryStream.flush
BinaryStream.Close
Set BinaryStream = Nothing
Set objADO = Nothing
End Sub
The developers solution
Using writer As New XmlTextWriter(Server.MapPath(filename), Encoding.UTF8)
writer.WriteStartDocument()
writer.WriteStartElement("urlset")
writer.WriteAttributeString("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")
writer.Formatting = Formatting.Indented
writer.WriteStartElement("url")
writer.WriteElementString("loc", domain + "/index.aspx")
writer.WriteEndElement()
writer.WriteStartElement("url")
writer.WriteElementString("loc", domain + "/aboutus.aspx")
writer.WriteEndElement()
... and so on....
If it's twice the size, then one is UTF-8 (the smaller one) and one is UTF-16 (the bigger one). In UTF-16, every ASCII character takes twice as much space as in UTF-8.
(And Unicode means (in Windows) UTF-16).