Reading XML with records with different columns in same records array

62 views Asked by At

I need to parse a XML response in C# and load in SQL. Just to brief,i know how to use XMLSerializer to parse the xml, so that is not i am looking for. My concern is my XML structure which i received from from web request. Below is the subset of xml, i received from xml

<apiXML>
<recordList>
<record id="31" >
    <administration_name>admin1</administration_name>
    <creator>Leekha, Mohit</creator>
    <object_category>painting</object_category>
    <object_number>1243</object_number>
    <id>31</id>
    <reproduction.reference>2458.jpg</reproduction.reference>
    <title lang="nl-NL" invariant="false">The Title1</title>
    <title lang="nl-NL" invariant="false">The Title</title>
    <title lang="nl-NL" invariant="false">Different Title</title>
</record>
<record id="32" >
    <administration_name>admin1</administration_name>
    <creator>Leekha, Mohit</creator>
    <object_category>painting</object_category>
    <object_number>AL1111</object_number>
    <id>32</id>
    <reproduction.reference>AL1111.jpg</reproduction.reference>
    <title lang="nl-NL" invariant="false">Health</title>
</record>
<record id="34" >
    <administration_name>admin2</administration_name>
    <creator>Leekha,Mohit</creator>
    <creator>System</creator>
    <object_category>earthenware</object_category>
    <object_category>ABC</object_category>
    <object_category>Remote</object_category>
    <object_number>Z.567 & X-124</object_number>
    <id>34</id>
    <reproduction.reference>Z.567 & X-124(1).jpg</reproduction.reference>
    <reproduction.reference>Z.567 & X-124(2).jpg</reproduction.reference>
    <reproduction.reference>Z.567 & X-124(3).jpg</reproduction.reference>
</record>
</recordList>
</apiXML>

My Concerns:

  1. Some records have multiple data members with same name. Like record id 31 has 3 titles
  2. Number of columns are different for each record.

So all i am asking is for suggestions how could i deal with scenario. Any suggestions are welcome

1

There are 1 answers

0
rene On BEST ANSWER

You'll need a couple of supporting classes to get that XML deserialized as is, as you didn't specify any other requirements.

Your database wold have tables for your record elements and all the collections within.

Serialization classes

Those classes will hold an in memory representation of your XML. At the root will be the Api class.

[XmlRoot("apiXML")]
public class Api
{
     [XmlArray("recordList")]
     [XmlArrayItem("record", typeof(Record))]
     public List<Record> RecordList {get;set;}
}

[Serializable]
public class Record
{
    [XmlAttribute("id")]
    public int RecordId {get;set;}

    [XmlElement("id")]
    public int Id {get;set;}

    [XmlElement("administration_name")]
    public string AdministrationName {get;set;}

    [XmlElement("object_number")]
    public string ObjectNumber {get;set;}

    [XmlElement("creator")]
    public List<Creator> Creators {get;set;}

    [XmlElement("object_category")]
    public List<ObjectCategory> ObjectCategories {get;set;}

    [XmlElement("reproduction.reference")]
    public List<ReproductionReference> ReproductionReferences {get;set;}

    [XmlElement("title")]
    public List<Title> Titles {get;set;}
}

[Serializable]
public class Title:Child
{
    [XmlAttribute("invariant")]
    public bool Invariant {get;set;}

    [XmlAttribute("lang")]
    public string Culture {get;set;}

    [XmlText]
    public string Text {get;set;}
}

public class Child
{
    [XmlIgnore]
    public int ParentId {get;set;}
}

[Serializable]
public class Creator:Child
{
    [XmlText]
    public string Text {get;set;}
}

[Serializable]
public class ObjectCategory:Child
{
    [XmlText]
    public string Text {get;set;}
}

[Serializable]
public class ReproductionReference:Child
{
    [XmlText]
    public string Text {get;set;}
}

Deserialization

With the classes correctly annotated deserializing the XML only needs a couple of lines:

var ser = new XmlSerializer(typeof(Api));
var sr = new StringReader(xml);
var api = (Api) ser.Deserialize(sr);

Processing and building up tables

In the variable api we now have the in-memory object graph which you can project on a relational database schema. For normalized model you'll need the following tables:

  • Record(id, [fields in class])
  • Creator(id, ..)
  • Title(id, ...)
  • ObjectCategory(id, ...)
  • ObjectNumber (id, ...)
  • ReproductionReference(id, ...)

Between these tables you'll need link tables that follow all the same convention like the one between Record and Creator:

  • RecordCreator(RecordId, CreatorId)

I'll assume you know how to create those tables and create a connection to your database.

// use an SqlAdapter.Fill to get the below  dataset call
// sqlAdapter.Fill(ds);
var ds = new DataSet();
// this is here so you can test without a database
// test mocking code
var recTable = ds.Tables.Add("Record");
recTable.Columns.Add("Id");
recTable.Columns.Add("AdministrationName");
recTable.Columns.Add("ObjectNumber");

var creTable = ds.Tables.Add("Creator");
creTable.Columns.Add("Id", typeof(int)).AutoIncrement = true;
creTable.Columns.Add("Text");

var reccreTable = ds.Tables.Add("RecordCreator");
reccreTable.Columns.Add("RecordId");
reccreTable.Columns.Add("CreatorId");
// end mocking code

// copy object graph and build link tables
foreach(var record in api.RecordList)
{
    // each main record is created
    var rtRow = recTable.NewRow();
    rtRow["Id"] = record.Id;
    rtRow["AdministrationName"] = record.AdministrationName;
    rtRow["ObjectNumber"] = record.ObjectNumber;
    recTable.Rows.Add(rtRow);
    // handle each collection
    foreach(var creator in record.Creators)
    {
        DataRow creRow; // will hold our Creator row
        // first try to find if the Text is already there
        var foundRows = creTable.Select(String.Format("Text='{0}'", creator.Text));
        if (foundRows.Length < 1) 
        {
            // if not, add it to the Creator table
            creRow =  creTable.NewRow(); // Id is autoincrement!
            creRow["Text"] = creator.Text;
            creTable.Rows.Add(creRow);
        }
        else 
        {
            // otherwise, we found an existing one
            creRow = foundRows[0];
        }
        // link record and creator
        var reccreRow = reccreTable.NewRow();
        reccreRow["RecordId"] = record.Id;
        reccreRow["CreatorId"] = creRow["Id"];
        reccreTable.Rows.Add(reccreRow);
   } 

   // the other collections follow a similar pattern but is left for the reader
} 
// now call Update to write the changes to the db.
// SqlDataAdapter.Update(ds); 

That concludes the code and structure you'll need to store that SQL in an RDBMS database without losing information.