How to split unstable-numbered CSV in SQL

75 views Asked by At

Currently, I am working on a project which is an application for Google Scholar analysis.

The basic specifications are ; parsing the title, author/s, publication venue... of the paper.

So, I can parse the HTML via C#, then store the variables in a SQL Server database.

Here is the problem, when I parse the author/s part it returns me something like that:

R Feynman, L Krauss.

or

R Feynman

or

R Feynman, L Krauss, C Sagan

I have read the answers about the Comma Separated Values on SQL Server. However, in my case, value for author/s column is not stable. What I want is; store author name/s in another table and resulting query should be like that:

R Feynman, L Krauss, NULL, NULL

R Feynman, NULL, NULL, NULL

R Feynman, L Krauss, C Sagan, NULL

If you have any suggestion please let me know.

Thank you!

Added after Edit:

I am taking the HTML partially from Google Scholar and show related field to user in some ListBox;

WebClient web = new WebClient();

string URL = URLTextBox.Text;

string page = web.DownloadString(URL);

string publicationName = "class=\"gsc_a_at\">(.*?)</a><div class=\"gs_gray\">(.*?)</div><div class=\"gs_gray\">(.*?)<span class=\"gs_oph\">, (.*?)</span></div></td><td class=\"gsc_a_c\"><a href=\"(.*?)\" class=\"gsc_a_ac\">(.*?)</a></td><td class=\"gsc_a_y\"><span class=\"gsc_a_h\">(.*?) </span></td></tr><tr class=\"gsc_a_tr\"><td class=\"gsc_a_t\"><a href=\"(.*?)\"";

foreach(Match match in Regex.Matches(page, publicationName))

{

listBox1.Items.Add(match.Groups[1].Value);

listBox2.Items.Add(match.Groups[2].Value);

listBox3.Items.Add(match.Groups[3].Value);

listBox4.Items.Add(match.Groups[4].Value);

listBox5.Items.Add(match.Groups[5].Value);

listBox6.Items.Add(match.Groups[6].Value);

}

Then I took the values from listbox to database.

using (SqlCommand command = new SqlCommand("insert into tbl_Titl values(@val) insert into tbl_Auth values(@val2) insert into tbl_Publ values(@val3) insert into tbl_Year values(@val4) insert into tbl_Link values(@val5) insert into tbl_Cita values(@val6)", connection))

{

                    for (int i = 0; i < listBox1.Items.Count; i++)

                    { 

                        command.Parameters.Clear();

                        command.Parameters.AddWithValue("@val", listBox1.Items[i]);

                        command.Parameters.AddWithValue("@val2", listBox2.Items[i]);

                        command.Parameters.AddWithValue("@val3", listBox3.Items[i]);

                        command.Parameters.AddWithValue("@val4", listBox4.Items[i]);

                        command.Parameters.AddWithValue("@val6", listBox6.Items[i]);

                        command.ExecuteNonQuery();

                   }

               }          

After these steps, joining the tables in database and in a query having all the result.

It is edited at the request of DJ KRAZE

0

There are 0 answers