Currently, I am working on a project which is an application for Google Scholar analysis.
The basic specifications are ; parsing the title, author/s, publication venue... of the paper.
So, I can parse the HTML via C#, then store the variables in a SQL Server database.
Here is the problem, when I parse the author/s part it returns me something like that:
R Feynman, L Krauss.
or
R Feynman
or
R Feynman, L Krauss, C Sagan
I have read the answers about the Comma Separated Values on SQL Server. However, in my case, value for author/s column is not stable. What I want is; store author name/s in another table and resulting query should be like that:
R Feynman, L Krauss, NULL, NULL
R Feynman, NULL, NULL, NULL
R Feynman, L Krauss, C Sagan, NULL
If you have any suggestion please let me know.
Thank you!
Added after Edit:
I am taking the HTML partially from Google Scholar and show related field to user in some ListBox;
WebClient web = new WebClient();
string URL = URLTextBox.Text;
string page = web.DownloadString(URL);
string publicationName = "class=\"gsc_a_at\">(.*?)</a><div class=\"gs_gray\">(.*?)</div><div class=\"gs_gray\">(.*?)<span class=\"gs_oph\">, (.*?)</span></div></td><td class=\"gsc_a_c\"><a href=\"(.*?)\" class=\"gsc_a_ac\">(.*?)</a></td><td class=\"gsc_a_y\"><span class=\"gsc_a_h\">(.*?) </span></td></tr><tr class=\"gsc_a_tr\"><td class=\"gsc_a_t\"><a href=\"(.*?)\"";
foreach(Match match in Regex.Matches(page, publicationName))
{
listBox1.Items.Add(match.Groups[1].Value);
listBox2.Items.Add(match.Groups[2].Value);
listBox3.Items.Add(match.Groups[3].Value);
listBox4.Items.Add(match.Groups[4].Value);
listBox5.Items.Add(match.Groups[5].Value);
listBox6.Items.Add(match.Groups[6].Value);
}
Then I took the values from listbox to database.
using (SqlCommand command = new SqlCommand("insert into tbl_Titl values(@val) insert into tbl_Auth
values(@val2) insert into tbl_Publ values(@val3) insert into tbl_Year values(@val4) insert into
tbl_Link values(@val5) insert into tbl_Cita values(@val6)", connection))
{
for (int i = 0; i < listBox1.Items.Count; i++)
{
command.Parameters.Clear();
command.Parameters.AddWithValue("@val", listBox1.Items[i]);
command.Parameters.AddWithValue("@val2", listBox2.Items[i]);
command.Parameters.AddWithValue("@val3", listBox3.Items[i]);
command.Parameters.AddWithValue("@val4", listBox4.Items[i]);
command.Parameters.AddWithValue("@val6", listBox6.Items[i]);
command.ExecuteNonQuery();
}
}
After these steps, joining the tables in database and in a query having all the result.
It is edited at the request of DJ KRAZE