AngleSharp Parsing title="text"

46 views Asked by At

There is this piece of HTML page code:

<table cellspacing="0" id="work_outline">
<td>
    <div class="work_genre">
                <a href="https://www.example.com/pro/fsr/=/work_category%5B/pc/category/3/from/icon.work"><span class="icon_ADL" title="14+">14+</span></a>
            </div>
  </td>
</tr>
<tr>
  <th>Product format</th>
  <td>
    <div class="work_genre" id="category_type">
      <a href="https://www.example.com/pro/works/type/=/work_type/ADV/from/icon.work"><span class="icon_ADV" title="Adventure">Adventure</span></a>&nbsp;/&nbsp;ADV      </div>
  </td>
</tr>
                                  <tr>
    <th>Supported languages</th>
    <td>
      <div class="work_genre">
        <a href="https://www.example.com/pro/fsr/=/work_category%5B/pc/options/JPN/from/icon.work"><span class="icon_JPN" title="Japanese">Japanese</span></a>
      </div>
    </td>
  </tr>
</table>

I need to get content from title quotes, work_genre class. In this case the word is "Japanese". Since I have three work_genre classes here, I need to use QuerySelectorAll. Full code

private async void textBox1_TextChanged(object sender, EventArgs e)
{
    await DownloadFromURL(textBox1.Text, (html_obj) => { ReadHtmlData_WorkEdition(html_obj); });
}

private async Task DownloadFromURL(string url, Action<AngleSharp.Dom.IDocument> document_action)
{
    var config = Configuration.Default.WithDefaultLoader();
    var context = BrowsingContext.New(config);
    var document = await context.OpenAsync(url);
    document_action(document);          
}

private void ReadHtmlData_WorkEdition(AngleSharp.Dom.IDocument html_obj)
{
    var text = html_obj.QuerySelectorAll(".work_genre a").ToList();
    var list = text.Select(elem => elem.TextContent).ToList();
    textBox1.Text = string.Join(", ", list);
}

And this code produces the following "14+, Adventure, Japanese". This is not what I need

How do I get just the text from the quotes title="Japanese"

1

There are 1 answers

1
Daniyal On

if you want to get just the text from the quotes title = "Japanese" you need to write seperated line under the var text = html_obj.QuerySelectorAll(".work_genre a").ToList(); and put the appropriate selector for it, smt like this: var text2 = html_obj.QuerySelector(".icon_JPN).TextContent;

I hope it helped you somehow, just tried to give the direction