C# AngleSharp Attribute Parsing

387 views Asked by At

How to parse multiple attributes if QuerySelectorAll().GetAttribute() doesn't work?

Website page example

<div class="product-slider">
  <!-- Sample image data -->
  <div ref="product_slider_data" class="product-slider-data">
    <div data-src="//img.site/modpub/images2/img_main.jpg" data-width="560" data-height="420" data-thumb="//img.site/resize/images2/img_main_240x240.jpg"></div>
    <div data-src="//img.site/modpub/images2/img_smp1.jpg" data-width="800" data-height="600" data-thumb="//img.site/resize/images2/img_smp1_100x100.jpg"></div>
    <div data-src="//img.site/modpub/images2/img_smp2.jpg" data-width="800" data-height="600" data-thumb="//img.site/resize/images2/img_smp2_100x100.jpg"></div>
    <div data-src="//img.site/modpub/images2/img_smp3.jpg" data-width="815" data-height="623" data-thumb="//img.site/resize/images2/img_smp3_100x100.jpg"></div>
    <div data-src="//img.site/modpub/images2/img_smp4.jpg" data-width="815" data-height="623" data-thumb="//img.site/resize/images2/img_smp4_100x100.jpg"></div>
  </div>

Need to get all text (links) from "data-src" attributes.

I parsed the usual text like this.

        private async Task DownloadFromURL(string url, Action<AngleSharp.Dom.IDocument> document_action)
        {
            var config = Configuration.Default.WithDefaultLoader();
            var context = BrowsingContext.New(config);
            var document = await context.OpenAsync(url);
            document_action(document);
        }

        private void ReadHtmlData_EN(AngleSharp.Dom.IDocument html_obj)
        {
            var images = html_obj.QuerySelectorAll(".product-slider-data div");

            AngleSharp.Dom.IElement[] images_arr = tags.ToArray();
            string[] images_texts_arr = images.Select(elem => elem.TextContent).ToArray();

            string images_str = string.Join(", ", images_texts_arr);
            postTextBox1.Text = images_str;
        }

But parsing attributes in the same way does not work.

var images = html_obj.QuerySelectorAll(".product-slider-data div").GetAttribute("data-src");

What to do?

1

There are 1 answers

0
K14M On

You must first find all occurrences of QuerySelectorAll And then filter the entries by the GetAttribute attribute through the loop

var images = html_obj.QuerySelectorAll(".product-slider-data div");
        for (byte i = 0; i < images.Length; i++)
        {
            var attr = images[i].GetAttribute("data-src");
            postTextBox1.Text += attr + "\r\n";
        }