Exctract data from a web site using inspect element

87 views Asked by At

I need to exctract a list of Accession number. I am a PhD student in biology working with GEOdatabase on NCBI website which give me datasets of genes. Each dataset possesses an Accession number, generally starting by "GSE" followed by numbers. I would like to exctract the list of Accession number present in the page after my research.

Here is a screen shot of what I would like to exctract (highlighted in yellow), from the page : https://www.ncbi.nlm.nih.gov/gds/?term=brain.

enter image description here

Is it possible to exctract it, writting a script via the console when I use inspect element ? Or any other idea ?

Sorry if I don't use correctly any of the previous term , I am not a dev.

Thank you for your help !

2

There are 2 answers

0
obscure On BEST ANSWER

That's rather easy. If we look up the HTML of that page, we can see that Series Accession: ... ID: ... results are wrapped in <div> elements with a css class resc.

To obtain those:

Array.from(document.getElementsByClassName("resc"))

Looking further, the actual results are wrapped in a pair of <dd> elements, where the first element holds the Accesion number.

So it's just going over all the <div class="resc"> elements and log the first child <dd> element's text - which can be retrieved using the .innerText property.

Executing the following line will output all the numbers to the console:

Array.from(document.getElementsByClassName("resc")).forEach(result => {console.log(result.getElementsByTagName("dd")[0].innerText)})
0
edou maria On

There is a scraping library named Beautifulsoup for this case https://www.crummy.com/software/BeautifulSoup/bs4/doc/

You can right clik and inspect the yellow marks and get the name of the class and build your python script.