I'm a complete newbie at dealing with python and selenium, just started a week ago, so do excuse my mess of a code. I'm trying to extract all the 'structure_id' and 'd' information from elements with tag name in this website and store each of them in a separate svg file. Here is a snippet of the code I'm having problems with:
for number in range(1,106):
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.ID, 'master_group'))
)
selected = driver.find_element_by_class_name('simstripImgSel')
driver.get(driver.current_url)
paths = driver.find_elements_by_tag_name('path')
for path in paths:
while True:
try:
structure = path.get_attribute('structure_id')
d = path.get_attribute('d')
break
except Exception as e:
print(e)
paths = driver.find_elements_by_tag_name('path')
continue
if structure != None:
print('Attributes copied.')
for word, initial in data.items():
structure = structure.replace(word,initial)
filepath = Path('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg')
if filepath.is_file():
text = open('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg','r+')
rep = text.read()
rep = rep.replace('</svg>','<path id="')
text.close()
os.remove('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg')
time.sleep(0.2)
text = open('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg','w+')
text.write(rep+str(structure)+'" d="'+str(d)+'"></path></svg>')
text.close()
print('File '+str(structure)+' modified in slice '+str(number)+'!')
else:
svg = open('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg','w+')
svg.write('<svg id="the_svg_wrapper" width="100%" height="100%" xmlns="http://www.w3.org/2000/svg"><path id="'+str(structure)+'" d="'+str(d)+'"></path></svg>')
svg.close()
print('File '+str(structure)+' made in slice '+str(number)+'!')
selected.send_keys('F')
paths = 0
print()
except Exception as e:
print('Error.')
print(e)
break
print('Done!')
driver.quit()
This works fine for the first page, but I need to extract paths for all 106 pages, and after pressing 'F' once (which moves on to the next page) I get a stale element reference at the line structure = path.get_attribute('structure_id')
. Initially I thought the paths took some time to load, hence the while loop, but by the second page it gets stuck with never-ending stale element references.
Explicit waits or refreshing the page didn't work too, I suspect the driver.find_element_by_class_name
WebElement isn't updating at all (when I refreshed the page after moving on to the next page, the files I extracted ended up being the same as the first page, and I got a stale element reference by page 5 anyways). How do I solve this? Any help is appreciated!
You looped the url so it went to page 1.
Import