Web page already open (in source format); just need to read that text, using Selenium

1.7k views Asked by At

Let's say I have a tab already open in the browswer. Its URL is:

view-source:http://www.google.com/webhp?source=search_app

Now that it's already open and displayed, I just want to read the text that's in the client window. (Get a context to the page, or obtain its object (as opposed to creating a new browser object), or whatever. Then just read the page.)

Is there any methodology in Selenium, Splinter that allows for that? Thanks for any help.

3

There are 3 answers

0
Robbie Wareham On BEST ANSWER

If you are asking if you can attach to an already open browser, then I believe the answer is "No".

0
lefloh On

You can get the Source of the page directly with Selenium: WebDriver.getPageSource().

But if you use view-source:url the browser will present you a html-page including the formatted source. Firefox e.g. is wrapping each line in a <span id="lineX"></span>. Instead of parsing this just use getPageSource without view-source.

Please read carefully the documentation of getPageSource:

Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.

0
coding_idiot On

This is what I used to do :

  1. Ask selenium to open a browser
  2. Show a popup/message window to pause execution
  3. Open the URL in the browser and perform all the related operations manually
  4. When I'm done (i.e. on the target page), I click OK on the popup and then the code resumes, extracting/doing the tasks we want on the target page opened currently in the browser.