I'm scrapping a page which is the result of a redirect: I visit page1 first, then it redirects to page2 via http-equiv="refresh"
. I'm scrapping page2. Content on page2 is based on some cookies page1 sets. I see page1 returns 2 cookies, but when I request page 2 (sending the same CookieContainer
, one cookie is missing. What's wrong in my code?
Thank you:
First :
I create a CookieContainer
and an HttpWebRequest
and request for page1.
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(eQuery);
req.AllowAutoRedirect = true; // but it doesn't autoredirects the meta-refresh
req.CookieContainer = cookiesContainer;
This is the result I get this from visiting page1
HTTP/1.1 200 OK
Date: Tue, 12 Apr 2011 19:14:06 GMT
Server: (...)
Set-Cookie: NAME1=VALUE1; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: NAME2=VALUE2; expires=Wed, 13-Apr-2011 19:14:06 GMT
Content-Length: 174
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
(...)
Everything is fine so far, I get two cookies are there and I get two cookie objects within the container.
Then I parse the "content" value of the meta http-equiv
for the next url. And request it using a similar code and using the same container. But only one cookie is sent. Here is the HTTP sent:
GET DETECTED_URL_IN_HTTP_EQUIV_REFRESH HTTP/1.1
User-Agent: (...)
Host: example.com
Cookie: NAME1=VALUE1
As you see cookie NAME2 is missing. Why is that happening? is something related differences in the two cookies (one has path and other has expiration date)? Any idea how can I pass the two cookies?
PS: I don't have access to page1, so I can't set path, or expiration for cookies. I'm scrapping those pages.
Thank you.
If you don't specify a path on your cookie it will default to the path it was requested on. So if you received a cookie on this request with no path declaration:
The browser would only send back that cookie for more requests in the
/subfolder/
directory. To have the browser send it back for all paths you need to includepath=/
when setting the cookie.