Retrieve Google Sites's Domain Index feed using OAuth 2.0 with Service Account

297 views Asked by At

I need to start crawling this url: https://sites.google.com/a/domain.com/sites/system/app/pages/meta/domainIndex using python script.

How to authorize this Google Site URL using OAuth2.0 with Service Account.

In the case of OAuth1.0, We have send the request to https://www.google.com/accounts/ClientLogin and extract the token which received as a token and authorized the url.

OAuth 1.0 Authenication

url = 'https://www.google.com/accounts/ClientLogin'
request = urllib.urlencode({
    'accountType': 'HOSTED',
    'Email': '[email protected]',
    'Passwd': 'userPassword',
    'service': 'jotspot'})

#.. Fetch the url: https://www.google.com/accounts/ClientLogin and extract the token

headers = {
    'Authorization': 'GoogleLogin auth=' + token,
    'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15 ( .NET CLR 3.5.30729)',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Host': 'sites.google.com',
    'Connection': 'keep-alive'
}

# .. Fetch the Google Site url with below headers
1

There are 1 answers

1
Andrzej Pronobis On

Some time ago I wrote a class for myself to handle OAuth2 authentication with Google APIs. It might serve as an example for you. And, yes, you need to register an "application", but that is just to get the client id/secret.

Some notes:

  • The class uses the 'offline' access type to obtain credentials that can be stored and re-used in the future.

  • The settings variable holds an instance of a class storing all my settings, in this case, the previously obtained google api credentials.

  • GAuthDialog is a dialog that presents the user with user/password login and reads the code google produces.

  • The execute method wraps any method that requires access to the API and authentication.

The class can be used as follows, e.g. for google drive:

self._gapi = GApi(settings, 'https://www.googleapis.com/auth/drive.readonly', 'drive', 'v2')
self._gapi.execute(self.get_file_list)

and then we have:

def get_file_list(self):
    query="query"
    children = self._gapi.service.files().list(q=query).execute()

And here comes the code of the class:

from oauth2client.client import OAuth2WebServerFlow, Credentials, AccessTokenRefreshError
from apiclient.discovery import build
from googleapiclient.errors import HttpError
import httplib2

class GApi():

    class CredentialsError(Exception):
        pass

    # App credentials from developers console
    __client_id = ''
    __client_secret = ''

    # Redirect URI for installed (non-web) apps
    __redirect_uri = 'urn:ietf:wg:oauth:2.0:oob'

    def __init__(self, settings, scopes, service_name, version):
        self.__settings = settings
        self.__scopes = scopes
        self.__service_name = service_name
        self.__version = version
        self.__service = None
        self.__credentials = None
        # Try restoring credentials from settings
        if self.__settings.get_gapi_credentials(self.__service_name):
            self.__credentials = Credentials.new_from_json(
                self.__settings.get_gapi_credentials(self.__service_name))

    @property
    def service(self):
        return self.__service

    def execute(self, method, *args, **kwargs):
        self.__setup()
        try:
            return method(*args, **kwargs)
        except AccessTokenRefreshError:
            pass  # Will re-authenticate below
        except HttpError as err:
            # Rethrow since HttpError has a bug in str()
            raise Exception("Response: %s, Content: %s" %
                            (str(err.resp), str(err.content)))
        # Try re-authenticating
        self.__reauthenticate()
        try:
            return method(**kwargs)
        except HttpError as err:
            # Rethrow since HttpError has a bug in str()
            raise Exception("Response: %s, Content: %s" %
                            (str(err.resp), str(err.content)))

    def __obtain_credentials(self):
        # Initialize the flow
        flow = OAuth2WebServerFlow(self.__client_id, self.__client_secret,
                                   self.__scopes, redirect_uri=self.__redirect_uri)
        flow.params['access_type'] = 'offline'
        # Run through the OAuth flow and retrieve credentials
        uri = flow.step1_get_authorize_url()
        # Get code from dialog
        dialog = GAuthDialog(uri)
        if dialog.exec() == QtWidgets.QDialog.Accepted and dialog.auth_code:
            # Get the new credentials
            self.__credentials = flow.step2_exchange(dialog.auth_code)
            # Set them in settings
            self.__settings.set_gapi_credentials(
                self.__service_name, self.__credentials.to_json())
        else:
            self.__credentials = None
            self.__settings.set_gapi_credentials(self.__service_name, None)

    def __reauthenticate(self):
        self.__credentials = None
        self.__service = None
        self.__setup()

    def __setup(self):
        # Do we have credentials?
        if not self.__credentials:
            self.__obtain_credentials()
        # Check if we got credentials
        if self.__credentials:
            # Do we have service?
            if not self.__service:
                # Create an httplib2.Http object and authorize it with our credentials
                http = httplib2.Http()
                http = self.__credentials.authorize(http)
                self.__service = build(self.__service_name,
                                       self.__version, http=http)
        else:
            raise GApi.CredentialsError