Docker container not using service account credentials provided in json

2.3k views Asked by At

I am trying to dockerize an app which makes an api call to bigquery for data, I have provided the credentials .json(trying to authenticate via oauth-service account) but what I am facing is when I run the container my app runs but it asks for authcode while when I run the script simply through jupyter from my laptop or Cloud functions(GCP) It makes use of .json and authenticates and gives the data.

Willing to deploy this container to Cloud run. What am I doing wrong here ? Any help would be great!

Sample method that I use to make api call to bigquery.

PS: not the algorithm code but this is simply the method I would want to work i.e an api call to bigquery. Facing the same issue in this code too.

def pfy_algorithm_1_1():


    import pandas as pd
    import numpy as np
    import datetime
    import requests
    import json
    from pandas import json_normalize
    from google.cloud import bigquery
    from google.oauth2 import service_account
    credentials = service_account.Credentials.from_service_account_file('mylo_bigquery.json')
    project_id = 'xyz'
    client = bigquery.Client(credentials= credentials,project=project_id)

    user_data=query_big_query('''select * from dataset_id.table_id limit 5''')
   
    destination_table1 = 'dataset-id.table-id'
    if_exists='replace'
    private_key='mylo_bigquery.json'
    authcode = 'xyz1xyz23'
    
    user_data.to_gbq(destination_table = destination_table1, 
      project_id = project_id, 
      chunksize=None,  
      reauth=False, 
      if_exists=if_exists, 
      auth_local_webserver=False, 
      table_schema=None)

DOCKER FILE:

#setting base image
FROM python:3
#setting the working directory in the container
WORKDIR /usr/src/app

#copy the dependencies file to working directory
COPY . .

#installing dependencies
RUN pip install -r requirements.txt


#command to run on container start
EXPOSE 8080

ENTRYPOINT ["python3","main.py"]
1

There are 1 answers

7
guillaume blaquiere On

Firstly, it's not safe to package a container with a secret in it. A container isn't secure and it's super easy to go into it and to get the secret. Don't do that.

Secondly, with Cloud Run, you don't need a service account key file. The Metadata server provides you the credential of the Cloud Run service. If you haven't set one, the compute engine default service account will be used, else, this one provided is used.

The metadata server generates tokens for your API calls, the GCP libraries are compliant with it, so, don't worry about that.

The solution in your code is to create your BigQuery client without credential, let the libraries get it from the runtime contexte

client = bigquery.Client(project=project_id)

If you want to test your container locally, I wrote an article on that


EDIT 1

I will try to explain:

  1. No key file? Yes, it's the principle! The best way to keep a secret secrets, is not to have secrets! So now, metadata server provide you all the required credential information, you don't need to worry about it on Cloud Run (or Cloud Run, and any other runtime on Google Cloud).

  2. On your workstation (or on non-GCP environment) there isn't metadata server, and thus you haven't credential from there.

  • If you have read my article, you will see how to load a credential at runtime in your docker container (mount a volume with the your user credential, and set an environment variable to reference this mounted volume)

  • if you run your container on other environment (non-GCP) the principle is the same but you can't use your user credential (it's not your workstation, it's not your responsibility). So, you need a service account key file that you put on the runtime environment and you run your container in the same way (mount a volume with the service account key file, and set an environment variable to reference this mounted volume)

The principle is still the same: don't put your secret in a container, it's not secure and it's a bad practice.

  1. I'm not sure to catch you. The metadata server has all what you need for the service account credentials (and more)

EDIT 2

I tested on my side with flask and it worked. Your issue comes from the Cloud Run contract which is not enforced.

Here a minimal code that works

from flask import Flask
import os


app = Flask(__name__)

@app.route('/', methods=['GET'])
def pfy_algorithm_1_1():


    import pandas as pd
    import numpy as np
    import datetime
    import requests
    import json
    from pandas import json_normalize


    user_data=query_big_query('''select * from dlp_test.name limit 5''')

    destination_table1 = 'dlp_test.name3'
    if_exists='replace'

    user_data.to_gbq(destination_table = destination_table1,
                     project_id = "project_id",
                     chunksize=None,
                     reauth=False,
                     if_exists=if_exists,
                     auth_local_webserver=False,
                     table_schema=None)
    return "Message End of processing",200


def query_big_query(query):
    """ Query bigquery return result in the form of dataframe :param query: the query to be queried """

    from google.cloud import bigquery
    # Cant directly add hence add indirectly
    client = bigquery.Client(project="project_id")
    update_query = client.query(query)
    update_iter = update_query.result()
    update_table = update_iter.to_dataframe()
    return update_table

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

Simply reach the Cloud Run URL after the deployment to run the script.

Don't forget to add flask in your dependencies.