I have a list of tables in a column of a csv that I have read in as a dataframe into pandas like so:

df = pd.read_csv('~/tablenames')

tablenames csv looks like this:


The connection to the db and iterating through the dataframe is created like so:

import pandas as pd
from sqlalchemy import create_engine

df = pd.read_csv('~/tablenames')

engine = create_engine('connection_string')

 for index,row in df.iterrows():
        df['column_count'] = pd.read_sql_query("select count(column_name) from information_schema.columns where table_name = '%s'"%row['table_name'],con=engine)
        df['row_count'] = pd.read_sql_query("select count(*) from %s"%row['table_name'],con=engine)

The above loop is only getting the first tables data and is only executing the query for that first table in the dataframe. What am I doing wrong where the query is not executing for the rest of the table names in the dataframe?

I am not getting any error messages. If it helps I am connecting to a postgres db.


0     45
0     89
0     36
0     17
0     12

2 Answers

Community On

Try pandas concat

import glob
import pandas as pd

# glob.glob('table*.csv') - returns List[str]
# for f in glob.glob() - returns a List[DataFrames]

df = pd.concat([pd.read_csv(f) for f in glob.glob('table*.csv')], ignore_index = True)

Wen-Ben On

Change your for loop to

 #also here add .iloc[0]
 for index,row in df.iterrows():
        df.loc[index,'column_count'] = pd.read_sql_query("select count(column_name) from information_schema.columns where table_name = '%s'"%row['table_name'],con=engine).iloc[0]
        df.loc[index,'row_count'] = pd.read_sql_query("select count(*) from %s"%row['table_name'],con=engine).iloc[0]


for index,row in df.iterrows():
     updatedf=pd.read_sql_query("select * from information_schema.columns where table_name = '%s'"%row['table_name'],con=engine)   
     df.loc[index,'column_count'] = updatedf.shape[1]
     df.loc[index,'row_count'] = updatedf.shape[0]