I've opened a Google Sheet in Colab using gspread
document = gc.open_by_url('https://docs.google.com/myspreadsheet')
sheet = elem.worksheet('Sheet1')
data = sheet.get_all_values()
df = pd.DataFrame(data)
The document contains element data and a print of head() looks like this:
0 1 ... 26 27
0 AtomicNumber Element ... NumberofShells NumberofValence
1 1 Hydrogen ... 1 1
2 2 Helium ... 1
3 3 Lithium ... 2 1
4 4 Beryllium ... 2 2
The problem I have is that when I try to reference by title, for example:
df.plot(x = 'AtomicNumber', y= 'AtomicMass', kind = 'scatter')
I get an error. I have also tried:
df.plot(x = df.AtomicNumber, y= df.AtomicMass, kind = 'scatter')
and
df.plot(x = df['AtomicNumber'], y= df['AtomicMass'], kind = 'scatter')
but I have no joy either. Unless I am using the column references like so:
df.plot(x = 0, y= 17, kind = 'scatter')
I get nothing. It will get tiring pretty fast if I have to keep referencing the .csv file to figure out which column reference I need!!
Finally, when I print:
df.columns.values
I get:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27])
I can't seem to not get this - even if I try to create a new dataframe that contains every row of df bar row index 0
I'm pretty new with this so I'm sure it's pretty simple, but I've hit an impasse... Help!
I've just figured one solution out, which I'm happy with so I'll mark this question as resolved.
The problem seems to be from the way I was creating my dataframe:
If I instead use the 'Get_all_records()' function then the dataframe is make without a seemingly non-removable column reference numbers as titles (see below)
when I print the head() of this dataframe I get:
and when I then call df.columns.values, I get:
I'm going to do a little dive into the documentation of gspread now and try to figure out what the distinction is between get_all_values and get_all_records, but I'm so happy to have figured it out! :-)