I'm trying to answer this Udacity question: https://www.udacity.com/course/viewer#!/c-st101/l-48696651/e-48532778/m-48635592
I like Python & Pandas so I'm using Pandas (version 0.14)
I have this DataFrame df=
pd.DataFrame(dict(size=(1400,
2400,
1800,
1900,
1300,
1100),
cost=(112000,
192000,
144000,
152000,
104000,
88000)))
I added this value of 2100 square foot to my data frame (notice there is no cost; that is the question; what would you expect to pay for a house of 2,100 sq ft)
df.append(pd.DataFrame({'size':(2100,)}), True)
The question wants you to answer what cost/price you expect to pay, using linear interpolation.
Can Pandas interpolate? And how?
I tried this:
df.interpolate(method='linear')
But it gave me a cost of 88,000; just the last cost value repeated
I tried this:
df.sort('size').interpolate(method='linear')
But it gave me a cost of 172,000; just halfway between the costs of 152,000 and 192,000 Closer, but not what I want. The correct answer is 168,000 (because there is a "slope" of $80/sqft)
EDIT:
I checked these SO questions
- Interpolation on DataFrame in pandas
- Demonstrates "1D" linear interpolation; that gives me the wrong answer
- Pandas interpolate data with units
- Demonstrates what I needed; "2D" linear interpolation; but this question is focused on the Python
quantities
library.
- Demonstrates what I needed; "2D" linear interpolation; but this question is focused on the Python
Pandas'
method='linear'
interpolation will do what I call "1D" interpolationIf you want to interpolate a "dependent" variable over an "independent" variable, make the "independent" variable; i.e. the Index of a Series, and use the
method='index'
(ormethod='values'
, they're the same)In other words:
This returns the correct answer 168,000
This is not clear to me from the example in Pandas Documentation, where the Series'
data
andindex
are the same list of values.