Data science python error- ValueError: x and y must have same first dimension

4.7k views Asked by At

I am working on doing some statistical analysis in python however I am new to the field and have been stuck on an error.

For background, I am computing a set of sample_means for each sample size, 200 times. I am then calculating the mean and standard deviation for each sample size, which are then stored in arrays. This is my code:

in[] = 
sample_sizes = np.arange(1,1001,1)
number_of_samples = 200
mean_of_sample_means = []
std_dev_of_sample_means = []
for x in range (number_of_samples):
    mean_of_sample_means.append(np.mean(sample_sizes))
    std_dev_of_sample_means.append(np.std(sample_sizes))

in[] = # mean and std of 200 means from 200 replications, each of size 10
trials[0], mean_of_sample_means[0], std_dev_of_sample_means[0] 

out[] = (10, 500.5, 288.67499025720952)

I am now trying to plot the data with the following input:

plt.plot(sample_sizes, mean_of_sample_means);
plt.ylim([0.480,0.520]);
plt.xlabel("sample sizes")
plt.ylabel("mean probability of heads")
plt.title("Mean of sample means over 200 replications");

However when I do, I get thrown the following error:

242         if x.shape[0] != y.shape[0]:
243             raise ValueError("x and y must have same first dimension, but "
--> 244                              "have shapes {} and {}".format(x.shape, y.shape))
245         if x.ndim > 2 or y.ndim > 2:
246             raise ValueError("x and y can be no greater than 2-D, but have "

ValueError: x and y must have same first dimension, but have shapes (1000,) and (200,)

Any thoughts on where I am going wrong? I feel like its probably something obvious that im not seeing as I am new to this. Any help would be appreciated!!

1

There are 1 answers

1
sascha On

This line:

plt.plot(sample_sizes, mean_of_sample_means)

need both arguments to have the same shape (because you need x and y for your plot on some cartesian coordinate-system; to be more precise: the same size in regards to the first dimension as seen in the error: if x.shape[0] != y.shape[0]).

But:

sample_sizes = np.arange(1,1001,1)  # 1000 !

and:

number_of_samples = 200
mean_of_sample_means = []
for x in range (number_of_samples):
    mean_of_sample_means.append(np.mean(sample_sizes))  # mean by default over flattened-structure
                                                        # so i assume: 1 element per iteration
# 200 !

And as expected, the error gives exactly this info: ValueError: x and y must have same first dimension, but have shapes (1000,) and (200,)