I'm writing a code to categorize the datas, and get the average and standard deviation. Here are the example of my data.
3917 1 -0.662261 25.148 22.9354 68.8076
3918 1 12.7649 18.7451 7.68473 69.0063
3919 1 -9.56836 -23.3265 -61.953 68.8357
3920 1 11.6292 31.6525 -29.3697 69.1372
3921 2 26.4837 -66.7897 12.0257 69.2282
3922 1 -9.81652 14.3788 9.38343 69.1217
3923 2 39.931 -88.1879 109.498 69.1604
3924 1 4.5502 3.53887 -6.59604 69.486
3925 2 13.6801 -24.6628 -5.7568 69.9398
3926 1 -10.5635 7.05517 -8.82785 70.2263
As you can see, there are 6 columns. I'm thinking of 3 step calculation here.
Categorize these numbers based on 6th column. 6th column is consist of float numbers from 0 ~ n. I hope to generate n sections (or sub matrices, or whatever), like 0~1, 1~2, 2~3 .... n-1 ~ n. The last number should be round up number of last data, because I hope to make sections. For example, if the last number is the 121.2513, the last section should be 120~121 to contain that data.
Reallocate the all other numbers of column 1~5, to the their corresponding subsections based on 6th column. If there are no number in specific sections, just print it as 0. There will be n number of subsections. The number of elements in each subsections will be random.
Get the average and standard deviation of 3th, 4th, and 5th column for each sub sections, and write to the output file with 'number of elements in subsection, beginning number of subsection, and avg and standard deviation of 3th, 4th, and 5th column'
I was trying this with multiple for loops, but it became too complex, and makes error. Is there any other easy way to categorize the data, play with each of the sub section, and print them out in Python? Also, my for loops are not working at all. Any simple example suggestion using this data?
This task lends itself to the pandas library. (http://pandas.pydata.org/) From what I understood from your post, you wanted to compute the columnwise means and standard deviations. To compute the rowwise statistics, add the parameter,
axis=1
to the mean and std functions. In the code below, the example has been saved to "tmp.txt'. The first step loads it; then it is simple to calculate statistics over the dataframe.For more information about pandas, take a look at the quick introduction: http://pandas.pydata.org/pandas-docs/stable/10min.html