Percentage calculation of positive/negative values inside a list in Python

2.9k views Asked by At

I have a list that was converted from a pandas DataFrame:

[['2020.06.25 11:20:12', 'US500', 'sell', 1.0, 3047.3, '2020.06.25 11:21:32', 3051.4, 0.0, **-3.89**], ['2020.06.25 11:20:59', 'US500', 'sell', 1.0, 3049.8, '2020.06.25 11:21:33', 3051.6, 0.0, **-1.71**], ['2020.06.25 11:23:49', 'US500', 'sell', 1.0, 3051.6, '2020.06.25 11:25:32', 3049.7, 0.0, **1.8**]]

I wanted to calculate the percentage of times the number in bold is negative or positive for 'US500', that in the list can change with other strings like 'FB'.

So the output should look like:

US500: 60% positive, 40% negative FB: 70% positive, 30% negative etc.

I tried this:

ticker_list = df.values.tolist()
pos = [sum(y>=0 for y in x)  for x in zip(ticker_list)]

but I got an error

TypeError: '>=' not supported between instances of 'list' and 'int'

and it wouldn't give what I want anyway.

Update:

With the new code it is possible to get positive and negative %, but it tries to save it, it doesn't iterate through the loop but just prints one value:

   stocks = set([i[1] for i in ticker_list])
   worksheet.write_column(3,0,stocks)


   for s in stocks:
        result = [i[-1] for i in ticker_list if s in i]
        pos = (len([x for x in result if x > 0])/len(result))*100
        neg = [100 - pos]
        worksheet.write_column(3,1,pos)

        worksheet.write_column(3,1,pos)

just save 1 value as the output:

enter image description here

2

There are 2 answers

10
alec_djinn On

I am not sure where FB is, I assume you will have it in your list. Also, I don't get why would you use zip() at all. Why would you make a list and not work on the DataFrame directly is also a mystery. Anyway, given your input (the initial list), the following code is enough.

data = [
    ['2020.06.25 11:20:12', 'US500', 'sell', 1.0, 3047.3, '2020.06.25 11:21:32', 3051.4, 0.0, -3.89],
    ['2020.06.25 11:20:59', 'US500', 'sell', 1.0, 3049.8, '2020.06.25 11:21:33', 3051.6, 0.0, -1.71],
    ['2020.06.25 11:23:49', 'US500', 'sell', 1.0, 3051.6, '2020.06.25 11:25:32', 3049.7, 0.0, 1.8]
]

us500 = [i[-1] for i in data if 'US500' in i]
pos = (len([x for x in us500 if x >= 0]) / len(us500)) * 100
neg = 100 - pos
print(pos, neg)
33.33333333333333 66.66666666666667

A more general ways would be:

data = [
    ['2020.06.25 11:20:12', 'US500', 'sell', 1.0, 3047.3, '2020.06.25 11:21:32', 3051.4, 0.0, -3.89],
    ['2020.06.25 11:20:59', 'US500', 'sell', 1.0, 3049.8, '2020.06.25 11:21:33', 3051.6, 0.0, -1.71],
    ['2020.06.25 11:23:49', 'US500', 'sell', 1.0, 3051.6, '2020.06.25 11:25:32', 3049.7, 0.0, 1.8],
    ['2020.06.25 11:20:12', 'FB', 'sell', 1.0, 3047.3, '2020.06.25 11:21:32', 3051.4, 0.0, -3.89],
    ['2020.06.25 11:20:59', 'FB', 'sell', 1.0, 3049.8, '2020.06.25 11:21:33', 3051.6, 0.0, 1.71],
    ['2020.06.25 11:23:49', 'FB', 'sell', 1.0, 3051.6, '2020.06.25 11:25:32', 3049.7, 0.0, 1.8]
]

stocks = set([i[1] for i in data])

for s in stocks:
    result = [i[-1] for i in data if s in i]
    pos = (len([x for x in result if x > 0])/len(result))*100
    neg = 100 - pos
    print(s, pos, neg)


US500 33.33333333333333 66.66666666666667
FB 66.66666666666666 33.33333333333334
1
Shubham Sharma On

Recreating your dataframe from the given list:

df = pd.DataFrame(lst)
print(df)
                     0      1     2    3       4                    5       6    7     8
0  2020.06.25 11:20:12  US500  sell  1.0  3047.3  2020.06.25 11:21:32  3051.4  0.0 -3.89
1  2020.06.25 11:20:59  US500  sell  1.0  3049.8  2020.06.25 11:21:33  3051.6  0.0 -1.71
2  2020.06.25 11:23:49  US500  sell  1.0  3051.6  2020.06.25 11:25:32  3049.7  0.0  1.80

Use np.sign which returns an element wise indication of the sign of a number, then use Series.map to map 1 as positive and -1 as negative, then use Series.groupby on s along with aggregation functions value_counts and count to get the percentage:

s = np.sign(df[8]).map({1: 'Positive', -1: 'Negative'})
pct = s.groupby(df[1]).value_counts().div(s.groupby(df[1]).count()).mul(100)

Details:

print(s) 
0    Negative
1    Negative
2    Positive
Name: 8, dtype: object

print(pct)
1      8       
US500  Negative    66.666667
       Positive    33.333333
Name: 8, dtype: float64