Calculating based on rows conditions (in pandas)

Question

Calculating based on rows conditions (in pandas)

85 views Asked by Avan Maric At 29 March 2024 at 10:03

I encountered the following problem: I have a pandas dataframe that looks like this.

id_tranc	sum	bid
1	4000	2.3%
1	20000	3.5%
2	100000	if >=100 000 - 1.6%, if < 100 000 - 100$
3	30000	if >=100 000 - 1.6%, if < 100 000 - 100$
1	60000	500$

code_to_create_dataset:

dataframe = pd.DataFrame({
    'id_tranc': [1, 1, 2, 3, 1],
    'sum': [4000, 20000, 100000, 30000, 60000],
    'bid': ['2.3%', '3.5%', 'if >=100 000 - 1.6%, if < 100 000 - 100$', 
            'if >=100 000 - 1.6%, if < 100 000 - 100$', '500$']})

Necessary to calculated 'commission', depending columns 'sum' and 'bid'. Final dataframe should be look like:

id_tranc	sum	bid	comission
1	4000	2.3%	92
1	20000	3.5%	700
2	100000	if >=100 000 - 1.6%, if < 100 000 - 100$	1600
3	30000	if >=100 000 - 1.6%, if < 100 000 - 100$	100
1	60000	500$	500

If calculated with df['commission'] = df['sum'] * df['bid'] - getting result only for first 2 record. Please tell me how to do this correctly.

Original Q&A

There are 5 answers

Manuel On 29 March 2024 at 10:17

in order to calculate the value of a third column, the other two must contain numbers (integers or floats).

This value in your third data row is not a number, so you cannot multiply anything with it:

if >=100 000 - 1.6%, if < 100 000 - 100$

Your other "bid" values ("2.3%" etc.) are also not numbers. They should be formatted as 0.023 to be used in a multiplication. Unsure how you even got you first two rows right.

Federicofkt On 29 March 2024 at 10:22

This function should do the trick:

import pandas as pd
import re

data = {
    'id_tranc': [1, 1, 2, 3, 1],
    'sum': [4000, 20000, 100000, 30000, 60000],
    'bid': ['2.3%', '3.5%', 'if >=100 000 - 1.6%, if < 100 000 - 100$', 
            'if >=100 000 - 1.6%, if < 100 000 - 100$', '500$']
}
df = pd.DataFrame(data)

def calculate_commission(row):
    if 'if' in row['bid']:
        parts = row['bid'].split('-')
        threshold = re.findall(r'\d+', parts[0])
        inter_value= ''.join(threshold)
        threshold = float(inter_value)
        if row['sum'] >= threshold:
            commission = float(re.findall(r'\d+\.*\d*', parts[1])[0]) * row['sum'] / 100
        else:
            commission = float(re.findall(r'\d+\.*\d*', parts[2])[0])
    elif '$' in row['bid']:
        commission = float(re.findall(r'\d+\.*\d*', row['bid'])[0])
    else:
        commission = float(row['bid'].replace('%', '')) * row['sum'] / 100
    return round(commission)

df['commission'] = df.apply(calculate_commission, axis=1)

In case the strings with the conditions have different formats, you can just fix the regex in accordance with what you have, this works well with the example provided

Loginus On 29 March 2024 at 11:01

If you control input data, the clean solution would be to store function/lambda in "bid" column and then simply run apply function

dataframe = pd.DataFrame({
    'id_tranc': [1, 1, 2, 3, 1],
    'sum': [4000, 20000, 100000, 30000, 60000],
    'bid': [lambda x: x * 0.023,
            lambda x: x * 0.035,
            lambda x: 100 if x < 100000 else x * 0.016,
            lambda x: 100 if x < 100000 else x * 0.016,
            lambda x: 500]})

dataframe['comission'] = dataframe.apply(lambda row: row['bid'](row['sum']), axis=1)

julianf19 On 29 March 2024 at 11:08

I would structure the data that comes in:

df = pd.DataFrame({
    'id_tranc': [1, 1, 2, 3, 1],
    'sum': [4000, 20000, 100000, 30000, 60000],
    'bid(%)': [2.3, 3.5, 1.6, 1.6, np.nan],
    'threshold': [0,0,100000,100000,np.inf],
    'min':[np.nan,np.nan,100,100,500]
    })

And then use this code to run your calculation:

df.loc[df["sum"]>=df["threshold"],"comission"]=df["sum"]*df["bid(%)"]/100
df.loc[df["sum"]<df["threshold"],"comission"]=df["min"]

	id_tranc	sum	bid(%)	threshold	min	comission
0	1	4000	2.3	0	nan	92
1	1	20000	3.5	0	nan	700
2	2	100000	1.6	100000	100	1600
3	3	30000	1.6	100000	100	100
4	1	60000	nan	inf	500	500

**mozway** · Accepted Answer · 2024-03-29T10:44:58+00:00

I would write a small parser based on a regex and operator:

from operator import ge, lt, gt, le
import re

def logic(value, bid):
    # define operators, add other ones if needed
    ops = {'>=': ge, '>': gt, '<': lt, '<=': le}
    # remove spaces, split conditions on comma
    conditions = bid.replace(' ', '').split(',')
    # then loop over them, the first match will be used
    for cond in conditions:
        # extract operator, threshold, commission, unit
        m = re.search('(?:if(\W+)(\d+)-)?(\d+\.?\d*)([%$])', cond)
        if not m:     # if no match, ignore
            continue
        op, thresh, com, unit = m.groups()
        # if no condition or condition is valid
        if (not op) or (op and ops[op](value, float(thresh))):
            if unit == '%':                     # handle % case
                return value * float(com)/100
            elif unit == '$':                   # handle fixed com
                return float(com)

df['comission'] = [logic(val, bid) for val, bid in zip(df['sum'], df['bid'])]

# or with apply, which is less efficient
# df['comission'] = df.apply(lambda row: logic(row['sum'], row['bid']), axis=1)

Output:

   id_tranc     sum                                       bid  comission
0         1    4000                                      2.3%       92.0
1         1   20000                                      3.5%      700.0
2         2  100000  if >=100 000 - 1.6%, if < 100 000 - 100$     1600.0
3         3   30000  if >=100 000 - 1.6%, if < 100 000 - 100$      100.0
4         1   60000                                      500$      500.0

Regex:

regex demo

(?:if(\W+)(\d+)-)?  # optionally match a condition (operator and threshold)
(\d+\.?\d*)         # match the value of the commission
([%$])              # match type of commission (% or $)

Reproducible input:

df = pd.DataFrame({'id_tranc': [1, 1, 2, 3, 1], 'sum': [4000, 20000, 100000, 30000, 60000],
                   'bid': ['2.3%', '3.5%', 'if >=100 000 - 1.6%, if < 100 000 - 100$',
                           'if >=100 000 - 1.6%, if < 100 000 - 100$', '500$']})

TechQA.

Calculating based on rows conditions (in pandas)

There are 5 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CALCULATED-COLUMNS

Popular Questions

Trending Questions