Why does dict comprehension doesn't works when for-loops do?

82 views Asked by At

When calling the sample function func in this module, why does it throw an exception when I use comprehension (can be toggled with parameter)? Can someone explain the meaning of the exception? cycleseems to be overwritten and I can not wrap my head around it.

Example function with same functionality as loop and comprehension

import pandas as pd

def func(
    x_dict,
    keys_list,
    start_cycle,
    end_cycle,
    comprehension=True
):
    x_test_dicts = {}
    for cycle in range(start_cycle, end_cycle + 1):
        print(f"cycle = {cycle}")
        if comprehension:
            # Fill the dict with comprehension.
            x_test_dict = {
                f"{key}_input":
                x_dict[key].query('cycle == @cycle').values
                for key in keys_list
            }
        else:
            # Fill the dict with normal for loop.
            x_test_dict = {}
            for key in keys_list:
                x_test_dict[f"{key}_input"] = \
                    x_dict[key].query('cycle == @cycle').values
        x_test_dicts[cycle] = x_test_dict
    return x_test_dicts

Creation of test data

import pandas as pd
import numpy as np

# Create an ID array from 1 to 1000
ids = np.arange(1, 1001)

# Calculate cycle as ID divided by 100
cycles = ids // 100

# Generate random integer values for the remaining columns
# Assuming a range for random integers (e.g., 0 to 100)
col1_int = np.random.randint(0, 101, 1000)
col2_int = np.random.randint(0, 101, 1000)
col3_int = np.random.randint(0, 101, 1000)

# Update the DataFrame with integer values
df = pd.DataFrame({
    "ID": ids,
    "cycle": cycles,
    "col1": col1_int,
    "col2": col2_int,
    "col3": col3_int
})

df.head()  # Display the first few rows of the updated DataFrame

Run test cases with functions

import pandas as pd

df = df.set_index(['ID', 'cycle'])  # Use multi-indexing

x_dict = {'Auxin': df}  # Create a simple dict with the DataFrame
keys_list = ['Auxin']  # Define a list of keys to work with

# Define ranges for the loop inside `func`
start_cycle = 6
end_cycle = 29

# RUNS SUCCESSFULLY WITHOUT LIST COMPREHENSION
comprehension = False
result = func(
    x_dict,
    keys_list,
    start_cycle,
    end_cycle,
    comprehension=comprehension
)
print("Worked without dict comprehension!")

# FAILS WITH LIST COMPREHENSION
comprehension = True
result = func(
    x_dict,
    keys_list,
    start_cycle,
    end_cycle,
    comprehension=comprehension
)
print("Breaks when dict comprehension is used!")

The error

UndefinedVariableError: local variable 'cycle' is not defined
1

There are 1 answers

0
Daraan On

In short comprehensions work underneath differently than one might expect. They use a different scope for local variables underneath that is not exposed. Some read-up here: https://peps.python.org/pep-0572/#changing-the-scope-rules-for-comprehensions

Here pandas and the comprehension do not share the same scope, this cycle is not defined when trying to access it via pandas inside the comprehension.


  1. A fix you can use it to use global cycle before your for-loop to allow accessing cycle.

  1. The second way looks a bit strange but moves cycle to the comprehension scope. Maybe in your real example you can do it in a more elegant way
import pandas as pd
def func(
     x_dict,
     keys_list,
     start_cycle,
     end_cycle,
     comprehension=True
 ):
     x_test_dicts = {}
     for cycle in range(start_cycle, end_cycle + 1):
         print(f"cycle = {cycle}")
         if comprehension:
             # Fill the dict with comprehension.
             x_test_dict = {
                 f"{key}_input": 
                 (cycle := cycle, # defines cycle in the local scope
                 x_dict[key].query('cycle == @cycle').values)[1] # access element you want
                 for key in keys_list
             }
         else:
             # Fill the dict with normal for loop.
             x_test_dict = {}
             for key in keys_list:
                 x_test_dict[f"{key}_input"] = \
                     x_dict[key].query('cycle == @cycle').values
         x_test_dicts[cycle] = x_test_dict
     return x_test_dicts