I've got a routine to read in a CSV file and spit out selected columns that match specific criteria:
CSV Input File looks like this
| Name | Role | Login |
|---|---|---|
| Phil | Role A | Role B | 2024/01/01 |
| Bob | Role A | Role B | 2024/02/01 |
| Arthur | Role A | Role C | 2024/01/04 |
| Jane | Role B | Role C | 2024/01/31 |
| Mary | Role A | Role D | 2024/02/12 |
| Liz | Role B | Role F | 2024/02/21 |
| Phoebe | Role C | Role D | 2023/11/21 |
| Mike | Role E | 2024/02/15 |
| Rick | Role D | Role E | 2024/01/13 |
| Hilary | Role F | 2024/01/11 |
I have a block of code that matches based on a passed value:
# Define function to check if a value matches any of the filter values
def matches_filter(value):
value_lower = value.lower()
for filter_value in value_lower.split("|"):
filter_value_lower = filter_value.lower()
for fvals in fltr_values:
if fvals.lower() in filter_value_lower:
return fvals.lower()
return None
# Apply filter
# filtered_df = df[df[fltr_field].apply(matches_filter)]
df[fltr_field + "_matched"] = df[fltr_field].apply(matches_filter)
Based on passing the values "Role B" and "Role D" I'd like to replace whatever is in the Role column with the result of the filter. The net resulting table should therefore look something like this:
| Name | Role | Login |
|---|---|---|
| Phil | Role B | 2024/01/01 |
| Bob | Role B | 2024/01/01 |
| Jane | Role B | 2024/02/03 |
| Mary | Role D | 2024/02/02 |
| Liz | Role B | 2024/02/12 |
| Phoebe | Role D | 2024/02/21 |
| Rick | Role D | 2024/01/31 |
So far, the code will filter so I only get strings containing "Role B" or "Role D" but I'd like to replace the string found with the match criterion, rather than the list of roles. Can someone explain what I need to change here?
To further explain based on comments received so far:
- What is the content of
fltr_field?
fltr_field contains the name of the column to filter on (in this case, I'm filtering on the column called "Role".
- Can you clearly explain what you're trying to do here?
I wish to replace the contents of the Role column with the matched value.
- What is the nature of the
"Login"column?
The "Login" column contains the last login date.
With this input dataframe generator:
A regex pattern extraction looks like your best option:
I saw no need for the complexity of a dynamic column name (I mean your
df[fltr_field + "_matched"]) since the final column in your desired output is called"Role"anyways.Update to match various roles as whole words
With alternative input,
The "safe variant" with lookahead assertion will function as long as roles are entered with " | " in between. It makes sure the role name is followed by
Similarly, we could not only "look ahead" but also "look back"; you see: regex complexity will have to catch-up with the diversity of expected "roles", but there is no need to make it more complex than it needs to be.
References