I have a spreadsheet where a column (naics2digit) does not match the first two characters of the corresponding column, (naics5digit) which is accurate. I want to find a solution where if naics2digit does not match the first two characters of naics5digit, then to replace the naics2digit column with the first two characters of the naics5digit column.
See the bold values below:
| industry | naics_2digit | naics_5digit |
|---|---|---|
| Accommodation and Food Services | 53 | 53456 |
| Real Estate | 72 | 72543 |
| Other Services | 43 | 72239 |
| Retail Trade | 81 | 81111 |
| Construction | 45 | 45789 |
| Real Estate | 23 | 23435 |
| Real Estate | 53 | 23333 |
I have tried using np.where() or conditional for loops with .str.startswith() or logical expressions like naics2digit != naics5digit to return rows where you could update the original table values like the following:
| industry | naics_2digit | naics_5digit |
|---|---|---|
| Real Estate | 53 | 23333 |
TO
| industry | naics_2digit | naics_5digit |
|---|---|---|
| Real Estate | 23 | 23333 |
Any suggestions?