PyCharm crashing with "multiprocessing" error in single threaded application only when debugger is attached

30 views Asked by At

I have an extremely weird situation on my machine debugging my application.

macOS: 14.4 (23E214) - 16GB RAM PyCharm: PyCharm 2023.3.5 (Community Edition) Python3.10

PyCharm was reinstalled twice without an effect on the outcome.

I am locally debugging with databricks.connect fetching some data (batch 1000 rows one after another) and then I am transforming the data into local objects (330) each has like 20 fields - so nothing memory intense etc. Activity Monitor also shows no abnormalities.

The fun part are those lines:

    def set_value_fast(self, twin_id: str, key: str, value: str):
        if self.row_item_dict is None:
            dummy_row = create_row_from_schema(self._schema)
            self.row_item_dict = dummy_row.asDict()
            self.row_item_dict[self._primary_key] = twin_id
            self.row_item_dict[key] = value  # self._assign_value_based_on_data_type(key, value)
        else:
            self.row_item_dict[key] = value  # self._assign_value_based_on_data_type(key, value)

It does not crash and creates my dictionary as expected. If I am using the commented function instead - please do not comment an the date conversion I have just tried millions of things as I thought that this is the reason for the issue somehow.

self.row_item_dict[key] = self._assign_value_based_on_data_type(key, value)

    def _assign_value_based_on_data_type(self, key, value):
        for time_stamp_column in self._time_stamp_columns:
            if key == time_stamp_column:
                try:
                    print(f"twinId: {self.twin_id}  key: {key} value: {value}")
                    if value is None or value == '':
                        return None
                    if len(value.split('.')) > 1 and len(value.split('.')[1]) > 3:
                        # If microseconds are present, truncate to milliseconds
                        date_string = '.'.join(value.split('.')[:2])[:23]  # Truncate to milliseconds

                    # Convert date string to datetime
                    datetime_obj = datetime.strptime(date_string, '%Y-%m-%dT%H:%M:%S.%f')

                    # Set timezone to UTC
                    datetime_obj = datetime_obj.replace(tzinfo=timezone.utc)

                    return datetime_obj
                except Exception as e:
                    print("An error occurred:", e)

        return value

when starting with debug this immediately happens on the output:

twinId: XXXX#ZZZZ  key: createdAt value: 2021-04-03T02:06:57.606Z

/usr/local/Cellar/[email protected]/3.10.13_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

When I just run the code without a debugger then I get all objects converted without any issues.

Can anyone give me a hint about what could cause this behavior as I really have no idea anymore what to do.

I hope I provided all the necessary information, if you require more info please let me know.

Thanks, Andre

0

There are 0 answers