Data Transformation Pipeline Issue on End-to-End Machine Learning Project

44 views Asked by At

I am following the process shown on Wine Quality Prediction End-to-End ML Project on Krish Naik's YouTube channel to do a Flight Fare Prediction Project.

I am facing an issue with Data Transformation Pipeline.

I run this cell of data transformation pipeline on 03_data_transformation.ipynb:

try:
    config = ConfigurationManager()
    data_transformation_config = config.get_data_transformation_config()
    data_transformation = DataTransformation(config=data_transformation_config)
    # data_transformation.train_test_spliting()
    # New Line
    data_transformation.initiate_data_transformation()
except Exception as e:
    raise e

I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 14: invalid 
start byte

Here is the traceback:

[2023-11-21 11:13:08,772: INFO: common: yaml file: config\config.yaml loaded successfully]
[2023-11-21 11:13:08,776: INFO: common: yaml file: params.yaml loaded successfully]
[2023-11-21 11:13:08,783: INFO: common: yaml file: schema.yaml loaded successfully]
[2023-11-21 11:13:08,784: INFO: common: created directory at: artifacts]
[2023-11-21 11:13:08,788: INFO: common: created directory at: artifacts/data_transformation]
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
g:\Machine_Learning_Projects\iNeuron internship\Flight-Fare-Prediction-End-to-End-ML-Project\research\03_data_transformation.ipynb Cell 10 line 9
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=6'>7</a>     data_transformation.initiate_data_transformation()
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=7'>8</a> except Exception as e:
----> <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=8'>9</a>     raise e

g:\Machine_Learning_Projects\iNeuron internship\Flight-Fare-Prediction-End-to-End-ML-Project\research\03_data_transformation.ipynb Cell 10 line 7
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=3'>4</a>     data_transformation = DataTransformation(config=data_transformation_config)
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=4'>5</a>     # data_transformation.train_test_spliting()
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=5'>6</a>     # New Line
----> <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=6'>7</a>     data_transformation.initiate_data_transformation()
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=7'>8</a> except Exception as e:
      <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=8'>9</a>     raise e

g:\Machine_Learning_Projects\iNeuron internship\Flight-Fare-Prediction-End-to-End-ML-Project\research\03_data_transformation.ipynb Cell 10 line 2
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=26'>27</a> def initiate_data_transformation(self):
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=27'>28</a>     ## reading the data
---> <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=28'>29</a>     df = pd.read_csv(self.config.data_path)
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=30'>31</a>     logger.info('Read data completed')
     <a href='vscode-notebook-cell:/g%3A/Machine_Learning_Projects/iNeuron%20internship/Flight-Fare-Prediction-End-to-End-ML-Project/research/03_data_transformation.ipynb#X12sZmlsZQ%3D%3D?line=31'>32</a>     logger.info(f'df dataframe head: \n{df.head().to_string()}')

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\io\parsers\readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
    899 kwds_defaults = _refine_defaults_read(
    900     dialect,
    901     delimiter,
   (...)
    908     dtype_backend=dtype_backend,
    909 )
    910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\io\parsers\readers.py:577, in _read(filepath_or_buffer, kwds)
    574 _validate_names(kwds.get("names", None))
    576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
    579 if chunksize or iterator:
    580     return parser

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\io\parsers\readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
   1404     self.options["has_index_names"] = kwds["has_index_names"]
   1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\io\parsers\readers.py:1679, in TextFileReader._make_engine(self, f, engine)
   1676     raise ValueError(msg)
   1678 try:
-> 1679     return mapping[engine](f, **self.options)
   1680 except Exception:
   1681     if self.handles is not None:

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:93, in CParserWrapper.__init__(self, src, **kwds)
     90 if kwds["dtype_backend"] == "pyarrow":
     91     # Fail here loudly instead of in cython after reading
     92     import_optional_dependency("pyarrow")
---> 93 self._reader = parsers.TextReader(src, **kwds)
     95 self.unnamed_cols = self._reader.unnamed_cols
     97 # error: Cannot determine type of 'names'

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\_libs\parsers.pyx:550, in pandas._libs.parsers.TextReader.__cinit__()

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\_libs\parsers.pyx:639, in pandas._libs.parsers.TextReader._get_header()

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\_libs\parsers.pyx:850, in pandas._libs.parsers.TextReader._tokenize_rows()

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\_libs\parsers.pyx:861, in pandas._libs.parsers.TextReader._check_tokenize_status()

File c:\Users\2021\.conda\envs\flightfareprediction\lib\site-packages\pandas\_libs\parsers.pyx:2021, in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 14: invalid start byte

I have been stuck here for days. But can't fix the issue.

In the data transformation process, I am not getting any error.

I am getting error in only the code of data transformation pipeline.

Here is my file in GitHub.

My file encoding is UTF-8

Would you please help me to fix this issue?

0

There are 0 answers