I am trying to create aws glue spark job to train one of the data set . I am using xgboost algorithm in 1.3-1 version . When i try to run the estimator , i am having issue
infrastructure : aws glue 4.00 spark shell
all file folders are s3 path
code snippet.
xgb_script_mode_estimator = XGBoost(
entry_point="training.py",
hyperparameters=hyperparameters,
role=role,
instance_count=1,
instance_type=instance_type,
framework_version="1.3-1",
output_path="s3://{}/{}/{}/output".format(hyperparameters['bucket_nm'], '/output/', job_name),
error :
FileNotFoundError: [Errno 2] No such file or directory: 'training.py'
I placed the "glue script" and training.py in the same job bucket in same folder with init.py file .
The XGBoost function is not recognizing the training.py in the same folder (no name mismatch for the training file including case)
This issue resolved by adding "Source-dir" parameter in the XGBboost function by pointing to trianing.py file location