After executing the workshop code from the notebook below:
Deploy Falcon 7B instruct on Amazon SageMaker https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/generativeai/llm-workshop/deploy-falcon-40b-and-7b/falcon-7b-instruct-mpi.ipynb
I got an error like below after trying to perform queries using the created endpoint: {"code":424,"message":"Batch inference failed","properties":{},"content":{"keys":[],"values":[]}}
Some logs I find in Cloudwatch are like below. Any ideas about what has happened ?
Edited 2023-10-09 (image upload didn“t work well). See below a Cloudwatch log:
| 1696286212592 | [WARN ] RollingBatch - Batch inference failed: prediction failure |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._prefill_and_decode(preprocessed_new_requests) |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/scheduler_rolling_batch.py", line 131, in _prefill_and_decode |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self.scheduler.add_request( |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 114, in add_request |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._add_request(input_ids[index_not_use_prompt], |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 157, in _add_request |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: new_seq_batcher, output_ids = seq_batcher_cls.init_forward( |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return func(*args, **kwargs) |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batcher_impl.py", line 72, in init_forward |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: lm_output = lm_block.forward(*model_input, past_key_values=kv_cache) |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/lm_block.py", line 81, in forward |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = self.model.forward(input_ids=input_ids, |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = old_forward(*args, **kwargs) |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:TypeError: forward() got an unexpected keyword argument 'position_ids' |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Failed invoke service.invoke_handler() |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Traceback (most recent call last): |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python_engine.py", line 116, in run_server |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs = self.service.invoke_handler(function_name, inputs) |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/service_loader.py", line 29, in invoke_handler |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return getattr(self.module, function_name)(inputs) |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 515, in handle |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return _service.inference(inputs) |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 275, in inference |
| 1696286212592 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs.add(result[idx], key="data", batch_index=i) |
| 1696286217531 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:IndexError: list index out of range |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Rolling batch inference error |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Traceback (most recent call last): |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/rolling_batch.py", line 111, in try_catch_handling |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return func(self, input_data, parameters) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/scheduler_rolling_batch.py", line 55, in inference |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._prefill_and_decode(preprocessed_new_requests) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/scheduler_rolling_batch.py", line 131, in _prefill_and_decode |
| 1696286230123 | [WARN ] RollingBatch - Batch inference failed: prediction failure |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self.scheduler.add_request( |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 114, in add_request |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._add_request(input_ids[index_not_use_prompt], |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 157, in _add_request |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: new_seq_batcher, output_ids = seq_batcher_cls.init_forward( |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return func(*args, **kwargs) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batcher_impl.py", line 72, in init_forward |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: lm_output = lm_block.forward(*model_input, past_key_values=kv_cache) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/lm_block.py", line 81, in forward |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = self.model.forward(input_ids=input_ids, |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = old_forward(*args, **kwargs) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:TypeError: forward() got an unexpected keyword argument 'position_ids' |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Failed invoke service.invoke_handler() |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Traceback (most recent call last): |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python_engine.py", line 116, in run_server |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs = self.service.invoke_handler(function_name, inputs) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/service_loader.py", line 29, in invoke_handler |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return getattr(self.module, function_name)(inputs) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 515, in handle |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return _service.inference(inputs) |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 275, in inference |
| 1696286230123 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs.add(result[idx], key="data", batch_index=i) |
| 1696286234530 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:IndexError: list index out of range |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Rolling batch inference error |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Traceback (most recent call last): |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/rolling_batch.py", line 111, in try_catch_handling |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return func(self, input_data, parameters) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/scheduler_rolling_batch.py", line 55, in inference |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._prefill_and_decode(preprocessed_new_requests) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/scheduler_rolling_batch.py", line 131, in _prefill_and_decode |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self.scheduler.add_request( |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 114, in add_request |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._add_request(input_ids[index_not_use_prompt], |
| 1696286261435 | [WARN ] RollingBatch - Batch inference failed: prediction failure |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 157, in _add_request |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: new_seq_batcher, output_ids = seq_batcher_cls.init_forward( |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return func(*args, **kwargs) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batcher_impl.py", line 72, in init_forward |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: lm_output = lm_block.forward(*model_input, past_key_values=kv_cache) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/lm_block.py", line 81, in forward |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = self.model.forward(input_ids=input_ids, |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = old_forward(*args, **kwargs) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:TypeError: forward() got an unexpected keyword argument 'position_ids' |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Failed invoke service.invoke_handler() |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Traceback (most recent call last): |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python_engine.py", line 116, in run_server |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs = self.service.invoke_handler(function_name, inputs) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/service_loader.py", line 29, in invoke_handler |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return getattr(self.module, function_name)(inputs) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 515, in handle |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return _service.inference(inputs) |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 275, in inference |
| 1696286261435 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs.add(result[idx], key="data", batch_index=i) |
| 1696286265530 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:IndexError: list index out of range |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Rolling batch inference error |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Traceback (most recent call last): |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/rolling_batch.py", line 111, in try_catch_handling |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return func(self, input_data, parameters) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/scheduler_rolling_batch.py", line 55, in inference |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._prefill_and_decode(preprocessed_new_requests) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/rolling_batch/scheduler_rolling_batch.py", line 131, in _prefill_and_decode |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self.scheduler.add_request( |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 114, in add_request |
| 1696287848633 | [WARN ] RollingBatch - Batch inference failed: prediction failure |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: self._add_request(input_ids[index_not_use_prompt], |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batch_scheduler.py", line 157, in _add_request |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: new_seq_batcher, output_ids = seq_batcher_cls.init_forward( |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return func(*args, **kwargs) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/seq_batcher_impl.py", line 72, in init_forward |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: lm_output = lm_block.forward(*model_input, past_key_values=kv_cache) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/scheduler/lm_block.py", line 81, in forward |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = self.model.forward(input_ids=input_ids, |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: output = old_forward(*args, **kwargs) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:TypeError: forward() got an unexpected keyword argument 'position_ids' |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Failed invoke service.invoke_handler() |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:Traceback (most recent call last): |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python_engine.py", line 116, in run_server |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs = self.service.invoke_handler(function_name, inputs) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/service_loader.py", line 29, in invoke_handler |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return getattr(self.module, function_name)(inputs) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 515, in handle |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: return _service.inference(inputs) |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: File "/tmp/.djl.ai/python/0.23.0/djl_python/huggingface.py", line 275, in inference |
| 1696287848633 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>: outputs.add(result[idx], key="data", batch_index=i) |
| 1696287853531 | [INFO ] PyProcess - W-88-falcon_src-stdout: [1,0]<stdout>:IndexError: list index out of range |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[amazon-sagemaker] [WARN ] RollingBatch - Batch inference failed: prediction failure ("code":424,"message":"Batch inference failed")
I was expecting the result of the inference from the Falcon model for the proposed text (question).