TPU V4 Runtime Error: SliceBuilder detects hardware error

29 views Asked by At

I've been running some jax code successfully on a tpu v4-64 slice. However, my slice was preempted and when I recreated the same size slice I am now running into the following error :

"RuntimeError: Unable to initialize backend 'tpu': INTERNAL: SliceBuilder detects hardware error and is stopping TPU slice. (set JAX_PLATFORMS='' to automatically choose an available backend)".

The jax code has not changed between the old and new slice. I tried again to recreate a new v4-64 slice but encountered the same error. The error also always occurs on worker 0.

Any help would be greatly appreciated!

I've tried

  • recreating the slice
  • launching the command on each worker separately vs using --all-workers
0

There are 0 answers