Logging Huggingface Trainer in WandB

135 views Asked by At

I am using the following training arguments and trainer for fine-tuning a huggingface model:

trainer_args = TrainingArguments(
    output_dir=model_ckpt.split('/')[0], 
    num_train_epochs=5, 
    per_device_train_batch_size=1, 
    per_device_eval_batch_size=1,
    weight_decay=0.01, 
    logging_steps=5,
    evaluation_strategy='steps', 
    eval_steps=100, 
    eval_accumulation_steps=1,
    save_steps=800,
    report_to="wandb",  # enable logging to W&B
    run_name=f"{your_name}_{model_ckpt.split('/')[0]}_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}",
    overwrite_output_dir=True,
    load_best_model_at_end=True,
    metric_for_best_model='eval_loss',
)
trainer = Trainer(model=model, args=trainer_args,
                  tokenizer=tokenizer, 
                  data_collator=seq2seq_data_collator,
                  train_dataset=dataset_pt["train"], 
                  eval_dataset=dataset_pt["validation"])

I have two questions wrt the logs in WandB:

  1. What does this plot mean? What is train/epoch?
  2. Why can't I see any logs like epoch/batch/train-val loss?

Basically I want to check which epoch my trainer is currently running.

I tried checking log parameters in Training Arguments, but couldn't understand what to change.

Edit 1: The y-axis is 'steps' for the given graph

0

There are 0 answers