我正在使用 HF Seq2SeqTrainingArguments 和 Seq2SeqTrainer 微调 HuggingFace 变压器模型(PyTorch 版本),并且我想在 Tensorboard 中显示训练和验证损失(在同一张图表中)。
据我了解,为了将两个损失绘制在一起,我需要使用 SummaryWriter。 HF Callbacks 文档描述了一个可以接收 tb_writer 参数的 TensorBoardCallback 函数:
但是,如果它应该与 Trainer API 一起使用,我无法弄清楚什么是正确的使用方法。
我的代码看起来像这样:
args = Seq2SeqTrainingArguments(
output_dir=output_dir,
evaluation_strategy='epoch',
learning_rate= 1e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=num_train_epochs,
predict_with_generate=True,
logging_steps=logging_steps,
report_to='tensorboard',
push_to_hub=False,
)
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_train_data,
eval_dataset=tokenized_val_data,
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
我认为我应该在训练器中包含对 TensorBoard 的回调,例如
callbacks = [TensorBoardCallback(tb_writer=tb_writer)]
但我找不到如何使用/导入什么来使用它的综合示例。
我还在 GitHub 上发现了此功能请求,
https://github.com/huggingface/transformers/pull/4020
但没有使用示例,所以我很困惑......
任何见解将不胜感激
最佳答案
据我所知,在同一个 TensorBoard 图表上绘制两个值的唯一方法是使用两个具有相同根目录的单独的SummaryWriter。例如,日志记录目录可能是:log_dir/train
和 log_dir/eval
。
此方法用于 this answer但对于 TensorFlow
而不是 pytorch
。
为了使用 🤗 Trainer
API 执行此操作,需要一个自定义回调,该回调需要两个 SummaryWriter
。以下是我的自定义回调 CombinedTensorBoardCallback
的代码,它是我通过修改 TensorBoardCallback
的代码而制作的:
import os
from transformers.integrations import TrainerCallback, is_tensorboard_available
def custom_rewrite_logs(d, mode):
new_d = {}
eval_prefix = "eval_"
eval_prefix_len = len(eval_prefix)
test_prefix = "test_"
test_prefix_len = len(test_prefix)
for k, v in d.items():
if mode == 'eval' and k.startswith(eval_prefix):
if k[eval_prefix_len:] == 'loss':
new_d["combined/" + k[eval_prefix_len:]] = v
elif mode == 'test' and k.startswith(test_prefix):
if k[test_prefix_len:] == 'loss':
new_d["combined/" + k[test_prefix_len:]] = v
elif mode == 'train':
if k == 'loss':
new_d["combined/" + k] = v
return new_d
class CombinedTensorBoardCallback(TrainerCallback):
"""
A [`TrainerCallback`] that sends the logs to [TensorBoard](https://www.tensorflow.org/tensorboard).
Args:
tb_writer (`SummaryWriter`, *optional*):
The writer to use. Will instantiate one if not set.
"""
def __init__(self, tb_writers=None):
has_tensorboard = is_tensorboard_available()
if not has_tensorboard:
raise RuntimeError(
"TensorBoardCallback requires tensorboard to be installed. Either update your PyTorch version or"
" install tensorboardX."
)
if has_tensorboard:
try:
from torch.utils.tensorboard import SummaryWriter # noqa: F401
self._SummaryWriter = SummaryWriter
except ImportError:
try:
from tensorboardX import SummaryWriter
self._SummaryWriter = SummaryWriter
except ImportError:
self._SummaryWriter = None
else:
self._SummaryWriter = None
self.tb_writers = tb_writers
def _init_summary_writer(self, args, log_dir=None):
log_dir = log_dir or args.logging_dir
if self._SummaryWriter is not None:
self.tb_writers = dict(train=self._SummaryWriter(log_dir=os.path.join(log_dir, 'train')),
eval=self._SummaryWriter(log_dir=os.path.join(log_dir, 'eval')))
def on_train_begin(self, args, state, control, **kwargs):
if not state.is_world_process_zero:
return
log_dir = None
if state.is_hyper_param_search:
trial_name = state.trial_name
if trial_name is not None:
log_dir = os.path.join(args.logging_dir, trial_name)
if self.tb_writers is None:
self._init_summary_writer(args, log_dir)
for k, tbw in self.tb_writers.items():
tbw.add_text("args", args.to_json_string())
if "model" in kwargs:
model = kwargs["model"]
if hasattr(model, "config") and model.config is not None:
model_config_json = model.config.to_json_string()
tbw.add_text("model_config", model_config_json)
# Version of TensorBoard coming from tensorboardX does not have this method.
if hasattr(tbw, "add_hparams"):
tbw.add_hparams(args.to_sanitized_dict(), metric_dict={})
def on_log(self, args, state, control, logs=None, **kwargs):
if not state.is_world_process_zero:
return
if self.tb_writers is None:
self._init_summary_writer(args)
for tbk, tbw in self.tb_writers.items():
logs_new = custom_rewrite_logs(logs, mode=tbk)
for k, v in logs_new.items():
if isinstance(v, (int, float)):
tbw.add_scalar(k, v, state.global_step)
else:
logger.warning(
"Trainer is attempting to log a value of "
f'"{v}" of type {type(v)} for key "{k}" as a scalar. '
"This invocation of Tensorboard's writer.add_scalar() "
"is incorrect so we dropped this attribute."
)
tbw.flush()
def on_train_end(self, args, state, control, **kwargs):
for tbw in self.tb_writers.values():
tbw.close()
self.tb_writers = None
如果您想将训练和评估结合起来以获取除损失之外的其他指标,则应相应修改 custom_rewrite_logs
。
与往常一样,回调位于 Trainer
构造函数中。在我的测试示例中是:
trainer = Trainer(
model=rnn,
args=train_args,
train_dataset=train_dataset,
eval_dataset=validation_dataset,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
callbacks=[CombinedTensorBoardCallback]
)
此外,您可能希望删除默认的 TensorBoardCallback
,否则除了组合损失图之外,训练损失和验证损失将像默认情况下一样单独显示。
trainer.remove_callback(TensorBoardCallback)
这是生成的 TensorBoard
View :
关于python - 有没有办法使用 HuggingFace TrainerAPI 在同一个图表上绘制训练和验证损失?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73281901/