nlp - huggingface longformer内存问题

标签 nlp classification huggingface-transformers

我正在构建基于 Huggingface Longformer 的分类器。下面是我的主要代码

model = LongformerForSequenceClassification.from_pretrained('/mnt/longformer_official/',
                                                           gradient_checkpointing=False,
                                                           attention_window = 512)
tokenizer = LongformerTokenizerFast.from_pretrained('/mnt/longformer_official/', max_length = 4000)


train_df_tuning_dataset_tokenized = train_df_tuning_dataset.map(tokenization, batched = True, batch_size = len(train_df_tuning_dataset))


training_args = TrainingArguments(
    output_dir="xyz",

num_train_epochs = 5,# changed this from 5
per_device_train_batch_size = 4,#4,#8,#adding on 18 march from huggingface example notebook
gradient_accumulation_steps = 16,#16,  #8  adding it back 18 march even though missing in huggingface example notebook as otherwise memory issues

per_device_eval_batch_size= 16,#16
evaluation_strategy = "epoch",

save_strategy = "epoch",#adding on 18 march from huggingface example notebook
learning_rate=2e-5,#adding on 18 march from huggingface example notebook
load_best_model_at_end=True,
greater_is_better=False,

disable_tqdm = False, 
weight_decay=0.01,
optim="adamw_torch",#removing on 18 march from huggingface example notebook
run_name = 'longformer-classification-16March2022'
)

#class weights
class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss (suppose one has 3 labels with different weights)
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 0.5243])).to(device)
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1)).to(device)
        return (loss, outputs) if return_outputs else loss

trainer = CustomTrainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_df_tuning_dataset_tokenized,
    eval_dataset=val_dataset_tokenized
)

当我在 tokenizer 中尝试 max_length=1500 时,代码运行正常。使用 max_length=4000 运行时失败 我什至尝试将这些参数设置为 per_device_train_batch_size = 1,gradient_accumulation_steps = 1,per_device_eval_batch_size = 1

我的问题:

  1. 可以设置 per_device_train_batch_size = 1, gradient_accumulation_steps = 1, per_device_eval_batch_size = 1 吗?

  2. 我得到的错误如下。除了获得更多内存之外,还有其他解决方法吗?

    运行时错误:CUDA 内存不足。尝试分配 720.00 MiB(GPU 0;14.76 GiB 总容量;12.77 GiB 已分配;111.75 MiB 空闲;PyTorch 总共保留 13.69 GiB)如果保留内存是 >> 已分配内存,请尝试设置 max_split_size_mb 以避免碎片化。请参阅内存管理和 PYTORCH_CUDA_ALLOC_CONF 的文档

最佳答案

尝试设置

gradient_accumulation_steps = int(math.ceil(len(tr_inputs) / per_device_train_batch_size) / 1) * epochs

因为 gradient_aacumulation_steps 应该基于 epochs 和 batch size 导出

关于nlp - huggingface longformer内存问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71668624/

相关文章:

python - 类似 Siri 的应用程序 : calculating similarities between a query and a predefined set of control phrases

ruby-on-rails - ruby/rails 中的自然语言日期?

classification - 朴素贝叶斯分类器——多重决策

python - Huggingface的Bert的第二个输出是什么意思?

machine-learning - 提取相关句子到实体

javascript - CoffeeScript 或 JavaScript 中的基本 NLP——Punkt 标记化,简单训练的贝叶斯模型——从哪里开始?

python - 使用 Numpy 进行低效正则逻辑回归

java - 计算用于电子邮件分类的语言模型的困惑度

nlp - 如何使用 BertForMaskedLM 或 BertModel 计算句子的困惑度?

python - BERT 中变压器编码器和解码器的输入是什么?