huggingface - 微调开放式法学硕士

我是一个正在尝试学习微调的新手。从 falcon 7B 指导 LLM 作为我的基础 LLM 开始，并希望通过开放助理指导数据集对其进行微调。我有 2080 Ti 和 11G VRAM。所以我使用 4 位量化和 Lora。

这些是我迄今为止所做的实验:

1> 我用 SFT 训练器从拥抱脸部开始训练了 25000 个 epoch，损失从 1.8 下降到 0.7。下面是我用于训练的完整代码。

import torch, einops
from datasets import load_dataset
from peft import LoraConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    AutoTokenizer,
    TrainingArguments
)
from peft.tuners.lora import LoraLayer

from trl import SFTTrainer


def create_and_prepare_model():
    compute_dtype = getattr(torch, "float16")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=True,
    )

    model = AutoModelForCausalLM.from_pretrained(
        "tiiuae/falcon-7b-instruct", quantization_config=bnb_config, device_map={"": 0}, trust_remote_code=True
    )

    peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=[
            "query_key_value"
        ],
    )

    tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=True)
    tokenizer.pad_token = tokenizer.eos_token

    return model, peft_config, tokenizer


training_arguments = TrainingArguments(
    output_dir="./results_falcon-7b-instruct-new",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=10,
    optim="paged_adamw_32bit",
    save_steps=5,
    logging_steps=10,
    learning_rate=2e-4,
    fp16=True,
    max_grad_norm=0.3,
    max_steps=20,
    warmup_ratio=0.03,
    # group_by_length=True,
    lr_scheduler_type="constant",
)

model, peft_config, tokenizer = create_and_prepare_model()
model.config.use_cache = False
dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=True,
)

trainer.train()
trainer.save_model("falcon-instruct-7b-4bit-openassist-latest-new")
model.config.to_json_file("falcon-instruct-7b-4bit-openassist-latest-new/config.json")

耗时约 53 小时。但当被问及“你好吗？”之类的简单问题时，模型只是吐出胡言乱语。

2> 300 个 epoch，损失从 1.8 下降到 1.5，但模型仍然输出乱码。

3> 40 个 epoch，损失从 1.8 下降到 1.7，但模型仍然输出乱码。

有什么建议可以让我抢占先机吗？请建议。任何做类似事情的开源代码都将不胜感激。非常感谢。

最佳答案

1) 将提示与数据集匹配

您在生成提示中输入的内容是否与正在微调的内容相似？

通常，当您使用与微调数据集格式相同的格式时，LLM 将生成所需的输出。这种格式有助于“引导”或“上下文化”生成文本。

Alpaca数据集通常遵循以下格式:

### Instruction:
(Instruction Text)

### Input:
(Auxiliary Input Text)

### Response:
(Desired Response Text)

Vicuna数据集通常遵循以下格式:

A chat between a human and an assistant.

### Human:
(Question Text)
### Assistant:
(Response Text)

最近在 Microsoft Orca paper 中正式描述的另一种格式:

<System>: (You are a helpful <role> ai assistant)
<Instruction>: (Instruction Text Goes Here)
<Input>: (Other input goes here)
<Response>: (The desired response goes here)

请注意数据集和提示中的换行符( 如果 LLM 模型在预训练中有任何换行符，还要注意文本符号的结尾 )。 Vicuna 推理提示例如

### Human:
What shape is the Earth?
### Assistant:

如果您直接在 python 中使用转换器来执行推理，则必须将“### Assistant:\n”行添加到提示符末尾，注意换行符“\n”的处理方式数据集中。法学硕士即使不完全是随机鹦鹉，也是光荣的自动完成。

Vicuna 格式非常适合聊天机器人微调提示。 Alpaca 和 Orca 格式对于遵循经过训练以提供特定格式信息的模型的指导很有用。这个主题正在不断发展，在实践中，大多数用户从未见过或想到过即时工程的实质内容。总而言之，这些格式并不神奇，只是生成与值得严格关注的意图一致的可解释响应的解决方案的一部分。

2) 当您考虑到所有提示和数据集格式后，请返回超参数优化。

创建一个训练数据集，其中包含从基础数据集中随机抽取的 1000 个项目，并对 300 个随机样本进行评估。
使用转换器支持的超参数优化方法
- optuna、sigopt、raytune、wandb
- 更多来自 Huggingface

3) 20 个训练步骤后的文本生成

下面的示例可以工作，但会抛出警告，因为 Falcon 模型没有很好地集成到 Huggingface 变压器库中。

from transformers import pipeline
prompt = """### Human: Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research. ### Assistant:"""
    
    
pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
            device_map="auto",
    )
        
sequences = pipe(
            prompt,
            max_length=100,
            do_sample=True,
            top_k=10,
            num_return_sequences=1,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.eos_token_id
    )

for seq in sequences:
    print(seq['generated_text'])
        
>>>
### Human: 
Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.

### Assistant:
In economics, a monopsony is a market position where a single entity has enough market power to exercise price-setting and product differentiation strategies. In particular, a labour market monopsony occurs when a single employer has the ability

关于huggingface - 微调开放式法学硕士，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76537855/

huggingface - 微调开放式法学硕士

上一篇：delphi - 为什么 EN_PROTECT 通知消息没有发送到 RichEdit？

下一篇：python - 如何减少方位角计算标准差的执行时间