python - 在我的 Mac M1 上本地运行 Databricks Dolly

标签 python open-source chatgpt-api alpaca

我正在尝试部署和运行 Databricks Dolly,它是最新发布的开源 LLM 模型,作为 gpt 的替代选项

文档 - https://learn.microsoft.com/en-us/azure/architecture/aws-professional/services

尝试使用拥抱鲮鱼变压器来运行此程序

代码-


tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v1-6b")

model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v1-6b")

import numpy as np
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    PreTrainedModel,
    PreTrainedTokenizer
)

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v1-6b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v1-6b", device_map="auto", trust_remote_code=True, offload_folder='offload')

PROMPT_FORMAT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""


def generate_response(instruction: str, *, model: PreTrainedModel, tokenizer: PreTrainedTokenizer,
                      do_sample: bool = True, max_new_tokens: int = 256, top_p: float = 0.92, top_k: int = 0,
                      **kwargs) -> str:
    input_ids = tokenizer(PROMPT_FORMAT.format(instruction=instruction), return_tensors="pt").input_ids.to("cuda")

    # each of these is encoded to a single token
    response_key_token_id = tokenizer.encode("### Response:")[0]
    end_key_token_id = tokenizer.encode("### End")[0]

    gen_tokens = model.generate(input_ids, pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id,
                                do_sample=do_sample, max_new_tokens=max_new_tokens, top_p=top_p, top_k=top_k, **kwargs)[
        0].cpu()

    # find where the response begins
    response_positions = np.where(gen_tokens == response_key_token_id)[0]

    if len(response_positions) >= 0:
        response_pos = response_positions[0]

        # find where the response ends
        end_pos = None
        end_positions = np.where(gen_tokens == end_key_token_id)[0]
        if len(end_positions) > 0:
            end_pos = end_positions[0]

        return tokenizer.decode(gen_tokens[response_pos + 1: end_pos]).strip()

    return None


# Sample similar to: "Excited to announce the release of Dolly, a powerful new language model from Databricks! #AI #Databricks"
generate_response("Write a tweet announcing Dolly, a large language model from Databricks.", model=model,
                  tokenizer=tokenizer)

我收到以下错误 -

断言错误:Torch 未在启用 CUDA 的情况下编译

在网上查找时我发现 - *PyTorch 仅支持 x86_64 架构上的 CUDA,因此 CUDA 支持不适用于 Apple M1 Mac。 *

我该怎么办?

最佳答案

M1 不支持 CUDA,您可能需要删除 .to("cuda") 才能实现此功能。

input_ids = tokenizer(PROMPT_FORMAT.format(instruction=instruction), return_tensors="pt").input_ids.to("cuda")

关于python - 在我的 Mac M1 上本地运行 Databricks Dolly,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75956610/

相关文章:

oracle - 在 Oracle 11g 数据库中使用 OpenAI API (chatgpt)

python - plt.show() 不显示数据,而是将其保留用于下一个图(spyder)

python - 在 Pandas Dataframe 中搜索动态单词数

python - 使用 mysql2pgsql 将数据库从 MySql 传输到 Postgres 时出错

c - 什么是 HTTP Parser,用在什么地方,有什么作用

post - OpenAI ChatGPT (GPT-3.5) API 错误 400 : "Bad Request" (migrating from GPT-3 API to GPT-3. 5 API)

python - 加快矩阵中每个 x,y 点的角度计算

java 二进制消息解析

java - 开源 Swing 应用程序

python - 如何处理KeyError : 'choices' with chatGPT?