我正在尝试使用 llama_index该模型根据您的个人文档构建索引,并允许您提出有关 GPT 聊天信息的问题。
这是完整的代码(当然还有我的 API):
import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex.from_documents(documents)
当我根据文档中的步骤运行索引构建时,此步骤失败:
index = GPTSimpleVectorIndex.from_documents(documents)
出现以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\base.py", line 92, in from_documents
service_context = service_context or ServiceContext.from_defaults()
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\service_context.py", line 71, in from_defaults
embed_model = embed_model or OpenAIEmbedding()
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\embeddings\openai.py", line 209, in __init__
super().__init__(**kwargs)
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\embeddings\base.py", line 55, in __init__
self._tokenizer: Callable = globals_helper.tokenizer
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\utils.py", line 50, in tokenizer
enc = tiktoken.get_encoding("gpt2")
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\tiktoken\registry.py", line 63, in get_encoding
enc = Encoding(**constructor())
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\tiktoken_ext\openai_public.py", line 11, in gpt2
mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\tiktoken\load.py", line 83, in data_gym_to_mergeable_bpe_ranks
for first, second in bpe_merges:
ValueError: not enough values to unpack (expected 2, got 1)
我应该提到,我在包含此类文件和文件夹以及子文件夹的特定文件夹内的 DOCX 文件上尝试过此操作。
最佳答案
如果您的文件位于子文件夹中,则必须将递归参数设置为 True:
documents = SimpleDirectoryReader('documents', recursive=True).load_data()
关于python - 我正在尝试运行 llama 索引模型,但是当我进入索引构建步骤时 - 它一次又一次失败,我该如何解决这个问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75968314/