amazon-web-services - 从 s3 加载 gensim 不起作用

标签 amazon-web-services amazon-s3 aws-lambda gensim

我有以下代码(仅显示相关部分):

def load_model(model_file):
    return Doc2Vec.load(model_file)

# infer 
def infer_docs(input_string, model_file, inferred_docs=5):
    model = load_model(model_file)
    processed_str = simple_preprocess(input_string, min_len=2, max_len=35)    
    inferred_vector = model.infer_vector(processed_str)
    return model.docvecs.most_similar([inferred_vector], topn=inferred_docs)

代码在 aws 上作为 lambda 运行。当我的模型很小时(我认为这就是原因),它工作正常,但是当我有一个合适的尺寸模型(~200mb)时,我收到以下错误

[INFO]  2018-01-21T20:44:59.613Z    f2689816-feeb-11e7-b397-b7ff2947dcec    testing keys in event dict
[INFO]  2018-01-21T20:44:59.614Z    f2689816-feeb-11e7-b397-b7ff2947dcec    loading model from s3://data-d2v/trained_models/model_law
[INFO]  2018-01-21T20:44:59.614Z    f2689816-feeb-11e7-b397-b7ff2947dcec    loading Doc2Vec object from s3://data-d2v/trained_models/model_law
[INFO]  2018-01-21T20:44:59.650Z    f2689816-feeb-11e7-b397-b7ff2947dcec    Found credentials in environment variables.
[INFO]  2018-01-21T20:44:59.707Z    f2689816-feeb-11e7-b397-b7ff2947dcec    Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com
[INFO]  2018-01-21T20:44:59.801Z    f2689816-feeb-11e7-b397-b7ff2947dcec    Starting new HTTPS connection (2): s3.eu-west-1.amazonaws.com
[INFO]  2018-01-21T20:45:35.830Z    f2689816-feeb-11e7-b397-b7ff2947dcec    loading wv recursively from s3://data-d2v/trained_models/model_law.wv.* with mmap=None
[INFO]  2018-01-21T20:45:35.830Z    f2689816-feeb-11e7-b397-b7ff2947dcec    loading syn0 from s3://data-d2v/trained_models/model_law.wv.syn0.npy with mmap=None
[Errno 2] No such file or directory: 's3://data-d2v/trained_models/model_law.wv.syn0.npy': FileNotFoundError
Traceback (most recent call last):
  File "/var/task/handler.py", line 20, in infer_handler
    event['input_text'], event['model_file'], inferred_docs=10)
  File "/var/task/infer_doc.py", line 26, in infer_docs
    model = load_model(model_file)
  File "/var/task/infer_doc.py", line 21, in load_model
    return Doc2Vec.load(model_file)
  File "/var/task/gensim/models/word2vec.py", line 1569, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
  File "/var/task/gensim/utils.py", line 282, in load
    obj._load_specials(fname, mmap, compress, subname)
  File "/var/task/gensim/models/word2vec.py", line 1593, in _load_specials
    super(Word2Vec, self)._load_specials(*args, **kwargs)
  File "/var/task/gensim/utils.py", line 301, in _load_specials
    getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
  File "/var/task/gensim/utils.py", line 312, in _load_specials
    val = np.load(subname(fname, attrib), mmap_mode=mmap)
  File "/var/task/numpy/lib/npyio.py", line 372, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 's3://data-d2v/trained_models/model_law.wv.syn0.npy'

首先,文件 s3://data-d2v/trained_models/model_law.wv.syn0.npy 存在,其次对我来说,主模型文件 s3://data-d2v/trained_models/model_law 已加载。

验证我添加的文件的访问权限和存在性:

import smart_open
with smart_open.smart_open('s3://data-d2v/trained_models/model_law.wv.syn0.npy') as prut:
    for line in prut:
        print(line)

它的打印效果很好。

你能帮忙吗?

最佳答案

当模型拆分为多个文件时,目前无法使用 s3 存储桶加载模型。我已在 github 上发布了功能请求

https://github.com/RaRe-Technologies/gensim/issues/1851

关于amazon-web-services - 从 s3 加载 gensim 不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48371824/

相关文章:

python - 如何将 django-storages 用于媒体文件和静态文件?

php - golang S3 客户端库是否具有获取 Iterator 函数来检索 S3 存储桶中的所有对象

amazon-web-services - 无法使用 boto3 执行 s3 复制

amazon-web-services - 使用 nginx 向位于云中的 docker 容器发出 http 请求

amazon-web-services - 使用 Application Load Balancer 在 aws ECS 上发现服务

python - 如何在云形成模板/蓝图中不进行硬编码的情况下传递 secret ?

amazon-web-services - 如何使用缓存的 boto3 客户端和 Lambda 预置并发刷新凭证?

python - 为在 AWS Lambda 中使用的 python 包指定 C(++) 依赖项

python-3.x - 使用 boto3 获取给定 EC2 实例类型的当前价格

java - jsp 不能在 linux ec2 上工作