python - 在 Scrapy 项目中使用 Django 的模型(在管道中)

以前有人问过这个问题，但总是出现的答案是使用 DjangoItem .然而它在它的 github 上声明:

often not a good choice for a write intensive applications (such as a web crawler) ... may not scale well

这是我的问题的症结所在，我想以与运行 python manage.py shell 时相同的方式使用我的 django 模型并与其交互> 然后我执行 from myapp.models import Model1。使用查询 like seen here.

我已经尝试过相对导入并将我的整个 scrapy 项目移动到我的 django 应用程序中，但都无济于事。

我应该将我的 scrapy 项目移动到哪里才能让它工作？如何在 scrapy 管道内的 shell 中重新创建/使用我可用的所有方法？

提前致谢。

最佳答案

在这里，我创建了一个在 django 中使用 scrapy 的示例项目。并在其中一个管道中使用 Django 模型和 ORM。

https://github.com/bipul21/scrapy_django

目录结构从您的 django 项目开始。在这种情况下，项目名称是 django_project。一旦进入基础项目，您就可以创建自己的 scrapy 项目，即 scrapy_project here

在您的 scrapy 项目设置中添加以下行以设置初始化 django

import os import sys import django sys.path.append(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "..")) os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings' django.setup()

在管道中我对问题模型做了一个简单的查询

from questions.models import Questions class ScrapyProjectPipeline(object): def process_item(self, item, spider): try: question = Questions.objects.get(identifier=item["identifier"]) print "Question already exist" return item except Questions.DoesNotExist: pass question = Questions() question.identifier = item["identifier"] question.title = item["title"] question.url = item["url"] question.save() return item

您可以 checkin 项目以了解更多详细信息，例如模型架构。

关于python - 在 Scrapy 项目中使用 Django 的模型(在管道中)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41905464/

python - 在 Scrapy 项目中使用 Django 的模型(在管道中)

上一篇：python - 换行适用于 Windows 但不适用于 Linux

下一篇：Python - 在满足多个条件的NumPy数组中选择行