python - 使用 scrapy 创建 RSS

我添加了一个管道，我在 stackoverflow 中找到了一个示例项目的答案。它是:

import csv
from craiglist_sample import settings


def write_to_csv(item):
   writer = csv.writer(open(settings.csv_file_path, 'a'), lineterminator='\n')
    writer.writerow([item[key] for key in item.keys()])



class WriteToCsv(object):
    def process_item(self, item, spider):
        write_to_csv(item)
        return item

它可以正确写入 csv 文件。然后我将其更改为这个:

import csv
import sys
from craiglist_sample import settings
import datetime
import PyRSS2Gen

def write_to_csv(item):

    rss = PyRSS2Gen.RSS2(
        title = "Andrew's PyRSS2Gen feed",
        link = "http://www.dalkescientific.com/Python/PyRSS2Gen.html",
        description = "The latest news about PyRSS2Gen, a "
                      "Python library for generating RSS2 feeds",

        lastBuildDate = datetime.datetime.now(),

        items = [
           PyRSS2Gen.RSSItem(
             title =str((item['title']),
             link = str((item['link']),
             description = "Dalke Scientific today announced PyRSS2Gen-0.0, "
                           "a library for generating RSS feeds for Python.  ",
             guid = PyRSS2Gen.Guid("http://www.dalkescientific.com/news/"
                              "030906-PyRSS2Gen.html"),
             pubDate = datetime.datetime(2003, 9, 6, 21, 31)),

        ])

    rss.write_xml(open("pyrss2gen.xml", "w"))

class WriteToCsv(object):
    def process_item(self, item, spider):
        write_to_csv(item)
        return item

但问题是它只将最后一个条目写入 xml 文件。我怎样才能解决这个问题？我需要为每个条目添加新行吗？

items.py 是:

class CraiglistSampleItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title=Field()
    link=Field()

最佳答案

使用a追加，每次使用w都会覆盖，所以你只能得到最后一条数据:

rss.write_xml(open("pyrss2gen.xml", "a"))

如果您查看原始代码，您会发现它也使用 a 而不是 w。

您可能想使用with打开文件或至少关闭它们时。

关于python - 使用 scrapy 创建 RSS，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28127396/

python - 使用 scrapy 创建 RSS

上一篇：python - 将元组分组到列表中

下一篇：python - ansible 在 OSX 上找不到 pycurl