python - Scrapy-MySQL管道不保存数据

标签 python mysql python-3.x scrapy scrapy-pipeline

我正在使用 scrapy 抓取网站的外部链接并将这些链接存储到 MYSQl 数据库。我用了snippet在我的代码中。当我运行蜘蛛时,我看到链接被废弃但给出错误

2018-03-07 13:33:27 [scrapy.log] ERROR: not all arguments converted during string formatting

很明显,由于点、斜杠、逗号和破折号,链接没有转换为字符串。那么我怎样才能传递链接并存储它们而不会出现错误。TIA

管道.py

from scrapy import log
from twisted.enterprise import adbapi
import MySQLdb.cursors


class MySQLStorePipeline(object):

def __init__(self):
    self.dbpool = adbapi.ConnectionPool('MySQLdb', db='usalogic_testdb',
            user='root', passwd='1234', cursorclass=MySQLdb.cursors.DictCursor,
            charset='utf8', use_unicode=True)

def process_item(self, item, spider):
    # run db query in thread pool
    query = self.dbpool.runInteraction(self._conditional_insert, item)
    query.addErrback(self.handle_error)

    return item

def _conditional_insert(self, tx, item):
    # create record if doesn't exist. 
    # all this block run on it's own thread
    tx.execute("select * from test where link = %s", (item['link'], ))
    result = tx.fetchone()
    if result:
        log.msg("Item already stored in db: %s" % item, level=log.DEBUG)
    else:
        tx.execute(\
            "insert into test (link) "
            "values (%s)",
            (item['link'])
        )
        log.msg("Item stored in db: %s" % item, level=log.DEBUG)

def handle_error(self, e):
    log.err(e)

当给出运行命令时 ITEMS.py

class CollectUrlItem(scrapy.Item):
link = scrapy.Field()

settings.py

ITEM_PIPELINES = {

'rvca4.pipelines.MySQLStorePipeline': 800,
}

最佳答案

我认为如果你使用列表而不是元组,它会起作用

tx.execute(\
        "insert into test (link) "
        "values (%s)",
        [ item['link'] ]
    )

或者,向元组添加逗号

tx.execute(\
        "insert into test (link) "
        "values (%s)",
        (item['link'], )
    )

因为在元组中添加尾随逗号才真正使其成为元组。请阅读下文

(1)  # the number 1 (the parentheses are wrapping the expression `1`)
(1,) # a 1-tuple holding a number 1

关于python - Scrapy-MySQL管道不保存数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49147353/

相关文章:

python - PuLP 输出到 numpy 数组

python - 自定义 junitxml Pytest 报告以从自定义标记添加多个属性

PYTHONPATH 与 sys.path(重新加载)

mysql - 具有自动增量列的插入性能

python - 在 Matplotlib (Python) 中将文本添加到条形图中

Python 在另一个类中使用变量

python - 使用 Flask-Login 获取 'str' 对象在 Flask 中没有属性 'is_authenticated'

MySQL 8.0 : Create a polygon as a circle around a point

php - 使用 PHP - MySQL 进行 MMORPG

python - 尽管转换为列表,但获取 'dict_keys' 对象不支持索引