python - 统一码编码错误 : 'latin-1' codec can't encode character u'\u2014'

标签 python mysql unicode

我收到此错误 UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2014'

我正在尝试将大量新闻文章加载到 MySQLdb 中。但是我在处理非标准字符时遇到了困难,对于各种字符,我遇到了数百个这样的错误。我可以使用 .replace() 单独处理它们,尽管我想要一个更完整的解决方案来正确处理它们。

ubuntu@ip-10-0-0-21:~/scripts/work$ python test_db_load_error.py
Traceback (most recent call last):
  File "test_db_load_error.py", line 27, in <module>
    cursor.execute(sql_load)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 157, in execute
    query = query.encode(charset)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2014' in position 158: ordinal not in range(256)

我的脚本;

import MySQLdb as mdb
from goose import Goose
import string
import datetime

host = 'rds.amazonaws.com'
user = 'news'
password = 'xxxxxxx'
db_name = 'news_reader'
conn = mdb.connect(host, user, password, db_name)

url = 'http://www.dailymail.co.uk/wires/ap/article-3060183/Andrew-Lesnie-Lord-Rings-cinematographer-dies.html?ITO=1490&ns_mchannel=rss&ns_campaign=1490'
g = Goose()
article = g.extract(url=url)
body = article.cleaned_text
body = body.replace("'","`")
load_date = str(datetime.datetime.now())
summary = article.meta_description
title = article.title
image = article.top_image

sql_load = "insert into articles " \
        "    (title,summary,article,,image,source,load_date) " \
        "     values ('%s','%s','%s','%s','%s','%s');" % \
        (title,summary,body,image,url,load_date)
cursor = conn.cursor()
cursor.execute(sql_load)
#conn.commit()

如有任何帮助,我们将不胜感激。

最佳答案

当您创建 mysqldb 连接时,将 charset='utf8' 传递给连接。

conn = mdb.connect(host, user, password, db_name, charset='utf8')

关于python - 统一码编码错误 : 'latin-1' codec can't encode character u'\u2014',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29940759/

相关文章:

python - 检测异常是否已经在 Python 2.7 的嵌套 with 语句中处理

mysql - 理解不存在

python - UnicodeEncodeError : 'charmap' codec can't encode character. ..问题

python - 如何使用 Python 读取 excel Unicode 字符

javascript - 使用 Javascript 查找 Unicode 字符名称

python - 为什么请求停止?

python - 如何检查一个变量是否与其他两个变量中的至少一个相同?

python - 忽略 nan 值的二维插值

使用ansible的python-mysqldb模块错误

MySQL 自定义排序(日期时间和连接表)