Python 字符编码欧洲口音

我知道这不是一个罕见的问题，并且已经有多个关于此问题的答案( 1 、 2 、 3 )，但即使遵循那里的建议，我仍然看到此错误(对于以下代码):

uri_name = u"%s_%s"% (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip ()) UnicodeDecodeError:“ascii”编解码器无法解码位置 4 中的字节 0xc3:序号不在范围内(128)

所以我试图从艺术家姓名列表中获取一个网址，其中很多都有重音符号和欧洲字符，如下所示(他们的名字也通过 repr 打印有特殊字符):

Auberjonois, René -> Auberjonois, Ren\xc3\xa9
Bäumer, Eduard -> B\xc3\xa4umer, Eduard
Baur-Nütten, Gisela -> Baur-N\xc3\xbctten, Gisela
Bösken, Lorenz -> B\xc3\xb6sken, Lorenz
Čapek, Josef -> \xc4\x8capek, Josef
Großmann, Rudolf -> Gro\xc3\x9fmann, Rudolf

我尝试运行的 block 是:

def create_uri(artist_name):

  artist_name = artist_name

  name = artist_name.split(",")

  uri_name = u"%s_%s" % (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip())

  uri = 'http://example.com/' + uri_name

  print uri

create_uri('Name, Non_Accent')
create_uri('Auberjonois, René')

因此第一个可以工作并生成 http://example.com/Non_Accent_Name 但第二个失败并出现上述错误。

我已将 #coding=utf-8 添加到脚本顶部，并尝试在整个过程中的每个点对 artist_name 字符串进行编码，只是为了获得每次都出现同样的错误。

如果重要的话，我使用 Atom 作为文本编辑器，当我打开这些名称所在的 .csv 文件时，重音符号都会正确显示。

我还能做什么来确保脚本将 UTF-8 解释为 UTF-8 而不是 ascii？

最佳答案

停止使用 UTF-8。到处使用 unicode，并且仅在接口(interface)处解码/编码(如果需要)。

def create_uri(artist_name):
  name = artist_name.split(u",")
  uri_name = u"%s_%s" % (name[1].strip(), name[0].strip())
  uri = u'http://example.com/' + uri_name
  print uri

create_uri(u'Name, Non_Accent')
create_uri(u'Auberjonois, René')

关于Python 字符编码欧洲口音，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23326234/

Python 字符编码欧洲口音

上一篇：python - pymongo更新位置数据

下一篇：python - Flask 端点的 RESTful 访问