python - Unicode 和 lxml 对象化数据

标签 python python-2.7 unicode lxml

我收到标准的 UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 15: ordinal not in range(128),但我无法得到通常的补救措施。

我尝试将第 72 行更改为 gName = getattr(root, u"name", "").encode('utf-8').strip() 它给了我 AttributeError: no such child: encode 我在第 84 行尝试了 gName.encode('utf-8'),结果相同。

test_data = ("""
    <rsp stat="ok">
        <group id="34427465497@N01" iconserver="1" iconfarm="1" lang=""
                     ispoolmoderated="0" is_member="0" is_moderator="0" is_admin="0">
            <name>DanceFloor - [ © Plz Read Rules ]</name>
            <members>245</members>
            <pool_count>133</pool_count>
            <topic_count>106</topic_count>
            <restrictions photos_ok="1" videos_ok="1" images_ok="1" screens_ok="1"
                 art_ok="1" safe_ok="1" moderate_ok="0" restricted_ok="0" has_geo="0" />
        </group>
    </rsp>
""")

from lxml import html, etree, objectify
import re
import time
import flickrapi

g, u, gt = 0, 0, 0
fErr = ''

t = open(r'C:\Mirc\Python\Temp Files\text.xml', 'r')
td = t.read()

tst = 1   # # True for test data, False for live data
ext = 0   # # True for external test data, False for internal
if tst:
    if ext:
        t = open(r'C:\Mirc\Python\Temp Files\text.xml', 'r')
        td = t.read()
    else:
        td = test_data

    api_key = 'test'
    api_secret =  'test'
else:
        KeyFile = open(KF_path, 'r')
        for line in KeyFile:
            # line = line [:-2]
            if 'api_key' in line.lower():
                api_key = line.strip("api_key = \'")[:-2]
            if 'api_secret' in line.lower():
                api_secret = line.strip("api_secret = \'")[:-2]
        KeyFile.close()

flickr = flickrapi.FlickrAPI(api_key, api_secret, format='rest')
api_key = api_secret = ""

uNSIDfile = '\Mirc\! dl files\Fav Test\Grp.ttxt'
Output_File = 'C:\Mirc\! dl files\Fav Test\GrpOut.ttxt'

InFile = open(uNSIDfile, 'r')
OutFile = open(Output_File, 'w')

for gid in InFile:
    gid = gid[:-1]

    if tst:
        Grp = objectify.fromstring(td)
    else:
        Grp = objectify.fromstring(flickr.groups_getInfo(group_id=gid))

    fErr = ''
    mn   = Grp.xpath(u'//group')[0].attrib
    res  = Grp.xpath(u'//restrictions')[0].attrib
    root = Grp.group

    gNSID   = gid
    gAlias  = ""
##### gName is here
    gName   = getattr(root, u"name", "")
    Images  = getattr(root, 'pool_count', (-1))
    Mbr     = getattr(root, "members", (-1))

    Sft     = int(res["safe_ok"]) + (int(res["moderate_ok"]) * 2) + \
                        (int(res["restricted_ok"]) * 4)
    Is_Mem  = int(mn["is_member"]) + (int(mn["is_moderator"]) * 2) + \
                        (int(mn["is_admin"]) * 4)
    O18     = True if Sft > 3 else False
    Priv    = getattr(root, "privacy", (-1))

#####  Error comes here  ############
    ttup = '\"{}\"\t\"{}\"\t\"{}\"\t'.format(gNSID, gAlias, gName)
    tup = '{0}{6}{1}{6}{2}{6}{3}{6}{4}{6}{5}\n'.format(ttup, Images, Mbr, Sft, O18,
                         Priv, "\t")

    OutFile = open(Output_File, mode='ab')
    OutFile.write(tup)
    OutFile.close()

InFile.close()
if tst and ext:
    t.close()

最佳答案

为什么不尝试以 utf8 格式写入您的文件

OutFile = open(Output_File,'ab', 'utf8')
OutFile.write(tup)
OutFile.close()

关于python - Unicode 和 lxml 对象化数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20364156/

相关文章:

python - 使用 virtualenv 和 pip 为 Python 2.7 安装 ipdb 时出错

python - 分割时间序列以从 Pandas 的数据框中查找年度最大值

c# - 来自 C# 的 javascript unicode 属性

iphone - Cocoa 中用于在 unicode 字符和虚拟键码之间转换的内置函数?

python - 将 Django Web 应用程序数据库连接到 Pythonanywhere 上的 postgresql

python - 在 numpy 中多次引用 bool 切片

python - 尝试用 Pandas 读取表时出现 IndexError

python - 计算每个字母在文本样本中出现的次数

python - python中错误安装的模块

c++ - C/C++ 中十六进制转义码的位数