python - 从python3中的文件中读取字节字符串

文件内容如下，文件编码为utf-8:

cd232704-a46f-3d9d-97f6-67edb897d65f    b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.'

这是我的代码:

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        print(tokens[1])

我想得到正确的答案 - “这个星期五，Gerda Scheuers 会很兴奋 - 但她最兴奋的是电影将带来的商品。”

print(b'\xe2\x80\x94'.decode('utf-8')) #convert into ASCII

但我无法从文件中读取字节。如果我打开一个包含字节的文件，我需要解码该行以拆分它。

最佳答案

您可以使用 ast.literal_eval将字节文字转换为字节:

然后，解码得到字符串对象:

>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'")
b'excited \xe2\x80\x94 but she\xe2\x80\x99s'
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8')
'excited — but she’s'

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        # if len(tokens) < 2:
        #    continue
        bytes_part = ast.literal_eval(tokens[1])
        s = bytes_part.decode('utf-8')  # Decode the bytes to convert to a string

关于python - 从python3中的文件中读取字节字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43337544/

上一篇：python - 替换Dataframe的某个索引值

下一篇：python - Pandas 追加不起作用；总是返回空数据框

相关文章：

python - (ArcGIS) 为 Arcpy 创建新函数

java - 如何组织辅助函数

python-3.x - 我似乎无法让 google.cloud.texttospeech 工作

python - 在同一语句python中打印字典和字符串对象

python googlemaps不同位置之间所有可能的距离

python - 将 html 输入标签保存到 django 模型

python - Python 的 property.getter 的用途是什么？

python - 在 Python 中查找两个列表/数组中最近的项目

ios - UILabel 中的 Youtube/Instagram 评论风格

string - 我可以在 bash/fishshell 中使用任何字符串操作命令/库吗？