python - 为什么我无法实现字符串的解码功能？

标签 python string nltk decode tokenize

我正在研究数据集并重新运行我同事的代码。当标记文本数据时，下面显示的代码在我的 MacBook 上不起作用，但是在我同事的计算机上运行良好。以下是代码。

我不知道他有哪个版本，但我的是python3.6。是版本不同的问题吗？

s=title+' '+author+' '+text
 tokens=word_tokenize(s.decode('ascii','ignore').lower())

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-e50403f82604> in <module>
     10         flushPrint(m/100)#208
     11     s=title+' '+author+' '+text
---> 12     tokens=word_tokenize(s.decode('ascii','ignore').lower())
     13     tokens = [z for z in tokens if not z in stopset and len(z)>1]
     14     k=[]

AttributeError: 'str' object has no attribute 'decode'

最佳答案

该问题很可能是由于 python2 和 python3 之间的更改造成的

在 python2 中

'' 的类型为 str，因此支持 ''.decode()
u'' 的类型为 unicode，因此支持 u''.encode()

在 python3 中这是相反的

'' 的类型为 unicode，因此支持 ''.encode()
u'' 的类型为 byte，因此支持 u''.decode()

因此，在您的情况下，根据变量的类型，您可能需要执行类似的操作

s = title + b' ' + author + b' ' + text

只求助于 python 2 :)

关于python - 为什么我无法实现字符串的解码功能？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54256153/

上一篇：python - 从数据框中删除列以仅显示所需的列

下一篇：python - 如何查找Python数据框中单元格中第一次出现匹配的行索引(包含日期)

相关文章：

java - "?="在正则表达式中做什么？

machine-learning - 使用句子上下文的命名实体识别

python - 使用senti_classifier和NLTK进行情感分析

python - 设置 Mechanize 以接受 cookie

python - 替换非字母数字字符，除了一些异常(exception) python

python - matplotlib.pyplot 如何在同一图中命名不同的线？

string - 什么是就地算法？

regex - 在 bash 中修剪变量

python - 在将字符串转换为列表后，如何从我的 nltk token 中删除 '\n'，或阻止它首先出现？

python - 如何检查是否包含多重集？

©2024 IT工具网联系我们