使用 unicode - string data(dtype object) 迭代数据框中的列时出现以下错误:
in text_pre_processing(text)
2 # removing punctuation
3 #text = text1(r'\n',' ', regex=True)
----> 4 text1 = [char for char in text if char not in string.punctuation]
5 text1 = ''.join(text1)
**TypeError: 'float' object is not iterable**
使用的函数
def text_pre_processing(text):
# removing punctuation
#text1 = text1(r'\n',' ', regex=True)
text1 = [char for char in str(text) if char not in string.punctuation]
text1 = ''.join(text1)
# removing all the stop words from corpus
#return text.split()
return[word for word in text1.split() if word not in stopwords.words('english')]
我试图查看输入到函数中的列是否有任何浮点值(仅包含浮点值的句子),但未能这样做,因为“pandas”将阿尔法数字和阿尔法值视为数据类型“对象”,显式类型类型转换不起作用。
有人知道出了什么问题吗?
我将此函数用作 naivebayes 算法分析器的一部分。
数据: 第 1 列是索引
Column2
this is a good movie...#
this is a bad movie $....
this #movie was good ;) but some scenes were exaggerating
预期输出:
[this, good, movie]
[this, bad, movie ]
[this, movie, good, some, scenes, were, exaggerating]
最佳答案
您需要将 float 转换为字符串:
>>> str(3.14159)
'3.14159'
关于python - “float”类型错误 Python、pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47464939/