Python 提取文件中的单词

我想在文件中进行词干提取。当我在终端中使用它时，它工作正常，但是当我在文本文件中应用它时，它不起作用。终端代码:

print PorterStemmer().stem_word('complications')

函数代码:

def stemming_text_1():
    with open('test.txt', 'r') as f:
        text = f.read()
        print text
        singles = []    

        stemmer = PorterStemmer() #problem from HERE
        for plural in text:
            singles.append(stemmer.stem(plural))
        print singles

输入test.txt

126211 crashes bookmarks runs error logged debug core bookmarks
126262 manual change crashes bookmarks propagated ion view bookmarks

期望/预期输出

126211 crash bookmark runs error logged debug core bookmark
126262 manual change crash bookmark propagated ion view bookmark

任何建议将不胜感激，谢谢:)

最佳答案

您需要将文本拆分为单词，词干分析器才能工作。目前，变量 text 将整个文件包含为一个大字符串。循环 forplural in text: 将 text 中的每个字符分配给 plural。

尝试使用在text.split()中使用复数:。

[编辑]要获得所需格式的输出，您需要逐行读取文件，而不是一次读取全部内容:

def stemming_text_1():
    with open('test.txt', 'r') as f:
        for line in f:
            print line
            singles = []

            stemmer = PorterStemmer() #problem from HERE
            for plural in line.split():
                singles.append(stemmer.stem(plural))
            print ' '.join(singles)

关于Python 提取文件中的单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16835372/

Python 提取文件中的单词

上一篇：python - 使用 Python 编程 Google CardDAV

下一篇：python - openpyxl 行迭代器忽略 row_offset 参数？