python - 如何使用python(nltk)匹配段落中的关键字

标签 python machine-learning nltk

关键字:

Keywords={u'secondary': [u'sales growth', u'next generation store', u'Steps Down', u' Profit warning', u'Store Of The Future', u'groceries']}

段落:

paragraph="""HOUSTON -- Target has unveiled its first "next generation" store in the Houston area, part of a multibillion-dollar effort to reimagine more than 1,000 stores nationwide to compete with e-commerce giants.

The 124,000-square-foot store, which opened earlier this week at Aliana market center in Richmond, Texas, has two distinct entrances and aims to appeal to consumers on both ends of the shopping spectrum.

Busy families seeking convenience can enter the "ease" side of the store, which offers a supermarket-style experience. Customers can pick up online orders, both in store and curbside, and buy grab-and-go items like groceries, wine, last-minute gifts, cleaning supplies and prepared meals."""

有没有办法匹配段落中的关键字?(不使用正则表达式)

输出:

匹配的关键字:下一代商店、杂货

最佳答案

无需为此使用 NLTK。首先,您必须清理段落中的文本,或者更改“辅助键”列表中的值。 ““下一代”商店”和“下一代商店”是两个不同的东西。

此后,您可以迭代“secondary”的值,并检查文本中是否存在这些字符串。

match = [i for i in Keywords['secondary'] if i in paragraph]

编辑:正如我在上面指定的,“下一代”商店和“下一代商店”是两个不同的东西,这就是你只得到 1 个匹配的原因。如果您有“下一代商店”和“下一代商店”,您将获得两个匹配项 - 因为实际上有两个匹配项。

输入:

paragraph="""HOUSTON -- Target has unveiled its first "next generation" store in the Houston area, part of a multibillion-dollar effort to reimagine more than 1,000 stores nationwide to compete with e-commerce giants.

The 124,000-square-foot store, which opened earlier this week at Aliana market center in Richmond, Texas, has two distinct entrances and aims to appeal to consumers on both ends of the shopping spectrum.

Busy families seeking convenience can enter the "ease" side of the store, which offers a supermarket-style experience. Customers can pick up online orders, both in store and curbside, and buy grab-and-go items like groceries, wine, last-minute gifts, cleaning supplies and prepared meals."""

输出:

['groceries']

输入:

paragraph="""HOUSTON -- Target has unveiled its first next generation store in the Houston area, part of a multibillion-dollar effort to reimagine more than 1,000 stores nationwide to compete with e-commerce giants.

The 124,000-square-foot store, which opened earlier this week at Aliana market center in Richmond, Texas, has two distinct entrances and aims to appeal to consumers on both ends of the shopping spectrum.

Busy families seeking convenience can enter the "ease" side of the store, which offers a supermarket-style experience. Customers can pick up online orders, both in store and curbside, and buy grab-and-go items like groceries, wine, last-minute gifts, cleaning supplies and prepared meals."""

输出:

['next generation store','groceries']

关于python - 如何使用python(nltk)匹配段落中的关键字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47992091/

相关文章:

python - SQlite:保存/注册子查询以供重复使用

machine-learning - 如何同时使用交叉验证和提前停止?

python - TF-IDF 简单使用 - NLTK/Scikit Learn

python - 亚洲语言情感分析的代码示例 - Python NLTK

python - 为什么我会收到此错误 : AttributeError: 'LocalStack' object has no attribute '__ident_func__' in SQLAlchemy

python分配变量困惑

python - 模型验证器谷歌应用引擎 - BadValueError

python - 为什么神经网络在自己的训练数据上预测错误?

python - 随机森林袋内和节点尺寸

Python NLTK 莎士比亚语料库