python - 当binary = False时,如何从树结构中提取诸如PER、ORG、GPE之类的命名实体?

标签 python machine-learning nlp nltk stanford-nlp

我是 nltk 新手,并尝试从以下代码中提取 PERSON、ORGANIZATION、GPE:

for i in tokcomp:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=False)
print(namedEnt)

我得到的输出是:

(S
  Our/PRP$
  direct/JJ
  competitors/NNS
  include/VBP
  ,/,
  among/IN
  others/NNS
  ,/,
  (PERSON Accenture/NNP)
  ,/,
  (GPE Capgemini/NNP)
  ,/,
  (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
  ,/,
  (GPE Genpact/NNP)
  ,/,
  (ORGANIZATION HCL/NNP Technologies/NNPS)
  ,/,
  (ORGANIZATION HP/NNP Enterprise/NNP)
  ,/,
  (ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
  ,/,
  (ORGANIZATION Infosys/NNP Technologies/NNPS)
  ,/,
  (PERSON Tata/NNP Consultancy/NNP Services/NNPS)
  and/CC
  (PERSON Wipro/NNP)
  ./.)
(S
  These/DT
  markets/NNS
  also/RB
  include/VBP
  numerous/JJ
  smaller/JJR
  local/JJ
  competitors/NNS
  in/IN
  the/DT
  various/JJ
  geographic/JJ
  markets/NNS
  in/IN
  which/WDT
  we/PRP
  operate/VBP
  which/WDT
  may/MD
  be/VB
  able/JJ
  to/TO
  provide/VB
  services/NNS
  and/CC
  solutions/NNS
  at/IN
  lower/JJR
  costs/NNS
  or/CC
  on/IN
  terms/NNS
  more/RBR
  attractive/JJ
  to/TO
  clients/NNS
  than/IN
  we/PRP
  can/MD
  ./.)
(S
  Our/PRP$
  direct/JJ
  competitors/NNS
  include/VBP
  ,/,
  among/IN
  others/NNS
  ,/,
  (PERSON Accenture/NNP)
  ,/,
  (GPE Capgemini/NNP)
  ,/,
  (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
  ,/,
  (GPE Genpact/NNP)
  ,/,
  (ORGANIZATION HCL/NNP Technologies/NNPS)
  ,/,
  (ORGANIZATION HP/NNP Enterprise/NNP)
  ,/,
  (ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
  ,/,
  (ORGANIZATION Infosys/NNP Technologies/NNPS)
  ,/,
  (PERSON Tata/NNP Consultancy/NNP Services/NNPS)
  and/CC
  (PERSON Wipro/NNP)
  ./.)
(S
  The/DT
  rates/NNS
  we/PRP
  are/VBP
  able/JJ
  to/TO
  recover/VB
  for/IN
  our/PRP$
  services/NNS
  are/VBP
  affected/VBN
  by/IN
  a/DT
  number/NN
  of/IN
  factors/NNS
  ,/,
  including/VBG
  :/:
  •/VB
  our/PRP$
  clients’/JJ
  perceptions/NNS
  of/IN
  our/PRP$
  ability/NN
  to/TO
  add/VB
  value/NN
  through/IN
  our/PRP$
  services/NNS
  ;/:
  •/NNP
  introduction/NN
  of/IN
  new/JJ
  services/NNS
  or/CC
  products/NNS
  by/IN
  us/PRP
  or/CC
  our/PRP$
  competitors/NNS
  ;/:
  •/VB
  our/PRP$
  competitors’/NN
  pricing/NN
  policies/NNS
  ;/:
  •/VB
  our/PRP$
  ability/NN
  to/TO
  accurately/RB
  estimate/VB
  ,/,
  attain/NN
  and/CC
  sustain/NN
  contract/NN
  revenues/NNS
  ,/,
  margins/NNS
  and/CC
  cash/NN
  flows/NNS
  over/IN
  increasingly/RB
  longer/JJR
  contract/NN
  periods/NNS
  ;/:
  •/NNP
  bid/NN
  practices/NNS
  of/IN
  clients/NNS
  and/CC
  their/PRP$
  use/NN
  of/IN
  third-party/JJ
  advisors/NNS
  ;/:
  •/VB
  the/DT
  use/NN
  by/IN
  our/PRP$
  competitors/NNS
  and/CC
  our/PRP$
  clients/NNS
  of/IN
  offshore/JJ
  resources/NNS
  to/TO
  provide/VB
  lower-cost/JJ
  service/NN
  delivery/NN
  capabilities/NNS
  ;/:
  •/VB
  our/PRP$
  ability/NN
  to/TO
  charge/VB
  premium/NN
  prices/NNS
  when/WRB
  justified/VBN
  by/IN
  market/NN
  demand/NN
  or/CC
  the/DT
  type/NN
  of/IN
  service/NN
  ;/:
  and/CC
  •/VB
  general/JJ
  economic/JJ
  and/CC
  political/JJ
  conditions/NNS
  ./.)
(S
  For/IN
  our/PRP$
  internal/JJ
  management/NN
  reporting/NN
  and/CC
  budgeting/NN
  purposes/NNS
  ,/,
  we/PRP
  use/VBP
  non-GAAP/JJ
  financial/JJ
  information/NN
  that/WDT
  does/VBZ
  not/RB
  include/VB
  stock-based/JJ
  compensation/NN
  expense/NN
  ,/,
  acquisition-related/JJ
  charges/NNS
  and/CC
  net/JJ
  non-operating/JJ
  foreign/JJ
  currency/NN
  exchange/NN
  gains/NNS
  or/CC
  losses/NNS
  for/IN
  financial/JJ
  and/CC
  operational/JJ
  decision/NN
  making/NN
  ,/,
  to/TO
  evaluate/VB
  period-to-period/JJ
  comparisons/NNS
  and/CC
  for/IN
  making/VBG
  comparisons/NNS
  of/IN
  our/PRP$
  operating/NN
  results/NNS
  to/TO
  those/DT
  of/IN
  our/PRP$
  competitors/NNS
  ./.)

我浏览了很多链接,但没有找到适合我目的的方法来提取标记为“个人”、“组织”和“GPE”的公司。

如果提供任何链接来了解有关提取除 nltk 网站之外的命名实体的更多信息,我们将非常感激。

最佳答案

应用了此 link 中的代码并能够从上面的结果中得到命名实体。使用nltk.ne_chunk_sents()函数代替nltk.ne_chunk。

关于python - 当binary = False时,如何从树结构中提取诸如PER、ORG、GPE之类的命名实体?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42012960/

相关文章:

python - 用pygame在矩形上方绘制Ascii字符

python - 当我希望它继续时,我的嵌套 if 语句似乎正在结束我的 for 循环

arrays - 不知道批量大小的 3-D 批量矩阵乘法

python - 在 python 中找到与目标短语相关的周围 ADJ 的任何有效方法?

python - 如何使用 ndarray/模型预测替换列中的缺失值

python - 使用字典审查文本字符串并用 X 替换单词。 Python

machine-learning - ConvNets 上的感受野(感受野大小困惑)

machine-learning - 强化学习中相同状态的不同奖励

nlp - 什么时候n语法(n> 3)相对于二元语法或三元语法重要?

python - Scikit Learn 中的 CountVectorizer