r - R : Spanish language not working 中的斯坦福 CoreNLP

我开始在 R 中使用 Stanford CoreNLP 包，以便用西类牙语进行一些文本分析。所以，我尝试以下操作:

R

R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages("coreNLP")
Installing package into ‘/home/ach/R/x86_64-pc-linux-gnu-library/3.2’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
trying URL 'https://cran.rediris.es/src/contrib/coreNLP_0.4-1.tar.gz'
Content type 'application/x-gzip' length 17392 bytes (16 KB)
==================================================
downloaded 16 KB

* installing *source* package ‘coreNLP’ ...
** package ‘coreNLP’ successfully unpacked and MD5 sums checked
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (coreNLP)

The downloaded source packages are in
    ‘/tmp/RtmpO3q77z/downloaded_packages’
> library(coreNLP)
> downloadCoreNLP(type="base")
trying URL 'http://nlp.stanford.edu/software//stanford-corenlp-full-2015-04-20.zip'
Content type 'application/zip' length 360824440 bytes (344.1 MB)
==================================================
downloaded 344.1 MB

[1] 0
> 
> downloadCoreNLP(type="spanish")
trying URL 'http://nlp.stanford.edu/software//stanford-spanish-corenlp-2015-01-08-models.jar'
Content type 'application/x-java-archive' length 25007256 bytes (23.8 MB)
==================================================
downloaded 23.8 MB

> initCoreNLP()
Searching for resource: config.properties
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [3.5 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.2 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.3 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].
Adding annotator dcoref
Adding annotator sentiment
> > sInes <- "Hola padre. Acabo de llegar a casa. Tengo ganas de cenar"
> annotation <- annotateString(sInes)
> token <- getToken(annotation)
> token[token$sentence==2,c(1:4,7)]
  sentence id  token  lemma POS
4        2  1  Acabo  Acabo NNP
5        2  2     de     de NNP
6        2  3 llegar llegar NNP
7        2  4      a      a  DT
8        2  5   casa   casa  FW
9        2  6      .      .   .

似乎一切正常(据我所知，看不到任何错误)，但它不起作用。例如，“casa”被标记为不正确的外来词 (FW)。

那么，有人对此有任何想法吗？

非常感谢

奥古斯丁

最佳答案

您不仅需要下载西类牙语，还需要将分词器设置为西类牙语:

props.setProperty("tokenize.language", "es");

关于r - R : Spanish language not working 中的斯坦福 CoreNLP，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37884613/

r - R : Spanish language not working 中的斯坦福 CoreNLP

上一篇：jsTree 节点展开/折叠

下一篇：notifications - 未调用 didReceiveRemoteNotification