r - K-Skip-N-Gram : generalization of for-loops in R

标签 r for-loop switch-statement n-gram

我有一个 R 函数来生成 K-Skip-N-Grams :
我的完整功能可以在 github 找到.

我的代码确实正确生成了所需的 k-skip-ngram:

> kSkipNgram("Lorem ipsum dolor sit amet, consectetur adipiscing elit.", n=2, skip=1)
 [1] "Lorem dolor"            "Lorem ipsum"            "ipsum sit"             
 [4] "ipsum dolor"            "dolor amet"             "dolor sit"             
 [7] "sit consectetur"        "sit amet"               "amet adipiscing"       
[10] "amet consectetur"       "consectetur elit"       "consectetur adipiscing"
[13] "adipiscing elit"       

但我想概括/简化以下嵌套 for 循环的 switch 语句:
# x - should be text, sentense
# n - n-gramm
# skip - number of skips
###################################
  switch(as.character(n),
         "0" = {ngram<-c(ngram, paste(x[i]))},
         "1" = {for(j in skip:1)
                  {
                    if (i+j <= length(x)) 
                      {ngram<-c(ngram, paste(x[i],x[i+j]))}
                  }
                },
         "2" = {for(j in skip:1)
                  {for (k in skip:1)
                    {
                      if (i+j <= length(x) && i+j+k <= length(x)) 
                        {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k]))}
                    }
                  }
                },
         "3" = {for(j in skip:1)
                  {for (k in skip:1)
                    {for (l in skip:1)
                      {
                      if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x)) 
                          {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l]))}
                      }
                    }
                  }
                },
         "4" = {for(j in skip:1)
                  {for (k in skip:1)
                      {for (l in skip:1)
                        {for (m in skip:1)
                            {
                            if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x) && i+j+k+l+m <= length(x)) 
                                  {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l],x[i+j+k+l+m]))}
                            }
                        }
                      }
                    }
                  }
        )
  }
}

最佳答案

我对一般的 k-skip-n-gram 使用了递归解决方案。我已经将它包含在 Python 中;我对 R 没有经验,但希望你能翻译它。我使用了这篇论文中的定义:
http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf

如果你打算在长句子上使用它,这可能应该用一些动态编程来优化,因为它目前有很多冗余计算(重复计算子语法)。我也没有彻底测试过,可能会有极端情况。

def kskipngrams(sentence,k,n):
    "Assumes the sentence is already tokenized into a list"
    if n == 0 or len(sentence) == 0:
        return None
    grams = []
    for i in range(len(sentence)-n+1):
        grams.extend(initial_kskipngrams(sentence[i:],k,n))
    return grams

def initial_kskipngrams(sentence,k,n):
    if n == 1:
        return [[sentence[0]]]
    grams = []
    for j in range(min(k+1,len(sentence)-1)):
        kmjskipnm1grams = initial_kskipngrams(sentence[j+1:],k-j,n-1)
        if kmjskipnm1grams is not None:
            for gram in kmjskipnm1grams:
                grams.append([sentence[0]]+gram)
    return grams

关于r - K-Skip-N-Gram : generalization of for-loops in R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18259128/

相关文章:

c++ - 添加两个不同大小的 vector : Optimal Run-time and Optimal Memory Management

python - 如何添加到 python 中的奇数索引值?

c++ - 初级 C++ : Why is this switch statement giving me an error?

python - 是否可以在 R shiny 中运行 python 脚本

r - 在 R : NSE programing error in the tidyverse 中的函数中创建和使用新变量

r - 为什么行函数会关闭 R 中的路径?

R:列和/或行上的多索引

c - C 中的位和数组 - 编程新手

java - 减少重复的 if 语句

c# - 我可以将 switch 语句与 string.Contains() 结合使用吗?