python - 有人可以解释 BigramAssocMeasures.chi_sq 的语法吗？

我正在使用 NLTK 的 BigramAssocMeasures.chi_sq 来查找不同类别中的单词提供的信息内容。但是我无法弄清楚如何向此函数提供数据。

NLTK 的定义说 """使用卡方对二元组进行评分，即 phi-sq 乘以二元组的数量，如 Manning 和 Schutze 5.3.3 中所示。 ”“” 返回 n_xx * cls.phi_sq(n_ii, (n_ix, n_xi), n_xx)

n_ii、(n_ix、n_xi)、n_xx 代表什么？

最佳答案

我找到了以下解释来源:

第一个来源解释了该主题及其在情感分析中的应用以及 python 代码。第二个源提供了更多代码示例。第三个来源包含您想要的解释:

The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word w in question, while x indicates the appearance of any word. Thus, for example::
n_ii counts (w1, w2), i.e. the bigram being scored
n_ix counts (w1, *)
n_xi counts (*, w2)
n_xx counts (*, *), i.e. any bigram
This may be shown with respect to a contingency table::
        w1    ~w1
     ------ ------
 w2 | n_ii | n_oi | = n_xi
     ------ ------
~w2 | n_io | n_oo |
     ------ ------
     = n_ix        TOTAL = n_xx

我希望这项研究有所帮助。

关于python - 有人可以解释 BigramAssocMeasures.chi_sq 的语法吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32549376/