python - 使用 scipy 和 groupby 计算 Kendall 的 tau

标签 python pandas dataframe scipy statistics

我有一个 csv 文件，其中包含每年和每个气象站的降水数据。它看起来像这样:

station_id    year       Sum
 210018      1916      65.024
 210018      1917      35.941
 210018      1918      28.448
 210018      1919      68.58
 210018      1920      31.115
 215400      1916      44.958
 215400      1917      31.496
 215400      1918      38.989
 215400      1919      74.93
 215400      1920      53.5432

我想根据唯一的站点 ID 返回 Kendall 的 tau 相关性和 p 值。因此，对于上面的内容，我想要站点 ID 210018 和 215400 的总和与年份之间的相关性。

station_id 210018 的相关性将为 -.20，p 值为 .62，而 station_id 215400 的相关性将为 .40，p 值为 .33。

我正在尝试使用这个:

grouped=df.groupby(['station_id'])
grouped.aggregate([tau, p_value=sp.stats.kendalltau(df.year, df.Sum)])

返回的错误是p_value后面等号的语法错误。

如有任何帮助，我们将不胜感激。

最佳答案

一种计算方法是在 groupby 对象上使用 apply:

>>> import scipy.stats as st
>>> df.groupby(['station_id']).apply(lambda x: st.kendalltau(x['year'], x['Sum']))
station_id
210018        (-0.2, 0.62420612399)
215400        (0.4, 0.327186890661)
dtype: object

关于python - 使用 scipy 和 groupby 计算 Kendall 的 tau，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28974425/

上一篇：python - 根据长度重复列表列表的第一个元素

下一篇：python - 我收到 RuntimeError : dictionary changed size during iteration. Python

相关文章：

python - pandasql::sqldf 不捕获循环变量

python - 将大型数据框拆分为较小的 Pandas 数据框列表

python - Pandas 使用 Series 过滤 DataFrame

python - 为什么连接两个数据帧时样本大小不同？

python - 如何在 python 中填充 pandas 数据框列中的剩余数值并作为索引？

python 类方法，接收其类的对象

python - 在 Python 中解析 OCR 响应中的日期

python - 如何计算年龄是否在出生年份范围内，同时从 Django ORM 中的 Db 获取出生年份

python - 如何在 Pandas DataFrame 中对列表中的项目进行 "and"操作

python - 如何在 python 中一次读取多个 csv 文件？