python - 在 Pandas 数据帧之间寻找最接近的值

我目前正在尝试生成一些关键数据的四分位图。

我的四分位数在 Pandas DataFrame 中，如下所示:

                 0,05    0,1   0,25   0,33  
IndicatorName
indicator 1      10653  10512  10096   9857
indicator 2      2,85   2,87   3,01   3,11
indicator 3      1,66   1,75   1,84    1,9
indicator 4      13,01  11,78   8,55   7,64

这是来自几百个用户的四分位数映射然后我查询我的 sql 数据库并获取一个用户的值并将其加载到 DF 中

                value
IndicatorName
indicator1      9917.00
indicator2      3.10
indicator3      1.86
indicator4      13.74

我现在想做的是在我的第二个 DF 中创建一个新列，其中包含该值位于哪个四分位数的指示(最接近的匹配值):

                value     quartile
IndicatorName
indicator1      9917.00   0,33
indicator2      3.10      0,33
indicator3      1.86      0,25
indicator4      13.74     0,05

您将如何比较这样的数据帧？

最佳答案

零步是将 , 替换为 . in df1 并转换为 float。

df1 = df1.replace(',','.', regex=True).astype(float)

或者:

df1 = pd.read_csv(file, decimal=',')

还有必要的匹配索引，所以如果唯一的区别是空格，请将其删除:

df1.index = df1.index.str.replace('\s+','')

然后用 sub 减去 value 列, 得到 abs值并通过 DataFrame.idxmin 查找最小值列:

df2['quartile'] = df1.sub(df2['value'],axis=0).abs().idxmin(axis=1)
print (df2)
                 value quartile
IndicatorName                  
indicator1     9917.00     0,33
indicator2        3.10     0,33
indicator3        1.86     0,25
indicator4       13.74     0,05

详细信息:

print (df1.sub(df2['value'],axis=0))
                 0,05     0,1    0,25   0,33
IndicatorName                               
indicator1     736.00  595.00  179.00 -60.00
indicator2      -0.25   -0.23   -0.09   0.01
indicator3      -0.20   -0.11   -0.02   0.04
indicator4      -0.73   -1.96   -5.19  -6.10

print (df1.sub(df2['value'],axis=0).abs())
                 0,05     0,1    0,25   0,33
IndicatorName                               
indicator1     736.00  595.00  179.00  60.00
indicator2       0.25    0.23    0.09   0.01
indicator3       0.20    0.11    0.02   0.04
indicator4       0.73    1.96    5.19   6.10

关于python - 在 Pandas 数据帧之间寻找最接近的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47090233/

python - 在 Pandas 数据帧之间寻找最接近的值

上一篇：python - 识别 Pandas 群体环境的变化

下一篇：python - Selenium get_attribute ("id")