r - 分组并找到最接近的数字

数据在页面底部提供。我有 2 个数据框 df1 和 df2。

df1:
ticker   Price
<chr>    <dbl>
SPY      200.00
AAPL     100.00

df2:
ticker  expiration   strike
<chr>    <dbl>       <dbl>
SPY      0621         180
SPY      0621         205
SPY      0719         180
SPY      0719         205
AAPL     0621          75
AAPL     0621         105
AAPL     0719          75
AAPL     0719         105

两个数据框都有股票数据并共享“股票代码”列。我想将 df2 按 2 列分组，然后找到最接近 df1 中价格列的行权价。

输出看起来像这样。

df3 = df2 %>% group_by(ticker, expiration)%>% #which[abs(df1$Price - df2$strike) is closest to 0]

output:
ticker   expiration  strike
<chr>     <dbl>       <dbl>
SPY       0621         205
SPY       0719         205
AAPL      0621         105
AAPL      0719         105

这是 df1

structure(list(ticker = structure(2:1, .Label = c("AAPL", "SPY"
), class = "factor"), Price = c(200, 100)), class = "data.frame", row.names = c(NA, 
-2L))

这是 df2

structure(list(ticker = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L), .Label = c("AAPL", "SPY"), class = "factor"), expiration = c(621, 
621, 719, 719, 621, 621, 719, 719), strike = c(180, 205, 180, 
205, 75, 100, 75, 100)), class = "data.frame", row.names = c(NA, 
-8L))

我对 @akrun data.table 的答案感兴趣。但是我没有得到完整的期望输出。 SPY 的 0719 丢失。

library(data.table)
setDT(df2)[, Price := strike][df1, on = .(ticker, Price), roll = -Inf]
ticker expiration strike Price
1:    SPY        621    205   200
2:   AAPL        621    100   100
3:   AAPL        719    100   100

最佳答案

在与第二个数据集中“expiration”的unique元素创建组合后，我们可以使用滚动连接

library(data.table)
library(tidyr)
df1N <- crossing(df1, expiration = unique(df2$expiration))
setDT(df2)[, Price := strike][df1N, on = .(ticker, expiration, Price), roll = -Inf]
#    ticker expiration strike Price
#1:    SPY        621    205   200
#2:    SPY        719    205   200
#3:   AAPL        621    100   100
#4:   AAPL        719    100   100

或者执行full_join，然后根据“价格”之间的minimum abs绝对差异进行切片按“股票代码”、“到期时间”分组后的“行权价”列

library(dplyr)
full_join(df1, df2) %>% 
    group_by(ticker, expiration) %>% 
    slice(which.min(abs(Price - strike)))
# A tibble: 4 x 4
# Groups:   ticker, expiration [4]
#  ticker Price expiration strike
#  <fct>  <dbl>      <dbl>  <dbl>
#1 AAPL     100        621    100
#2 AAPL     100        719    100
#3 SPY      200        621    205
#4 SPY      200        719    205

关于r - 分组并找到最接近的数字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56454754/

r - 分组并找到最接近的数字

上一篇：list - 将现有列表写入 Prolog 中的新列表

下一篇：r - 修改列表的非递归版本？