python - 对非时间数据进行上采样

标签 python pandas

是否有更好的方法根据每行中包含的开始和停止条件对一系列 float 进行上采样?这是我开始的示例:

borehole   top  bottom  lithology
0     AP-2  94.6    95.1   dolomite
1     AP-2  95.1    96.7  limestone
2     AP-2  96.7    97.0   dolomite
3     AP-2  97.0    97.5  limestone
4     AP-2  97.5    97.8  limestone
5     AP-3  87.4    87.7  limestone
6     AP-3  87.7    88.1  limestone
7     AP-3  88.1    88.5   dolomite
8     AP-3  88.5    89.1  limestone

对于每一行,我想使用选定的增量在顶部值和底部值之间添加行,并向前填充钻孔和岩性值。数据帧中的增量是恒定的。这是我想要增量为 0.1 的填充数据框。 (这仅适用于上述数据框中的前两行:)

   borehole   Top       Lith
0      AP-2  94.6   dolomite
1      AP-2  94.7   dolomite
2      AP-2  94.8   dolomite
3      AP-2  94.9   dolomite
4      AP-2  95.0   dolomite
5      AP-2  95.1  limestone
6      AP-2  95.2  limestone
7      AP-2  95.3  limestone
8      AP-2  95.4  limestone
9      AP-2  95.5  limestone
10     AP-2  95.6  limestone
11     AP-2  95.7  limestone
12     AP-2  95.8  limestone
13     AP-2  95.9  limestone
14     AP-2  96.0  limestone
15     AP-2  96.1  limestone
16     AP-2  96.2  limestone
17     AP-2  96.3  limestone
18     AP-2  96.4  limestone
19     AP-2  96.5  limestone
20     AP-2  96.6  limestone
21     AP-2  96.7  limestone

这是我使用过的代码,它可以工作,但是当我在 pandas 中进行循环时,我想知道我是否遗漏了一些明显的东西。 pd.DataFrame.resample() 很诱人,但我不知道如何让它处理非时间数据。

import pandas as pd
import numpy as np

liths = pd.DataFrame(
{'borehole': {0: 'AP-2',
  1: 'AP-2',
  2: 'AP-2',
  3: 'AP-2',
  4: 'AP-2',
  5: 'AP-3',
  6: 'AP-3',
  7: 'AP-3',
  8: 'AP-3'},
 'bottom': {0: 95.099999999999994,
  1: 96.700000000000003,
  2: 97.0,
  3: 97.5,
  4: 97.799999999999997,
  5: 87.700000000000003,
  6: 88.099999999999994,
  7: 88.5,
  8: 89.099999999999994},
 'lithology': {0: 'dolomite',
  1: 'limestone',
  2: 'dolomite',
  3: 'limestone',
  4: 'limestone',
  5: 'limestone',
  6: 'limestone',
  7: 'dolomite',
  8: 'limestone'},
 'top': {0: 94.599999999999994,
  1: 95.099999999999994,
  2: 96.700000000000003,
  3: 97.0,
  4: 97.5,
  5: 87.400000000000006,
  6: 87.700000000000003,
  7: 88.099999999999994,
  8: 88.5}}
)

filled = []
increment = 0.1
for row in liths.itertuples():
    start = row.top
    end = row.bottom
    for i in np.arange(start, end, increment):
        filled.append([row.borehole, i, row.lithology])
filled = pd.DataFrame(filled, columns=['borehole', 'Top', 'Lith']); filled

最佳答案

我将从每一行构造一个新的数据框并将它们全部连接起来

def expand(r):
    a = np.arange(r.top, r.bottom, .1)
    n = len(a)
    return pd.DataFrame(dict(
            borehole=[r.borehole] * n,
            lithology=[r.lithology] * n,
            top=a
        ))

pd.concat([expand(r) for r in df.itertuples()], ignore_index=True)
<小时/>
   borehole  lithology   top
0      AP-2   dolomite  94.6
1      AP-2   dolomite  94.7
2      AP-2   dolomite  94.8
3      AP-2   dolomite  94.9
4      AP-2   dolomite  95.0
5      AP-2  limestone  95.1
6      AP-2  limestone  95.2
7      AP-2  limestone  95.3
8      AP-2  limestone  95.4
9      AP-2  limestone  95.5
10     AP-2  limestone  95.6
11     AP-2  limestone  95.7
12     AP-2  limestone  95.8
13     AP-2  limestone  95.9
14     AP-2  limestone  96.0
15     AP-2  limestone  96.1
16     AP-2  limestone  96.2
17     AP-2  limestone  96.3
18     AP-2  limestone  96.4
19     AP-2  limestone  96.5
20     AP-2  limestone  96.6
21     AP-2  limestone  96.7
22     AP-2   dolomite  96.7
23     AP-2   dolomite  96.8
24     AP-2   dolomite  96.9
25     AP-2  limestone  97.0
26     AP-2  limestone  97.1
27     AP-2  limestone  97.2
28     AP-2  limestone  97.3
29     AP-2  limestone  97.4
30     AP-2  limestone  97.5
31     AP-2  limestone  97.6
32     AP-2  limestone  97.7
33     AP-3  limestone  87.4
34     AP-3  limestone  87.5
35     AP-3  limestone  87.6
36     AP-3  limestone  87.7
37     AP-3  limestone  87.8
38     AP-3  limestone  87.9
39     AP-3  limestone  88.0
40     AP-3   dolomite  88.1
41     AP-3   dolomite  88.2
42     AP-3   dolomite  88.3
43     AP-3   dolomite  88.4
44     AP-3   dolomite  88.5
45     AP-3  limestone  88.5
46     AP-3  limestone  88.6
47     AP-3  limestone  88.7
48     AP-3  limestone  88.8
49     AP-3  limestone  88.9
50     AP-3  limestone  89.0

关于python - 对非时间数据进行上采样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43049355/

相关文章:

python - 在Linux上使用python接收多播UDP数据报

python - 使用Python SDK导出数据

python - 将类别值分配给 pandas 列中的所有行

python - 从 Pandas 回归中获取要绘制的回归线

python - 从网站抓取数据到 Pandas 数据框

python - 在持久化之前从 SQLAlchemy session 中删除对象

python - 如何使用将服务器链接到在暴露的本地主机上运行的 flask 应用程序

Python:根据另一列内容的第一个字符填充列

Python:cmd 在提示和空行时执行最后一条命令

python - Python 中的样本数量不一致错误