python-2.7 - 将两个数据框列中的数据合并为一列

标签 python-2.7 pandas

我在两个单独的 DataFrame 列中有时间序列数据,它们引用相同的参数但长度不同。

在数据仅存在于一列中的日期,我希望将此值放置在我的新列中。在两列都有条目的日期,我想要平均值。 (我想使用索引加入,它是一个日期时间值)

有人可以建议一种方法来合并我的两列吗?谢谢。

Edit2:我编写了一些代码,应该合并两个列中的数据,但是当我尝试使用从第一个 df 所在行生成的索引来设置新值时,出现 KeyError有值,但我的第二个 df 没有。代码如下:

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)

这是错误:

KeyError: "['2004-01-14T01:00:00.000000000+0100' '2004-03-04T01:00:00.000000000+0100'\n '2004-03-30T02:00:00.000000000+0200' '2004-04-12T02:00:00.000000000+0200'\n '2004-04-15T02:00:00.000000000+0200' '2004-04-17T02:00:00.000000000+0200'\n '2004-04-19T02:00:00.000000000+0200' '2004-04-20T02:00:00.000000000+0200'\n '2004-04-22T02:00:00.000000000+0200' '2004-04-26T02:00:00.000000000+0200'\n '2004-04-28T02:00:00.000000000+0200' '2004-04-30T02:00:00.000000000+0200'\n '2004-05-05T02:00:00.000000000+0200' '2004-05-07T02:00:00.000000000+0200'\n '2004-05-10T02:00:00.000000000+0200' '2004-05-13T02:00:00.000000000+0200'\n '2004-05-17T02:00:00.000000000+0200' '2004-05-20T02:00:00.000000000+0200'\n '2004-05-24T02:00:00.000000000+0200' '2004-05-28T02:00:00.000000000+0200'\n '2004-06-04T02:00:00.000000000+0200' '2004-06-10T02:00:00.000000000+0200'\n '2004-08-27T02:00:00.000000000+0200' '2004-10-06T02:00:00.000000000+0200'\n '2004-11-02T01:00:00.000000000+0100' '2004-12-08T01:00:00.000000000+0100'\n '2011-02-21T01:00:00.000000000+0100' '2011-03-21T01:00:00.000000000+0100'\n '2011-04-04T02:00:00.000000000+0200' '2011-04-11T02:00:00.000000000+0200'\n '2011-04-14T02:00:00.000000000+0200' '2011-04-18T02:00:00.000000000+0200'\n '2011-04-21T02:00:00.000000000+0200' '2011-04-25T02:00:00.000000000+0200'\n '2011-05-02T02:00:00.000000000+0200' '2011-05-09T02:00:00.000000000+0200'\n '2011-05-23T02:00:00.000000000+0200' '2011-06-07T02:00:00.000000000+0200'\n '2011-06-21T02:00:00.000000000+0200' '2011-07-04T02:00:00.000000000+0200'\n '2011-07-18T02:00:00.000000000+0200' '2011-08-31T02:00:00.000000000+0200'\n '2011-09-13T02:00:00.000000000+0200' '2011-09-28T02:00:00.000000000+0200'\n '2011-10-10T02:00:00.000000000+0200' '2011-10-25T02:00:00.000000000+0200'\n '2011-11-08T01:00:00.000000000+0100' '2011-11-28T01:00:00.000000000+0100'\n '2011-12-20T01:00:00.000000000+0100' '2012-01-19T01:00:00.000000000+0100'\n '2012-02-14T01:00:00.000000000+0100' '2012-03-13T01:00:00.000000000+0100'\n '2012-03-27T02:00:00.000000000+0200' '2012-04-02T02:00:00.000000000+0200'\n '2012-04-10T02:00:00.000000000+0200' '2012-04-17T02:00:00.000000000+0200'\n '2012-04-26T02:00:00.000000000+0200' '2012-04-30T02:00:00.000000000+0200'\n '2012-05-03T02:00:00.000000000+0200' '2012-05-07T02:00:00.000000000+0200'\n '2012-05-10T02:00:00.000000000+0200' '2012-05-14T02:00:00.000000000+0200'\n '2012-05-22T02:00:00.000000000+0200' '2012-06-05T02:00:00.000000000+0200'\n '2012-06-19T02:00:00.000000000+0200' '2012-07-03T02:00:00.000000000+0200'\n '2012-07-17T02:00:00.000000000+0200' '2012-07-31T02:00:00.000000000+0200'\n '2012-08-14T02:00:00.000000000+0200' '2012-08-28T02:00:00.000000000+0200'\n '2012-09-11T02:00:00.000000000+0200' '2012-09-25T02:00:00.000000000+0200'\n '2012-10-10T02:00:00.000000000+0200' '2012-10-24T02:00:00.000000000+0200'\n '2012-11-21T01:00:00.000000000+0100' '2012-12-18T01:00:00.000000000+0100'] not in index"

最佳答案

您很接近,但实际上在使用 isnull() 函数时不需要迭代行。默认情况下

df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index

将仅返回 DOC_mg/L 所在行的索引不为空且 TOC_mg/L为空。

现在您可以执行以下操作来设置 TOC_mg/L 的值:

null_index = df[(df['DOC_mg/L'].isnull() == False) & \
                (df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index] # EDIT To switch the index position.

这将使用 TOC_mg/L 为空且 DOC_mg/L 不为空的行的索引,并将 TOC_mg/L 的值设置为同一行中 DOC_mg/L 中找到的值。

注意:这不是使用索引设置值的可接受方法,但我已经这样做了一段时间了。只需确保在设置值时,等式左侧为 df['col_name'][index] 。如果col_nameindex切换后,您会将值设置为副本,该副本永远不会设置回原始值。

现在要设置平均值,您可以创建一个新列,我们将其称为 Mean_mg/L并将值设置为 = 0.0。然后将此新列设置为两列的平均值:

# Insert a new col at the end of the dataframe columns name 'Mean_mg/L' 
#     with default value 0.0
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
# Set this columns value to the average of DOC_mg/L and TOC_mg/L
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2

在用相应列值填充空值的列中,平均值将与值相同。

关于python-2.7 - 将两个数据框列中的数据合并为一列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24143291/

相关文章:

python - 在安装时检测 Python 传递依赖问题?

python - 使用 Pandas ,如何将导出的 csv 文件保存到相对于脚本位置的文件夹?

python - 小数(-1)是什么意思?

python - 如何使用 apply 函数和 lambda 函数将列表的值添加到列值?

python-3.x - 从数据框中过滤行

pandas - 在 Pandas 中获取索引标签作为字符串

python - 我可以访问英语词典以循环匹配莫尔斯电码吗?如果不能,我可以将它从某个地方复制并粘贴到几行吗?

matrix - python : Transformation Matrix

python - Pandas 数据框

python - 如何在 Pandas 中使用 Apply 函数来应用/lambda?