python - 在数据帧的某个索引处用另一行替换一行并更改单元格值

标签 python python-3.x pandas dataframe python-3.7

我有一个像这样的 csv 示例:

                 keys                       key_regex    datatype detailed_datatype precedence  val_regex     val_regex_2  val_regex_3  max_words  alpha_char_check
0      billingAddress      original_billing_key_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
1     deliveryAddress     original_delivery_key_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
2         notifyParty     original_notify_party_regex  alphabetic        alphabetic    primary        NaN             NaN          NaN        NaN               NaN
3       originAddress   original_seller_address_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
4   billingAddressAlt   alternative_billing_key_regex  alphabetic           address   tertiary        NaN             NaN          NaN        NaN               NaN
5  deliveryAddressAlt  alternative_delivery_key_regex  alphabetic           address   tertiary        NaN             NaN          NaN        5.0               1.0
6    originAddressAlt    alternative_seller_key_regex  alphabetic           address   tertiary        NaN  sample_val_re1          NaN        NaN               0.0

我正在尝试将 keys 列的值作为 tertiary_row_replacement_dict 中的键的行替换为具有 keys 的行列值作为相应的值,然后将 precendence 列值从 'tertiary' 重命名为 'primary' - 同时保持索引位置与前。

预期的输出是这样的:

              keys                       key_regex    datatype detailed_datatype precedence  val_regex     val_regex_2  val_regex_3  max_words  alpha_char_check
0   billingAddress   alternative_billing_key_regex  alphabetic           address    primary        NaN             NaN          NaN        NaN               NaN
1  deliveryAddress  alternative_delivery_key_regex  alphabetic           address    primary        NaN             NaN          NaN        5.0               1.0
2      notifyParty     original_notify_party_regex  alphabetic        alphabetic    primary        NaN             NaN          NaN        NaN               NaN
3    originAddress    alternative_seller_key_regex  alphabetic           address    primary        NaN  sample_val_re1          NaN        NaN               0.0

有 3 个原始 csv - 每个都很大,有很多类似的情况,即具有第一优先级的键和具有第三优先级的替代键。我的字典的键如下所示:

tertiary_row_replacement_dict = {
    "originAddress": "originAddressAlt",
    "deliveryAddress": "deliveryAddressAlt",
    # "totalAmount": "totalAmountAlt",
    "billingAddress": "billingAddressAlt"
    ....
}

假设该字典的键和相应的值始终存在于 csv 中,我有以下代码:

for k, new_k in row_replacement_dict.items():
    t2 = df.loc[df['keys']==new_k].index[0]
    df.loc[df.loc[df['keys']==k].index[0]] = [i if i!='tertiary' else 'primary' for i in df.loc[t2]]
    df = df.replace([new_k, 'tertiary'], [k, 'primary']).drop([t2])

它完成了我想做的事情。仅在测试 csv 上执行此操作大约需要 0.034 秒,并且可能不是处理仅替换行并替换单元格值的情况的最佳或优化方法。是否有任何更快的替代方法,前提是知道哪些行要替换为哪一行(即,不强制使用该字典,我们可以将其用作列表列表的元组列表以进行速度权衡)。

最佳答案

您可以使用replace将三级键替换为主键,并使用groupby().first()填写信息:

inverse_dict = {v:k for k,v in tertiary_row_replacement_dict.items()}
(df.groupby(df['keys'].replace(inverse_dict))
   .first()
   .reset_index(drop=True)
)

输出:

    keys             key_regex                      datatype    detailed_datatype    precedence      val_regex  val_regex_2       val_regex_3    max_words    alpha_char_check
--  ---------------  -----------------------------  ----------  -------------------  ------------  -----------  --------------  -------------  -----------  ------------------
 0  billingAddress   original_billing_key_regex     alphabetic  address              primary               nan  nan                       nan          nan                 nan
 1  deliveryAddress  original_delivery_key_regex    alphabetic  address              primary               nan  nan                       nan            5                   1
 2  notifyParty      original_notify_party_regex    alphabetic  alphabetic           primary               nan  nan                       nan          nan                 nan
 3  originAddress    original_seller_address_regex  alphabetic  address              primary               nan  sample_val_re1            nan          nan                   0

关于python - 在数据帧的某个索引处用另一行替换一行并更改单元格值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62105742/

相关文章:

python - 如何使用 matplotlib 为所有子图设置默认颜色循环?

python - 在 Python 中运行 bash while 循环

python - Pandas Dataframe - 根据两列找到具有最小值但大于 0 的行

python - 使用逻辑表达式和 if 语句评估 pandas 系列值

python - 从单个父脚本运行多个 Python 脚本

Python 使用按钮通过 tkinter GUI 循环字典项

python - 根据多个条件创建列

python - Pandas : how to add Column name on dataframe on csv file

python - 为什么我的numpy文件比使用同一数组生成的PNG大?

python - 如何在 Python 中正确排序对象列表