python - 操作后将 pandas 数据帧保存在循环中

标签 python pandas dataframe

我有一个循环,它获取一系列现有数据帧并操作它们的格式和值。我需要知道如何在循环结束时创建包含修改内容的新数据帧。

示例如下:

import pandas as pd

# Create datasets
First = {'GDP':[200,175,150,100]}
Second = {'GDP':[550,200,235,50]}

# Create old_dataframes
old_df_1 = pd.DataFrame(First)
old_df_2 = pd.DataFrame(Second)

# Define references and dictionary
old_dfs = [old_df_1, old_df_2]
new_dfs = ['new_df_1','new_df_2']
dictionary = {}

# Begin Loop
for df, name in zip(old_dfs, new_dfs):

    # Multiply all GDP values by 1.5 in both dataframes
    df = df * 1.5    

    # ISSUE HERE - Supposed to Create new data frames 'new_df_1' & 'new_df_2' containing df*1.5 values: Only appends to dictionary. Does not create new_df_1 & new_df_2
    dictionary[name] = df

# Check for the existance of 'new_df_1 & new_df_2' (They will not appear)
%who_ls DataFrame

问题:我已经标记了上面的问题。我的代码不会创建“new_df_1”和“new_df_2”数据帧。它只是将它们附加到字典中。我需要能够创建 new_df_1 和 new_df_2 作为单独的数据帧。

最佳答案

from copy import deepcopy   #  to copy old dataframes appropriately

# create 2 lists, first holds old dataframes and second holds modified ones
old_dfs_list, new_dfs_list = [pd.DataFrame(First), pd.DataFrame(Second)], []

# process old dfs one by one by iterating over old_dfs_list, 
# copy, modify each and append it to list of new_dfs_list with same index as 
# old df ... so old_dfs_list[1] is mapped to new_dfs_list[1]

for i in range(len(old_dfs_list)):
  # a deep copy prevent changing old dfs by reference
  df_deep_copy = deepcopy(old_dfs_list[i]) 
  df_deep_copy['GDP'] *= 1.5
  new_dfs_list.append(df_deep_copy)

print(old_dfs_list[0])   # to check that old dfs are not changed
print(new_dfs_list[0])

results_before_after

您还可以尝试使用字典而不是列表来使用您喜欢的名称:

import pandas as pd
datadicts_dict = { 
                    'first' :{'GDP':[200,175,150,100]}, 
                    'second':{'GDP':[550,200,235,50]}, 
                    'third' :{'GDP':[600,400,520,100, 800]}
                    }

# Create datasets and store it in a python dictionary
old_dfs_dict, new_dfs_dict = {}, {}    # initialize 2 dicts to hold original and modified dataframes

# process datasets one by one by iterating over datadicts_dict, 
# convert to df save it in old_dfs_dict with same name as the key
# copy, modify each and put it in new_dfs_dict with same key 
# so dataset of key 'first' in datadicts_dict is saved as old_dfs_dict['first'] 
# modified and mapped to new_dfs_dict['first']

for dataset_name, data_dict in datadicts_dict.items():
    old_dfs_dict[dataset_name] = pd.DataFrame({'GDP':data_dict['GDP']})
    new_dfs_dict[dataset_name] = pd.DataFrame({'GDP':data_dict['GDP']}) * 1.5

print(old_dfs_dict['third'])   # to check that old dfs are not changed
print(new_dfs_dict['third'])

关于python - 操作后将 pandas 数据帧保存在循环中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59384898/

相关文章:

python - 在 python 中处理大型密集矩阵

python - 在 wxPython 中自动排列面板

r - 将表的命名列表转换为 data.frame

python - 应用 groupby() 后计算最大行数

python - 使用 try : except within a for loop to handle DivideZeroError 时出现语法错误

python - 在多个条件下的 For 循环中过滤 pandas DataFrame 的更快方法

pandas - Pyarrow 用于 Parquet 文件,还是只是 Pandas ?

python - Pandas 中矩阵最有效的行乘法

python - pandas groupby ffill bfill 需要中间 groupby 吗?

python - 如何从 GaussianProcessClassifier 中提取估计参数 (theta)