python - 有没有一种简单的方法可以将许多新列广播到 Pandas DataFrame 中？

我有一组 .png 图像，以及一个包含 2 列的 df_train 数据框，图像文件的名称 id_code 和图像的诊断。 (df_train 有 m 行和 2 列)

我想创建一个新的数据框df_train_new，它维护m行，但添加n个新列。 n 中的每个新列都将保存该图像的像素值之一。 (df_train_new 有 m 行和 (2+n) 列)。

我通过 PIL 编写了简单的代码来获取绿色 channel 的像素值，对它们进行矢量化(因此它们是 n 行和 1 列)，并尝试创建一个 for 循环(循环在每个图像上)将这些新列添加到新数据框中。

df_train = pd.read_csv('../train.csv')

img_dims = 6869376 # number of pixels per image

for example in range(len(df_train)): # iterate over every image
    img = Image.open("../input/train_images/" + str(df_train.iloc[example,0]) + ".png") # open image with PIL
    img_green_data = np.asarray(list(img.getdata()))[:,1].reshape(img_dims,-1).T # create (1, 6869376) vector for every image
    df_train.loc[example,2:] = img_green_data # now try to add these columns to the data frame! *** doesn't work

我收到此错误:

ValueError: Must have equal len keys and value when setting with an ndarray

我知道这一定不是正确的方法，但我尝试了几种方法，感觉一定有一种更简单的方法来完成这种事情!

最佳答案

不要更新现有的数据框，而是考虑通过垂直堆叠像素矩阵列表来构建 Numpy 矩阵。然后，将原始数据帧与类型转换矩阵连接起来。以下是未经测试的调整。

df_train = pd.read_csv('../train.csv')

img_dims = 6869376 # number of pixels per image

mat_list = []
# iterate over every image
for id_code in df_train['id_code']):
    # open image with PIL
    img = Image.open("../input/train_images/" + str(id_code) + ".png")
    # create (1, 6869376) vector for every image
    img_green_data = (np.asarray(list(img.getdata()))[:,1]
                        .reshape(img_dims,-1)
                        .transpose()
                     )
    # APPEND TO LIST
    mat_list.append(img_green_data)

final_df = pd.concat([df_train,
                      pd.DataFrame(np.vstack(mat_list))],
                     axis = 'columns')

关于python - 有没有一种简单的方法可以将许多新列广播到 Pandas DataFrame 中？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57428572/

python - 有没有一种简单的方法可以将许多新列广播到 Pandas DataFrame 中？

上一篇：python - 在 Python 内部如何将列表追加到自身？

下一篇：python - 在 Chrome 中切换 iframe 后调用任何内容都会导致错误 - Python Selenium (Chrome)