python - 将多个 CSV 文件中的列数据合并到单个 CSV 文件中

标签 python pandas dataframe csv merge

我对 Python 很陌生,尤其是数据处理。这就是我想要实现的目标-

我在多台服务器上运行 CIS 测试,并为每台服务器生成一个 CSV 文件(文件名与服务器名称相同)。所有服务器的输出文件被复制到中央服务器 生成的输出如下所示(截断输出)-

File1: dc1pp1v01.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Not Found,PASSED

File2: dc1pp2v01.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Not Found,PASSED

File..n: dc1pp1v02.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Found,FAILED

我想要的是输出应该看起来像 -

Description  dc1pp1v01 dc1pp2v01 dc1pp1v02 
0  1.1 Database Placement PASSED   PASSED   PASSED
1  1.2 Use dedicated least privilaged account PASSED   PASSED   PASSED
2  1.3 Diable MySQL history PASSED   PASSED   FAILED

为了合并这些文件,我创建了另一个文件,其中仅包含说明字段和两列标题,如下所示 -

file: cis_report.csv
Description,Result
1.1 Database Placement,
1.2 Use dedicated least privilaged account,
1.3 Diable MySQL history,

我编写了下面的代码来进行基于列的合并-

import glob
import os
import pandas as pd 

col_list = ["Description","Result"]
path = "/Users/Python/Data"
all_files = glob.glob(os.path.join(path, "dc*.csv"))

cis_df = pd.read_csv("/Users/Python/Data/cis_report.csv")

for fl in all_files:
   d = pd.read_csv(fl, usecols=col_list)
   f = cis_df.merge(d, on='Description')
   cis_df = f.copy()
   
print(cis_df.head())

我得到的输出是-

Description Result_x Result_y Result_x Result_y
0                      1.1 Database Placement      NaN   PASSED   PASSED   PASSED
1  1.2 Use dedicated least privilaged account      NaN   PASSED   PASSED   PASSED
2                    1.3 Diable MySQL history      NaN   PASSED   PASSED   FAILED

在我的代码中,我不确定如何获取文件名作为结果的 header 并摆脱 NaN。

此外,是否有更好的方法可以在不使用虚拟文件(cis_report.csv)的情况下实现我正在寻找的输出?非常感谢您的帮助。

最佳答案

您需要DataFrme.pivot()函数。 下面的代码有很好的注释,并且是一个完整的工作示例。根据需要进行更改

import os
import pandas as pd

# Get all file names in a directory
# Use . to use current working directory or replace it with
# e.g. r'C:\Users\Dames\Desktop\csv_files'
file_names = os.listdir('.')

# Filter out all non .csv files
# You can skip this if you know that only .csv files will be in that folder
csv_file_names = [fn for fn in file_names if fn[-4:] == '.csv']

# This Loads a csv file into a dataframe and sets the Server column
def load_csv(file_name):
    df = pd.read_csv(file_name)
    df['Server'] = file_name.split('.')[0]
    return df

# Append all the csvfiles after being processed by load_csv
df = pd.DataFrame().append([load_csv(fn) for fn in csv_file_names])

# Turn DataFrame into Pivot Table
df = df.pivot('Description', 'Server', 'Result')

# Save DataFrame into CSV File
# If this script runs multiple times make sure that the final.csv is saved elsewhere!
# Or it will be read by the code above as an input file
df.to_csv('final.csv')

最终的 DataFrame 如下所示

Server                                     dc1pp1v01 dc1pp1v02 dc1pp2v01
Description
1.1 Database Placement                        PASSED    PASSED    PASSED
1.2 Use dedicated least privilaged account    PASSED    PASSED    PASSED
1.3 Diable MySQL history                      PASSED    FAILED    PASSED

像这样的 CSV 文件

Description,dc1pp1v01,dc1pp1v02,dc1pp2v01
1.1 Database Placement,PASSED,PASSED,PASSED
1.2 Use dedicated least privilaged account,PASSED,PASSED,PASSED
1.3 Diable MySQL history,PASSED,FAILED,PASSED

关于python - 将多个 CSV 文件中的列数据合并到单个 CSV 文件中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63939462/

相关文章:

pandas split-apply-combine 创建不需要的多重索引

dataframe - 将数据框的架构更改为其他架构

c++ - 是否有可以解析 C++ 的优秀 Python 库?

python - 在 python 中,如何在标记化字符串中找到短语?

通过 pandas 表进行分组的 Pythonic 方式

Python 具有条件的数据帧的聚合总和

Python Selenium 错误处理

python - 使用 Python Win32Com.Client 发送电子邮件发送错误

python - 删除 pandas 数据框中的列会删除父数据框中的列

python - 如何在 python/matplotlib 中制作居中气泡图