Python:将不同 csv 文件中的列追加到 CSV

标签 python csv

我目前有一个脚本,我想用它来组合 csv 数据文件。 例如,我有一个名为 process.csv 和 file.csv 的文件,但是当我尝试在名为“all_files.csv”的新文件中将一个文件附加到另一个文件时 它会附加正确的列,但不是从文件顶部开始。

现在发生了什么:

                process/sec
08/03/16 11:19  0
08/03/16 11:34  0.1
08/03/16 11:49  0
08/03/16 12:03  0
08/03/16 12:13  0
08/03/16 12:23  0
                                file/sec
                                0
                                43.3
                                0
                                0
                                0
                                0
                                0

我想要什么:

                process/sec     file/sec
08/03/16 11:19  0               0
08/03/16 11:34  0.1             43.3
08/03/16 11:49  0               0
08/03/16 12:03  0               0
08/03/16 12:13  0               0
08/03/16 12:23  0               0

这是我的代码(请注意,我删除了与用于 per_second 值的算法相关的所有多余代码,并在此示例中使用静态值):

def all_data(data_name,input_file_name,idx):
    #Create file if first set of data
    if data_name == 'first_set_of_data':
            all_per_second_file = open("all_data.csv", 'wb')
    #Append to file for all other data
    else:
        all_per_second_file = open("all_data.csv", 'a')

        row_position=''
        #For loop with index number to position rows after one another 
        #So not to rewrite new data to the same columns in all_data.csv
        for number in range(0,idx):
            row_position=row_position+','

    with open(input_file_name, 'rb') as csvfile:

        # get number of columns
        for line in csvfile.readlines():
            array = line.split(',')
            first_item = array[0]

        num_columns = len(array)
        csvfile.seek(0)

        reader = csv.reader(csvfile, delimiter=',')
        #Columns to include Date and desired data
        included_cols = [0, 3]
        count =0

        #Test value for example purposes
        per_second=12

        for row in reader:          
            #Create header
            if count==1:
                all_per_second_file.write(row_position+','+event_name+"\n")

            #Intialise date column with first set of data
            #first entry rate must be 0
            if count ==2:
                if event_name == 'first_set_of_data':
                    all_per_second_file.write(row_position+row[0]+",0\n")
                else:
                    all_per_second_file.write(row_position+",0\n")

            #If data after the first row =0 value should reset so data/sec should be 0, not a minus number
            if count>2 and row[3]=='0':
                if event_name == 'first_set_of_data':
                    all_per_second_file.write(row_position+row[0]+",0\n")
                else:
                    all_per_second_file.write(row_position+",0\n")

            #Otherwise calculate rate
            elif count >=3:
                if event_name == 'first_set_of_data':
                    all_per_second_file.write(row_position+row[0]+","+str("%.1f" % per_second)+"\n")
                else:
                    all_per_second_file.write(row_position+","+str("%.1f" % per_second)+"\n")

            count = count+1

    all_per_second_file.close()

代码更新:

我已将脚本更改为以下内容,该脚本似乎可以正常工作:

def all_data(input_file_name):
    a = pd.read_csv(per_second_address+input_file_name[0])
    b = pd.read_csv(per_second_address+input_file_name[1])
    c = pd.read_csv(per_second_address+input_file_name[2])
    d = pd.read_csv(per_second_address+input_file_name[3])

    b = b.dropna(axis=1)
    c = c.dropna(axis=1)
    d = d.dropna(axis=1)

    merged = a.merge(b, on='Date')
    merged = merged.merge(c, on='Date')
    merged = merged.merge(d, on='Date')    

    merged.to_csv(per_second_address+"all_event_per_second.csv", index=False)

最佳答案

CSV 文件读/写操作是基于行的。

<小时/>

请检查以下代码以及Python可用的基本模块:

process.csv 包含:

time,process/sec
8/3/2016 11:19,0
8/3/2016 11:34,0
8/3/2016 11:49,1
8/3/2016 12:03,1
8/3/2016 12:13,0
8/3/2016 12:23,0

files.csv 包含:

time,files/sec
8/3/2016 11:19,0
8/3/2016 11:34,2
8/3/2016 11:49,3
8/3/2016 12:03,4
8/3/2016 12:13,1
8/3/2016 12:23,0

Python 代码将创建“combine.csv”:

import csv

#Read both files
with open('process.csv', 'rb') as a:
    reader = csv.reader(a,delimiter = ",")
    process_csv = list(reader)

with open('files.csv', 'rb') as b:
    reader = csv.reader(b,delimiter = ",")
    data_csv = list(reader)

#Write into combine.csv
if len(process_csv) == len(data_csv):
    with open('combine.csv', 'ab') as f:
        writer = csv.writer(f,delimiter = ",")
        for i in range(0,len(process_csv)):
            temp_list = []
            temp_list.extend(process_csv[i])
            temp_list.append(data_csv[i][1])
            writer.writerow(temp_list)

combine.csv 有:

time,process/sec,files/sec
8/3/2016 11:19,0,0
8/3/2016 11:34,0,2
8/3/2016 11:49,1,3
8/3/2016 12:03,1,4
8/3/2016 12:13,0,1
8/3/2016 12:23,0,0
<小时/>

使用 pandas 模块编写代码。

import pandas as pd

a = pd.read_csv("process.csv")
b = pd.read_csv("files.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='time')
merged.to_csv("combine2.csv", index=False)

有关 pandas 模块的更多信息,click here !!!

关于Python:将不同 csv 文件中的列追加到 CSV,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37294600/

相关文章:

python - 在 csv 文件中查找每个月的最高温度?

python - 从 Dataframe Cell 获取单个值适用于 .loc,但不适用于 .at

python - 使用pandas包在python中合并多个excel文件的数据

python - 如何使用 url.open() 提取从网站获取的部分数据

python - JSON 使用被调用的值序列化 Django 查询集。我的方法有什么问题吗?

python - 使用 pandas 高效读取大型 CSV 文件而不会崩溃

python - 值错误 : I/O operation on closed file

python - 如何用 Python 替换 CSV 文件中的列?

javascript - 如何将 Javascript 客户端连接到 Python-SocketIO 服务器?

powershell - 如何在Powershell中导出具有arraylist的对象