Python:循环作用于多个文件并写入新文件

标签 python csv pandas

我有以下代码,它采用文件“University2.csv”, 并写入新的 csv 文件“Hours.csv”-“Hours -Stacked.csv”和“Days.csv”。

现在我希望代码能够在多个文件(University3.csv、University4.csv 等)上循环和运行,并为每个文件生成“Hours3.csv”、“Hours - Stacked3.csv”“Days3” .csv”、“Hours4.csv”等

这是代码:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt


#Importing the csv file into df
df = pd.read_csv('university2.csv', sep=";", skiprows=1)

#Changing datetime
df['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YYYY-MO-DD HH-MI-SS_SSS'], 
                                               format='%Y-%m-%d %H:%M:%S:%f')

#Set index from column
df = df.set_index('YYYY-MO-DD HH-MI-SS_SSS')

#Add Magnetic Magnitude Column
df['magnetic_mag'] = np.sqrt(df['MAGNETIC FIELD X (μT)']**2 + df['MAGNETIC FIELD Y (μT)']**2 + df['MAGNETIC FIELD Z (μT)']**2)

#Copy interesting values
df2 = df[[ 'ATMOSPHERIC PRESSURE (hPa)',
          'TEMPERATURE (C)', 'magnetic_mag']].copy()

#Hourly Average and Standard Deviation for interesting values 
df3 = df2.resample('H').agg(['mean','std'])
df3.columns = [' '.join(col) for col in df3.columns]   

#Daily Average and Standard Deviation for interesting values 
df4 = df2.resample('D').agg(['mean','std'])
df4.columns = [' '.join(col) for col in df4.columns] 

#Write to new csv
df3.to_csv('Hours.csv', index=True)  
df4.to_csv('Days.csv', index=True)    

#New csv with stacked hour averages
df5 = pd.read_csv('Hours.csv')
df5['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df5['YYYY-MO-DD HH-MI-SS_SSS'])  
hour = pd.to_timedelta(df5['YYYY-MO-DD HH-MI-SS_SSS'].dt.hour, unit='H')
df6 = df5.groupby(hour).mean()
df6.to_csv('Hours - stacked.csv', index=True)

有人可以帮忙吗?

谢谢!

最佳答案

我认为您可以将循环与列表文件一起使用。我从文件名称中提取数字到 i,然后将它们添加到输出名称中。

此外,您还可以通过 reset_indexdf3 获取 df5 ,不需要再次read_csv

import pandas as pd

files = ['university1.csv','university2.csv','university3.csv']

for f in files:
    i = f[-5]
    print i

    #Importing the csv file into df
    df = pd.read_csv(f, sep=";", skiprows=1)

    #Changing datetime
    df['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YYYY-MO-DD HH-MI-SS_SSS'], 
                                                   format='%Y-%m-%d %H:%M:%S:%f')

    #Set index from column
    df = df.set_index('YYYY-MO-DD HH-MI-SS_SSS')

    #Add Magnetic Magnitude Column
    df['magnetic_mag'] = np.sqrt(df['MAGNETIC FIELD X (μT)']**2 + df['MAGNETIC FIELD Y (μT)']**2 + df['MAGNETIC FIELD Z (μT)']**2)

    #Copy interesting values
    df2 = df[[ 'ATMOSPHERIC PRESSURE (hPa)',
              'TEMPERATURE (C)', 'magnetic_mag']].copy()

    #Hourly Average and Standard Deviation for interesting values 
    df3 = df2.resample('H').agg(['mean','std'])
    df3.columns = [' '.join(col) for col in df3.columns]   

    #Daily Average and Standard Deviation for interesting values 
    df4 = df2.resample('D').agg(['mean','std'])
    df4.columns = [' '.join(col) for col in df4.columns] 

    #Write to new csv
    df3.to_csv('Hours'+ i + '.csv')  
    df4.to_csv('Day'+ i + 's.csv')    

    #New csv with stacked hour averages
    #df5 = pd.read_csv('Hours.csv')
    #df5['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df5['YYYY-MO-DD HH-MI-SS_SSS'])  
    df5 = df3.reset_index()
    hour = pd.to_timedelta(df5['YYYY-MO-DD HH-MI-SS_SSS'].dt.hour, unit='H')
    df6 = df5.groupby(hour).mean()
    df6.to_csv('Hours - stacked'+ i + '.csv')

编辑:

更一般的是Yaron solution ,我使用它并仅将 2,3,4 更改为 range() :

import pandas as pd

#files = ['university1.csv','university2.csv','university3.csv']
for i in range(1,4):
    print i
    print 'university'+ str(i) + '.csv'

    #Importing the csv file into df
    df = pd.read_csv('university'+ str(i) + '.csv', sep=";", skiprows=1)

    #Changing datetime
    df['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YYYY-MO-DD HH-MI-SS_SSS'], 
                                                   format='%Y-%m-%d %H:%M:%S:%f')

    #Set index from column
    df = df.set_index('YYYY-MO-DD HH-MI-SS_SSS')

    #Add Magnetic Magnitude Column
    df['magnetic_mag'] = np.sqrt(df['MAGNETIC FIELD X (μT)']**2 + df['MAGNETIC FIELD Y (μT)']**2 + df['MAGNETIC FIELD Z (μT)']**2)

    #Copy interesting values
    df2 = df[[ 'ATMOSPHERIC PRESSURE (hPa)',
              'TEMPERATURE (C)', 'magnetic_mag']].copy()

    #Hourly Average and Standard Deviation for interesting values 
    df3 = df2.resample('H').agg(['mean','std'])
    df3.columns = [' '.join(col) for col in df3.columns]   

    #Daily Average and Standard Deviation for interesting values 
    df4 = df2.resample('D').agg(['mean','std'])
    df4.columns = [' '.join(col) for col in df4.columns] 

    #Write to new csv
    df3.to_csv('Hours'+ str(i) + '.csv')  
    df4.to_csv('Day'+ str(i) + 's.csv')    

    #New df3 with stacked hour averages
    df5 = df3.reset_index()
    hour = pd.to_timedelta(df5['YYYY-MO-DD HH-MI-SS_SSS'].dt.hour, unit='H')
    df6 = df5.groupby(hour).mean()
    df6.to_csv('Hours - stacked'+ str(i) + '.csv')

关于Python:循环作用于多个文件并写入新文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36527370/

相关文章:

python - numpy 2d 区域的快速随机到唯一重新标记(无循环)

python - Pytorch argsort 已排序,张量中有重复元素

python - 包含额外逗号的 csv 文件的 CSV 模块问题

java - 如何在不知道标题的情况下使用java读取csv文件?

python - 如何在不组合行级别的情况下使用 Pandas 进行热编码

python - Pandas ,删除重复项但合并某些列

python - 为什么 pylint 无法检测列表中缺失的成员函数 (E1103)?

csv - 用 awk 替换 CSV 文件中的列值

python - 如何按行对多个条件的 Pandas 数据框列进行求和

python - 计算 Pandas 中出现次数的最有效方法是什么?