pandas - 使用 pandas 数据框平均 txt 文件

我有一堆具有完全相同结构的txt文件。每个txt文件包含m行n列数据。我想对每个条目取平均值并报告最终的 df。

txt1

Hour | X1 | X2 | X3 | X4
 0   | 15 | 13 | 25 | 37  
 1   | 26 | 52 | 21 | 45 
 2   | 18 | 45 | 45 | 25 
 3   | 65 | 38 | 98 | 14

txt2

Hour | X1 | X2 | X3 | X4
 0   | 10 | 13 | 45 | 37  
 1   | 20 | 53 | 31 | 45 
 2   | 13 | 43 | 45 | 25 
 3   | 65 | 32 | 38 | 14

txt3

Hour | X1 | X2 | X3 | X4
 0   | 11 | 13 | 25 | 37  
 1   | 21 | 52 | 21 | 45 
 2   | 18 | 41 | 45 | 25 
 3   | 65 | 31 | 98 | 14

最终数据框

Hour | X1 | X2 | X3 | X4
 0   | (15+10+11)/3 | .. | 37  
 1   | (26+20+21)/3 | .. | 45 
 2   | (18+13+18)/3 | .. | 25 
 3   | (65+65+65)/3 | .. | 14

什么是有效的方法？

最佳答案

下面的代码允许您迭代文件夹并将所有文本文件附加到单个数据框中。

import os
import glob
import pandas as pd
os.chdir('C:\\path_to_folder_for_text_files\\')
Filelist = glob.glob('*.txt')
appended_data = []
for file in FileList:
    df = pd.read_csv(file,sep='|')
    #df = any other operations in each file if required
    appended_data.append(df)
appended_data = pd.concat(appended_data)
df = pd.DataFrame(appended_data)

获得附加数据后，执行以下操作:

df.groupby('Hour')[df.columns[1:]].mean().reset_index()

   Hour    X1    X2    X3    X4
0     0 12.00 13.00 31.67 37.00
1     1 22.33 52.33 24.33 45.00
2     2 16.33 43.00 45.00 25.00
3     3 65.00 33.67 78.00 14.00

关于pandas - 使用 pandas 数据框平均 txt 文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54384330/

上一篇：django - 基于外键的过滤

下一篇：r - 如何在函数中使用函数作为参数？

相关文章：

python - 使用多重索引时 Pandas 的行为非常奇怪

apache-spark - Spark : build recursive tree path for every node of an hierarchy DataFrame

r - 条件均值陈述

r - 计算平均值和其他不包括某些值

python max() 函数在 pandas 列上使用时失败并且整数失败

python - 使用 read_csv 时，我收到 sre_constants.error : nothing to repeat at position 0

python - 将mat文件转换为pandas dataframe问题

python - 按最近的索引加入 pandas DataFrame 值

python - pandas dataframe 按下一次出现的列值进行分组

google-sheets - 使用中值和分组方式查询谷歌表