我正在尝试根据键(“r_id”)合并多个文件,并使用文件名重命名输出中的列名。除了用文件名重命名输出之外,我可以做所有事情。我遇到以下错误,可能是由旧版本的 Pandas 引起的。有谁知道如何在不将 pandas 更新到新版本的情况下解决此问题?
错误
Traceback (most recent call last):
File "multijoin_2.py", line 19, in <module>
result = merge_files(files).reset_index()
File "multijoin_2.py", line 11, in merge_files
pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'])
File "/users/xxx/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2007, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'assign'
输入
$猫测试1
r_id g_id exp
r1 g1 20
r2 g1 30
r3 g1 1
r4 g1 3
$猫测试2
r_id gid exp
r1 g2 20
r2 g2 30
r3 g2 1
r4 g2 3
$猫测试3
r_id g_id exp
r1 g3 30
r2 g3 40
r3 g3 11
r4 g3 32
期望的输出
r_id test3 test2 test1
0 r1 30 20 20
1 r2 40 30 30
2 r3 11 1 1
3 r4 32 3 3
工作代码(列命名除外)
import os
import glob
import pandas as pd
files = glob.glob(r'/path/test*')
def merge_files(files, **kwargs):
dfs = []
for f in files:
dfs.append(
pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'])
#.assign(col=0)
.rename(columns={'col_name':os.path.splitext(os.path.basename(f))[0]})
.set_index(['repeat_id'])
)
return pd.concat(dfs, axis=1)
result = merge_files(files).reset_index()
print(result)
最佳答案
您需要更改 exp
作为重命名的列名称:
def merge_files(files, **kwargs):
dfs = []
for f in files:
dfs.append(
pd.read_csv(f, sep='\t', usecols=['r_id', 'exp'], index_col=['r_id'])
.rename(columns={'exp':os.path.splitext(os.path.basename(f))[0]})
)
return pd.concat(dfs, axis=1)
result = merge_files(files).reset_index()
print(result)
r_id test1 test2 test3
0 r1 20 20 30
1 r2 30 30 40
2 r3 1 1 11
3 r4 3 3 32
关于pandas - 如何修复属性错误: 'DataFrame' object has no attribute 'assign' with out updating Pandas?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44305253/