我是 Stack Overflow 的新手,有一个我无法解决的问题。 我有一个 csv 文件或 excel(基本上是一个表格)并想在 Python 3 中执行以下操作:
Column Header,"r269_d","r295_A","r295_R","r299_A","r325_D","r326_A"
id1,"0.0","2.29","0.0","1.3","0.0","188.4"
id2,"0.0","1.0","0.0","0.6","0.0","0.0"
对于这个 csv 文件,我想:
进入第一行(id1)
检查第 1 列 (r269_d)
2.1 如果 col1 的值 = 0 将 0 写入新的 result_string
2.2 如果 col1 的值 != 0 将 1 写入新的 result_string
检查第 2 列 (r295_A)
3.1 如果col2的值=0,将0写入2.1中提到的同一个result_string
3.2 If value of col2 != 0 write 1 to the same result_string as mentioned in 2.1
对所有列执行此操作
转到下一行并执行相同的操作。
最后我想要这样的东西:
Column Header,"r269_d","r295_A","r295_R","r299_A","r325_D","r326_A", "result_string"
id1,"0.0","2.29","0.0","1.3","0.0","188.4","010101"
id2,"0.0","1.0","0.0","0.6","0.0","0.0","010100"
最佳答案
Pandas 解决方案:
import pandas as pd
import numpy as np
df = pd.read_csv(r'/path/to/file.csv')
df['result_string'] = (df.filter(regex='r\d+')
.ne(0).astype(np.int8).astype(str)
.apply(''.join, axis=1))
df.to_csv(r'/path/to/result.csv', index=False)
源 CSV 文件:
col,r269_d,r295_A,r295_R,r299_A,r325_D,r326_A
id1,0.0,2.29,0.0,1.3,0.0,188.4
id2,0.0,1.0,0.0,0.6,0.0,0.0
解析的 DF:
In [169]: df
Out[169]:
col r269_d r295_A r295_R r299_A r325_D r326_A
0 id1 0.0 2.29 0.0 1.3 0.0 188.4
1 id2 0.0 1.00 0.0 0.6 0.0 0.0
结果:
In [170]: df['result_string'] = (df.filter(regex='r\d+')
...: .ne(0).astype(np.int8).astype(str)
...: .apply(''.join, axis=1))
...:
In [171]: df
Out[171]:
col r269_d r295_A r295_R r299_A r325_D r326_A result_string
0 id1 0.0 2.29 0.0 1.3 0.0 188.4 010101
1 id2 0.0 1.00 0.0 0.6 0.0 0.0 010100
In [172]: df.to_csv(r'c:/temp/result.csv', index=False)
生成的 CSV:
col,r269_d,r295_A,r295_R,r299_A,r325_D,r326_A,result_string
id1,0.0,2.29,0.0,1.3,0.0,188.4,010101
id2,0.0,1.0,0.0,0.6,0.0,0.0,010100
关于python - 遍历 csv/xlsx 文件 python 3 中的行和列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43778939/