Python pandas 数据清理

标签 python pandas

我是 python pandas 的新手,我很难实现以下数据清理,请帮忙。

我的实际数据(csv 文件链接 - https://s3.amazonaws.com/rajaampledata/data.csv )

Date,Description,Description,Ref. No,Amount,Balance
30/08/2012,TFR-TFR:0000000101-,,,"1,952.50-","4,000.000"
"",Kumar - S/O To:,,,,
"",600010013441,,,,
30/08/2012,FDR-,,,10.50-,"5,114,897.40"
"",AU;541411;301218;RAJA,,,,
"",J;RTGS-AUTO-,,,,
"",TRANSAC,,,,
26/08/2012,DEP-IN162071/D61519,,,"1,000.83","6,100,098.32"
26/08/2012,WDL-IN B CM 20120826,,,180.32-,"789,126.31"
25/08/2012,103-,,,"1,000,000.00","3,225,700.00"
"",IN;112138;100318;BANK,,,,
"",ACC;,,,,

我想获取如下数据

30/08/2012,TFR-TFR:0000000101-Kumar - S/O To:600010013441,,,"1,952.50","4,000.000"
30/08/2012,FDR-AU;541411;301218;RAJAJ;RTGS-AUTO-TRANSAC,,,10.50-,"5,114,897.40"
26/08/2012,DEP-IN162071/D61519,,,"1,000.83","6,100,098.32"
26/08/2012,WDL-IN B CM 20120826,,,180.32-,"789,126.31"
25/08/2012,103-IN;112138;100318;BANKACC;,,,"1,000,000.00","3,225,700.00"

最佳答案

如果当前行以空格开头,请尝试附加到上一行。获得数据后,使用逗号分隔符将它们连接到一个字符串中。

with open('data.csv') as f:
    reader = csv.reader(f)
    headers = next(reader)
    lines = []
    for r in reader:
        if r[0] == '':
            lines[-1][1] = lines[-1][1] + r[1]
        else:
            lines.append(r)

lines = [','.join(i) for i in lines]

print(lines)
>>['30/08/2012,TFR-TFR:0000000101-Kumar - S/O To:6.0001E+11,,,1,952.50-,4,000.00',
 '30/08/2012,FDR-AU;541411;301218;RAJAJ;RTGS-AUTO-TRANSAC,,,10.50-,5,114,897.40',
 '26/08/2012,DEP-IN162071/D61519,,,1,000.83,6,100,098.32',
 '26/08/2012,WDL-IN B CM 20120826,,,180.32-,789,126.31',
 '25/08/2012,103-IN;112138;100318;BANKACC;,,,1,000,000.00,3,225,700.00']

如果您想要标题,请读取 csv 的第一行。

关于Python pandas 数据清理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53014674/

相关文章:

python - 从多个文件读取行

Python迭代csv并执行多边形操作中的点

python - 更好的方法去堆叠 Pandas 行?

python - 如何使用 pandas 从当前行获取过去 12 个月的产品

python - 合并两个数据框pandas

python - 使用 tkinter 从同一文件调用类

python - 如何保持与 Heroku 的 WebSocket 连接?

python - 计算 Pandas 的中价

python - 对特定语法进行 pyparsing 时未获得预期结果

python - 当未找到子字符串 B 时,正则表达式仅匹配子字符串 A