python - csv中的列操作[python]

标签 python python-2.7 python-3.x pandas csv

我有一个场景,从 csv 文件中提取行值。

(CSV) test1:

    Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000

(CSV) test2:

Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000

This is my code:

df = pd.read_csv('test1.csv',skipfooter=1)
df2 = pd.read_csv('test2.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])

这里我试图获取“server_hit_rate”的值,该值是99%并且属于第三行数据。 但通过上面的代码,我只能获取第一行的数据。即

                    Host    Time Up    Time OK
0  server1.test.com:1717  100.000%   100.000% 
1  server2.test.com:1717  100.000%   100.000%

所需的输出应该是:

                    Host    Time Up    Time OK
0  server1.test.com:1717  100.000%    99.000% 
1  server2.test.com:1717  100.000%    99.000% 

任何实现以下目标的建议都会有所帮助。

Edit1:

import pandas as pd
import pandas
import os, shutil, glob
import sys
import datetime
import time
def t1():
    import pandas as pd
    import pandas
    today=datetime.datetime.utcnow().strftime("%a %b %d %H:%M:%S %Z %Y")
    print "date :", today
    df = pd.read_csv('t1.csv',skipfooter=1, engine='python')
    df2 = pd.read_csv('t2.csv',skipfooter=1, engine='python')
    temp = df2.ffill()[df2['Service']=='server_hit_rate']
    combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
    combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
    combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
    combined.to_csv('test.csv',index=False)
t1()


O/P:

Wed Nov 15 10:07:01  2017
Empty DataFrame
Columns: [Host, % Time Up, % Time OK]
Index: []

最佳答案

如果你根据Service通过前向填写Host来选择包含server_hit_rate的数据,然后合并数据,那就相当简单了,即

temp = df2.ffill()[df2['Service']=='server_hit_rate']

#                 Host          Service             Time OK      ...
#1  server1.test.com:1717  server_hit_rate  99.000% (100.000%)   ...
#6  server2.test.com:1717  server_hit_rate  99.000% (100.000%)   ...

combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])

数据帧的输出组合:

print(combined)

                  Host    Time Up   Time OK
0  server1.test.com:1717  100.000%   99.000% 
1  server2.test.com:1717  100.000%   99.000% 

Also instead of using spaces before the column name strip the spaces using

df.columns = df.columns.str.strip()

关于python - csv中的列操作[python],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47213771/

相关文章:

python - 在我的脚本中获取 "TypeError: an integer is required"

django - 如何迁移django中的特定表

python - Pandas - map 函数的错误行为

python - Python 函数调用中的条件参数

python - 在 Python 中使用编码进行防弹工作

python-2.7 - Pandas - 将列合并为一列,保留列名称

python - 将标签列表作为 drop() 方法的 'labels' 参数传递时,出现 ValueError : Need to specify at least one of 'index' , 'columns' 或 'columns'

python - 如何使用 python 在多个子目录中为文件添加扩展名

python - 如何从 bash 脚本执行多行 python 代码?

c++ - 将 Boost Python 与 Weak Ptr 一起使用?