我有一个场景,从 csv 文件中提取行值。
(CSV) test1:
Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
(CSV) test2:
Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
This is my code:
df = pd.read_csv('test1.csv',skipfooter=1)
df2 = pd.read_csv('test2.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
这里我试图获取“server_hit_rate”的值,该值是99%并且属于第三行数据。 但通过上面的代码,我只能获取第一行的数据。即
Host Time Up Time OK
0 server1.test.com:1717 100.000% 100.000%
1 server2.test.com:1717 100.000% 100.000%
所需的输出应该是:
Host Time Up Time OK
0 server1.test.com:1717 100.000% 99.000%
1 server2.test.com:1717 100.000% 99.000%
任何实现以下目标的建议都会有所帮助。
Edit1:
import pandas as pd
import pandas
import os, shutil, glob
import sys
import datetime
import time
def t1():
import pandas as pd
import pandas
today=datetime.datetime.utcnow().strftime("%a %b %d %H:%M:%S %Z %Y")
print "date :", today
df = pd.read_csv('t1.csv',skipfooter=1, engine='python')
df2 = pd.read_csv('t2.csv',skipfooter=1, engine='python')
temp = df2.ffill()[df2['Service']=='server_hit_rate']
combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
combined.to_csv('test.csv',index=False)
t1()
O/P:
Wed Nov 15 10:07:01 2017
Empty DataFrame
Columns: [Host, % Time Up, % Time OK]
Index: []
最佳答案
如果你根据Service
通过前向填写Host来选择包含server_hit_rate
的数据,然后合并数据,那就相当简单了,即
temp = df2.ffill()[df2['Service']=='server_hit_rate']
# Host Service Time OK ...
#1 server1.test.com:1717 server_hit_rate 99.000% (100.000%) ...
#6 server2.test.com:1717 server_hit_rate 99.000% (100.000%) ...
combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
数据帧的输出组合
:
print(combined) Host Time Up Time OK 0 server1.test.com:1717 100.000% 99.000% 1 server2.test.com:1717 100.000% 99.000%
Also instead of using spaces before the column name strip the spaces using
df.columns = df.columns.str.strip()
关于python - csv中的列操作[python],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47213771/