使用 collections.namedtuple
,以下 Python 代码通过标识符的 csv 文件(名为 ContentItemId
的列中的整数)处理数据库中的记录。一个示例记录是 https://api.aucklandmuseum.com/id/library/ephemera/21291 .
它的目的是检查给定 id 的 HTTP 状态并将其写入磁盘:
import requests
from collections import namedtuple
import csv
with open('in.csv', mode='r') as f:
reader = csv.reader(f)
all_records = namedtuple('rec', next(reader))
records = [all_records._make(row) for row in reader]
#Create output file
with open('out.csv', mode='w+') as o:
w = csv.writer(o)
w.writerow(["ContentItemId","code"])
count = 1
for r in records:
id = r.ContentItemId
url = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
req = requests.get(url, allow_redirects=False)
code = req.status_code
w.writerow([id, code])
如何通过后一个循环将代码的进度(理想情况下为 25%、50% 和 75% 的接合点)打印到控制台?另外,如果我在底部添加一个未缩进的 print("Complete")
,是否会到达该行?
提前致谢。
编辑:感谢所有帮助。我的(工作!)代码现在看起来像这样:
import csv
import requests
import pandas
import time
from collections import namedtuple
from tqdm import tqdm
with open('active_true_pub_no.csv', mode='r') as f:
reader = csv.reader(f)
all_records = namedtuple('rec', next(reader))
records = [all_records._make(row) for row in reader]
with open('out.csv', mode='w+') as o:
w = csv.writer(o)
w.writerow(["ContentItemId","code"])
num = len(records)
print("Checking {} records...\n".format(num))
with tqdm(total=num, bar_format="{percentage:3.0f}% {bar} [{n_fmt}/{total_fmt}] ", ncols=64) as pbar:
for r in records:
pbar.update(1)
id = r.ContentItemId
url = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
req = requests.get(url, allow_redirects=False)
code = req.status_code
w.writerow([id, code])
# time.sleep(.25)
print ('\nSummary: ')
df = pandas.read_csv("out.csv")
print(df['code'].value_counts())
我用过 pandas
' value_counts
最后总结结果。
最佳答案
要获得进度条,请使用 TQDM:
数据(来自in.csv
):
ContentItemId
21200
21201
21202
21203
21204
21205
21206
...
21296
21297
21298
21299
21300
代码:
from collections import namedtuple
import csv
import requests
from tqdm import tqdm
with open('in.csv', mode='r') as f:
reader = csv.reader(f)
all_records = namedtuple('rec', next(reader))
records = [all_records._make(row) for row in reader]
#Create output file
with open('out.csv', mode='w+') as o:
w = csv.writer(o)
w.writerow(["ContentItemId","code"])
count = 1
with tqdm(total=len(records)) as pbar:
for r in records:
pbar.update(1)
id = r.ContentItemId
url = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
req = requests.get(url, allow_redirects=False)
code = req.status_code
w.writerow([id, code])
print('Complete!')
关于python - 通过 python 中的 namedtuple csv 循环跟踪进度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57948562/