python - 通过 python 中的 namedtuple csv 循环跟踪进度

标签 python loops count percentage

使用 collections.namedtuple,以下 Python 代码通过标识符的 csv 文件(名为 ContentItemId 的列中的整数)处理数据库中的记录。一个示例记录是 https://api.aucklandmuseum.com/id/library/ephemera/21291 .

它的目的是检查给定 id 的 HTTP 状态并将其写入磁盘:

import requests
from collections import namedtuple
import csv

with open('in.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    #Create output file
    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        count = 1
        for r in records:
            id   = r.ContentItemId
            url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
            req  = requests.get(url, allow_redirects=False)
            code = req.status_code
            w.writerow([id, code])

如何通过后一个循环将代码的进度(理想情况下为 25%、50% 和 75% 的接合点)打印到控制台?另外,如果我在底部添加一个未缩进的 print("Complete"),是否会到达该行?

提前致谢。


编辑:感谢所有帮助。我的(工作!)代码现在看起来像这样:

import csv
import requests
import pandas
import time
from collections import namedtuple
from tqdm import tqdm

with open('active_true_pub_no.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        num = len(records)
        print("Checking {} records...\n".format(num))

        with tqdm(total=num, bar_format="{percentage:3.0f}% {bar} [{n_fmt}/{total_fmt}]  ", ncols=64) as pbar:
            for r in records:
                pbar.update(1)
                id   = r.ContentItemId
                url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
                req  = requests.get(url, allow_redirects=False)
                code = req.status_code
                w.writerow([id, code])
                # time.sleep(.25)

print ('\nSummary: ')
df = pandas.read_csv("out.csv")
print(df['code'].value_counts())

我用过 pandas' value_counts最后总结结果。

最佳答案

要获得进度条,请使用 TQDM:

数据(来自in.csv):

ContentItemId
21200
21201
21202
21203
21204
21205
21206
...
21296
21297
21298
21299
21300

代码:

from collections import namedtuple
import csv
import requests
from tqdm import tqdm


with open('in.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    #Create output file
    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        count = 1

        with tqdm(total=len(records)) as pbar:
            for r in records:
                pbar.update(1)
                id   = r.ContentItemId
                url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
                req  = requests.get(url, allow_redirects=False)
                code = req.status_code
                w.writerow([id, code])
    print('Complete!')
  • 注意在 for-loop 之前添加 with tqdm(total=len(records)) as pbar:
  • 从控制台运行时,会出现一个进度条,显示完成百分比。
  • enter image description here
  • enter image description here
  • 注意图像的左侧,21/101,这是对记录 列表长度的计数。
    • tqdm 提供百分比进度条和 complete/total
    • 的计数

关于python - 通过 python 中的 namedtuple csv 循环跟踪进度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57948562/

相关文章:

python - 在 Python 中,是否可以从函数内访问全局命名空间

python - 将 numpy.bool 数组写入压缩文件?

python - 在python的openCV中遍历色彩空间

python - 在python中反转频率计数

sql - 累计计数

javascript - 在 ViewPort 中计数

python - 为什么内联交换返回意外结果?

javascript - 从 Repeater QML/JS 创建一个动态数量的 AppCheckBoxes

java - 而循环/java

c++ - 如何通过 Enter (c++) 打破循环?