python - 在 CentOS 8 上使用 crontab 时是否有用于 HTTP 请求的页面缓存?

标签 python http caching python-requests centos

问题及输出

使用我的 python 脚本时,我从返回 json 的 API 调用中得到了相同的响应,并且它似乎同时发生了几个小时。

我正在使用 Coindesk BPI API ,它每分钟更新一次。所以我们知道,比特币的价格不会在 5 个小时内保持稳定。请参见下面的输出示例:

    # results.txt
    {"timestamp": 16-Apr-2020 22:50, "price": 7078, "gCount": 28, "rCount": 48}
    {"timestamp": 16-Apr-2020 23:00, "price": 7085, "gCount": 29, "rCount": 50}
    {"timestamp": 16-Apr-2020 23:10, "price": 7011, "gCount": 33, "rCount": 52}
    {"timestamp": 16-Apr-2020 23:20, "price": 7002, "gCount": 31, "rCount": 55}
    {"timestamp": 16-Apr-2020 23:30, "price": 7020, "gCount": 30, "rCount": 52}
    {"timestamp": 16-Apr-2020 23:40, "price": 7027, "gCount": 33, "rCount": 54}
    {"timestamp": 16-Apr-2020 23:50, "price": 7047, "gCount": 35, "rCount": 58}
    {"timestamp": 17-Apr-2020 00:01, "price": 7060, "gCount": 36, "rCount": 57}
    {"timestamp": 17-Apr-2020 00:10, "price": 7051, "gCount": 34, "rCount": 45}
    {"timestamp": 17-Apr-2020 00:20, "price": 7052, "gCount": 41, "rCount": 48}
    {"timestamp": 17-Apr-2020 00:31, "price": 7054, "gCount": 47, "rCount": 48}
    # It worked! Now the price is stuck for 2 get requests.
    {"timestamp": 17-Apr-2020 00:40, "price": 7054, "gCount": 48, "rCount": 47}
    {"timestamp": 17-Apr-2020 00:50, "price": 7054, "gCount": 50, "rCount": 48}
    {"timestamp": 17-Apr-2020 01:01, "price": 7051, "gCount": 48, "rCount": 43}
    # Price stuck again for around 30 get requests.
    {"timestamp": 17-Apr-2020 01:10, "price": 7051, "gCount": 46, "rCount": 47}
    {"timestamp": 17-Apr-2020 01:20, "price": 7051, "gCount": 49, "rCount": 46}
    {"timestamp": 17-Apr-2020 01:30, "price": 7051, "gCount": 48, "rCount": 47}
    {"timestamp": 17-Apr-2020 01:40, "price": 7051, "gCount": 50, "rCount": 48}
    {"timestamp": 17-Apr-2020 01:50, "price": 7051, "gCount": 50, "rCount": 52}
    {"timestamp": 17-Apr-2020 02:00, "price": 7051, "gCount": 51, "rCount": 56}
    {"timestamp": 17-Apr-2020 02:10, "price": 7051, "gCount": 50, "rCount": 55}
    {"timestamp": 17-Apr-2020 02:20, "price": 7051, "gCount": 57, "rCount": 57}
    {"timestamp": 17-Apr-2020 02:30, "price": 7051, "gCount": 48, "rCount": 54}
    {"timestamp": 17-Apr-2020 02:40, "price": 7051, "gCount": 52, "rCount": 54}
    {"timestamp": 17-Apr-2020 02:51, "price": 7051, "gCount": 54, "rCount": 57}
    {"timestamp": 17-Apr-2020 03:00, "price": 7051, "gCount": 53, "rCount": 59}
    {"timestamp": 17-Apr-2020 03:11, "price": 7051, "gCount": 53, "rCount": 59}
    {"timestamp": 17-Apr-2020 03:21, "price": 7051, "gCount": 50, "rCount": 55}
    {"timestamp": 17-Apr-2020 03:31, "price": 7051, "gCount": 51, "rCount": 55}
    {"timestamp": 17-Apr-2020 03:41, "price": 7051, "gCount": 52, "rCount": 56}
    {"timestamp": 17-Apr-2020 03:51, "price": 7051, "gCount": 50, "rCount": 55}
    {"timestamp": 17-Apr-2020 04:01, "price": 7051, "gCount": 48, "rCount": 56}
    {"timestamp": 17-Apr-2020 04:10, "price": 7051, "gCount": 39, "rCount": 50}
    {"timestamp": 17-Apr-2020 04:20, "price": 7051, "gCount": 39, "rCount": 49}
    {"timestamp": 17-Apr-2020 04:31, "price": 7051, "gCount": 41, "rCount": 53}
    {"timestamp": 17-Apr-2020 04:40, "price": 7051, "gCount": 43, "rCount": 53}
    {"timestamp": 17-Apr-2020 04:50, "price": 7051, "gCount": 39, "rCount": 51}
    {"timestamp": 17-Apr-2020 05:00, "price": 7051, "gCount": 37, "rCount": 52}
    {"timestamp": 17-Apr-2020 05:11, "price": 7051, "gCount": 38, "rCount": 54}
    {"timestamp": 17-Apr-2020 05:20, "price": 7051, "gCount": 31, "rCount": 49}
    {"timestamp": 17-Apr-2020 05:30, "price": 7051, "gCount": 0, "rCount": 0}
    {"timestamp": 17-Apr-2020 05:41, "price": 7051, "gCount": 32, "rCount": 49}
    {"timestamp": 17-Apr-2020 05:50, "price": 7051, "gCount": 37, "rCount": 49}
    {"timestamp": 17-Apr-2020 06:01, "price": 7051, "gCount": 39, "rCount": 51}
    {"timestamp": 17-Apr-2020 06:11, "price": 7051, "gCount": 41, "rCount": 47}
    {"timestamp": 17-Apr-2020 06:21, "price": 7051, "gCount": 42, "rCount": 46}
    # Now it works again as intended.
    {"timestamp": 17-Apr-2020 06:31, "price": 7082, "gCount": 45, "rCount": 49}
    {"timestamp": 17-Apr-2020 06:40, "price": 7084, "gCount": 48, "rCount": 50}
    {"timestamp": 17-Apr-2020 06:51, "price": 7095, "gCount": 45, "rCount": 51}
    {"timestamp": 17-Apr-2020 07:01, "price": 7097, "gCount": 44, "rCount": 45}
    {"timestamp": 17-Apr-2020 07:11, "price": 7068, "gCount": 45, "rCount": 46}
    {"timestamp": 17-Apr-2020 07:21, "price": 7070, "gCount": 43, "rCount": 45}

Python 脚本和我尝试过的

我正在使用 python 2.7 和请求。默认情况下,请求不缓存。所以我认为连接只是随机保持,python 重用它,得到相同的 json。

我试图通过设置 keep alive to falseusing the with block 和尝试 requests.session().close() 来关闭请求 session 。下面找到相关的python代码:

import requests, json, sys, time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


def request_json():
    print 'Begin request to get the json...'

    # Try get request once
    response = requests_retry_session().get('https://api.coindesk.com/v1/bpi/currentprice/USD.json')
    if (response.status_code == 200):
        # Close the connection 
        # requests.session().close() <-- tried, doesn't do the trick
        print 'Fetched price succesfully.\n'
        return response.json()

    # If first request didn't succeed, retry 3 times using session 
    with requests.Session() as s:
        s.get('https://api.coindesk.com/v1/bpi/currentprice/USD.json')
        # Close the connection
        # s.config['keep_alive'] = False <-- tried, doesn't do the trick
        response = requests_retry_session(session=s).get(
            'https://api.coindesk.com/v1/bpi/currentprice/USD.json'
        )

    # When requests succeed using session
    if (response.status_code == 200):
        # Close the connection 
        # requests.session().close() <-- tried, doesn't do the trick
        print 'Fetched price succesfully.\n'
        return response.json()

    print 'Couldn\'t fetch price json.'
    return 'error'


def requests_retry_session(
    retries=3,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
    session=None,
):

    session = session or requests.Session()
    retry = Retry(
        total=retries,
        read=retries,
        connect=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
    )

    adapter = HTTPAdapter(max_retries=retry)
    session.mount('https://', adapter)

    return session




def get_price_data(json):

    price = str(json['bpi']['USD']['rate'])
    # Strip the ',' from price, convert to float and to int
    price = int(float(price.replace(',', '')))

    return price




def main():
    # Send a request for the bitcoin price json
    priceJson = request_json()
    # Check if the request and retries failed
    if (json == 'error'):
        print 'Terminating bitcoinPrice.py script.'
        sys.exit()

    # Get the data from the response json
    priceInt = get_price_data(priceJson)

    # Get timestamp as milliseconds
    milli_sec = int(round(time.time() * 1000))

    # Read the colordata from colors.txt
    # The format is: '63,61' where greenCount,redCount
    fh = open('colors.txt', 'r')
    colorData = fh.read()
    gCount = colorData.split(',')[0]
    rCount = colorData.split(',')[1]

    # Create a string in json format with the price and color data
    dataString = "{\"timestamp\": \"%d\", \"price\": \"%d\", \"gCount\": \"%s\", \"rCount\": \"%s\"}" % (milli_sec, priceInt, gCount, rCount)
    print dataString

    # Read and write to results.txt
    fh = open('results/results.txt', 'a')
    fh.write(dataString + '\n')
    fh.close()
    print '\nSuccesfully saved BTC price and color data to results.txt'




if __name__ == '__main__':
    main()

我无法通过以普通用户身份每分钟运行 crontab 并且仅使用此 bitcoinPrice.py 脚本来重现该错误。

该错误似乎发生在我的根 crontab 每 10 分钟运行一次,而其他几个脚本在此之前运行。由 root 用户运行的实际 crontab 简化了其他脚本,如下所示:
*/10 * * * * node script1.js && python2 script2.py && python2 bitcoinPrice.py && /home/user/clearcache.sh

所有其他脚本都按预期工作。最后一个脚本 clearcache.sh 以下列方式重置缓存和缓冲区,如 here 所述:
#!/bin/sh
sync; echo 3 > /proc/sys/vm/drop_caches

我想了解这个错误是怎么回事。如果我找不到解决方案,我将开始使用 curl 并将 API json 响应转储到文件中并从那里读取它。任何想法表示赞赏!

最佳答案

我设法解决了它。定期使用 curl 仍然存在相同的问题,但我使用了这个答案(https://stackoverflow.com/a/42263514/12965126)技巧并为每个请求添加了一个唯一的查询参数(以毫秒为单位的纪元时间)?$(日期 +%s)。
curl https://api.coindesk.com/v1/bpi/currentprice/USD.json?$(date +%s) -o results/priceJson.txt
...它可以在没有任何缓存的情况下工作。现在也可以使用相同的技巧处理 python 请求。

关于python - 在 CentOS 8 上使用 crontab 时是否有用于 HTTP 请求的页面缓存?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61286085/

相关文章:

javascript - 尝试通过 Steam 身份验证 : stumped with RSA ecnryption (js to python)

java - Chrome : A html space is replacing %20 to %2520

javascript - 函数内部的 AJAX 响应缓存/onclick-event

php - 在服务器端存储复选框值

python - 如何检查字符串是否包含方括号内的数字

Python声明多变量混淆

python - 想要使用 Python3 从互联网上读取文件的特定偏移量

apache - htaccess 文件规则

ruby-on-rails - 缓存特定的部分 rails 3.0.x

python - Kivy:分散内部分散