python - 如何使用缓存过滤器?

标签 python caching browser-cache

我遇到了缓存过滤器问题。

这个想法是不缓存包含"incomplete_result":true的响应

这是我的过滤功能:

import requests
import requests_cache

def phrase_filter(response: requests.models.Response)->bool:
    if '"incomplete_results":true' in response.text:
        return False
    return True

但是当我用这段代码测试它时:

requests_cache.install_cache('demo_cache',expired_after=600,filter_fn=phrase_filter)
requests_cache.clear()

url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'

with requests_cache.enabled():
    r = requests.get(url1) # First request
    r = requests.get(url1) # Second request
    print(f'Text from url1:\n{r.text}')
    assert r.from_cache==True
    #
    r1 = requests.get(url2) # First request
    r1 = requests.get(url2) # Second request
    print('---')
    print(f'Text from url2:\n{r1.text}')
    assert r1.from_cache==False

requests_cache.disabled()

结果如下:

Text from url1:
abc
xyz
"incomplete_results":false

---
Text from url2:
abc
xyz
"incomplete_results":true

Traceback (most recent call last):
  File "C:\Users\ADMIN\source\repos\LearningPython\py_2\py_2.py", line 25, in <module>
    assert r1.from_cache==False
AssertionError

我不明白为什么 r1 被缓存。

有什么问题吗?我该如何解决它?

感谢您抽出宝贵的时间回答

最佳答案

修补

看来你已经快到了! requests_cache.enabled()disabled()install_cache()uninstall_cache() 的 contextmanager 替代品。只需将您的设置传递给 enabled() 而不是 install_cache():

with requests_cache.enabled('demo_cache', expire_after=600, filter_fn=phrase_filter):
    # ... make requests

这基本上与:

requests_cache.install_cache('demo_cache', expire_after=600, filter_fn=phrase_filter)
# ... make requests
requests_cache.uninstall_cache()

session

我个人建议使用 requests_cache.CachedSession 而不是修补方法,因为它使缓存的内容更加明确,如果您想发出非缓存请求,您可以使用常规的请求函数。这里的文档中有更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/general.html

示例:

from requests import Response
from requests_cache import CachedSession

def phrase_filter(response: Response) -> bool:
    return '"incomplete_results":true' not in response.text

url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'
session = CachedSession('demo_cache', expire_after=600, filter_fn=phrase_filter)
session.cache.clear()

nonfiltered_response = session.get(url1)
nonfiltered_response = session.get(url1)
assert nonfiltered_response.from_cache is True

filtered_response = session.get(url2)
filtered_response = session.get(url2)
assert filtered_response.from_cache is False

调试

如果您将来遇到类似问题,并且不确定响应是否被缓存的原因,您可以启用调试日志记录:

import logging
logging.basicConfig(level='DEBUG')

您将获得每个响应的缓存信息,如下所示:

DEBUG:requests_cache.session: Pre-cache checks for response from https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt: 
{
    'disabled cache': False,
    'disabled method': False,
    'disabled status': False,
    'disabled by filter': True,
    'disabled by headers or expiration params': False,
}

此处文档中的更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/troubleshooting.html

关于python - 如何使用缓存过滤器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69535256/

相关文章:

python - 如何将字节字符串拆分为单独的部分

Python Anaconda 没有模块名称_imagingtk

Angular 缓存通过拦截器变得简单,但不是针对每个请求,仅通过 bool 值

html - IE表单输入数据在浏览器刷新后消失

javascript - 在javascript中清除浏览器关闭事件的缓存?

python - 在连续两行中打印字符串匹配之前的行

python - 一个月的一周 Pandas

c - 运行没有缓存的程序

php - 利用浏览器缓存,如何使用 apache 或 .htaccess?

java - 如何避免用户注销后的缓存?