我遇到了缓存过滤器问题。
这个想法是不缓存包含"incomplete_result":true
的响应
这是我的过滤功能:
import requests
import requests_cache
def phrase_filter(response: requests.models.Response)->bool:
if '"incomplete_results":true' in response.text:
return False
return True
但是当我用这段代码测试它时:
requests_cache.install_cache('demo_cache',expired_after=600,filter_fn=phrase_filter)
requests_cache.clear()
url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'
with requests_cache.enabled():
r = requests.get(url1) # First request
r = requests.get(url1) # Second request
print(f'Text from url1:\n{r.text}')
assert r.from_cache==True
#
r1 = requests.get(url2) # First request
r1 = requests.get(url2) # Second request
print('---')
print(f'Text from url2:\n{r1.text}')
assert r1.from_cache==False
requests_cache.disabled()
结果如下:
Text from url1:
abc
xyz
"incomplete_results":false
---
Text from url2:
abc
xyz
"incomplete_results":true
Traceback (most recent call last):
File "C:\Users\ADMIN\source\repos\LearningPython\py_2\py_2.py", line 25, in <module>
assert r1.from_cache==False
AssertionError
我不明白为什么 r1
被缓存。
有什么问题吗?我该如何解决它?
感谢您抽出宝贵的时间回答
最佳答案
修补
看来你已经快到了! requests_cache.enabled()
和 disabled()
是 install_cache()
和 uninstall_cache()
的 contextmanager 替代品。只需将您的设置传递给 enabled()
而不是 install_cache()
:
with requests_cache.enabled('demo_cache', expire_after=600, filter_fn=phrase_filter):
# ... make requests
这基本上与:
requests_cache.install_cache('demo_cache', expire_after=600, filter_fn=phrase_filter)
# ... make requests
requests_cache.uninstall_cache()
session
我个人建议使用 requests_cache.CachedSession
而不是修补方法,因为它使缓存的内容更加明确,如果您想发出非缓存请求,您可以使用常规的请求函数。这里的文档中有更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/general.html
示例:
from requests import Response
from requests_cache import CachedSession
def phrase_filter(response: Response) -> bool:
return '"incomplete_results":true' not in response.text
url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'
session = CachedSession('demo_cache', expire_after=600, filter_fn=phrase_filter)
session.cache.clear()
nonfiltered_response = session.get(url1)
nonfiltered_response = session.get(url1)
assert nonfiltered_response.from_cache is True
filtered_response = session.get(url2)
filtered_response = session.get(url2)
assert filtered_response.from_cache is False
调试
如果您将来遇到类似问题,并且不确定响应是否被缓存的原因,您可以启用调试日志记录:
import logging
logging.basicConfig(level='DEBUG')
您将获得每个响应的缓存信息,如下所示:
DEBUG:requests_cache.session: Pre-cache checks for response from https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt:
{
'disabled cache': False,
'disabled method': False,
'disabled status': False,
'disabled by filter': True,
'disabled by headers or expiration params': False,
}
此处文档中的更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/troubleshooting.html
关于python - 如何使用缓存过滤器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69535256/