from bs4 import BeautifulSoup
import urllib
import re
soup = urllib.urlopen("http://atlanta.craigslist.org/cto/")
soup = BeautifulSoup(soup)
souped = soup.p
print souped
m = re.search("\\$.",souped)
print m.group(0)
我可以很好地下载并打印出 html,但是当我添加最后两行时它总是中断。
我收到这个错误:
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 323, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 655, in run
exec cmd in globals, locals
File "C:\Users\Zack\Documents\Scripto.py", line 1, in <module>
from bs4 import BeautifulSoup
File "C:\Python27\lib\re.py", line 142, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
非常感谢!
最佳答案
您可能需要 re.search("\\$.", str(souped))
。
关于python - 为什么 Python 正则表达式不能处理格式化的 HTML 字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9446260/