我无法过滤 table[3]
的结果以仅包含其中包含今天日期的行。我使用这个 url 作为我的数据源:
http://tides.mobilegeographics.com/locations/3881.html
我可以取回所有数据,但我的过滤不起作用。我得到了整个系列,5 天前。我只想要这样的东西:(当天)
Montauk Point, Long Island Sound, New York
41.0717° N, 71.8567° W
2014-03-13 12:37 PM EDT 0.13 feet Low Tide
2014-03-13 6:51 PM EDT Sunset
2014-03-13 7:13 PM EDT 2.30 feet High Tide
我怎样才能得到这个,然后计算潮汐是否在接下来的 40 分钟内进/退。
感谢您的帮助。
我的代码是:
import sre, urllib2, sys, BaseHTTPServer, datetime, re, time, pprint, smtplib
from bs4 import BeautifulSoup
from bs4.diagnose import diagnose
data = urllib2.urlopen('http://tides.mobilegeographics.com/locations/3881.html').read()
day = datetime.date.today().day
month = datetime.date.today().month
year = datetime.date.today().year
date = datetime.date.today()
soup = BeautifulSoup(data)
keyinfo = soup.find_all('h2')
str_date = datetime.date.today().strftime("%Y-%m-%d")
time_text = datetime.datetime.now() + datetime.timedelta(minutes = 20)
t_day = time_text.strftime("%Y-%m-%d")
tide_table = soup.find_all('table')[3]
pre = tide_table.findAll('pre')
dailytide = []
pattern = str_date
allmatches = re.findall(r'pattern', pre)
print allmatches
if allmatches:
print allmatches
else:
print "Match for " + str_date + " not found in data string \n" + datah
最佳答案
您不需要正则表达式,只需拆分一个pre
标记的内容并检查今天的日期是否在该行中:
import urllib2
import datetime
from bs4 import BeautifulSoup
URL = 'http://tides.mobilegeographics.com/locations/3881.html'
soup = BeautifulSoup(urllib2.urlopen(URL))
pre = soup.find_all('table')[3].find('pre').text
today = datetime.date.today().strftime("%Y-%m-%d")
for line in pre.split('\n'):
if today in line:
print line
打印:
2014-03-13 6:52 PM EDT Sunset
2014-03-13 7:13 PM EDT 2.30 feet High Tide
关于python - 从表中仅抓取具有今天日期的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22384999/