python - 从表中仅抓取具有今天日期的行

我无法过滤 table[3] 的结果以仅包含其中包含今天日期的行。我使用这个 url 作为我的数据源:

http://tides.mobilegeographics.com/locations/3881.html

我可以取回所有数据，但我的过滤不起作用。我得到了整个系列，5 天前。我只想要这样的东西:(当天)

Montauk Point, Long Island Sound, New York
41.0717° N, 71.8567° W

2014-03-13 12:37 PM EDT   0.13 feet  Low Tide
2014-03-13  6:51 PM EDT   Sunset
2014-03-13  7:13 PM EDT   2.30 feet  High Tide

我怎样才能得到这个，然后计算潮汐是否在接下来的 40 分钟内进/退。

感谢您的帮助。

我的代码是:

import sre, urllib2, sys, BaseHTTPServer, datetime, re, time, pprint, smtplib
from bs4 import BeautifulSoup
from bs4.diagnose import diagnose

data = urllib2.urlopen('http://tides.mobilegeographics.com/locations/3881.html').read()
day = datetime.date.today().day
month = datetime.date.today().month

year = datetime.date.today().year
date = datetime.date.today()
soup = BeautifulSoup(data)

keyinfo = soup.find_all('h2')
str_date = datetime.date.today().strftime("%Y-%m-%d")
time_text = datetime.datetime.now() + datetime.timedelta(minutes = 20)

t_day = time_text.strftime("%Y-%m-%d")
tide_table = soup.find_all('table')[3]
pre = tide_table.findAll('pre')

dailytide = []
pattern = str_date
allmatches = re.findall(r'pattern', pre)
print allmatches

if allmatches:
    print allmatches
else:
    print "Match for " + str_date + " not found in data string \n" + datah

最佳答案

您不需要正则表达式，只需拆分一个pre 标记的内容并检查今天的日期是否在该行中:

import urllib2
import datetime
from bs4 import BeautifulSoup


URL = 'http://tides.mobilegeographics.com/locations/3881.html'
soup = BeautifulSoup(urllib2.urlopen(URL))
pre = soup.find_all('table')[3].find('pre').text

today = datetime.date.today().strftime("%Y-%m-%d")
for line in pre.split('\n'):
    if today in line:
        print line

打印:

2014-03-13  6:52 PM EDT   Sunset
2014-03-13  7:13 PM EDT   2.30 feet  High Tide

关于python - 从表中仅抓取具有今天日期的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22384999/

python - 从表中仅抓取具有今天日期的行

上一篇：javascript - 如何找到显示为无的 HTML 元素的实际高度或宽度

下一篇：javascript - 将 div 标签扩展到页面的长度