html - 无法获取表格数据 - HTML

标签 html python-2.7 beautifulsoup

我正在尝试从以下位置获取“收入公告表”:https://www.zacks.com/stock/research/amzn/earnings-announcements

我正在使用不同的 beautifulsoup 选项,但没有一个得到表格。

table = soup.find('table', attrs={'class': 'earnings_announcements_earnings_table'})

table = soup.find_all('table')

当我检查表格时,表格的元素就在那里。

我正在粘贴我为表格获取的部分代码(js、json?)。

document.obj_data = {
"earnings_announcements_earnings_table"   : 
         [  [ "10/26/2017", "9/2017", "$0.06", "--", "--", "--", "--" ] ,  [ "7/27/2017", "6/2017", "$1.40", "$0.40", "<div class=\"right neg negative neg_icon showinline down\">-1.00</div>", "<div class=\"right neg negative neg_icon showinline down\">-71.43%</div>", "After Close" ] ,  [ "4/27/2017", "3/2017", "$1.03", "$1.48", "<div class=\"right pos positive pos_icon showinline up\">+0.45</div>", "<div class=\"right pos positive pos_icon showinline up\">+43.69%</div>", "After Close" ] ,  [ "2/2/2017", "12/2016", "$1.40", "$1.54", "<div class=\"right pos positive pos_icon showinline up\">+0.14</div>", "<div class=\"right pos positive pos_icon showinline up\">+10.00%</div>", "After Close" ] ,  [ "10/27/2016", "9/2016", "$0.85", "$0.52", "<div class=\"right neg negative neg_icon showinline down\">-0.33</div>", "<div class=\"right neg negative neg_icon showinline down\">-38.82%</div>", "After Close" ] ,  [ "7/28/2016", "6/2016", "$1.14", "$1.78", "<div class=\"right pos positive pos_icon showinline up\">+0.64</div>", "<div class=\"right pos positive pos_icon showinline up\">+56.14%</div>", "After Close" ] ,  [ "4/28/2016", "3/2016", "$0.61", "$1.07", "<div class=\"right pos positive pos_icon showinline up\">+0.46</div>", "<div class=\"right pos positive pos_icon showinline up\">+75.41%</div>", "After Close" ] ,  [ "1/28/2016", "12/2015", "$1.61", "$1.00", "<div class=\"right neg negative neg_icon showinline down\">-0.61</div>", "<div class=\"right neg negative neg_icon showinline down\">-37.89%</div>", "After Close" ] ,  [ "10/22/2015", "9/2015", "-$0.1", "$0.17", "<div class=\"right pos positive pos_icon showinline up\">+0.27</div>", "<div class=\"right pos positive pos_icon showinline up\">+270.00%</div>", "After Close" ] ,  [ "7/23/2015", "6/2015", "-$0.15", "$0.19", "<div class=\"right pos positive pos_icon showinline up\">+0.34</div>", "<div class=\"right pos positive pos_icon showinline up\">+226.67%</div>", "After Close" ] ,  [ "4/23/2015", "3/2015", "-$0.13", "-$0.12", "<div class=\"right pos positive pos_icon showinline up\">+0.01</div>", "<div class=\"right pos positive pos_icon showinline up\">+7.69%</div>", "After Close" ] ,  [ "1/29/2015", "12/2014", "$0.24", "$0.45", "<div class=\"right pos positive pos_icon showinline up\">+0.21</div>", "<div class=\"right pos positive pos_icon showinline up\">+87.50%</div>", "After Close" ] ,  [ "10/23/2014", "9/2014", "-$0.73", "-$0.95", "<div class=\"right neg negative neg_icon showinline down\">-0.22</div>", "<div class=\"right neg negative neg_icon showinline down\">-30.14%</div>", "After Close" ] ,  [ "7/24/2014", "6/2014", "-$0.13", "-$0.27", "<div class=\"right neg negative neg_icon showinline down\">-0.14</div>", "<div class=\"right neg negative neg_icon showinline down\">-107.69%</div>", "After Close" ] ,  [ "4/24/2014", "3/2014", "$0.22", "$0.23", "<div class=\"right pos positive pos_icon showinline up\">+0.01</div>", "<div class=\"right pos positive pos_icon showinline up\">+4.55%</div>", "After Close" ] ,  [ "1/30/2014", "12/2013", "$0.68", "$0.51", "<div class=\"right neg negative neg_icon showinline down\">-0.17</div>", "<div class=\"right neg negative neg_icon showinline down\">-25.00%</div>", "After Close" ] ,  [ "10/24/2013", "9/2013", "-$0.09", "-$0.09", "<div class=\"right pos_na showinline\">0.00</div>", "<div class=\"right pos_na showinline\">0.00%</div>", "After Close" ] ,  [ "7/25/2013", "6/2013", "$0.04", "-$0.02", "<div class=\"right neg negative neg_icon showinline down\">-0.06</div>", "<div class=\"right neg negative neg_icon showinline down\">-150.00%</div>", "After Close" ] ,  [ "4/25/2013", "3/2013", "$0.10", "$0.18", "<div class=\"right pos positive pos_icon showinline up\">+0.08</div>", "<div class=\"right pos positive pos_icon showinline up\">+80.00%</div>", "After Close" ] ,  [ "1/29/2013", "12/2012", "$0.28", "$0.21", "<div class=\"right neg negative neg_icon showinline down\">-0.07</div>", "<div class=\"right neg negative neg_icon showinline down\">-25.00%</div>", "After Close" ] ,  [ "10/25/2012", "9/2012", "-$0.08", "-$0.23", "<div class=\"right neg negative neg_icon showinline down\">-0.15</div>", "<div class=\"right neg negative neg_icon showinline down\">-187.50%</div>", "After Close" ] ,  [ "7/26/2012", "6/2012", "--", "--", "--", "--", "After Close" ] ,  [ "4/26/2012", "3/2012", "--", "--", "--", "--", "After Close" ] ,  [ "1/31/2012", "12/2011", "--", "--", "--", "--", "After Close" ] ,  [ "10/25/2011", "9/2011", "--", "--", "--", "--", "After Close" ] ,  [ "7/26/2011", "6/2011", "--", "--", "--", "--", "After Close" ] ,  [ "4/26/2011", "3/2011", "--", "--", "--", "--", "--" ] ,  [ "1/27/2011", "12/2010", "--", "--", "--", "--", "After Close" ] ,  [ "10/21/2010", "9/2010", "--", "--", "--", "--", "After Close" ] ,  [ "7/22/2010", "6/2010", "--", "--", "--", "--", "After Close" ] ,  [ "4/22/2010", "3/2010", "--", "--", "--", "--", "After Close" ] ,  [ "1/28/2010", "12/2009", "--", "--", "--", "--", "After Close" ] ,  [ "10/22/2009", "9/2009", "--", "--", "--", "--", "After Close" ] ,  [ "7/23/2009", "6/2009", "--", "--", "--", "--", "After Close" ]  ]

我怎样才能得到这张 table ? 谢谢!

最佳答案

所以解决方案是使用 Python 的字符串和 RegExp 函数而不是 BeautifulSoup 来解析整个 HTML 文档,因为我们不是试图从 HTML 标签中获取数据,而是希望将它们放入 JS 代码中。

所以这段代码基本上是获取“earnings_announcements_earnings_table”中的 JS 数组,由于 JS 数组与 Python 的列表结构相同,我只是使用 ast 解析它。结果是一个您可以循环进入的列表,它显示了表格所有页面的所有数据。

import urllib2
import re
import ast

user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'}
req = urllib2.Request('https://www.zacks.com/stock/research/amzn/earnings-announcements', None, user_agent)
source = urllib2.urlopen(req).read()

compiled = re.compile('"earnings_announcements_earnings_table"\s+\:', flags=re.IGNORECASE | re.DOTALL)
match = re.search(compiled, source)
if match:
    source = source[match.end(): len(source)]

compiled = re.compile('"earnings_announcements_webcasts_table"', flags=re.IGNORECASE | re.DOTALL)
match = re.search(compiled, source)
if match:
    source = source[0: match.start()]

result = ast.literal_eval(str(source).strip('\r\n\t, '))
print result

如果您需要说明,请告诉我。

关于html - 无法获取表格数据 - HTML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46024536/

相关文章:

python - 具有多个参数的多处理在 Python 2.7 中运行

Python/BeautifulSoup : How to look directly beneath a code comment?

javascript - 从网站构建 html/css/javascript 代码以提高可读性

html - Vue 组件返回多个表行

javascript - 是否可以在没有 ActiveX 的情况下运行 JS/html 中的批处理文件?

python - (错误)理解生成器

python - 值未通过 tkinter 中的按钮传递

python - 用 BeautifulSoup 和多个段落进行抓取

python - key 错误 : -1 when appending new tag to soup in bs4

html - 固定菜单在 IE 上移动