javascript - 尝试使用 Pandas 从 Selenium 的结果中抓取表格

标签 javascript python selenium

我正在尝试使用 Pandas 从 Javascript 网站上抓取表格。为此,我首先使用 Selenium 到达我想要的页面。我能够以文本格式打印表格(如注释脚本所示),但我也希望能够在 Pandas 中拥有表格。我附上我的脚本如下,我希望有人能帮我解决这个问题。

import time
from selenium import webdriver
import pandas as pd

chrome_path = r"Path to chrome driver"
driver = webdriver.Chrome(chrome_path)
url = 'http://www.bursamalaysia.com/market/securities/equities/prices/#/?
filter=BS02'

page = driver.get(url)
time.sleep(2)


driver.find_element_by_xpath('//*[@id="bursa_boards"]/option[2]').click()


driver.find_element_by_xpath('//*[@id="bursa_sectors"]/option[11]').click()
time.sleep(2)

driver.find_element_by_xpath('//*[@id="bm_equity_price_search"]').click()
time.sleep(5)

target = driver.find_elements_by_id('bm_equities_prices_table')
##for data in target:
##    print (data.text)

for data in target:
    dfs = pd.read_html(target,match = '+')
for df in dfs:
    print (df)  

运行上面的脚本,我得到以下错误:

Traceback (most recent call last):
  File "E:\Coding\Python\BS_Bursa Properties\Selenium_Pandas_Bursa Properties.py", line 29, in <module>
    dfs = pd.read_html(target,match = '+')
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\html.py", line 906, in read_html
    keep_default_na=keep_default_na)
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\html.py", line 728, in _parse
    compiled_match = re.compile(match)  # you can pass a compiled regex here
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 233, in compile
    return _compile(pattern, flags)
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py", line 616, in _parse
    source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 0

我也试过在 url 上使用 pd.read_html,但它返回了“找不到表”的错误。网址是:http://www.bursamalaysia.com/market/securities/equities/prices/#/?filter=BS08&board=MAIN-MKT&sector=PROPERTIES&page=1 .

最佳答案

您可以使用以下代码获取表格

import time
from selenium import webdriver
import pandas as pd

chrome_path = r"Path to chrome driver"
driver = webdriver.Chrome(chrome_path)
url = 'http://www.bursamalaysia.com/market/securities/equities/prices/#/?filter=BS02'

page = driver.get(url)
time.sleep(2)

df = pd.read_html(driver.page_source)[0]
print(df.head())

这是输出

No  Code    Name    Rem Last Done   LACP    Chg % Chg   Vol ('00)   Buy Vol ('00)   Buy Sell    Sell Vol ('00)  High    Low
0   1   5284CB  LCTITAN-CB  s   0.025   0.020   0.005   +25.00  406550  19878   0.020   0.025   106630  0.025   0.015
1   2   1201    SUMATEC [S] s   0.050   0.050   -   -   389354  43815   0.050   0.055   187301  0.055   0.050
2   3   5284    LCTITAN [S] s   4.470   4.700   -0.230  -4.89   367335  430 4.470   4.480   34  4.780   4.140
3   4   0176    KRONO [S]   -   0.875   0.805   0.070   +8.70   300473  3770    0.870   0.875   797 0.900   0.775
4   5   5284CE  LCTITAN-CE  s   0.130   0.135   -0.005  -3.70   292379  7214    0.125   0.130   50  0.155   0.100

要从所有页面获取数据,您可以抓取剩余页面并使用 df.append

关于javascript - 尝试使用 Pandas 从 Selenium 的结果中抓取表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45394374/

相关文章:

python - python 中的打印格式

python - 如何处理paramiko中的[Errno -2]?

html - selenium find_elements_by_tag_name ("img") 会捕获 CSS 图像吗?

javascript - 如何覆盖 Angular 的号码过滤器?

javascript - 从 Canvas 中删除元素

python - 合并 Json 响应 Django python

python - 集成到 for 循环 python 中时,actionchains send_keys 发送的重复文本

java - Selenium,确保选中所有复选框

javascript - JavaScript 中的 numpy.random.choice?

javascript - 将json数据绑定(bind)到kendo grid