python - 如何在Python中获取下载按钮的url并读取CSV文件?

标签 python selenium csv selenium-webdriver web-scraping

我正在使用 Python Google Colab 并尝试从此链接读取 csv 文件:https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history

如果您向下滚动一点,您将能够看到下载按钮。我想通过使用 selenium 或 bs 获取链接并读取 csv 文件。我正在尝试做这样的事情

# install packages
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

# load packages
import pandas as pd
from selenium import webdriver
import sys

# run selenium and read the csv file
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get('https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history')#put here the adress of your page
btn = driver.find_element_by_tag_name('button')
btn.click()
df = pd.read_csv('##.csv')

它似乎一直工作到 btn.click() 部分,但之后出现错误,因为它没有告诉我下载按钮的链接或文件名。您能帮忙吗?我们将不胜感激。

最佳答案

不需要 Selenium 。数据嵌入在<script>中标签。

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

t = 'AAPL'
url = 'https://www.macrotrends.net/assets/php/stock_price_history.php?t={}'.format(t)

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

scripts = soup.find_all('script',{'type':'text/javascript'})
for script in scripts:
    if 'var dataDaily' in str(script):
        jsonStr = '[' + str(script).split('[',1)[-1].split('];')[0] + ']'
        jsonData = json.loads(jsonStr)
        
df = pd.DataFrame(jsonData)
df = df.rename(columns={'o':'open','h':'high','l':'low','c':'close','d':'date','v':'volume'})
df.to_csv('MacroTrends_Data_Download_{}.csv'.format(t), index=False)

输出:

print(df)
             date      open      high  ...   volume     ma50    ma200
0      1980-12-12    0.1012    0.1016  ...  469.034      NaN      NaN
1      1980-12-15    0.0964    0.0964  ...  175.885      NaN      NaN
2      1980-12-16    0.0893    0.0893  ...  105.728      NaN      NaN
3      1980-12-17    0.0910    0.0915  ...   86.442      NaN      NaN
4      1980-12-18    0.0937    0.0941  ...   73.450      NaN      NaN
          ...       ...       ...  ...      ...      ...      ...
10135  2021-02-25  124.6800  126.4585  ...  148.200  131.845  112.241
10136  2021-02-26  122.5900  124.8500  ...  164.560  131.838  112.460
10137  2021-03-01  123.7500  127.9300  ...  116.308  131.840  112.716
10138  2021-03-02  128.4100  128.7200  ...  102.261  131.790  112.957
10139  2021-03-03  124.8100  125.7100  ...  111.514  131.661  113.184

[10140 rows x 8 columns]

关于python - 如何在Python中获取下载按钮的url并读取CSV文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66472332/

相关文章:

python - 图像大小(Python、OpenCV)

Python:打印 Pandas dataframe 返回 numpy.ndarray 属性错误

javascript - Selenium IsElementPresent 在 IE 中不起作用

java - 您可以将 WebElement 转换为 WebDriver 吗?

javascript - 如何在点击时安装组件?

json - 将嵌套的 JSON 字典列表写入 CSV

python - 无法弄清楚 python selenium webdriver move_to_element 功能

python - python 中不区分大小写的选择

python - 无法使用 python 中的 selenium webdriver 发送 key

Python - 如何像行一样读取/解析 csv?