python - 使用 selenium Python 检索 Google 趋势数据中的所有元素

我正在尝试编写一个 Python 程序来从 Google Trends (GT) 收集数据 - 具体来说，我想自动打开 URL 并访问标题中显示的特定值。我已经编写了代码，并且能够成功地抓取数据。但我比较了代码返回的数据和网址中存在的数据，结果仅返回了部分。例如在下图中，代码返回第一个标题“Manchester United F.C. • Tottenham Hotspur F.C.”但实际网站有4个结果“曼联·托特纳姆热刺、国际冠军杯、曼彻斯特 ”。 google trends image

screenshot output of code

我们目前已尝试了页面中所有可能的定位元素，但我们仍然无法资助解决此问题。我们不想为此使用 scrapy 或 beautiful soup

    import pandas as pd
    import requests
    import re
    from bs4 import BeautifulSoup
    import time
    from selenium import webdriver

    links=["https://trends.google.com/trends/trendingsearches/realtime?geo=DE&category=s"] 

    for link in links:
        Title_temp=[]
        Titile=''
        seleniumDriver = r"C:/Users/Downloads/chromedriver_win32/chromedriver.exe" 
        chrome_options = Options()
        brow = webdriver.Chrome(executable_path=seleniumDriver, chrome_options=chrome_options)
        try:
            brow.get(link) ## getting the url
            try:
                content = brow.find_elements_by_class_name("details-top")
                for element in content:
                    Title_temp.append(element.text)    
                Title=' '.join(Title_temp)
            except:
                Title=''       
            brow.quit()

        except Exception as error:
            print error
            break

    Final_df = pd.DataFrame(
        {'Title': Title_temp
        })

最佳答案

据我所知，数据是从可以直接调用的 API 端点检索的。我展示了如何调用，然后仅提取标题(请注意，除了 API 调用中的标题之外，还会返回更多信息)。您可以探索返回内容的广度(包括文章片段、网址、图像链接等)here .

import requests
import json

r = requests.get('https://trends.google.com/trends/api/realtimetrends?hl=en-GB&tz=-60&cat=s&fi=0&fs=0&geo=DE&ri=300&rs=20&sort=0')
data = json.loads(r.text[5:])
titles = [story['title'] for story in data['storySummaries']['trendingStories']]
print(titles)

关于python - 使用 selenium Python 检索 Google 趋势数据中的所有元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57205181/

python - 使用 selenium Python 检索 Google 趋势数据中的所有元素

上一篇：python - 如何将 Python 变量输入到 AWS Lambda 中的一段 xml 代码中？

下一篇：python - 如何通过id抓取文本