javascript - Python/BeautifulSoup 与 JavaScript 源

标签 javascript python beautifulsoup

首先,我是 Python 和 BeautifulSoup 的新手。如果我使用了错误的术语,请原谅我。

我遇到一个问题,当我检查元素时,我能够找到它,但是当我转到“查看源代码”时,它不在那里,并且似乎数据是通过 javascript 提取的,因此它可能是动态的。

我的问题是,如何合并由 javascript“上传”的数据(源/元素/标签)?

到目前为止,我有下面的代码。我无法获取每个“搜索”的 URL

import urllib
import urllib.request
from bs4 import BeautifulSoup
import csv

rootURL="http://www.homestead.ca"

def HomeStead2(URL):
    thePage = urllib.request.urlopen(URL)
    soup = BeautifulSoup(thePage, "html.parser")
    return soup

soup = HomeStead2(rootURL)

for dropdownlist in soup.find("ul", {"class":"nav navbar-nav primary"}).find('ul').findAll('a'):

"""NOTHING IS WORKING FROM HERE ONWARDS WHEN I TRY TO GET THE HREF"""
    citySoup = HomeStead2(rootURL + dropdownlist.get('href'))
    for btnPreview in citySoup.find("div", {"class":"search extended-search"}).findAll('li'):
        try:
            for ApartmentLink in btnPreview.findAll("div", {"class":"property-container"}):
                print(ApartmentLink)
        except:
            print('skip')

enter image description here

最佳答案

您可以在没有 selenium 的情况下完成这一切,一旦您访问每个公寓 url,就会从对 api 的 ajax 调用中检索数据,我们需要的只是城市 ID:

from bs4 import BeautifulSoup
from urllib.parse import urljoin

root = "http://www.homestead.ca"

data = {'keyword': 'false', 'max_bed': '100', 'geocode': '',
        'min_rate': '0', 'offset': '0', 'max_rate': '4000',
        'show_custom_fields': 'true', 'limit': '50', ''
                                                     'pet_friendly': '', 'city_id': '', 'amenities': '',
        'client_id': '6', 'max_bath': '10',
        'auth_token': 'sswpREkUtyeYjeoahA2i',
        'count': 'false', 'min_bath': '0',
        'order': 'max_rate ASC, min_rate ASC, min_bed ASC, max_bath ASC',
        'city_ids': '', 'region': '',
        'property_types': 'low-rise-apartment,mid-rise-apartment,high-rise-apartment,luxury-apartment,townhouse,house,multi-unit-house,single-family-home,duplex,tripex,semi',
        'min_bed': '-1',
        'show_promotions': 'true'}

get = "http://api.theliftsystem.com/v2/search"
with requests.Session() as s:
    r = s.get(root)
    soup = BeautifulSoup(r.content, "lxml")
    lis = soup.select("ul.child-pages.dropdown-menu li")
    for li in lis:
        city_id = li["data-city-id"]
        data["city_id"] = city_id
        p = s.get(get, params=data)
        print(p.json())

您可以修改数据以匹配您想要的任何查询。

输出将采用 json 格式,例如:

[{'building_header': '', 'office_hours': '', 'name': 'North Park Tower', 'matched_suite_names': ['Bachelor', 'One Bedroom', 'Two Bedroom'], 'matched_beds': ['0', '1', '2'], 'id': 309, 'statistics': {'suites': {'rates': {'average': 950.0, 'max': 1275.0, 'min': 625.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '1.0', 'max': 2, 'min': 0}, 'bathrooms': {'average': 1.0, 'max': 1.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2605725', 'latitude': '43.1703624', 'distance': None}, 'photo': '1443018148_2.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '325 North Park Street', 'postal_code': 'N3R 2X4', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/325-north-park-street-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443018148_2.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': "Located on North Park Street and Memorial Avenue,this quiet building is within walking distance of the following: - Zehrs Plaza, North Park Plaza, Shoppers Drug Mart, Zehrs Grocery Store, Zellers, Pet Store, Party Supply Store, furniture store, variety store, Black's Photography, paint shop and veterinary clinic\xa0  - Restaurants and coffee shops\xa0  - Wayne Gretzky Recreational Arena\xa0  - Medical Clinic,Shoppers Home Health Care Clinic and Pharmacy\xa0  - Catholic Elementary School\xa0  - On bus route "}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 752-6855', 'alt_phone': '', 'name': '', 'phone': '519-752-3596', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 6, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443018148_2.jpg'}, {'building_header': '', 'office_hours': '', 'name': 'Westgate Apartments', 'matched_suite_names': ['Bachelor', 'One Bedroom', 'Two Bedroom'], 'matched_beds': ['0', '1', '2'], 'id': 310, 'statistics': {'suites': {'rates': {'average': 975.0, 'max': 1300.0, 'min': 650.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '1.0', 'max': 2, 'min': 0}, 'bathrooms': {'average': 1.0, 'max': 1.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2482991', 'latitude': '43.1733242', 'distance': None}, 'photo': '1443017488_1.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '661 West Street', 'postal_code': 'N3R 6W9', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/661-west-street-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017488_1.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': 'Located in the North end of Brantford, Westgate Tower is in an area that resembles a city within a city. There are a variety of banks, grocery stores, drug stores, malls, a wide selection of fast food, fine dining restaurants and an after hours medical centre, within waking distance.'}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 751-0379', 'alt_phone': '', 'name': '', 'phone': '519-751-3867', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 6, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017488_1.jpg'}, {'building_header': '', 'office_hours': '', 'name': 'Dornia Manor', 'matched_suite_names': ['One Bedroom', 'Two Bedroom', 'Three Bedroom'], 'matched_beds': ['1', '2', '3'], 'id': 308, 'statistics': {'suites': {'rates': {'average': 1124.5, 'max': 1350.0, 'min': 899.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '2.25', 'max': 3, 'min': 1}, 'bathrooms': {'average': 1.375, 'max': 2.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2584034', 'latitude': '43.1706331', 'distance': None}, 'photo': '1443017947_1.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '321 Fairview Drive', 'postal_code': 'N3R 2X6', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/321-fairview-drive-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017947_1.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': 'Dornia Manor is a quiet, ninety-two unit apartment building located in the North end of Brantford. We offer one, two and three bedroom units and one penthouse suite. The building is located in close proximity to many major services such as banking, shopping, health services, recreational facilities, beauty shops, dry cleaners, schools and churches. There is a bus stop at the front door and highway 403 is within minutes.'}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 752-6855', 'alt_phone': '', 'name': '', 'phone': '519-752-3596', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 8, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017947_1.jpg'}]

关于javascript - Python/BeautifulSoup 与 JavaScript 源,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38334715/

相关文章:

python - 带有递归调用的 return 语句如何在 Python 中保存中间值?

Python:解释器如何读取集合并查找交集

python - 如何从元素中提取链接?

python - 优化我的 Python Scraper

python - 如何用 BeautifulSoup 抓取页面?页面源不匹配检查元素

javascript - 如何使响应式网页渲染得更快?

javascript - iPad 方向更改上网站内容的全面刷新

javascript - (Javascript) 需要帮助将对象转换为数组

javascript - 如何在 Canvas 中缩放 alpha 值?

Python:如何将 ggplot 与简单的 2 列数组一起使用?