python - 如何使用Selenium获取房屋数据

标签 python selenium web web-scraping screen-scraping

我正在尝试从网页http://www.har.com/4311-Childress-St/sale_40763013获取数据。它有房屋地址、价格和其他信息。我试图获取所有数据,但只成功检索到地址、城市和邮政编码。下面是我的代码。我如何获取其他信息,例如县、故事等?

def getHarData(driver):
driver.get("http://www.har.com/4311-Childress-St/sale_40763013")
try:
    address = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "heading_22")))
    cityzip = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "sub_heading")))
    #price = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "heading_22 pb15"))) 
    print (address.text + ", " + cityzip.text+ ", " +price.text)
except TimeoutException:
    print("data not found")

最佳答案

如果您只需要一些特定字段,我会制作一个很好的可重用函数来通过字段名称/标签获取字段值:

def get_field_value(driver, field):
    field = field.capitalize() + ":"
    return driver.find_element_by_xpath("//div[@class = 'dc_label' and . = '%s']/following-sibling::div[@class = 'dc_value']" % field).text

用法:

county = get_field_value(driver, "county")
print(county)  # prints "Harris County"

完整的工作示例:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait


def get_field_value(driver, field):
    field = field.capitalize() + ":"
    return driver.find_element_by_xpath("//div[@class = 'dc_label' and . = '%s']/following-sibling::div[@class = 'dc_value']" % field).text

driver = webdriver.Firefox()
driver.get("http://www.har.com/4311-Childress-St/sale_40763013")

# wait for the page to load
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, "dc_title")))

county = get_field_value(driver, "county")
print(county)

关于python - 如何使用Selenium获取房屋数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35423723/

相关文章:

api - 从 WebJob 调用 Azure API/WebJob 和 Web api 之间的共享代码

java - Java Web 应用程序中的主要方法?

java - 需要找到有内部文本的element元素

mongodb - 玩!添加reactivemongo插件后框架 Controller 返回EMPTY RESPONSE

python - lambda 函数闭包捕获什么?

python - 理解Python中DataFrame的执行

java - 如何让 FirefoxDriver 使用现有的配置文件?

java - ChromeDriver 不与 OS X 上的浏览​​器通信

python - 为什么 tuple.index() 的性能与 list.index() 相似?

python - django-社会-auth : did not save twitter's email