python - 如何从表格行中抓取特定单词?

标签 python selenium xpath beautifulsoup css-selectors

我只想使用 python 从下表中抓取代码

enter image description here

如图所示,您可以看到我只想抓取 CPT、CTC、PTC、STC、SPT、HTC、P5TC、P1A、P2A P3A、P1E、P2E、P3E。此代码可能会不时更改,例如添加 P4E 或删除 P1E。

上表的 HTML 代码为:

<table class="list">
   <tbody>
      <tr>
         <td>
            <p>PRODUCT<br>DESCRIPTION</p>
         </td>
         <td>
            <p><strong>Time Charter:</strong> CPT, CTC, PTC, STC, SPT, HTC, P5TC<br><strong>Time Charter Trip:</strong> P1A, P2A, P3A,<br>P1E, P2E, P3E</p>
         </td>
         <td><strong>Voyage: </strong>C3E, C4E, C5E, C7E</td>
      </tr>
      <tr>
         <td>
            <p>CONTRACT SIZE</p>
            <p></p>
         </td>
         <td>
            <p>1 day</p>
         </td>
         <td>
            <p>1,000 metric tons</p>
         </td>
      </tr>
      <tr>
         <td>
            <p>MINIMUM TICK</p>
            <p></p>
         </td>
         <td>
            <p>US$ 25</p>
         </td>
         <td>
            <p>US$ 0.01</p>
         </td>
      </tr>
      <tr>
         <td>
            <p>FINAL SETTLEMENT PRICE</p>
            <p></p>
         </td>
         <td colspan="2" rowspan="1">
            <p>The floating price will be the end-of-day price as supplied by the Baltic Exchange.</p>
            <p><br><strong>All products:</strong> Final settlement price will be the mean of the daily Baltic Exchange spot price assessments for every trading day in the expiry month.</p>
            <p><br><strong>Exception for P1A, P2A, P3A:</strong> Final settlement price will be the mean of the last 7 Baltic Exchange spot price assessments in the expiry month.</p>
         </td>
      </tr>
      <tr>
         <td>
            <p>CONTRACT SERIES</p>
         </td>
         <td colspan="2" rowspan="1">
            <p><strong><strong>CTC, CPT, PTC, STC, SPT, HTC, P5TC</strong>:</strong> Months, quarters and calendar years out to a maximum of 72 months</p>
            <p><strong>C3E, C4E, C5E, C7E, P1A, P2A, P3A, P1E, P2E, P3E:</strong> Months, quarters and calendar years out to a maximum of 36 months</p>
         </td>
      </tr>
      <tr>
         <td>
            <p>SETTLEMENT</p>
         </td>
         <td colspan="2" rowspan="1">
            <p>At 13:00 hours (UK time) on the last business day of each month within the contract series</p>
         </td>
      </tr>
   </tbody>
</table>

您可以从以下网站链接查看代码

https://www.eex.com/en/products/global-commodities/freight

最佳答案

如果您的用例是抓取所有文本:

timecharter

你你得诱导WebDriverWait对于所需的 visibility_of_element_ located() ,您可以使用以下任一 Locator Strategies :

  • 使用CSS_SELECTOR:

    driver.get('https://www.eex.com/en/products/global-commodities/freight')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "article div:last-child table>tbody>tr td:nth-child(2)>p"))).text)
    
  • 使用XPATH:

    driver.get('https://www.eex.com/en/products/global-commodities/freight')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[text()='Contract Specifications']//following::table[1]/tbody/tr//following::td[1]/p"))).text)
    
  • 控制台输出:

    Time Charter: CPT, CTC, PTC, STC, SPT, HTC, P5TC
    Time Charter Trip: P1A, P2A, P3A,
    P1E, P2E, P3E
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

更新 1

如果您想提取CPT、CTC、PTC、STC、SPT、HTC、P5TCP1A、P2A、P3AP1E、P2E、P3E 您可以单独使用以下解决方案:

  • 打印CPT、CTC、PTC、STC、SPT、HTC、P5TC

    #element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "article div:last-child table>tbody>tr td:nth-child(2)>p")))
    element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//h3[text()='Contract Specifications']//following::table[1]/tbody/tr//following::td[1]/p")))
    print(driver.execute_script('return arguments[0].childNodes[1].textContent;', element).strip())
    
  • 打印P1A、P2A P3A

    #element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "article div:last-child table>tbody>tr td:nth-child(2)>p")))
    element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//h3[text()='Contract Specifications']//following::table[1]/tbody/tr//following::td[1]/p")))
    print(driver.execute_script('return arguments[0].childNodes[4].textContent;', element).strip())
    
  • 打印P1E、P2E、P3E

    //element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "article div:last-child table>tbody>tr td:nth-child(2)>p")))
    element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//h3[text()='Contract Specifications']//following::table[1]/tbody/tr//following::td[1]/p")))
    print(driver.execute_script('return arguments[0].lastChild.textContent;', element).strip())
    

更新2

要将所有项目一起打印:

  • 代码块:

    driver.get('https://www.eex.com/en/products/global-commodities/freight')
    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[text()='Contract Specifications']//following::table[1]/tbody/tr//following::td[1]/p")))
    first = driver.execute_script('return arguments[0].childNodes[1].textContent;', element).strip()
    second = driver.execute_script('return arguments[0].childNodes[4].textContent;', element).strip()
    third = driver.execute_script('return arguments[0].lastChild.textContent;', element).strip()
    for list in (first,second,third):
        print(list)
    
  • 控制台输出:

    CPT, CTC, PTC, STC, SPT, HTC, P5TC
    P1A, P2A, P3A,
    P1E, P2E, P3E
    

关于python - 如何从表格行中抓取特定单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62338176/

相关文章:

xml - 如何使用 XSLT 按属性过滤数据?

c# - 在 C# 中使用带有默认命名空间的 Xpath 进行规范化

python - OpenERP 7 自定义采购订单规则

python - 如何修复 Django REST Framework 中的 UnicodeDecodeError?

python - python中两个不同数据帧的散点图数据

ruby - 错误 : disable image loading on watir with firefox

python - Fabric 中的正则表达式和 sed 出现问题

java - Selenium Webdriver 警告 - 无效 token "screen"

java - 无法在我的 MacBook 中使用 Firefox 浏览器打开网页

python - Scrapy使用xpath爬行ul类不起作用