python - 如果找到某些字符串,则提取链接和文本 - BeautifulSoup

标签 python web-scraping beautifulsoup

我正在尝试运行 beautifulSoup 从网站中提取链接和文本(我有权限)

我运行以下代码来获取链接和文本:

import requests
from bs4 import BeautifulSoup 

url = "http://implementconsultinggroup.com/career/#/6257"
r = requests.get(url)

soup = BeautifulSoup(r.content)

links = soup.find_all("a")

for link in links:
     if "career" in link.get("href"):
             print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

这给了我以下输出:

View Position

</a>
<a href='/career/business-analyst-within-human-capital-management/'>
Business analyst within human capital management
COPENHAGEN • We are looking for an ambitious student with an interest in HR 
who is passionate about working in the cross-field of people management, 
business and technology




View Position

</a>
<a href='/career/management-consultants-within-strategic-workforce-planning/'>
Management consultants within strategic workforce planning
COPENHAGEN • We are looking for consultants with profound experience from 
other consultancies




View Position

</a>
<a href='/career/management-consultants-within-supply-chain-strategy-
production-and-process-management/'>
Management consultants within supply chain strategy, production and process 
management
MALMÖ • We are looking for talented graduates who want a career in management 
consulting

这几乎是正确的,但是我只希望返回文本中具有名称 COPENHAGEN 的位置(即不应返回 MALMO 位置之上)。

该网站的 HTML 代码如下所示:

<div class="small-12 medium-9 columns top-lined">
                                    <a href="/career/management-consultants-within-supply-chain-management/" class="box-link">
                                    <h2 class="article__title--tiny" data-searchable-text="">Management consultants within supply chain management</h2>
                                    <p class="article__longDescription" data-searchable-text="">COPENHAGEN • We are looking for bright graduates with a passion for supply chain management and supply chain planning for our planning and execution excellence team.</p>
                                    <div class="styled-link styled-icon">
                                        <span class="icon icon-icon">
                                            <i class="fa fa-chevron-right"></i>
                                        </span>
                                        <span class="icon-text">View Position</span>
                                    </div>
                                </a>
                            </div>

最佳答案

看来你可以添加另一个条件:

(...)
for link in links:
    if "career" in link.get("href") and 'COPENHAGEN' in link.text:
       print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

关于python - 如果找到某些字符串,则提取链接和文本 - BeautifulSoup,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45900619/

相关文章:

Python:为什么 0.01.as_integer_ratio() 返回 5764607523034235/576460752303423488

python - 需要帮助计算地理距离

python - 无法部署到 Scrapinghub

python - Beautiful Soup 或 Python 请求库未检测到某些标签

python - BeautifulSoup: 'lxml' 和 'html.parser' 以及 'html5lib' 解析器有什么区别?

python - Python订票程序

python - 在字典中查找整数最近邻

java - 如何使用 Htmlunit 从网页获取输出(图像或 PDF)(包括外部图像和 css)

Python:Beautifulsoup 返回 None 或 [ ]

python - 如何查找未包含在标签中的字符串