python - 从列表的每个元素中获取子字符串

标签 python html python-3.x web-scraping beautifulsoup

我正在尝试创建一个过滤器构面列表。我已经加载了所有 <span>使用 bs4 进入列表,现在需要从 <span> 的较大字符串中抓取特定子字符串.我想将每个过滤器方面的名称加载到一个列表中,最终得到一个如下所示的列表:[size, width, colour, etc] .

用 bs4 生成的列表

[<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Size" data-v-05f803b1="">Size</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Width" data-v-05f803b1="">Width</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Colour" data-v-05f803b1="">Colour</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Heel Height" data-v-05f803b1="">Heel Height</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Product Type" data-v-05f803b1="">Product Type</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Function" data-v-05f803b1="">Function</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Age" data-v-05f803b1="">Age</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Technology" data-v-05f803b1="">Technology</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Material" data-v-05f803b1="">Material</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Price" data-v-05f803b1="">Price</span>]

我尝试过但似乎没有任何帮助的方法:

facetcode = [str(i) for i in spans]

facets = []

for i in facetcode:
    facetcode1 = i.split(' ')
    for y in facetcode1:
        if 'data-facet-name' == True:
            print(y)

当我 print(y)它给了我一个空白列表,但我期待的是:data-facet-name="Size"

我想要的结果:

[size, width, colour, etc]

我是不是把这个复杂化了?这个想法是遍历每个列表元素并仅将我想要的文本加载到新列表中。

最佳答案

您想从具有该属性的跨度中提取属性 data-facet-name。如果你真的想要一个列表,你可以在之后将集合转换为列表。

from bs4 import BeautifulSoup as bs

html = '''
<html>
 <head></head>
 <body>
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Size" data-v-05f803b1="">Size</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Width" data-v-05f803b1="">Width</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Colour" data-v-05f803b1="">Colour</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Heel Height" data-v-05f803b1="">Heel Height</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Product Type" data-v-05f803b1="">Product Type</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Function" data-v-05f803b1="">Function</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Age" data-v-05f803b1="">Age</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Technology" data-v-05f803b1="">Technology</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Material" data-v-05f803b1="">Material</span>, 
  <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Price" data-v-05f803b1="">Price</span>
 </body>
</html>
  '''
soup = bs(html, 'lxml') #or 'html.parser'
print({i['data-facet-name'] for i in soup.select('span[data-facet-name]')})

enter image description here

关于python - 从列表的每个元素中获取子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57826749/

相关文章:

python - 如何在 Django 管理自定义列中获取请求参数?

python - 在 numpy.einsum 中对省略号广播维度求和

python - 使用 distutils 为 python 扩展构建 RPM 时的相对包含路径

python - 为什么只有 `static`文件夹下的图片才能显示?

html - 如何删除内联/内联 block 元素之间的空间?

python - 来自 Pandas 数据框的共现矩阵

python - 如何在 python 中创建条件列

javascript - 使用 vanilla JavaScript 将事件绑定(bind)到动态创建的 HTML 元素 [无 jquery]

css - 让 div 调整到 <p> 元素的高度

python - 如何在 Python3 中打印格式化的字符串?