我正在尝试创建一个过滤器构面列表。我已经加载了所有 <span>
使用 bs4 进入列表,现在需要从 <span>
的较大字符串中抓取特定子字符串.我想将每个过滤器方面的名称加载到一个列表中,最终得到一个如下所示的列表:[size, width, colour, etc]
.
用 bs4 生成的列表
[<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Size" data-v-05f803b1="">Size</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Width" data-v-05f803b1="">Width</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Colour" data-v-05f803b1="">Colour</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Heel Height" data-v-05f803b1="">Heel Height</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Product Type" data-v-05f803b1="">Product Type</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Function" data-v-05f803b1="">Function</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Age" data-v-05f803b1="">Age</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Technology" data-v-05f803b1="">Technology</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Material" data-v-05f803b1="">Material</span>, <span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Price" data-v-05f803b1="">Price</span>]
我尝试过但似乎没有任何帮助的方法:
facetcode = [str(i) for i in spans]
facets = []
for i in facetcode:
facetcode1 = i.split(' ')
for y in facetcode1:
if 'data-facet-name' == True:
print(y)
当我 print(y)
它给了我一个空白列表,但我期待的是:data-facet-name="Size"
我想要的结果:
[size, width, colour, etc]
我是不是把这个复杂化了?这个想法是遍历每个列表元素并仅将我想要的文本加载到新列表中。
最佳答案
您想从具有该属性的跨度中提取属性 data-facet-name
。如果你真的想要一个列表,你可以在之后将集合转换为列表。
from bs4 import BeautifulSoup as bs
html = '''
<html>
<head></head>
<body>
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Size" data-v-05f803b1="">Size</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Width" data-v-05f803b1="">Width</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Colour" data-v-05f803b1="">Colour</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Heel Height" data-v-05f803b1="">Heel Height</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Product Type" data-v-05f803b1="">Product Type</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Function" data-v-05f803b1="">Function</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Age" data-v-05f803b1="">Age</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Technology" data-v-05f803b1="">Technology</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Material" data-v-05f803b1="">Material</span>,
<span class="col-sm-8 col-xs-9 facet-menu-facet__filter-name-spacing" data-facet-name="Price" data-v-05f803b1="">Price</span>
</body>
</html>
'''
soup = bs(html, 'lxml') #or 'html.parser'
print({i['data-facet-name'] for i in soup.select('span[data-facet-name]')})
关于python - 从列表的每个元素中获取子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57826749/