html - scrapy css last-child 选择器无法选择文本

我正在尝试使用 Scrapy 框架中的 CSS 选择器选择/匹配 HTML 中的元素。但是，我卡在了我希望使用最后一个子选择器提取的字段之一。

这是 HTML:

<td class="Table-Standard-AwardName Table-Scholarship-AwardName">

<a id="ctl00_ContentPlaceHolder1_ScholarshipDataControl_grvScholarshipSearch_ctl02_hylScholarshipName" class="bold" href="/Scholarships/14123/Family-Bursary,-The">Family Bursary, The</a>   

<br>

<span>Field of Study:</span> 

EcologyEnvironmental Science

</td>

我必须匹配文本“EcologyEnvironmental Science”。

当我使用 last-child 选择器时，输出显示“Field of Study”:

In [3]: response.css('td.Table-Standard-AwardName.Table-Scholarship-AwardName > *:last-child::text').extract_first()
Out[3]: 'Field of Study:'

我查看了其他问题并尝试了多种方法，例如 nth-last-child() 和 combined sibling 选择器，但无济于事。帮助!

最佳答案

如前所述，EcologyEnvironmental Science 文本是 td 元素的一部分，这就是为什么您只需要提取其文本的原因，尝试如下操作:

values = response.css('.Table-Standard-AwardName.Table-Scholarship-AwardName::text').extract()
out = next(filter(None, map(methodcaller('strip'), values)))
# you can assign 'EcologyEnvironmental Science' to your item

关于html - scrapy css last-child 选择器无法选择文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47445285/

上一篇：html - CSS 背景颜色在谷歌浏览器中给出奇怪的结果

下一篇：javascript - 以前选择的日期不会在日历中取消选择

html - 溢出-x :hidden conflict with CSS Flexbox in Chrome only

jquery - 如何使用 jquery 更改 webkit css

php - 使用 PHP 简单 HTML DOM 获取 img src

jquery - Woocommerce 结账 + 购物车白色不透明度类 ="processing"

python - 如何在 CSS 选择器 scrapy 中使用正则表达式

python - 使用Pipeline.py删除Value而不是Field

python - Scrapy 飞溅^ AttributeError : 'module' object has no attribute 'Spider'

html - 机器可解析的 html5 标签和属性列表

html - 使用垫子按钮以 Angular 构建导航栏