我正致力于将代码用于艺术(如果可能的话?)并且我正在尝试使用 findall
检索正则表达式的第三组。我阅读了关于 findall
的官方文档,发现它返回一个元组有点缺乏,我想传递一个标志来返回第三组,而不是 3 组的元组(前两个是占位符) .将某些内容链接起来以仅返回名称(第三组)而不是之后进行迭代的最有效方法是什么?
import re, requests
rgx = r"([<][TDtd][>])|(target[=]new[>])(?P<the_deceased>[A-Z].*?)[,]"
urls = {2013: "http://www.killedbypolice.net/kbp2013.html",
2014: "http://www.killedbypolice.net/kbp2014.html",
2015: "http://www.killedbypolice.net/" }
names_of_the_dead = []
for url in urls.values():
response = requests.get(url)
content = response.content
people_killed_by_police_that_year_alone = re.findall(rgx, content)
for dead_person in people_killed_by_police_that_year_alone:
names_of_the_dead.append(dead_person)
#dead_americans_as_string = ",".join(names_of_the_dead)
#print("RIP, {} since 2013:\n".format(len(names_of_the_dead)))
#print(dead_americans_as_string)
In [67]: names_of_the_dead
Out[67]:
[('', 'target=new>', 'May 1st - Dec 31st'),
('', 'target=new>', 'Ricky Junior Toney'),
('', 'target=new>', 'William Jackson'),
('', 'target=new>', 'Bethany Lytle'),
('', 'target=new>', 'Christopher George'),
最佳答案
只需将第一个和第二个捕获组转为非捕获组即可。
rgx = r"(?:[<][TDtd][>])|(?:target[=]new[>])(?P<the_deceased>[A-Z].*?)[,]"
关于python - 如何使用标志仅返回一组使用 findall- Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34959302/