python - 使用 python3.7 修剪列表中的链接

我在 python3.7 中有一个小脚本(参见相关问题 here )，它从网站 ( http://digesto.asamblea.gob.ni/consultas/coleccion/ ) 中抓取链接并将它们保存在列表中。不幸的是，它们只是部分的，我必须修剪它们才能将它们用作链接。

这是脚本的相关部分:

list_of_links = []    # will hold the scraped links
tld = 'http://digesto.asamblea.gob.ni'
current_url = driver.current_url   # for any links not starting with /
table_id = driver.find_element(By.ID, 'tableDocCollection')
rows = table_id.find_elements_by_css_selector("tbody tr") # get all table rows
for row in rows:
    row.find_element_by_css_selector('button').click()
    link = row.find_element_by_css_selector('li a[onclick*=pdf]').get_attribute("onclick") # href
    print(list_of_links)# trim
    if link.startswith('/'):
        list_of_links.append(tld + link)
    else:
        list_of_links.append(current_url + link)
    row.find_element_by_css_selector('button').click()

print(list_of_links)

我怎样才能操作这个列表(作为这里只有三个条目的例子)

["http://digesto.asamblea.gob.ni/consultas/coleccion/window.open('/consultas/util/pdf.php?type=rdd&rdd=p2%2FHzlqau8A%3D');return false;", "http://digesto.asamblea.gob.ni/consultas/coleccion/window.open('/consultas/util/pdf.php?type=rdd&rdd=Z%2FgLeZxynkg%3D');return false;", "http://digesto.asamblea.gob.ni/consultas/coleccion/window.open('/consultas/util/pdf.php?type=rdd&rdd=9rka%2BmYwvYM%3D');return false;"]

看起来像

["http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=p2%2FHzlqau8A%3D", "http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=Z%2FgLeZxynkg%3D", "http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=9rka%2BmYwvYM%3D"]

分解:在第一个链接的例子中，我基本上从网站上得到这个链接

http://digesto.asamblea.gob.ni/consultas/coleccion/window.open('/consultas/util/pdf.php?type=rdd&rdd=p2%2FHzlqau8A%3D');返回假;

并需要将其修剪为

http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=p2%2FHzlqau8A%3D。

如何在 python 中从整个列表中实现这一点？

最佳答案

一种方法是对字符串/consultas/coleccion/window.open('进行拆分，删除第二个字符串不需要的结尾，并将两个处理过的字符串连接起来得到你的结果。

应该这样做:

new_links = []

for link in list_of_links:

    current_strings = link.split("/consultas/coleccion/window.open('")
    current_strings[1] = current_strings[1].split("');return")[0]
    new_link = current_strings[0] + current_strings[1]
    new_links.append(new_link)

关于python - 使用 python3.7 修剪列表中的链接，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54163085/

python - 使用 python3.7 修剪列表中的链接

上一篇：python - 为什么 'else' 和 'if' 语句都在运行？

下一篇：python - np.dot 会自动转置向量吗？