python - 有人可以详细解释一下这段代码是如何工作的(使用 Python 访问 Web 数据)

使用 urllib 从下面的数据文件中读取 HTML，从 anchor 标记中提取 href= vaues，扫描相对于列表中第一个名称处于特定位置的标记，点击该链接并重复处理多次并报告您找到的姓氏。

这是数据的 HTML 链接 http://py4e-data.dr-chuck.net/known_by_Caragh.html

所以我必须找到第 18 个位置的链接(第一个名字是 1)。按照该链接。重复此过程 7 次。答案是您检索到的姓氏。

谁能逐行详细解释这 2 个循环(“While”和“for”)的工作原理。
因此，当我输入 positi 18 时，它是否会提取第 18 行 href 标签，然后是下一个第 18 行等等 7 次？因为即使我输入不同的数字，我仍然得到相同的答案。非常感谢您。

代码:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = input("Enter URL:")
numbers  = input("Enter count:")
position = input("Enter position:")

while n < 7:
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == 18:
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
n = n + 1

最佳答案

Because even if I Enter different number I'm still getting same answer.

你得到了相同的答案，因为你已经硬编码了:

while n < 7

和

if count == 18

我认为您的意思是将这些作为您的变量/输入。这样，您还需要将这些输入作为 int，目前，它们存储为 str。另外请注意，我不想每次都输入 url，所以硬编码，但你可以在那里取消注释你的输入，然后注释掉 url = 'http://py4e-data.dr -chuck.net/known_by_Caragh.html'

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

n = 0
count = 0

url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
#url = input("Enter URL:")

numbers  = int(input("Enter count:"))
position = int(input("Enter position:"))

while n < numbers:    #<----- there's your variable of how many times to try
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == position:  #<------- and the variable to get the position
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
    n = n + 1    #<---- I fixed your indentation. The way it was previously would never get yourself out of the while loop because n will never increment.

关于python - 有人可以详细解释一下这段代码是如何工作的(使用 Python 访问 Web 数据)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54900251/

python - 有人可以详细解释一下这段代码是如何工作的(使用 Python 访问 Web 数据)

上一篇：apache - 如何使用 Apache mod_ssl 变量验证 URI 格式的主题备用名称的内容？

下一篇：python - 在带有 uwsgi 的 flask 应用程序中使用 mysql-connector-python 并收到以下错误 : SSL connection error: SSL_CTX_new failed