Python - 逐行读取 BeautifulSoup 片段？ (或其他抓取我想要的数据的方法)

我有一个使用 Python 和 BeautifulSoup 阅读的网页，例如 soup=BeautifulSoup(urllib2.urlopen(site))。

我正在尝试获取网站的片段并解析它，因此我使用 pTag = soup.find("p", {"class":"secondary"}),这会产生以下内容。

<p class="secondary">
              Some address and street
              <br />
              City, State, ZIP
              (some) phone-number
             </p>

我基本上想要变量address1、address2和phone，这样:

address1= "Some address and street"
address2= "City, State, ZIP"
phone= "(some) phone-number"

我不确定如何读取汤的行来选择性地选择第 1、3、4 行(假设从第 0 行开始)，但我也愿意接受其他方式来获取我想要的数据。

提前致谢! :)

最佳答案

假设地址包含您的原始地址。

<p class="secondary">
              Some address and street
              <br />
              City, State, ZIP
              (some) phone-number
             </p>

然后您可以用逗号替换换行符，最后用逗号分隔。这并不理想，但对于这些场景，当元素(跨度、id 等)之间没有明确的分隔时，那么这一切都归结为位置检查。

address.find("br").replaceWith(",")
addressComponents = address.text.split(",")

这将为您提供 addressComponents 列表中的以下四个组件。

Some address and street
City
 State
 ZIP
              (some) phone-number

As there is no break line for the ZIP and phone number there appears to be a newline character inserted. So to split the final component:

addressSplit = addressComponents[3].split("\n")
print addressSplit[0] # Zip code
print addressSplit[1].strip() # Phone number

关于Python - 逐行读取 BeautifulSoup 片段？ (或其他抓取我想要的数据的方法)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9419848/

Python - 逐行读取 BeautifulSoup 片段？ (或其他抓取我想要的数据的方法)

上一篇：python - python 中稳健且完整的扩展或 RRULE

下一篇：Python/伪代码程序