python - 在python beautifulsoup中遍历多个div,输出到df然后csv

标签 python html css python-3.x beautifulsoup

尝试为我学校的类(class)目录构建一个抓取器/解析器。第一步是将 Coursicle 数据库抓取到 csv,但我现在只能让它吐出第一行。

这是我正在尝试解析的 html 片段:

<div class="card back" style="display: block;">
    <div class="addClass Back"> 
        <i class="fa clicky fa-star Back"></i> 
        <i class="fa clicky fa-star-o Back"></i>  
        <i class="clicky icon-info-sign"></i>
    </div>
    <div class="courseNumberBack">
        <span class="subject">ANTH</span> <span class="number">54</span>-<span class="section">001</span>
        <div class="smallCourseInfo">

            <span class="abbrevTitle">First-Year Seminar: The Indians' New Worlds: Southeastern Histories from 1200 to 1800</span> 

        </div>
    </div>
    <hr class="faddedLine">

    <div class="courseNameBack"><div class="days">TuTh</div><br>

    <div class="smallCourseInfo"> <div class="instructor">Clara Scarry</div></div>

    <div class="time">3:30pm-4:45pm</div><br>
    <div class="smallCourseInfo"> <div class="building">Alumni 203 </div></div>


    <div class="genEds">HS US WB </div>


</div>

这是我的代码:

import pandas as pd
import os
import csv
import itertools
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("/Users/as9934/Desktop/schedule/wb.htm"), "lxml")

cardback = (soup.find('div', class_='card back'))
for courseNumberBack in cardback.find_all('div', class_='courseNumberBack'):
    for subject in courseNumberBack.find_all('span', class_='subject'):
        for subjects in subject: 
            print (subjects.string,",", end=' ')

    for number in courseNumberBack.find_all('span', class_='number'):
        for numbers in number:
            print (numbers.string,",", end=' ')

    for section in courseNumberBack.find_all('span', class_='section'):
        for sections in section:
            print(sections.string,",", end=' ')

    for abbrevTitle in courseNumberBack.find_all('span', class_='abbrevTitle'):
        for abbrevTitles in abbrevTitle:
            print(abbrevTitles.string,",", end=' ')


for courseNameBack in cardback.find_all('div', class_='courseNameBack'):
    for day in courseNameBack.find_all('div', class_='days'):
        for days in day: 
            print(days.string,",", end=' ')

    for instructor in courseNameBack.find_all('div', class_='instructor'):
        for instructors in instructor:
            print(instructors.string,",", end=' ')

    for time in courseNameBack.find_all('div', class_='time'):
        for times in time:
            print(times.string,",", end=' ')

    for building in courseNameBack.find_all('div', class_='building'):
        for buildings in building:
            print(buildings.string,",", end=' ')

    for genEd in courseNameBack.find_all('div', class_='genEds'):
        for genEds in genEd:
            print(genEds.string, end=' ')

我试过这个:

cardback = (soup.find('div', class_='card back'))
result = dict(
    zip(
    [cardback.text for cardback in soup.select('span.subject')] , 
    [cardback.text for cardback in soup.select('span.number')] ,
    [cardback.text for cardback in soup.select('span.section')] , 
    [cardback.text for cardback in soup.select('span.abbrevTitle')] , 
    [cardback.text for carback in soup.select('div.days')] , 
    [cardback.text for carback in soup.select('div.instructor')] , 
    [cardback.text for carback in soup.select('div.time')] , 
    [cardback.text for carback in soup.select('div.building')] , 
    [cardback.text for carback in soup.select('div.genEds')] 
    )
    )
print(result) 

但是返回这个错误:

ValueError: dictionary update sequence element #0 has length 9; 2 is required

有人有什么想法吗?

最佳答案

在 python 中使用 print 时,您有 2 个特殊的 kw 参数:endsepend 参数正是您所需要的。它看起来像这样

print(something.text, ',', end=' ')

关于python - 在python beautifulsoup中遍历多个div,输出到df然后csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59499332/

相关文章:

javascript - ListView HTML 静态与动态

html - 带有按钮的导航栏在单击时获得事件属性但在选择其他按钮时丢失它们

html - 水平按钮的 CSS

python - math.isclose() 的类型检查

python - Python 中的持久终端 session

python - 类型错误:参数由名称 ('axis' ) 和位置 (2) 给出

python - 类方法作为模型函数和类方法作为 scipy.optimize 的优化函数

php - 未终止的字符串文字

html - 在ASP.NET中可以使用HTML音频播放器进行踏板操作吗?

html - div中间的横线