python - 如何解析字符串查找特定单词/数字并在找到时显示它们

我确信我已经编写了一些相当有问题的代码，但它似乎可以完成这项工作。问题在于，它将数据打印到电子表格中，并且在我希望找到车辆年份的列中，如果广告中的第一个单词不是年份，那么它会显示第一个单词，可能是制造商。

本质上，我想设置 if 语句，以便如果车辆年份不在第一个单词中，而是在字符串中的其他位置，它仍然可以找到它并将其打印到我的 .csv 中。

另外，我有一段时间一直在努力解析多个页面，并希望这里有人也能提供帮助。该网址中包含 page=2 等，但我无法让它解析所有网址并获取所有页面上的数据。目前我所尝试的一切都只完成第一页。正如您可能已经猜到的，我对 Python 相当陌生。

import csv ; import requests

from bs4 import BeautifulSoup

outfile = open('carandclassic-new.csv','w', newline='', encoding='utf-8')
writer = csv.writer(outfile)
writer.writerow(["Link", "Title", "Year", "Make", "Model", "Variant", "Image"])

url = 'https://www.carandclassic.co.uk/cat/3/?page=2'

get_url = requests.get(url)

get_text = get_url.text

soup = BeautifulSoup(get_text, 'html.parser')


car_link = soup.find_all('div', 'titleAndText', 'image')


for div in car_link:
    links = div.findAll('a')
    for a in links:
        link = ("https://www.carandclassic.co.uk" + a['href'])
        title = (a.text.strip())
        year = (title.split(' ', 1)[0])
        make = (title.split(' ', 2)[1])
        model = (title.split(' ', 3)[2])
        date = "\d"
        for line in title:
        yom = title.split()
        if yom[0] == "\d":
            yom[0] = (title.split(' ', 1)[0])
        else:
            yom = title.date

        writer.writerow([link, title, year, make, model])
        print(link, title, year, make, model)



outfile.close()

请问有人可以帮我解决这个问题吗？我意识到底部的 if 语句可能离题很远。

代码成功地从字符串中获取第一个单词，遗憾的是数据的结构方式并不总是车辆的制造年份 (yom)

最佳答案

Comment "1978 Full restored Datsun 280Z" becomes '1978' '1978' '280Z'.
Rather than '1978' 'Datsun' '280z'

改进year验证，更改为使用 re模块:

import re

if not (len(year) == 4 and year.isdigit()):
    match = re.findall('\d{4}', title)
    if match:
        for item in match:
            if int(item) in range(1900,2010):
                # Assume year
                year = item
                break

The output becomes:

'1978 Full restored Datsun 280Z', '1978', 'Full', '280Z'

关于错误结果make='Full'您有两个选择。

停用词列表
使用 ['full', 'restored', etc.] 等术语构建停用词列表和loop title_items查找停用词列表中第一个项不。
制作者列表
构建一个 Maker 列表，如 ['Mercedes', 'Datsun', etc.]和loop title_items查找第一个匹配项。

Question: find the vehicle's year if the first word in the advert isn't the year

二手build-in和module :

使用的示例标题:

# Simulating html Element
class Element():
    def __init__(self, text): self.text = text

for a in [Element('Mercedes Benz 280SL 1980 Cabriolet in beautiful condition'),
          Element('1964 Mercedes Benz 220SEb Saloon Manual RHD')]:

获取title来自<a Element并将其分割为 blanks .

    title = a.text.strip()
    title_items = title.split()

默认值为 title_items在索引0, 1, 2 .

    # Default
    year = title_items[0]
    make = title_items[1]
    model = title_items[2]

验证是否 year满足条件4位

    # Verify 'year'
    if not (len(year) == 4 and year.isdigit()):

循环所有item在title_items ，如果条件满足则中断。

        # Test all items
        for item in title_items:
            if len(item) == 4 and item.isdigit():
                # Assume year
                year = item
                break

更改为假定的 title_items在索引0, 1是make和model

        make = title_items[0]
        model = title_items[1]

检查是否 model以数字开头

Note: This will fail if a Model does not met this criteria!

    # Condition: Model have to start with digit
    if not model[0].isdigit():
        for item in title_items:
            if item[0].isdigit() and not item == year:
                model = item

    print('{}'.format([title, year, make, model]))

Output:

['Mercedes Benz 280SL 1980 Cabriolet in beautiful condition', '1980', 'Mercedes', '280SL']
['1964 Mercedes Benz 220SEb Saloon Manual RHD', '1964', 'Mercedes', '220SEb']

使用 Python 测试:3.4.2

关于python - 如何解析字符串查找特定单词/数字并在找到时显示它们，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54560756/

python - 如何解析字符串查找特定单词/数字并在找到时显示它们

上一篇：python - 如何计算 Pandas 滚动窗口的累积乘积？

下一篇：python - 变量拒绝添加到