python - 正则表达式从给定类型的字符串中获取多个日期

标签 python regex

我的 Python 程序中有这样的字符串,我按要求添加了想要的结果:

"Sat 1 Dec - 11h + 14h / Sun 2 Dec - 12h30"
("Sat 1 Dec 11h", "Sat 1 Dec 14h", "Sun 2 Dec 12h30")
"Tue 27 + Wed 28 Nov - 20h30"
("Tue 27 Nov 20h30", "Wed 28 Nov 20h30")
"Fri 4 + Sat 5 Jan - 20h30"
("Fri 4 Jan 20h30", "Sat 5 Jan 20h30")
"Wed 23 Jan - 20h"
("Wed 23 Jan 20h")
"Sat 26 Jan - 11h + 14h / Sun 27 Jan - 11h"
("Sat 26 Jan 11h", "Sat 26 Jan 14h", "Sun 27 Jan 11h")
"Fri 8 and Sat 9 Feb - 20h30 + thu 1 feb - 15h"
("Fri 8 Feb 20h30", "Sat 9 Feb 20h30", "Thu 1 feb 15h")
"Sat 2 Mar - 11h + 14h / Sun 3 Mar - 11h"
("Sat 2 Mar 11h", "Sat 2 Mar 14h", "Sun 3 Mar 11h")
"Wed 12, Thu 13, Fri 14 and Sat 15 Jun - 19h + Sun 16 Jun - 12h30"
("Wed 12 Jun 19h", "Thu 13 Jun 19h", "Fri 14 Jun 19h", "Sat 15 Jun 19h", "Sun 16 Jun 12h30") 

通过这两个正则表达式,我可以找到第一个字符串的 3 个日期:

(Mon|Tue|Wed|Thu|Fri|Sat|Sun)\s([0-9]{1,2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))(?:.*?)([0-9]{1,2}[uh\:](?:[0-9]{2})?)

(Mon|Tue|Wed|Thu|Fri|Sat|Sun)\s([0-9]{1,2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))(?:.*?\+\s)([0-9]{1,2}[uh\:](?:[0-9]{2})?)

是否可以使用一个或两个正则表达式模式从这些字符串中获取所有日期(以匹配所有日期)。所以我认为它需要做的是:找到接下来的第一个月份如果没有给出每个日期,则获取相应的时间,如果后跟多个小时,则为每个日期创建多个日期时间。 格式不是那么重要。

最佳答案

我让你开始了。这是我对你的问题的解释。我将 parse_complex 的实现留给您。

class DateParser(object):
    """parse dates according to the custom rules here:

    >>> DateParser("Sat 1 Dec - 11h + 14h / Sun 2 Dec - 12h30").parse()
    ("Sat 1 Dec 11h", "Sat 1 Dec 14h", "Sun 2 Dec 12h30")
    >>> DateParser("Tue 27 + Wed 28 Nov - 20h30").parse()
    ("Tue 27 Nov 20h30", "Wed 28 Nov 20h30")
    >>> DateParser("Fri 4 + Sat 5 Jan - 20h30").parse()
    ("Fri 4 Jan 20h30", "Sat 5 Jan 20h30")
    >>> DateParser("Wed 23 Jan - 20h").parse()
    ("Wed 23 Jan 20h")
    >>> DateParser("Sat 26 Jan - 11h + 14h / Sun 27 Jan - 11h").parse()
    ("Sat 26 Jan 11h", "Sat 26 Jan 14h", "Sun 27 Jan 11h")
    >>> DateParser("Fri 8 and Sat 9 Feb - 20h30 + thu 1 feb - 15h").parse()
    ("Fri 8 Feb 20h30", "Sat 9 Feb 20h30", "Thu 1 feb 15h")
    >>> DateParse("Sat 2 Mar - 11h + 14h / Sun 3 Mar - 11h").parse()
    ("Sat 2 Mar 11h", "Sat 2 Mar 14h", "Sun 3 Mar 11h")
    >>> DateParser("Wed 12, Thu 13, Fri 14 and Sat 15 Jun - 19h + Sun 16 Jun - 12h30").parse()
    ("Wed 12 Jun 19h", "Thu 13 Jun 19h", "Fri 14 Jun 19h", "Sat 15 Jun 19h", "Sun 16 Jun 12h30")
    """

    def __init__(self, line):
        self.date  = line
        self.dates = self.split_dates(line)
        self.final = []

        self.days = ['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun']
        self.mons = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']

    def parse(self):
        if self.is_complex():
            self.parse_complex()
        else:
            self.parse_simple()

        return tuple(self.final)

    def parse_simple(self):
        """typical formats: 
        Day 00 + Day 01 Mon - 00h00
        Day 00 Mon - 00h00 + 01h00
        Day 00 Mon - 00h00 / Day 02 Mon - 00h00
        """

        for date in self.dates:
            mods = self.split_modifiers(date)

            date_mods = []
            for mod in mods:
                if self.is_complete(mod):
                    #only *one* base_date
                    base_date, time = self.split_time(mod)
                    date_mods.append(time)
                else:
                    date_mods.append(mod)

            for mod in date_mods:
                if self.is_hour(mod):
                    #Sat 1 Dec - 11h + 14h
                    self.final.append(' '.join([base_date, mod]))
                else:
                    #Fri 4 + Sat 5 Jan - 20h30
                    self.final.append(' '.join([mod, self.extract_month(base_date), time]))

    def parse_complex(self):
        """typical format:
        Day 00, Day 01 and Day 02 Mon - 00h00 + Day 03 Mon 01h00
        """
        raise NotImplementedError()


    def is_complex(self):
        """presence of the complex date attribute requires special parsing"""
        return self.date.find(' and ') > -1

    def is_complete(self, section):
        """section has format `Day 00 Mon - 00h00`
        must have no modifiers to determine completeness
        """
        sections = map(lambda x: x.lower(), section.split())

        try:
            dow, dom, moy, dash, time = sections
        except ValueError, e:
            return False

        return all([dow in self.days, moy in self.mons])


    def is_hour(self, section):
        return section[0].isdigit()

    def is_day(self, section):
        return section[:3].lower() in self.days


    def extract_month(self, section):
        """return the month present in the string, if any"""
        for mon in self.mons:
            if section.lower().find(mon) > -1:
                found = section.lower().index(mon)
                return section[found : found + 3]
        return None


    def split_dates(self, section):
        """split individual dates from a list of dates"""
        return section.split(' / ')

    def split_time(self, section):
        """split individual times from a complete date"""
        return section.split(' - ')

    def split_modifiers(self, section):
        """extend a date by implying that they share a date or a time
        modifiers change their meaning when parsing a complex date
        """
        return section.split(' + ')

>>> DateParser("Fri 4 + Sat 5 Jan - 20h30 / Sat 1 Dec - 11h + 14h + 16h / Sun 2 Dec - 12h30").parse()
('Fri 4 Jan 20h30', 'Sat 5 Jan 20h30', 'Sat 1 Dec 11h', 'Sat 1 Dec 14h', 'Sat 1 Dec 16h', 'Sun 2 Dec 12h30')

如果您对我记录本类(class)的方式有疑问,请随时回复我,我可以为您提供更多帮助。这个问题比我最初想象的要复杂一些,我需要先完成一些其他事情。

关于python - 正则表达式从给定类型的字符串中获取多个日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13418084/

相关文章:

regex - .htaccess:重定向除一个页面之外的所有页面

javascript - 正则表达式在任何空白字符处拆分字符串但隔离任何换行符

python - 基于开始和结束日期 Pandas 的复杂合并

python - max() 在我的函数中给出 "int"不可调用错误

python - 将字符串转换为键不是字符串的字典

ruby - 删除空格和等号之间的任何内容

python - scipy 0.11.0 到 0.12.0 改变了线性 scipy.interpolate.interp1d,打破了我不断更新的插值器

python - 连接列表中的字符串

javascript - 将括号外的每个字符大写

Python 回复 : Overwrite Issue