在Python中解析并清理存储小时的文本块

我正在抓一个网站以下面的格式提取商店营业时间:

"""Hours
Monday 9:30 AM - 9:00 PM
Tuesday 9:30 AM - 9:00 PM
Wednesday 9:30 AM - 9:00 PM
Thursday 9:30 AM - 9:00 PM
Friday 9:30 AM - 11:00 PM
Saturday 9:30 AM - 11:00 PM
Sunday 11:00 AM - 6:00 PM
Holiday Hours
Thanksgiving Day 11:00 AM - 6:00 PM"""

我想要处理它最终如此:

"""Mon-Thu 9:30AM-9:00PM  
Fri-Sat 9:30AM-11:00PM
Sun & Hol 11:00AM-6:00PM"""

我很乐意为了学习和建立自己而采用一种提议的伪代码解决方案.我只是无法在这里解决任何问题.

最佳答案 我认为这是
itertools.groupby()的一个很好的用例 – 我们可以用它来对连续几天进行相同的时间范围分组.这些方面的东西:

from itertools import groupby
from operator import itemgetter
from pprint import pprint


data = """Hours
Monday 9:30 AM - 9:00 PM
Tuesday 9:30 AM - 9:00 PM
Wednesday 9:30 AM - 9:00 PM
Thursday 9:30 AM - 9:00 PM
Friday 9:30 AM - 11:00 PM
Saturday 9:30 AM - 11:00 PM
Sunday 11:00 AM - 6:00 PM
Holiday Hours
Thanksgiving Day 11:00 AM - 6:00 PM"""

# filter relevant rows with weekdays only
rows = [row.split(" ", 1) for row in data.splitlines()[1:-2]]

# group consecutive days by a time range
result = []
for time_range, group in groupby(rows, key=itemgetter(1)):
    days_in_group = [item[0] for item in group]

    first_day, last_day = days_in_group[0][:3], days_in_group[-1][:3]
    range_end = "-" + str(last_day) if first_day != last_day else ""

    result.append("{begin}{end} {time_range}".format(begin=first_day,
                                                     end=range_end,
                                                     time_range=time_range))

pprint(result)

打印:

['Mon-Thu 9:30 AM - 9:00 PM',
 'Fri-Sat 9:30 AM - 11:00 PM',
 'Sun 11:00 AM - 6:00 PM']

请注意,如果每一天都有不同的时间范围,这甚至可以工作.

点赞