我正在抓一个网站以下面的格式提取商店营业时间:
"""Hours
Monday 9:30 AM - 9:00 PM
Tuesday 9:30 AM - 9:00 PM
Wednesday 9:30 AM - 9:00 PM
Thursday 9:30 AM - 9:00 PM
Friday 9:30 AM - 11:00 PM
Saturday 9:30 AM - 11:00 PM
Sunday 11:00 AM - 6:00 PM
Holiday Hours
Thanksgiving Day 11:00 AM - 6:00 PM"""
我想要处理它最终如此:
"""Mon-Thu 9:30AM-9:00PM
Fri-Sat 9:30AM-11:00PM
Sun & Hol 11:00AM-6:00PM"""
我很乐意为了学习和建立自己而采用一种提议的伪代码解决方案.我只是无法在这里解决任何问题.
最佳答案 我认为这是
itertools.groupby()
的一个很好的用例 – 我们可以用它来对连续几天进行相同的时间范围分组.这些方面的东西:
from itertools import groupby
from operator import itemgetter
from pprint import pprint
data = """Hours
Monday 9:30 AM - 9:00 PM
Tuesday 9:30 AM - 9:00 PM
Wednesday 9:30 AM - 9:00 PM
Thursday 9:30 AM - 9:00 PM
Friday 9:30 AM - 11:00 PM
Saturday 9:30 AM - 11:00 PM
Sunday 11:00 AM - 6:00 PM
Holiday Hours
Thanksgiving Day 11:00 AM - 6:00 PM"""
# filter relevant rows with weekdays only
rows = [row.split(" ", 1) for row in data.splitlines()[1:-2]]
# group consecutive days by a time range
result = []
for time_range, group in groupby(rows, key=itemgetter(1)):
days_in_group = [item[0] for item in group]
first_day, last_day = days_in_group[0][:3], days_in_group[-1][:3]
range_end = "-" + str(last_day) if first_day != last_day else ""
result.append("{begin}{end} {time_range}".format(begin=first_day,
end=range_end,
time_range=time_range))
pprint(result)
打印:
['Mon-Thu 9:30 AM - 9:00 PM',
'Fri-Sat 9:30 AM - 11:00 PM',
'Sun 11:00 AM - 6:00 PM']
请注意,如果每一天都有不同的时间范围,这甚至可以工作.