试图改善下面的正则表达式:
urlpath=columns[4].strip()
urlpath=re.sub("(\?.*|\/[0-9a-f]{24})","",urlpath)
urlpath=re.sub("\/[0-9\/]*","/",urlpath)
urlpath=re.sub("\;.*","",urlpath)
urlpath=re.sub("\/",".",urlpath)
urlpath=re.sub("\.api","api",urlpath)
if urlpath in dlatency:
这会转换如下的URL:
/api/v4/path/apiCallTwo?host=wApp&trackId=1347158
至
api.v4.path.apiCallTwo
想要尝试改进正则表达式的性能,每5分钟这个脚本大约运行50,000个文件,整个运行大约需要40秒.
谢谢
最佳答案 试试这个:
s = '/api/v4/path/apiCallTwo?host=wApp&trackId=1347158'
re.sub(r'\?.+', '', s).replace('/', '.')[1:]
> 'api.v4.path.apiCallTwo'
为了获得更好的性能,请编译一次正则表达式并重用它,如下所示:
regexp = re.compile(r'\?.+')
s = '/api/v4/path/apiCallTwo?host=wApp&trackId=1347158'
# `s` changes, but you can reuse `regexp` as many times as needed
regexp.sub('', s).replace('/', '.')[1:]
一种更简单的方法,不使用正则表达式:
s[1:s.index('?')].replace('/', '.')
> 'api.v4.path.apiCallTwo'