

import csv, re # required imports

infile = open('Book1.csv', 'rt')  # open the csv file
reader = csv.reader(infile)  # read the csv file

strings = [] # initialize a list to read the rows into

for row in reader: # loop over all the rows in the csv file 
    strings += row  # put them into the list

link_list = []  # initialize list that all the links will be put in
for i in strings:  #  loop over the list to access each string for regex (can't regex on lists)

    links ='((https?|ftp)://|www\.)[^\s/$.?#].[^\s]*', i) # regex to find the links
    if links != None: # if it finds a link..
        link_list.append(links) # put it into the list!

for link in link_list: # iterate the links over a loop so we can have them in a nice column format


<_sre.SRE_Match object; span=(49, 80), match='"'>
<_sre.SRE_Match object; span=(29, 115), match='>
<_sre.SRE_Match object; span=(34, 117), match='>
<_sre.SRE_Match object; span=(32, 115), match='>
<_sre.SRE_Match object; span=(76, 166), match='>
<_sre.SRE_Match object; span=(9, 34), match='"'>


最佳答案 这里的问题是re.search返回
match object而不是匹配字符串,你需要使用



for i in strings:  #  loop over the list to access each string for regex (can't regex on lists)

    links ='((https?|ftp)://|www\.)[^\s/$.?#].[^\s]*', i) # regex to find the links
    if links != None: # if it finds a link..

group([group1, …])

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group. If a group number is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is None. If a group is contained in a part of the pattern that matched multiple times, the last match is returned.
