需求:
有两个表A,和B,分别为300万和400万数据,B中path字段包含目录信息,需要匹配A中的path信息,及B.path in A.path且需要包含信息。
问题:
1 如果直接关联查询,如用like,instr,查询非常慢,执行一天都执行不出来结果。执行计划无法使用索引,直接全表扫描,效率奇差
2 采用 full text索引后,搜索出来的结果都是模糊查询,一大堆结果,所以无法联合查询
解决办法:
1 修改表引擎为MyISAM,设置表A.path字段为full text索引,采 “IN BOOLEAN MODE”查询
2 采用生产者消费者模式,然后逐一查询,
做个记号,不知道有没有其他更好的方法,目前依旧很慢
#encoding:utf-8 import sys sys.path.append('/home/fastqweb/dyh/script') reload(sys) sys.setdefaultencoding('utf-8') from setting import * from util.mysqlclient import mysqlClient from util.log import Log loghander = Log(log_path, 'matchinfo.log') import Queue import time import threading mutex = threading.Lock() def mysql_connect(loghander): mysql_db = mysqlClient(mysql_config['IP'], mysql_config['USER'], mysql_config['PASSWORD'], mysql_config['DB'], mysql_config['PORT']) try: mysql_db.open() except Exception, e: print e loghander.error('mysql can not connect,the ip is {}, the user is {}, the password is {}, ' 'the db is {}, the port is {}'.format(mysql_config['IP'], mysql_config['USER'], mysql_config['PASSWORD'], mysql_config['DB'], mysql_config['PORT'])) exit(0) return mysql_db def get_gf_histroy_data(q): print "I am hear" mysql_db = mysql_connect(loghander) sql = 'select * from B' path_cursor = mysql_db.select(sql) while True: text = path_cursor.fetchone() if text: while q.full(): print 'full' time.sleep(5) q.put(text) else: q.put('FINISH') break def generate_searchword(dir): b = dir.split('/') data = [] for item in b: if item: item = '+' + item data.append(item) return ' '.join(data) def get_sample_machine(q): print "I am in" mysql_db = mysql_connect(loghander) while True: mutex.acquire() status = q.empty() if not status: text = q.get() mutex.release() if text and text != 'FINISH': backup_dir = text[8] if backup_dir: new_dir = generate_searchword(backup_dir) path_sql = "select * from A where match(machine_path) against(\'{}\'IN BOOLEAN MODE)".format(new_dir) path_cursor = mysql_db.select(path_sql) result_text = path_cursor.fetchall() if result_text: for item in result_text: if backup_dir in item[12]: mutex.acquire() loghander.info('path match result:' + str(item )) mutex.release() if text == 'FINISH': break else: mutex.release() if __name__ == '__main__': loghander.info('begin ') q = Queue.Queue(maxsize=1000) #g1 = gevent.spawn(get_gf_histroy_data, q) tread_list =[threading.Thread(target=get_gf_histroy_data, args=(q,))] for i in range(100): tread_list.append(threading.Thread(target=get_sample_machine, args=(q,))) for item in tread_list: item.start() for item in tread_list: item.join() loghander.info('finish')