用Python识别网站,app上面的元素

2023年1月22日 136次阅读来源: 数据运营python

基于反爬虫的场景，有一些敏感元素例如手机号码，有一些网站是直接用图片的方式展示，这样就没有办法爬取；还有一种是产品只有手机端，由于网络的传送过程是通过加密传输，即使截取了报文，也没有办法进行解密，这个时候可以通过对手机的界面截图，进行图片识别。

1. 思路

通过屏幕的截屏，保存要处理的界面的图片，通过定位到要获取信息的元素的位置，通过属性获取该元素的坐标，然后在之前保存的图片上，根据坐标截图对应的区间，然后通过pytesseract的包的方法进行图片识别。

2. 依赖的包安装

安装Pillow

pip install Pillow

安装tesseract-ocr
github地址: https://github.com/tesseract-ocr/tesseract
直接安装就可以

安装pytesseract

pip install pytesseract

3. 代码实现

    screenshotPath="e:\pythonimage\image01.png"
    saveImagePath="E:\pythonimage\yanzhengma01.png"
    webdriver.save_screenshot(screenshotPath)
    imglement = webdriver.find_element_by_id("genCheckCode")    #定位验证码
    location = imglement.location     #获取验证码X,Y的坐标
    size = imglement.size             #获取验证码的长宽
    #写成我们需要的位置坐标
    rangle = (int(location['x'])+10,int(location['y']),int(location['x']+size['width']-10),int(location['y']+size['height']))  
    image = Image.open((screenshotPath))     #打开截图
    frame4 = image.crop(rangle)                #使用image的crop函数，从截图中再次截取我们的区域
    frame4.save(saveImagePath)
    qq = Image.open(saveImagePath)
    text = pytesseract.image_to_string(qq).strip() #  使用image_to_string识别验证码
    frame4.close
    image.close()
    qq.close()

欢迎关注公众：sjyy_python

    原文作者：数据运营python
    原文地址: https://www.jianshu.com/p/022bb4a7cd61
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。