🔥 掘金小册爬虫

《🔥 掘金小册爬虫》

github 堆栈迎接 star

采纳 node https 模块,猎取已购置小册 html 代码,并将 html 代码转换为 markdown 格式文件保留当地。

注重:现在本项目有两个版本,v2 不须要运用 chromium 作为无头浏览器;v1 则运用 chromi 作为无头浏览器模仿用户登录网站;

根据须要挑选差别版本

运用要领

⚠️ 注重:掘金不支持境外收集接见,因而不要运用代办

要领一:npx 直接实行

在当地某目次中实行 npx @oliyg/juejinxiaoce 根据提醒输入用户名暗码以及小册 ID 当提醒 all done 完成

➜  Desktop npx @oliyg/juejinxiaoce
npx: 98 装置胜利,用时 10.748 秒
email: 输入你的用户名暗码
password: 输入你的用户名暗码
bookId: 小册 ID
===navagating to main page
===login...
===getting book section list
===getting book HTML content
口试经常使用技能
===writing html...
===getting book HTML content
===write html file success
===writing markdown...
===write markdown file success
火线的路,让我们结伴同行
===writing html...
===write html file success
===writing markdown...
===write markdown file success

======
All Done...Enjoy.
======

在实行敕令的这个目次中能够找到一个名为 md xxx 的文件夹,内包括 md 文档;在上面这个例子中,我们在 Desktop 桌面目次实行敕令,因而在桌面目次中会天生这个文件夹:

➜  md 1548483715543 ls -al
total 40
drwxr-xr-x  4 oli  staff   128  1 26 14:22 .
drwx------+ 9 oli  staff   288  1 26 14:21 ..
-rw-r--r--  1 oli  staff  4915  1 26 14:21 口试经常使用技能.md
-rw-r--r--  1 oli  staff  8465  1 26 14:22 火线的路,让我们结伴同行.md

要领二:npm i 敕令

运用 npm i -g 装置,并运用 juejinxiaoce 敕令实行:

➜  Desktop npm i -g @oliyg/juejinxiaoce
/Users/oli/.nvm/versions/node/v8.12.0/bin/juejinxiaoce -> /Users/oli/.nvm/versions/node/v8.12.0/lib/node_modules/@oliyg/juejinxiaoce/bin/juejinxiaoce
+ @oliyg/juejinxiaoce@2.2.1
added 98 packages from 201 contributors in 5.89s
➜  Desktop juejinxiaoce
email:
password:
bookId:
===navagating to main page
===login...
...
...

小册ID见 URL 链接:

《🔥 掘金小册爬虫》

实行后守候涌现音讯 all done. enjoy. 完成转换,结果以下:

《🔥 掘金小册爬虫》

《🔥 掘金小册爬虫》

更新日记

  • v2.2.0 增添敕令行形式
  • v2.0.0 运用 node 原生 https 模块,发送要求数据猎取内容,不须要装置 chromium,没有软件权限题目
  • v1.1.2 运用谷歌 puppeteer 作为无头浏览器猎取内容,须要装置 chromium,macOS 中可能有权限题目

常见题目

  • v1.1.2

    • 报错:spawn EACCES

      • 常见于 macOS,请保证 chromium 已被一般装置

免责

  • 不供应用户名和暗码,需运用用户本身的账号暗码登录
  • 仅作为手艺议论,进修和研讨运用

隐私

  • 该项目不会存储和发送任何用户隐私数据

License

The MIT License (MIT)
Copyright (c) 2019 OliverYoung

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the “Software”), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
OR OTHER DEALINGS IN THE SOFTWARE.

    原文作者:JS菌
    原文地址: https://segmentfault.com/a/1190000018033546
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞

发表评论

电子邮件地址不会被公开。 必填项已用*标注