在好例子网,分享、交流、成长!
您当前所在位置:首页Java 开发实例Java语言基础 → WebMagic 0.4.0 发布,Java爬虫框架

WebMagic 0.4.0 发布,Java爬虫框架

Java语言基础

下载此实例
  • 开发语言:Java
  • 实例大小:0.26M
  • 下载次数:13
  • 浏览次数:46
  • 发布时间:2023-05-04
  • 实例类别:Java语言基础
  • 发 布 人:js2021
  • 文件格式:.zip
  • 所需积分:2
 相关标签: java爬虫 java web 4.0 IC

实例介绍

【实例简介】WebMagic 0.4.0 发布,Java爬虫框架
修复0 3 2及之前版本连接池不生效的问题 #30 使用HttpClient 4 3 1新的连接池机制 实现连接复用功能 经测试 下载速度可达到90%左右的提升 测试代码:Kr36NewsModel java 二 增加同步抓取的API 对于小规模的抓取...

【实例截图】

from clipboard

【核心代码】
文件清单
└── webmagic-master
    ├── en_docs
    │   └── README.md
    ├── pom.xml
    ├── README.md
    ├── release-note.md
    ├── user-manual.md
    ├── webmagic-core
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   ├── java
    │       │   │   └── us
    │       │   │       └── codecraft
    │       │   │           └── webmagic
    │       │   │               ├── downloader
    │       │   │               │   ├── Downloader.java
    │       │   │               │   ├── HttpClientDownloader.java
    │       │   │               │   ├── HttpClientGenerator.java
    │       │   │               │   └── package.html
    │       │   │               ├── package.html
    │       │   │               ├── Page.java
    │       │   │               ├── pipeline
    │       │   │               │   ├── CollectorPipeline.java
    │       │   │               │   ├── ConsolePipeline.java
    │       │   │               │   ├── FilePipeline.java
    │       │   │               │   ├── package.html
    │       │   │               │   ├── Pipeline.java
    │       │   │               │   └── ResultItemsCollectorPipeline.java
    │       │   │               ├── processor
    │       │   │               │   ├── example
    │       │   │               │   │   ├── BaiduBaikePageProcesser.java
    │       │   │               │   │   ├── GithubRepoPageProcesser.java
    │       │   │               │   │   └── OschinaBlogPageProcesser.java
    │       │   │               │   ├── package.html
    │       │   │               │   ├── PageProcessor.java
    │       │   │               │   └── SimplePageProcessor.java
    │       │   │               ├── Request.java
    │       │   │               ├── ResultItems.java
    │       │   │               ├── scheduler
    │       │   │               │   ├── package.html
    │       │   │               │   ├── PriorityScheduler.java
    │       │   │               │   ├── QueueScheduler.java
    │       │   │               │   └── Scheduler.java
    │       │   │               ├── selector
    │       │   │               │   ├── AndSelector.java
    │       │   │               │   ├── BaseElementSelector.java
    │       │   │               │   ├── CssSelector.java
    │       │   │               │   ├── ElementSelector.java
    │       │   │               │   ├── Html.java
    │       │   │               │   ├── OrSelector.java
    │       │   │               │   ├── package.html
    │       │   │               │   ├── PlainText.java
    │       │   │               │   ├── RegexResult.java
    │       │   │               │   ├── RegexSelector.java
    │       │   │               │   ├── ReplaceSelector.java
    │       │   │               │   ├── Selectable.java
    │       │   │               │   ├── Selector.java
    │       │   │               │   ├── Selectors.java
    │       │   │               │   ├── SmartContentSelector.java
    │       │   │               │   ├── XpathSelector.java
    │       │   │               │   └── XsoupSelector.java
    │       │   │               ├── Site.java
    │       │   │               ├── Spider.java
    │       │   │               ├── Task.java
    │       │   │               └── utils
    │       │   │                   ├── EnvironmentUtil.java
    │       │   │                   ├── Experimental.java
    │       │   │                   ├── FilePersistentBase.java
    │       │   │                   ├── NumberUtils.java
    │       │   │                   ├── package.html
    │       │   │                   ├── ThreadUtils.java
    │       │   │                   └── UrlUtils.java
    │       │   └── resources
    │       │       └── log4j.xml
    │       └── test
    │           ├── java
    │           │   └── us
    │           │       └── codecraft
    │           │           └── webmagic
    │           │               ├── downloader
    │           │               │   └── HttpClientDownloaderTest.java
    │           │               ├── HtmlTest.java
    │           │               ├── scheduler
    │           │               │   └── PrioritySchedulerTest.java
    │           │               ├── selector
    │           │               │   ├── ExtractorsTest.java
    │           │               │   └── RegexSelectorTest.java
    │           │               ├── SpiderTest.java
    │           │               └── utils
    │           │                   ├── EnvironmentUtilTest.java
    │           │                   └── UrlUtilsTest.java
    │           └── resources
    │               └── log4j.xml
    ├── webmagic-extension
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   └── java
    │       │       └── us
    │       │           └── codecraft
    │       │               └── webmagic
    │       │                   ├── downloader
    │       │                   │   └── FileCache.java
    │       │                   ├── example
    │       │                   │   ├── BaiduBaike.java
    │       │                   │   ├── GithubRepo.java
    │       │                   │   └── OschinaBlog.java
    │       │                   ├── model
    │       │                   │   ├── AfterExtractor.java
    │       │                   │   ├── annotation
    │       │                   │   │   ├── ComboExtract.java
    │       │                   │   │   ├── ExtractBy.java
    │       │                   │   │   ├── ExtractByUrl.java
    │       │                   │   │   ├── Formatter.java
    │       │                   │   │   ├── HelpUrl.java
    │       │                   │   │   ├── package.html
    │       │                   │   │   └── TargetUrl.java
    │       │                   │   ├── ConsolePageModelPipeline.java
    │       │                   │   ├── Extractor.java
    │       │                   │   ├── FieldExtractor.java
    │       │                   │   ├── formatter
    │       │                   │   │   ├── BasicTypeFormatter.java
    │       │                   │   │   ├── DateFormatter.java
    │       │                   │   │   ├── ObjectFormatter.java
    │       │                   │   │   └── ObjectFormatters.java
    │       │                   │   ├── HasKey.java
    │       │                   │   ├── ModelPageProcessor.java
    │       │                   │   ├── ModelPipeline.java
    │       │                   │   ├── OOSpider.java
    │       │                   │   ├── package.html
    │       │                   │   ├── PageModelCollectorPipeline.java
    │       │                   │   └── PageModelExtractor.java
    │       │                   ├── MultiPageModel.java
    │       │                   ├── pipeline
    │       │                   │   ├── CollectorPageModelPipeline.java
    │       │                   │   ├── FilePageModelPipeline.java
    │       │                   │   ├── JsonFilePageModelPipeline.java
    │       │                   │   ├── JsonFilePipeline.java
    │       │                   │   ├── MultiPagePipeline.java
    │       │                   │   └── PageModelPipeline.java
    │       │                   ├── scheduler
    │       │                   │   ├── FileCacheQueueScheduler.java
    │       │                   │   └── RedisScheduler.java
    │       │                   ├── selector
    │       │                   │   └── JsonPathSelector.java
    │       │                   └── utils
    │       │                       ├── DoubleKeyMap.java
    │       │                       ├── ExtractorUtils.java
    │       │                       └── MultiKeyMapBase.java
    │       └── test
    │           ├── java
    │           │   └── us
    │           │       └── codecraft
    │           │           └── webmagic
    │           │               ├── downloader
    │           │               │   └── FileCacheTest.java
    │           │               ├── formatter
    │           │               │   └── DateFormatterTest.java
    │           │               ├── MockDownloader.java
    │           │               ├── MockPageModelPipeline.java
    │           │               ├── MockPipeline.java
    │           │               ├── model
    │           │               │   └── GithubRepoTest.java
    │           │               ├── processor
    │           │               │   └── GithubRepoProcessor.java
    │           │               ├── scheduler
    │           │               │   └── RedisSchedulerTest.java
    │           │               └── selector
    │           │                   └── JsonPathSelectorTest.java
    │           └── resouces
    │               └── log4j.xml
    ├── webmagic-lucene
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       └── main
    │           ├── java
    │           │   └── us
    │           │       └── codecraft
    │           │           └── webmagic
    │           │               └── pipeline
    │           │                   └── LucenePipeline.java
    │           └── test
    │               └── java
    │                   └── us
    │                       └── codecraft
    │                           └── webmagic
    │                               └── lucene
    │                                   └── OschinaBlog.java
    ├── webmagic-samples
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   ├── java
    │       │   │   └── us
    │       │   │       └── codecraft
    │       │   │           └── webmagic
    │       │   │               ├── main
    │       │   │               │   └── QuickStarter.java
    │       │   │               ├── model
    │       │   │               │   └── samples
    │       │   │               │       ├── Blog.java
    │       │   │               │       ├── GithubRepo.java
    │       │   │               │       ├── IteyeBlog.java
    │       │   │               │       ├── Kr36NewsModel.java
    │       │   │               │       ├── News163.java
    │       │   │               │       ├── OschinaAnswer.java
    │       │   │               │       └── OschinaBlog.java
    │       │   │               └── samples
    │       │   │                   ├── DiandianBlogProcessor.java
    │       │   │                   ├── HuxiuProcessor.java
    │       │   │                   ├── InfoQMiniBookProcessor.java
    │       │   │                   ├── IteyeBlogProcessor.java
    │       │   │                   ├── NjuBBSProcessor.java
    │       │   │                   ├── OschinaBlogPageProcesser.java
    │       │   │                   ├── OschinaPageProcesser.java
    │       │   │                   ├── QzoneBlogProcessor.java
    │       │   │                   ├── scheduler
    │       │   │                   │   ├── DelayQueueScheduler.java
    │       │   │                   │   ├── LevelLimitScheduler.java
    │       │   │                   │   └── ZipCodePageProcessor.java
    │       │   │                   ├── SinaBlogProcesser.java
    │       │   │                   └── TianyaPageProcesser.java
    │       │   └── resources
    │       │       └── log4j.xml
    │       └── test
    │           └── java
    │               └── us
    │                   └── codecraft
    │                       └── webmagic
    │                           ├── model
    │                           │   └── ProcessorBenchmark.java
    │                           ├── processor
    │                           │   └── SinablogProcessorTest.java
    │                           ├── samples
    │                           │   └── scheduler
    │                           │       └── DelayQueueSchedulerTest.java
    │                           └── SpiderTest.java
    ├── webmagic-saxon
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   └── java
    │       │       └── us
    │       │           └── codecraft
    │       │               └── webmagic
    │       │                   └── selector
    │       │                       └── Xpath2Selector.java
    │       └── test
    │           └── java
    │               └── us
    │                   └── codecraft
    │                       └── webmagic
    │                           └── selector
    │                               └── XpathSelectorTest.java
    ├── webmagic-selenium
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   └── java
    │       │       └── us
    │       │           └── codecraft
    │       │               └── webmagic
    │       │                   └── downloader
    │       │                       └── selenium
    │       │                           ├── SeleniumDownloader.java
    │       │                           └── WebDriverPool.java
    │       └── test
    │           └── java
    │               └── us
    │                   └── codecraft
    │                       └── webmagic
    │                           ├── downloader
    │                           │   ├── selenium
    │                           │   │   ├── SeleniumDownloaderTest.java
    │                           │   │   └── WebDriverPoolTest.java
    │                           │   └── SeleniumTest.java
    │                           └── samples
    │                               └── HuabanProcessor.java
    └── zh_docs
        ├── README.md
        └── us
            └── codecraft
                └── webmagic
                    ├── downloader
                    │   ├── Destroyable-cmnt.xml
                    │   ├── Downloader-cmnt.xml
                    │   ├── FileDownloader-cmnt.xml
                    │   ├── HttpClientDownloader-cmnt.xml
                    │   ├── HttpClientPool-cmnt.xml
                    │   └── package.cmnt
                    ├── model
                    │   ├── AfterExtractor-cmnt.xml
                    │   ├── annotation
                    │   │   ├── ComboExtract-cmnt.xml
                    │   │   ├── ExtractBy2-cmnt.xml
                    │   │   ├── ExtractBy2.Type-cmnt.xml
                    │   │   ├── ExtractBy3-cmnt.xml
                    │   │   ├── ExtractBy3.Type-cmnt.xml
                    │   │   ├── ExtractBy-cmnt.xml
                    │   │   ├── ExtractByRaw-cmnt.xml
                    │   │   ├── ExtractByRaw.Type-cmnt.xml
                    │   │   ├── ExtractBy.Type-cmnt.xml
                    │   │   ├── ExtractByUrl-cmnt.xml
                    │   │   ├── HelpUrl-cmnt.xml
                    │   │   ├── package.cmnt
                    │   │   └── TargetUrl-cmnt.xml
                    │   ├── ConsolePageModelPipeline-cmnt.xml
                    │   ├── HasKey-cmnt.xml
                    │   ├── OOSpider-cmnt.xml
                    │   ├── package.cmnt
                    │   └── PageModelPipeline-cmnt.xml
                    ├── package.cmnt
                    ├── Page-cmnt.xml
                    ├── PagedModel-cmnt.xml
                    ├── pipeline
                    │   ├── ConsolePipeline-cmnt.xml
                    │   ├── FilePipeline-cmnt.xml
                    │   ├── JsonFilePageModelPipeline-cmnt.xml
                    │   ├── JsonFilePipeline-cmnt.xml
                    │   ├── package.cmnt
                    │   ├── PagedPipeline-cmnt.xml
                    │   └── Pipeline-cmnt.xml
                    ├── processor
                    │   ├── package.cmnt
                    │   ├── PageProcessor-cmnt.xml
                    │   └── SimplePageProcessor-cmnt.xml
                    ├── Request-cmnt.xml
                    ├── ResultItems-cmnt.xml
                    ├── scheduler
                    │   ├── FileCacheQueueScheduler-cmnt.xml
                    │   ├── package.cmnt
                    │   ├── QueueScheduler-cmnt.xml
                    │   ├── RedisScheduler-cmnt.xml
                    │   └── Scheduler-cmnt.xml
                    ├── selector
                    │   ├── AndSelector-cmnt.xml
                    │   ├── CssSelector-cmnt.xml
                    │   ├── Html-cmnt.xml
                    │   ├── JsonPathSelector-cmnt.xml
                    │   ├── OrSelector-cmnt.xml
                    │   ├── package.cmnt
                    │   ├── PlainText-cmnt.xml
                    │   ├── RegexSelector-cmnt.xml
                    │   ├── ReplaceSelector-cmnt.xml
                    │   ├── Selectable-cmnt.xml
                    │   ├── Selector-cmnt.xml
                    │   ├── SelectorFactory-cmnt.xml
                    │   ├── SmartContentSelector-cmnt.xml
                    │   └── XpathSelector-cmnt.xml
                    ├── Site-cmnt.xml
                    ├── Spider-cmnt.xml
                    ├── Task-cmnt.xml
                    └── utils
                        ├── DoubleKeyMap-cmnt.xml
                        ├── FilePersistentBase-cmnt.xml
                        ├── MultiKeyMapBase-cmnt.xml
                        ├── package.cmnt
                        ├── ThreadUtils-cmnt.xml
                        └── UrlUtils-cmnt.xml

134 directories, 232 files

标签: java爬虫 java web 4.0 IC

实例下载地址

WebMagic 0.4.0 发布,Java爬虫框架

不能下载?内容有错? 点击这里报错 + 投诉 + 提问

好例子网口号:伸出你的我的手 — 分享

网友评论

发表评论

(您的评论需要经过审核才能显示)

查看所有0条评论>>

小贴士

感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。

  • 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
  • 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
  • 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
  • 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。

关于好例子网

本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明

;
报警