实例介绍
【实例简介】
Hadoop.The.Definitive.Guide.4th.Edition.2015.3.pdf
OURTH EDITION Hadoop: The Definitive guide Tom white Beijing· Cambridge· Farnham·.Kon· Sebastopol· Tokyo OREILLY° Hadoop: The definitive Guide fourth edition by tom white Copyright C 2015 Tom White. All rights reserved Printed in the United States of america Published by Oreilly Media, InC, 1005 Gravenstein Highway North, Sebastopol, CA 95472 OReilly books may be purchased for educational,business, or sales promotional use. Online editions are alsoavailableformosttitles(http://safaribooksonline.com).Formoreinformationcontactourcorporate institutionalsalesdepartment:800-998-9938orcorporate@oreilly.com Editors: Mike Loukides and Meghan blanchette Indexer: Lucie haskins Production editor: matthew hacker Cover Designer: Ellie Volckhausen Copyeditor: Jasmine Kwityn Interior Designer: David Futato Proofreader: Rachel head lustrator: Rebecca demarest June 2009 First edition October 2010: Second edition May2012: Third edition April 2015: Fourth edition Revision History for the Fourth Edition: 2015-03-19: First release 2015-04-17: Second release Seehttp://oreilly.com/catalog/errata.csp?isbn=9781491901632forreleasedetails The O reilly logo is a registered trademark of O Reilly Media, InC. Hadoop: The Definitive Guide, the cover image of an African elephant, and related trade dress are trademarks of o reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and OReilly Media, Inc was aware ofa trademark claim, the designations have been printed in caps or initial caps While the publisher and the author have used good faith efforts to ensure that the information and instruc tions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk if any code samples or other technology this work contains or describes is subject to open source licenses or the intel lectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights ISBN:978-1-491-90163-2 [L For eliane, emilia, and lottie Table of contents Foreword ,xⅶil Prefab Part 1. Hadoop fundamentals Meet Hadoop. Data Data Storage and analysis Querying All Your Data Beyond Batch Comparison with Other Systems 3356688 Relational database management Systems Grid Computing 10 Volunteer Computing 11 A Brief History of Apache Hadoop 12 What's in this book 15 2. MapReduce. 19 A Weather Dataset 19 Data format 19 analyzing the data with Unix tools 21 Analyzing the Data with Hadoop 22 Map and reduce 22 Java Map reduce 24 Scaling out 30 Data flow 30 Combiner functions 34 Running a distributed Map Reduce job Hadoop Streaming 37 Rub y 37 Python 3. The Hadoop Distributed Filesystem...................... 43 The Design of HDFS HDFS Concepts 45 Blocks 45 Namenodes and datanodes 46 Block caching HDFS Federation 48 HDFS High Availability The Command-Line Interface 50 Basic Filesystem operations 51 Hadoop Filesystems 53 Interfaces 54 The Java Interface 5 Reading Data from a Hadoop URL 57 Reading Data Using the FileSystem API 58 Writing data 61 Directories 63 Querying the Filesystem Deleting data 68 Data flow 69 Anatomy of a File read 69 anatomy of a File Write Coherency model 74 Parallel Copying with distcp 76 Keeping an hdFS Cluster balanced 4. YARN 79 Anatomy of a YARN Application Run 80 Resource requests 81 Application Lifespan 82 Building YARN Applications 82 YARN Compared to Map Reduce 1 83 Scheduling in yarn 85 Scheduler options 86 Capacity scheduler configuration 88 Fair Scheduler Configuration 0 Delay schedulin 94 Dominant resource fairness 95 Fu urther readin 96 ⅵi| Table of contents 5. Hadoop /0 97 Data Integrity 99q 7 Data Integrity in HDFS LocalFilesystem Checksum File system 99 Compression 100 Cod 101 Compression and input splits 105 Using Compression in Map reduce 107 Serialization 109 The Writable interface 110 Writable classes 113 Implementing a Custom Writable 121 Serialization frameworks File-Based Data Structures 127 127 Map file 135 Other File formats and Column-Oriented formats 136 Part lL. Map Reduce 6. Developing a MapReduce application . The Configuration API 141 Combining Resources 143 Variable expansion 143 Setting Up the Development Environment 144 Managing Configuration 146 GenericOptions Parser, Tool, and ToolRunner 148 Writing a Unit Test with MRUnit 152 153 R educer 156 Running locally on Test Data 156 Running a Job in a Local Job runner 157 Testing the Driver 158 Running on a cluster 160 Packaging a Job 160 aunching a job 162 The Map reduce Web UI 165 Retrieving the results 167 Debugging a Job 168 Hadoop logs 172 Table of contents|ⅶi Remote Debugging 174 Tuning a job 175 Profiling Tasks 175 Map Reduce Workflows 177 Decomposing a Problem into Map Reduce jobs 177 Job Control 178 apache oozie 179 7. How MapReduce Works 185 anatomy of a MapReduce Job run 185 Job Submission 186 Job initialization 187 Task assignment 188 Task Execution 189 Progress and Status Updates 190 Job Completion 192 Failures 193 Task Failure 193 Application Master Failure 194 Node Manager Failure 195 Resource Manager Failure 196 Shuffle and sort 197 The Map Side 197 The Reduce side 198 Configuration Tuning 201 Task Execution 203 The Task Execution Environment 203 Speculative execution 204 Output Committers 206 8. Map Reduce Types and Formats 209 Map Reduce type es 209 The Default MapReduce Job 214 Input formats 220 Input splits and records 220 Text Input p lt 232 Binary input 236 Multiple inputs 237 D tabase input (and output p 238 Output Formats 238 Text Output 239 Binary output 239 I Table of Contents 【实例截图】
【核心代码】
Hadoop.The.Definitive.Guide.4th.Edition.2015.3.pdf
OURTH EDITION Hadoop: The Definitive guide Tom white Beijing· Cambridge· Farnham·.Kon· Sebastopol· Tokyo OREILLY° Hadoop: The definitive Guide fourth edition by tom white Copyright C 2015 Tom White. All rights reserved Printed in the United States of america Published by Oreilly Media, InC, 1005 Gravenstein Highway North, Sebastopol, CA 95472 OReilly books may be purchased for educational,business, or sales promotional use. Online editions are alsoavailableformosttitles(http://safaribooksonline.com).Formoreinformationcontactourcorporate institutionalsalesdepartment:800-998-9938orcorporate@oreilly.com Editors: Mike Loukides and Meghan blanchette Indexer: Lucie haskins Production editor: matthew hacker Cover Designer: Ellie Volckhausen Copyeditor: Jasmine Kwityn Interior Designer: David Futato Proofreader: Rachel head lustrator: Rebecca demarest June 2009 First edition October 2010: Second edition May2012: Third edition April 2015: Fourth edition Revision History for the Fourth Edition: 2015-03-19: First release 2015-04-17: Second release Seehttp://oreilly.com/catalog/errata.csp?isbn=9781491901632forreleasedetails The O reilly logo is a registered trademark of O Reilly Media, InC. Hadoop: The Definitive Guide, the cover image of an African elephant, and related trade dress are trademarks of o reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and OReilly Media, Inc was aware ofa trademark claim, the designations have been printed in caps or initial caps While the publisher and the author have used good faith efforts to ensure that the information and instruc tions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk if any code samples or other technology this work contains or describes is subject to open source licenses or the intel lectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights ISBN:978-1-491-90163-2 [L For eliane, emilia, and lottie Table of contents Foreword ,xⅶil Prefab Part 1. Hadoop fundamentals Meet Hadoop. Data Data Storage and analysis Querying All Your Data Beyond Batch Comparison with Other Systems 3356688 Relational database management Systems Grid Computing 10 Volunteer Computing 11 A Brief History of Apache Hadoop 12 What's in this book 15 2. MapReduce. 19 A Weather Dataset 19 Data format 19 analyzing the data with Unix tools 21 Analyzing the Data with Hadoop 22 Map and reduce 22 Java Map reduce 24 Scaling out 30 Data flow 30 Combiner functions 34 Running a distributed Map Reduce job Hadoop Streaming 37 Rub y 37 Python 3. The Hadoop Distributed Filesystem...................... 43 The Design of HDFS HDFS Concepts 45 Blocks 45 Namenodes and datanodes 46 Block caching HDFS Federation 48 HDFS High Availability The Command-Line Interface 50 Basic Filesystem operations 51 Hadoop Filesystems 53 Interfaces 54 The Java Interface 5 Reading Data from a Hadoop URL 57 Reading Data Using the FileSystem API 58 Writing data 61 Directories 63 Querying the Filesystem Deleting data 68 Data flow 69 Anatomy of a File read 69 anatomy of a File Write Coherency model 74 Parallel Copying with distcp 76 Keeping an hdFS Cluster balanced 4. YARN 79 Anatomy of a YARN Application Run 80 Resource requests 81 Application Lifespan 82 Building YARN Applications 82 YARN Compared to Map Reduce 1 83 Scheduling in yarn 85 Scheduler options 86 Capacity scheduler configuration 88 Fair Scheduler Configuration 0 Delay schedulin 94 Dominant resource fairness 95 Fu urther readin 96 ⅵi| Table of contents 5. Hadoop /0 97 Data Integrity 99q 7 Data Integrity in HDFS LocalFilesystem Checksum File system 99 Compression 100 Cod 101 Compression and input splits 105 Using Compression in Map reduce 107 Serialization 109 The Writable interface 110 Writable classes 113 Implementing a Custom Writable 121 Serialization frameworks File-Based Data Structures 127 127 Map file 135 Other File formats and Column-Oriented formats 136 Part lL. Map Reduce 6. Developing a MapReduce application . The Configuration API 141 Combining Resources 143 Variable expansion 143 Setting Up the Development Environment 144 Managing Configuration 146 GenericOptions Parser, Tool, and ToolRunner 148 Writing a Unit Test with MRUnit 152 153 R educer 156 Running locally on Test Data 156 Running a Job in a Local Job runner 157 Testing the Driver 158 Running on a cluster 160 Packaging a Job 160 aunching a job 162 The Map reduce Web UI 165 Retrieving the results 167 Debugging a Job 168 Hadoop logs 172 Table of contents|ⅶi Remote Debugging 174 Tuning a job 175 Profiling Tasks 175 Map Reduce Workflows 177 Decomposing a Problem into Map Reduce jobs 177 Job Control 178 apache oozie 179 7. How MapReduce Works 185 anatomy of a MapReduce Job run 185 Job Submission 186 Job initialization 187 Task assignment 188 Task Execution 189 Progress and Status Updates 190 Job Completion 192 Failures 193 Task Failure 193 Application Master Failure 194 Node Manager Failure 195 Resource Manager Failure 196 Shuffle and sort 197 The Map Side 197 The Reduce side 198 Configuration Tuning 201 Task Execution 203 The Task Execution Environment 203 Speculative execution 204 Output Committers 206 8. Map Reduce Types and Formats 209 Map Reduce type es 209 The Default MapReduce Job 214 Input formats 220 Input splits and records 220 Text Input p lt 232 Binary input 236 Multiple inputs 237 D tabase input (and output p 238 Output Formats 238 Text Output 239 Binary output 239 I Table of Contents 【实例截图】
【核心代码】
标签:
好例子网口号:伸出你的我的手 — 分享!
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论