在好例子网,分享、交流、成长!
您当前所在位置:首页Others 开发实例一般编程问题 → Learning.Spark.Lightning-Fast.Big.Data.Analysis.pdf

Learning.Spark.Lightning-Fast.Big.Data.Analysis.pdf

一般编程问题

下载此实例
  • 开发语言:Others
  • 实例大小:6.09M
  • 下载次数:1
  • 浏览次数:80
  • 发布时间:2020-06-19
  • 实例类别:一般编程问题
  • 发 布 人:robot666
  • 文件格式:.pdf
  • 所需积分:2
 

实例介绍

【实例简介】
Learning Spark, pdf格式, 为数不多的spark著作,值得细看
Learning spark by holden Karau andy Konwins ki, Patrick Wendell, and matei zaharia Copyright 2015 Databricks. All rights reserved Printed in the united states of america Published by o reilly media, Inc, 1005 Gravenstein Highway north, Sebastopol, ca 95472. O'Reilly books may be purchased for educational, business or sales promotional use Onlineeditionsarealsoavailableformosttitles(http://safaribooksonline.com).Formore information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editors: Ann Spencer and marie beaugureau Production editor: Kara ebrahim Copyeditor: Rachel Monaghan Proofreader: charles roumeliotis Indexer: ellen troutman Interior Designer: David Futato Cover Designer: Ellie Volckhausen Illustrator: Rebecca demarest February 2015: First Edition Revision history for the First Edition 2015-01-26: First Release Seehttp:oreilly.com/catalog/errata.csp?isbn=9781449358624forreleasedetails The O' Reilly logo is a registered trademark of o'Reilly Media, InC. Learning Spark, the cover image of a small-spotted catshark, and related trade dress are trademarks of O Reilly media, Inc While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the nformation and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights 978-1-449-35862-4 LSI Foreword In a very short time, Apache Spark has emerged as the next generation big data processing engine, and is being applied throughout the industry faster than ever. Spark improves over Hadoop MapReduce, which helped ignite the big data revolution, in several key dimensions: it is much faster, much easier to use due to its rich APIs, and it goes far beyond batch applications to support a variety of workloads, including interactive queries, streaming, machine learning, and graph processing I have been privileged to be closely involved with the development of Spark all the way trom the drawing board to what has become the most active big data open source project today, and one of the most active Apache projects! As such, I,'m particularly delighted to see Matei Zaharia, the creator of Spark, teaming up with other longtime Spark developers Patrick Wendell, Andy Konwinski, and Holden Karau to write this book With Spark's rapid rise in popularity, a major concern has been lack of good reference material. This book goes a long way to address this concern, with 11 chapters and dozens of detailed examples designed for data scientists, students, and developers looking to learn Spark. It is written to be approachable by readers with no background in big data, making it a great place to start learning about the field in general. I hope that many years from now, you and other readers will fondly remember this as the book that introduced you to this exciting new field Ion stoica, ceo of Databricks and Co-director, AMPlab, UC Berkeley Preface As parallel data analysis has grown common, practitioners in many fields have sought easier tools for this task. Apache Spark has quickly emerged as one of the most popular, extending and generalizing MapReduce. Spark offers three main benefits. First, it is easy to use- you can develop applications on your laptop, using a high-level api that lets you focus on the content of your computation. Second, Spark is fast, enabling interactive use and complex algorithms. And third, Spark is a general engine, letting you combine multiple types of computations(e. g SQL queries, text processing, and machine learning) that might previously have required different engines. These features make Spark an excellent starting point to learn about Big data in general This introductory book is meant to get you up and running with Spark quickly. You'll learn how to download and run Spark on your laptop and use it interactively to learn the API. Once there we'l cover the details of available operations and distributed execution Finally, you' ll get a tour of the higher-level libraries built into Spark, including libraries for machine learning, stream processing, and SQL. We hope that this book gives you the tools to quickly tackle data analysis problems, whether you do so on one machine or hundreds Audience This book targets data scientists and engineers. We chose these two groups because they have the most to gain from using Spark to expand the scope of problems they can solve Spark's rich collection of data-focused libraries (like MLlib) makes it easy for data scientists to go beyond problems that fit on a single machine while using their statistical background. Engineers, meanwhile, will learn how to write general-purpose distributed programs in Spark and operate production applications. Engineers and data scientists will both learn different details from this book, but will both be able to apply Spark to solve large distributed problems in their respective fields Data scientists focus on answering questions or building models from data. They often have a statistical or math background and some familiarity with tools like Python, R, and SQL. We have made sure to include Python and, where relevant, SQL examples for all our material, as well as an overview of the machine learning and library in Spark. If you are a data scientist, we hope that after reading this book you will be able to use the same mathematical approaches to solve problems, except much faster and on a much larger scale The second group this book targets is software engineers who have some experience with Java, Python, or another programming language. If you are an engineer, we hope that this book will show you how to set up a Spark cluster, use the Spark shell, and write Spark applications to solve parallel processing problems. If you are familiar with Hadoop, you have a bit of a head start on figuring out how to interact with hdFS and how to manage a cluster, but either way, we will cover basic distributed execution concepts Regardless of whether you are a data scientist or engineer, to get the most out of this book you should have some familiarity with one of Python, Java, Scala, or a similar language We assume that you already have a storage solution for your data and we cover how to load and save data from many common ones, but not how to set them up. If you don't have experience with one of those languages, don' t worry: there are excellent resources available to learn these we call out some of the books available in "Supporting books? How This Book Is Organized The chapters of this book are laid out in such a way that you should be able to go through the material front to back. At the start of each chapter, we will mention which sections we think are most relevant to data scientists and which sections we think are most relevant for engineers. That said, we hope that all the material is accessible to readers of either background The first two chapters will get you started with getting a basic S park installation on your laptop and give you an idea of what you can accomplish with Spark. Once weve got the motivation and setup out of the way, we will dive into the Spark shell, a very useful tool for development and prototyping. Subsequent chapters then cover the Spark programming interface in detail, how applications execute on a cluster, and higher-level libraries available on Spark( such as Spark SQL and MLlib) Supporting Books If you are a data scientist and don' t have much experience with Python, the books Learning Python and Head First Python(both O'Reilly) are excellent introductions. If you have some Python experience and want more, Dive into Python(Apress)is a great book to help you get a deeper understanding of python If you are an engineer and after reading this book you would like to expand your data analysis skills, Machine learning for hackers and doing data Science are excellent books(both O'Reilly) This book is intended to be accessible to beginners. We do intend to release a deep -dive follow-up for those looking to gain a more thorough understanding of spark's internals Conventions used in This book The following typographical conventions are used in this book Italic Indicates new terms URLs. email addresses filenames, and file extensions Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables statements, and keywords Constant width bold Shows commands or other text that should be typed literally by the user Constant width italic Shows text that should be replaced with user-supplied values or by values determined by context P This element signifies a tip or suggestion Warning This element indicates a warning or caution 【实例截图】
【核心代码】

标签:

实例下载地址

Learning.Spark.Lightning-Fast.Big.Data.Analysis.pdf

不能下载?内容有错? 点击这里报错 + 投诉 + 提问

好例子网口号:伸出你的我的手 — 分享

网友评论

发表评论

(您的评论需要经过审核才能显示)

查看所有0条评论>>

小贴士

感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。

  • 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
  • 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
  • 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
  • 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。

关于好例子网

本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明

;
报警