实例介绍
streaming system英文原版,pdf,含目录
Streaming Systems by Tyler akidau, slava Chernyak, and reuven lax Copyright@ 2018 Tyler Akidau, Slava Chernyak, and Reuven Lax. All rights reserved Printed in the united states of america Published by o'reilly media, InC, 1005 Gravenstein Highway north Sebastopol, CA95472 O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles Chttp:/oreilly.com/safari).Formoreinformationcontactour corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editors: Rachel roumeliotis and jeff bleiel Production editor: nicholas adams Copyeditor: Octal Publishing, Inc Proofreader: Kim cofer Indexer: Ellen Troutman-zaig Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator rebecca demarest August 2018: First Edition Revision History for the First Edition 2018-07-12: First release Seehttp:/oreilly.com/catalog/errata.csp?isbn=9781491983874forrelease details The o reilly logo is a registered trademark of o reilly media, Inc Streaming Systems, the cover image, and related trade dress are trademarks of O'Reilly media, inc While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate. the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights 978-1-491-98387-4 LSI Preface Or: What Are You Getting Yourself Into Here? Hello adventurous reader welcome to our book at this point i assume that you're either interested in learning more about the wonders of stream processing or hoping to spend a few hours reading about the glory of the majestic brown trout. Either way, i salute you! That said, those of you in the latter bucket who dont also have an advanced understanding of computer science should consider how prepared you are to deal with disappointment before forging ahead; caveat piscator, and all that To set the tone for this book from the get go, i wanted to give you a heads up about a couple of things First, this book is a little strange in that we have multiple authors but were not pretending that we somehow all speak and write in the same voice like we're weird identical triplets who happened to be born to different sets of parents because as interesting as that sounds the end result would actually be less enjoyable to read. Instead, weve opted to each write in our own voices, and we've granted the book just enough self- awareness to be able to make reference to each of us where appropriate, but not so much self-awareness that it resents us for making it only into a book and not something cooler like a robot dinosaur with a Scottish accent 1 As far as voices go, there are three you' ll come across Tyler That would be me. If you haven't explicitly been told someone else is speaking, you can assume that it's me, because we added the other authors somewhat late in the game, and i was basically like, hells no when i thought about going back and updating everything Id already written. I'm the technical lead for the data Processing languages ands Systems group at Google, responsible for Google cloud dataflow, Google's apache Beam efforts, as well as Google-internal data processing systems such as Flume, MillWheel, and MapReduce. I'm also a founding apache Beam PMC member OREILLY Streaming Systems THE WHAT WHERE, WHEN AND HOW OF LARGE-SCALE DATA PROCESSING ONe ont Tyler Akidau, Slava Chernyak reuven lax Figure P-1. The cover that could have been Slava Slava was a long-time member of the millWheel team at Google, and later an original member of the windmill team that built millwheel's successor, the heretofore unnamed system that powers the streaming Engine in Google cloud dataflow slava is the foremost expert on watermarks and time semantics in stream processing systems the world over, period You might find it unsurprising then that he's the author of Chapter 3, watermarks Reuven Reuven is at the bottom of this list because he has more experience with stream processing than both Slava and me combined and would thus crush us if he were placed any higher. Reuven has created or led the creation of nearly all of the interesting systems-level magic in Googles general-purpose stream processing engines, including applying an untold amount of attention to detail in providing high-throughput, low-latency, exactly-once semantics in a system that nevertheless utilizes fine-grained checkpointing. You might find it unsurprising that he's the author of Chapter 5, Exactly-Once and Side effects. He also happens to be an Apache beam PMc member. Navigating this book Now that you know who you'll be hearing from, the next logical step would be to find out what you'll be hearing about, which brings us to the second thing i wanted to mention. There are conceptually two major parts to this book, each with four chapters, and each followed up by a chapter that stands relatively independently on its own The fun begins with Part I, The Beam Model( Chapters 1-4), which focuses on the high-level batch plus streaming data processing model originally developed for google cloud Dataflow, later donated to the apache software Foundation as apache beam and also now seen in whole or in part across most other systems in the industry. It's composed of four chapters Chapter 1, Streaming 101, which covers the basics of stream processing, establishing some terminology, discussing the capabilities of streaming systems, distinguishing between two important domains of time(processing time and event time), and finally looking at some common data processing patterns. Chapter 2, The What, Where, When, and How of data Processing, which covers in detail the core concepts of robust stream processing over out-of-order data, each anal yzed within the context of a concrete running example and with animated diagrams to highlight the dimension of time Chapter 3, Watermarks(written by Slava), which provides a deep survey of temporal progress metrics, how they are created and how they propagate through pipelines. It ends by examining the details of two real-world watermark implementations Chapter 4, Advanced Windowing, which picks up where Chapter 2 left off, diving into some advanced windowing and triggering concepts like processing-time windows, sessions, and continuation triggers Between Parts I and Il, providing an interlude as timely as the details contained therein are important, stands Chapter 5, Exactly-Once and Side Effects(written by Reuven). In it, he enumerates the challenges of providing end-to-end exactly-once (or effectively-once) processing semantics and walks through the implementation details of three different approaches to exactly- once processing: Apache Flink, Apache Spark, and Google Cloud Dataflow Next begins Part IL, Streams and Tables(Chapters 6-9), which dives deeper into the conceptual and investigates the lower-level"streams and tables" way of thinking about stream processing recently popularized by some upstanding citizens in the apache Kafka community but, of course, invented decades ago by folks in the database community because wasn't everything It too is composed of four chapters Chapter 6, Streams and Tables, which introduces the basic idea of streams and tables, analyzes the classic Map reduce approach through a streams-and-tables lens, and then constructs a theory of streams and tables sufficiently general to encompass the full breadth of the Beam Model(and beyond Chapter 7, The Practicalities of Persistent State, which considers the motivations for persistent state in streaming pipelines, looks at two common types of implicit state, and then analyzes a practical use case(advertising attribution to inform the necessary characteristics of a general state management mechanism Chapter 8, Streaming SQL, which investigates the meaning of streaming within the context of relational algebra and SQL, contrasts the inherent stream and table biases within the beam model and classic SQL as they exist today and proposes a set of possible paths forward toward incorporating robust streaming semantics in SQL Chapter 9, Streaming Joins, which surveys a variety of different join types, analyzes their behavior within the context of streaming, and finally looks in detail at a useful but ill-supported streaming join use case: temporal validity windows Finally, closing out the book is Chapter 10, The Evolution of large-scale Data Processing, which strolls through a focused history of the Mapreduce lineage of data processing systems, examining some of the important contributions that have evolved streaming systems into what they are today. Takeaways As a final bit of guidance, if you were to ask me to describe the things i most want readers to take away from this book, I would say this The single most important thing you can learn from this book is the theory of streams and tables and how they relate to one another Everything else builds on top of that. No, we wont get to this topic until Chapter 6. That's okay; it's worth the wait, and you'll be better prepared to appreciate its awesomeness by then. Time-varying relations are a revelation. They are stream processing incarnate: an embodiment of everything streaming systems are built to achieve and a powerful connection to the familiar tools we all know and love from the world of batch We won 't learn about them until Chapter 8, but again, the journey there will help you appreciate them all the more A well-written distributed streaming engine is a magical thing This arguably goes for distributed systems in general, but as you learn more about how these systems are built to provide the semantics they do(in particular, the case studies from Chapters 3 and 5), it becomes all the more apparent just how much heavy lifting they're doing for you. LaTeX/Tikz is an amazing tool for making diagrams, animated or otherwise. a horrible crusty tool with sharp edges and tetanus, but an incredible tool nonetheless. i hope the clarity the animated diagrams in this book bring to the complex topics we discuss will inspire more people to give lateX/Tikz a try(in"Figures,", we provide for a link to the full source for the animations from this book) Conventions Used in This book The following typographical conventions are used in this book Italic Indicates new terms. URLs, email addresses filenames. and file extensions Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords 【实例截图】
【核心代码】
标签:
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论