Observability Engineering：可观测性工程

Clojure

下载此实例

开发语言：Others
实例大小：14.34M
下载次数：2
浏览次数：83
发布时间：2022-09-05
实例类别：Clojure
发布人：bronzels
文件格式：.pdf
所需积分：5

实例介绍

[下载地址]

【实例简介】Observability Engineering：可观测性工程

【实例截图】

【核心代码】

Table of Contents

Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Part I. The Path to Observability

1. What Is Observability?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

The Mathematical Definition of Observability 4

Applying Observability to Software Systems 4

Mischaracterizations About Observability for Software 7

Why Observability Matters Now 8

Is This Really the Best Way? 9

Why Are Metrics and Monitoring Not Enough? 9

Debugging with Metrics Versus Observability 11

The Role of Cardinality 13

The Role of Dimensionality 14

Debugging with Observability 16

Observability Is for Modern Systems 17

Conclusion 17

2. How Debugging Practices Differ Between Observability and Monitoring. . . . . . . . . . . 19

How Monitoring Data Is Used for Debugging 19

Troubleshooting Behaviors When Using Dashboards 21

The Limitations of Troubleshooting by Intuition 23

Traditional Monitoring Is Fundamentally Reactive 24

How Observability Enables Better Debugging 26

Conclusion 28

iii3. Lessons from Scaling Without Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

An Introduction to Parse 29

Scaling at Parse 31

The Evolution Toward Modern Systems 33

The Evolution Toward Modern Practices 36

Shifting Practices at Parse 38

Conclusion 41

4. How Observability Relates to DevOps, SRE, and Cloud Native. . . . . . . . . . . . . . . . . . . . . 43

Cloud Native, DevOps, and SRE in a Nutshell 43

Observability: Debugging Then Versus Now 45

Observability Empowers DevOps and SRE Practices 46

Conclusion 48

Part II. Fundamentals of Observability

5. Structured Events Are the Building Blocks of Observability. . . . . . . . . . . . . . . . . . . . . . 51

Debugging with Structured Events 52

The Limitations of Metrics as a Building Block 53

The Limitations of Traditional Logs as a Building Block 55

Unstructured Logs 55

Structured Logs 56

Properties of Events That Are Useful in Debugging 57

Conclusion 59

6. Stitching Events into Traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Distributed Tracing and Why It Matters Now 61

The Components of Tracing 63

Instrumenting a Trace the Hard Way 65

Adding Custom Fields into Trace Spans 68

Stitching Events into Traces 70

Conclusion 71

7. Instrumentation with OpenTelemetry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A Brief Introduction to Instrumentation 74

Open Instrumentation Standards 74

Instrumentation Using Code-Based Examples 75

Start with Automatic Instrumentation 76

Add Custom Instrumentation 78

Send Instrumentation Data to a Backend System 80

Conclusion 82

iv | Table of Contents8. Analyzing Events to Achieve Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Debugging from Known Conditions 84

Debugging from First Principles 85

Using the Core Analysis Loop 86

Automating the Brute-Force Portion of the Core Analysis Loop 88

This Misleading Promise of AIOps 91

Conclusion 92

9. How Observability and Monitoring Come Together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Where Monitoring Fits 96

Where Observability Fits 97

System Versus Software Considerations 97

Assessing Your Organizational Needs 99

Exceptions: Infrastructure Monitoring That Can’t Be Ignored 101

Real-World Examples 101

Conclusion 103

Part III. Observability for Teams

10. Applying Observability Practices in Your Team. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Join a Community Group 107

Start with the Biggest Pain Points 109

Buy Instead of Build 109

Flesh Out Your Instrumentation Iteratively 111

Look for Opportunities to Leverage Existing Efforts 112

Prepare for the Hardest Last Push 114

Conclusion 115

11. Observability-Driven Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Test-Driven Development 117

Observability in the Development Cycle 118

Determining Where to Debug 119

Debugging in the Time of Microservices 120

How Instrumentation Drives Observability 121

Shifting Observability Left 123

Using Observability to Speed Up Software Delivery 123

Conclusion 125

12. Using Service-Level Objectives for Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Traditional Monitoring Approaches Create Dangerous Alert Fatigue 127

Threshold Alerting Is for Known-Unknowns Only 129

Table of Contents | vUser Experience Is a North Star 131

What Is a Service-Level Objective? 132

Reliable Alerting with SLOs 133

Changing Culture Toward SLO-Based Alerts: A Case Study 135

Conclusion 138

13. Acting on and Debugging SLO-Based Alerts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Alerting Before Your Error Budget Is Empty 139

Framing Time as a Sliding Window 141

Forecasting to Create a Predictive Burn Alert 142

The Lookahead Window 144

The Baseline Window 151

Acting on SLO Burn Alerts 152

Using Observability Data for SLOs Versus Time-Series Data 154

Conclusion 156

14. Observability and the Software Supply Chain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Why Slack Needed Observability 159

Instrumentation: Shared Client Libraries and Dimensions 161

Case Studies: Operationalizing the Supply Chain 164

Understanding Context Through Tooling 164

Embedding Actionable Alerting 166

Understanding What Changed 168

Conclusion 170

Part IV. Observability at Scale

15. Build Versus Buy and Return on Investment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

How to Analyze the ROI of Observability 174

The Real Costs of Building Your Own 175

The Hidden Costs of Using “Free” Software 175

The Benefits of Building Your Own 176

The Risks of Building Your Own 177

The Real Costs of Buying Software 179

The Hidden Financial Costs of Commercial Software 179

The Hidden Nonfinancial Costs of Commercial Software 180

The Benefits of Buying Commercial Software 181

The Risks of Buying Commercial Software 182

Buy Versus Build Is Not a Binary Choice 182

Conclusion 183

vi | Table of Contents16. Efficient Data Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

The Functional Requirements for Observability 185

Time-Series Databases Are Inadequate for Observability 187

Other Possible Data Stores 189

Data Storage Strategies 190

Case Study: The Implementation of Honeycomb’s Retriever 193

Partitioning Data by Time 194

Storing Data by Column Within Segments 195

Performing Query Workloads 197

Querying for Traces 199

Querying Data in Real Time 200

Making It Affordable with Tiering 200

Making It Fast with Parallelism 201

Dealing with High Cardinality 202

Scaling and Durability Strategies 202

Notes on Building Your Own Efficient Data Store 204

Conclusion 205

17. Cheap and Accurate Enough: Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Sampling to Refine Your Data Collection 207

Using Different Approaches to Sampling 209

Constant-Probability Sampling 209

Sampling on Recent Traffic Volume 210

Sampling Based on Event Content (Keys) 210

Combining per Key and Historical Methods 211

Choosing Dynamic Sampling Options 211

When to Make a Sampling Decision for Traces 211

Translating Sampling Strategies into Code 212

The Base Case 212

Fixed-Rate Sampling 213

Recording the Sample Rate 213

Consistent Sampling 215

Target Rate Sampling 216

Having More Than One Static Sample Rate 218

Sampling by Key and Target Rate 218

Sampling with Dynamic Rates on Arbitrarily Many Keys 220

Putting It All Together: Head and Tail per Key Target Rate Sampling 222

Conclusion 223

18. Telemetry Management with Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

Attributes of Telemetry Pipelines 226

Routing 226

Table of Contents | viiSecurity and Compliance 227

Workload Isolation 227

Data Buffering 228

Capacity Management 228

Data Filtering and Augmentation 229

Data Transformation 230

Ensuring Data Quality and Consistency 230

Managing a Telemetry Pipeline: Anatomy 231

Challenges When Managing a Telemetry Pipeline 233

Performance 233

Correctness 233

Availability 233

Reliability 234

Isolation 234

Data Freshness 234

Use Case: Telemetry Management at Slack 235

Metrics Aggregation 235

Logs and Trace Events 236

Open Source Alternatives 238

Managing a Telemetry Pipeline: Build Versus Buy 239

Conclusion 240

Part V. Spreading Observability Culture

19. The Business Case for Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

The Reactive Approach to Introducing Change 243

The Return on Investment of Observability 245

The Proactive Approach to Introducing Change 246

Introducing Observability as a Practice 248

Using the Appropriate Tools 249

Instrumentation 250

Data Storage and Analytics 250

Rolling Out Tools to Your Teams 251

Knowing When You Have Enough Observability 252

Conclusion 253

20. Observability’s Stakeholders and Allies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Recognizing Nonengineering Observability Needs 255

Creating Observability Allies in Practice 258

Customer Support Teams 258

Customer Success and Product Teams 259

viii | Table of ContentsSales and Executive Teams 260

Using Observability Versus Business Intelligence Tools 261

Query Execution Time 262

Accuracy 262

Recency 262

Structure 263

Time Windows 263

Ephemerality 264

Using Observability and BI Tools Together in Practice 264

Conclusion 265

21. An Observability Maturity Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

A Note About Maturity Models 267

Why Observability Needs a Maturity Model 268

About the Observability Maturity Model 269

Capabilities Referenced in the OMM 270

Respond to System Failure with Resilience 271

Deliver High-Quality Code 273

Manage Complexity and Technical Debt 274

Release on a Predictable Cadence 275

Understand User Behavior 276

Using the OMM for Your Organization 277

Conclusion 277

22. Where to Go from Here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

Observability, Then Versus Now 279

Additional Resources 281

Predictions for Where Observability Is Going 282

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

标签： Engine bil BS AB ty

实例下载地址