实例介绍
斯坦福大学较为优秀的一个正则化教材,涉及到统计相关知识,也涉及到稀疏性质,对很多理论都做出了很好的诠释
Contents Preface 1 Introduction 2 The lasso for Linear models 2.1 Introduction 1778 2.2 The Lasso estimator 2.3 CrOss-Validation and Inference 13 2.4 Computation of the lasso solution 2.4.1 Single Predictor: Soft Thresholding 15 2.4.2 Multiple Predictors: Cyclic Coordinate Descent 16 2.4.3 Soft-Thresholding and Orthogonal Bases 17 2.5 Degrees of Freedom 17 2.6 Uniqueness of the Lasso Solutions 2.7 A Glimpse at the Theory 2.8 The Nonnegative Garrote 2.9 Penalties and bayes Estimates 22 2.10 Some Perspective Exercises 3 Generalized linear models 29 3.1 Introduction 29 3.2 Logistic Regression 3.2.1 Example: Document Classification 3.2.2 Algorithms 3.3 Multiclass Logistic Regression 36 3.3.1 Example: Handwritten Digits 3.3.2 Algorith 39 3.3.3 Grouped-Lasso Multinomial 3.4 Log-Linear Models and the Poisson GLM 3.4.1 Example: Distribution Smoothing 3.5 Cox Proportional Hazards Modcls 3.5.1 Cross-Validatic 3.5.2 Pre-Validation 45 3.6 Support Vector Machines 46 3.6.1 Logistic Regression with Separable Data 49 3.7 Computational Details and glmnet 50 Bibliographic N 52 Exerc 53 4 Generalizations of the Lasso Penalty 55 4.1 Introduction 55 4.2 The clastic Net 56 4.3 The Group L 58 4.3.1 Computation for the group lasso 62 4.3.2 Sparse Group lasso 64 4.3.3 The Overlap Group Lasso 65 4.4 Sparse Additive Models and the Group Lasso 69 4.4.1 Additive Modcls and Backfitting 4.4.2 Sparse Additive Models and Backfitting 4.4.3 Approaches using Opt imization and t he group l asse 72 4.4.4 Multiple Penalization for Sparse Additive Models 74 4.5 The Fused Lasso 76 4.5. 1 Fitting the Fused Lasso 77 4.5.1.1 Reparametrization 78 4.5.1.2 A Path Algorithm 79 4.5.1.3 A Dual Path Algorit 79 4.5.1.4 Dynamic Programming for the Fused Lasso 80 4.5.2 Trend Filtering 4.5.3 Nearly Isotonic Regression 83 4.6 Nonconvex penalties 84 Bibliographic Notes 86 E excises 88 5 Optimization Methods 95 5.1 Introduction 95 5.2 Convex optimality conditions 95 5.2. 1 Optimality for Differentiable Problems 95 5.2.2 Nondifferentiable Functions and subgradients 98 5.3 Gradient Descent 100 5.3.1 Unconstrained Gradient Descent 5.3.2 Projected Gradient Metliods 102 5.3.3 Proximal Gradient Methods 103 5.3.4 Accelerated Gradient Methods 107 5.4 Coordinate Descent 109 5.4.1 Sepa. rahility and Coordinate Descent. 110 5.4.2 Linear Regression and the Lasso 112 5.4.3 Logistic Regression and Generalized Linear Models 115 5.5 A Simulation Study 5.6 Least Angle Regression 118 5.7 Altcrnating Dircction Mcthod of Multipliers 121 5.8 Minorization-Maximization Algorithms 123 5.9 Biconvexity and Alternating Minimization 124 5.10 Screening Rules Bibliographic notes ppcndIx 132 EXercises 6 Statistical Inference 139 6.1 The Bayesian Lasso 139 6.2 The boot 6.3 Post-Selection Inference for the Lasso 147 6.3.1 The Covariance Test 6.3.2 A Gcncral Schcmc for Post-Sclcction Infcrcncc 150 6.3.2.1 Fixed-入Ife 154 6.3.2.2 The Spacing Test for LAR 156 6.3.3 What Hypothesis Is Being Tested? 6.3.4 Back to Forward Stepwise Regression 6.4 Inference via a debiased lasso 158 6.5 Other Proposals for Post-Selection Inference 160 Bibliographic Notes 161 Exercises 162 7 Matrix Decompositions, Approximations, and Completion 167 7.1 Introduction 167 7.2 The Singular Value Decomposition 7.3 Missing Data and Matrix Completion 169 7.3.1 The Netfix Movie Challenge 7.3.2 Matrix Completion Using Nuclcar Norm 174 7.3.3 Theoretical Results for Matrix Completion 177 7.3.4 Maximum Margin Factorization and Related Methods 181 7.4 Reduced-Rank Regression 7.5 A General Matrix Regression Framework 185 7.6 Penalized Matrix Decomposition 187 7.7 Additive Matrix Decomposition Bibliographic Notes 95 Exercises 8 Sparse Multivariate Met hods 201 8.1 Introduction 201 8.2 Sparse Principal Components analysis 8.2.1 Some Background 202 8.2.2 Sparse Principal Components 204 8.2.2.1 Sparsity from Maximum Variance 204 8.2.2.2 Methods Based on reconstructioN 206 8.2.3 Higher-Rank Solutions 207 8.2.3.1 Illustrative Application of s PCA 209 8.2.4 Sparse PCA via Fantope Projection 210 8.2.5 Sparse Autoencoders and Deep learning 210 8.2.6 Some Theory for Sparse PCA 212 8.3 Sparsc Canonical Corrclation Analysis 213 8.3.1 ExaMple: Netflix Movie Rating Data 215 8.4 Sparse Linear Discriminant Analysis 217 8.4.1 Normal Th neary and d Bayes'Rule 217 8.4.2 Nearest Shrunken Centroids 218 8.4.3 Fishers Linear Discriminant Analysis 221 8.4.3.1 Example: Simulated Data with Five Classes 222 8.4.4 Optimal Scoring 225 8.4.4.1 Exalllple: Face silhouettes 226 5 Sparse Clusterin 227 8.5.1 Somc Background on Clusterin 227 8.5.1.1 Example: Simulated Data with Six Classes 228 8.5.2 Sparse Hierarchical Clustering 228 8.5.3 Sparse K-Means Clustering 230 8.5.4 Convex Clustering 231 Bibliographic Notes 232 Exercises 234 9 Graphs and Model Selection 241 9. 1 Introduction 241 9.2 Basics of Graphical models 241 9. 2.1 Factorization and Markov Properties 241 9.2.1.1 Factorization Propert 24 9.2.1.2 Markov Property 243 9.2.1.3 Equivalence of Factorization and Markov Propcrtics 243 9.2.2 Some Examples 244 9.2.2.1 Discrete graphica. I Models 244 9.2.2.2 Gaussian Graphical Models 245 9.3 Graph Selection via Penalized Likelihood 246 9.3.1 Global Likelihoods for Gaussian Models 247 9.3.2 Graphical Lasso Algorithm 248 9.3.3 Exploiting Block-Diagonal Structure 251 9.3.4 Theoretical Guarantees for the Graphical lasso 252 9.3.5 Global Likelihood for Discrete Models 253 9.4 Graph Selection via Conditional Inference 254 9.4.1 Neighborhood-Based Likelihood for Gaussians 255 9.4.2 Neighborhood-Based Likelihood for Discrete Models 256 9. 4.3 Pseudo-Likelihood for Mixed models 259 9.5 Graphical Models with Hidden Variables 261 Bibliographic Notes 261 Exercises 263 10 Signal Approximation and Compressed Sensing 269 10.1 Introduction 269 10.2 Signals and Sparse Representations 269 10.2.1 Orthogonal bases 269 10.2.2 Approximation in Orthogonal Bases 271 10.2.3 Reconstruction in Overcomplete Bases 274 10.3 Random Projection and Approximation 276 10.3.1 Johnson Lindenstrauss Approximation 277 10.3.2 CoMpressed Sensing 278 10.4 Equivalence between lo and li recovery 280 10.4.1 Restricted Nullspace Pr 281 10.4.2 Sufficient Conditions for Restricted Nullspace 282 10.4.3 Proofs 284 10.41.3.1 Proof of theorem 10.1 284 10.4.3.2 Proof of Proposition 10.1 284 Bibliographic n 285 E 11 Theoretical results for the lasso 289 11.1 Introduction 289 11.1.1T of loss fu 11.1.2 Types of sparsity Models 290 11.2 Bounds on lasso e -Erro 291 11.2.1 Strong Convexity in the Classical Setting 291 11.2.2 Restricted Eigenvalues for Regression 11.2.3 A Basic Consistency Rcsult 294 11. 3 Bounds on prediction error 99 11.4 Support Recovery in Linear Regression 301 11.4.1 Variable-Selection Consistency for the Lasso 301 11.4.1.1 Some Numerical Studies 303 11.4.2 Proof of Theorem 11.3 305 11.5 Beyond the Basic Lasso Bibliographic Notes 311 Exercises 312 Bibliography 315 Author index 337 343 Preface In this monograph, we have attempted to summarize the actively developing field of statistical learning with sparsity. A sparse st at ist ica.I model is one having only a small number of nonzero parameters or weights. It represents a classic case of " less is more: a sparse model can be much easier to estimate and interpret than a dense model. In this age of big data, the number of features measured on a person or object can be large, and might be larger than the number of observations. The sparsity assumption allows us to tackle such probleMs and extract useful and reproducible patterns froIn big datasets The ideas described here represent the work of an entire community of researchers in statistics and machine learning, and we thank everyone for their continuing contributions to this exciting area. We particularly thank our colleagues at Stanford, Berkeley and elsewhere; our collaborators, and our past and current students working in this area. These include Alekh Agarwal Arash Amini, Francis Bach, Jacob Bien, Stephen Boyd, Andreas Buja, Em Manuel Candes. Alexandra Chouldechova. David Donoho. John Duchi. brad Efron. Will Fithian. Jerome Friedman. Max GSell. Iain Johnstone. Michael Jordan, Ping Li, Po-Ling Loh, Michacl Lim, Jason Lcc, Richard Lockhart Rahul Mazumder, Balasubramanian Narashimhan, Sahand Negahban, gui laume Obozinski, Mee-Young Park, Junyang Qian, Ga.rvesh Raskutti, Pradeep Ravikumar, Saharon rosset, Prasad Santhanam, Noah Simon, Dennis Sun, Yukai Sun, Jonathan Taylor, Ryan Tibshirani, I Stefan Wager, Daniela Wit ten, Bin Yu, Yuchen Zhang, Ji Zhou, and Hui Zou. We also thank our editor John Kimmel for his advice and support Stanford ulliversity Trevor hastie Robert tibshirani University of California, Berkeley Martin Wainwright I Some of the bibliographic references, for example in Chapters 4 and 6, are to Tibshirani2, R.J., rather than Tibshirani, R. the former is Ryan Tibshirani, the latter is Robert(son and father) V 【实例截图】
【核心代码】
标签:
相关软件
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论