神经网络与深度学习

一般编程问题

下载此实例

开发语言：Others
实例大小：11.00M
下载次数：37
浏览次数：277
发布时间：2022-04-23
实例类别：一般编程问题
发布人：呵呵呵嗯
文件格式：.pdf
所需积分：2

实例介绍

【实例简介】神经网络与深度学习

这本书是关于什么的？神经⽹络是有史以来发明的最优美的编程范式之⼀。在传统的编程⽅法中，我们告诉计算机做什么，把⼤问题分成许多⼩的、精确定义的任务，计算机可以很容易地执⾏。相⽐之下，在神经⽹络中，我们不告诉计算机如何解决我们的问题。相反，它从观测数据中学习，找出它⾃⼰的解决问题的⽅法。从数据中⾃动学习，听上去很有前途。然⽽，直到 2006 年，除了⽤于⼀些特殊的问题，我们仍然不知道如何训练神经⽹络去超越传统的⽅法。2006 年，被称为 “深度神经⽹络” 的学习技术的发现引起了变⾰。这些技术现在被称为 “深度学习”。它们已被进⼀步发展，今天深度神经⽹络和深度学习在计算机视觉、语⾳识别、⾃然语⾔处理等许多重要问题上都取得了显著的性能。他们正被⾕歌、微软、Facebook 等公司⼤规模部署。这本书的⽬的是帮助你掌握神经⽹络的核⼼概念，包括现代技术的深度学习。在完成这本书的学习之后，你将使⽤神经⽹络和深度学习来解决复杂模式识别问题。你将为使⽤神经⽹络和深度学习打下基础，来攻坚你⾃⼰设计中碰到的问题。
【实例截图】

【核心代码】

Table of Contents
1 Deep Learning for AI 2
1.1 Who should read this book? . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Historical Perspective and Neural Networks . . . . . . . . . . . . . . . . 14
1.4 Recent Impact of Deep Learning Research . . . . . . . . . . . . . . . . . 15
1.5 Challenges for Future Research . . . . . . . . . . . . . . . . . . . . . . . 17
2 Linear algebra 20
2.1 Scalars, vectors, matrices and tensors . . . . . . . . . . . . . . . . . . . . 20
2.2 Multiplying matrices and vectors . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Identity and inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Linear dependence, span, and rank . . . . . . . . . . . . . . . . . . . . . 25
2.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Special kinds of matrices and vectors . . . . . . . . . . . . . . . . . . . . 27
2.7 Eigendecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 30
2.9 The trace operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.10 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.11 Example: Principal components analysis . . . . . . . . . . . . . . . . . . 31
3 Probability and Information Theory 35
3.1 Why probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Discrete variables and probability mass functions . . . . . . . . . 37
3.3.2 Continuous variables and probability density functions . . . . . . 38
3.4 Marginal probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Independence and conditional independence . . . . . . . . . . . . . . . . 40
3.8 Expectation, variance, and covariance . . . . . . . . . . . . . . . . . . . 41
3.9 Information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.10 Common probability distributions . . . . . . . . . . . . . . . . . . . . . 44
1
3.10.1 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . 44
3.10.2 Multinoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . 44
3.10.3 Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . 45
3.10.4 Dirac Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.10.5 Mixtures of Distributions and Gaussian Mixture . . . . . . . . . 48
3.11 Useful properties of common functions . . . . . . . . . . . . . . . . . . . 48
3.12 Bayes’ rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.13 Technical details of continuous variables . . . . . . . . . . . . . . . . . . 51
3.14 Example: Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Numerical Computation 56
4.1 Overflow and underflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Poor conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Gradient-Based Optimization . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Constrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5 Example: linear least squares . . . . . . . . . . . . . . . . . . . . . . . . 68
5 Machine Learning Basics 70
5.1 Learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1.1 The task, T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1.2 The performance measure, P . . . . . . . . . . . . . . . . . . . . 72
5.1.3 The experience, E . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Example: Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Generalization, Capacity, Overfitting and Underfitting . . . . . . . . . . 76
5.3.1 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.2 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.3 Occam’s Razor, Underfitting and Overfitting . . . . . . . . . . . 78
5.4 Estimating and Monitoring Generalization Error . . . . . . . . . . . . . 81
5.5 Estimators, Bias, and Variance . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.2 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5.4 Trading off Bias and Variance and the Mean Squared Error . . . 85
5.5.5 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . 86
5.6.1 Properties of Maximum Likelihood . . . . . . . . . . . . . . . . . 87
5.6.2 Regularized Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 87
5.7 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.8 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.8.1 Estimating Conditional Expectation by Minimizing Squared Error 88
5.8.2 Estimating Probabilities or Conditional Probabilities by Maxi-
mum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.9 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.9.1 Principal Components Analysis . . . . . . . . . . . . . . . . . . . 90
2
5.10 Weakly supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.11 The Smoothness Prior, Local Generalization and Non-Parametric Models 95
5.12 Manifold Learning and the Curse of Dimensionality . . . . . . . . . . . . 99
5.13 Challenges of High-Dimensional Distributions . . . . . . . . . . . . . . . 102
6 Feedforward Deep Networks 104
6.1 Formalizing and Generalizing Neural Networks . . . . . . . . . . . . . . 104
6.2 Parametrizing a Learned Predictor . . . . . . . . . . . . . . . . . . . . . 107
6.2.1 Family of Functions . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2.2 Loss Function and Conditional Log-Likelihood . . . . . . . . . . 108
6.2.3 Training Criterion and Regularizer . . . . . . . . . . . . . . . . . 113
6.2.4 Optimization Procedure . . . . . . . . . . . . . . . . . . . . . . . 114
6.3 Flow Graphs and Back-Propagation . . . . . . . . . . . . . . . . . . . . 115
6.3.1 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3.2 Back-Propagation in a General Flow Graph . . . . . . . . . . . . 118
6.4 Universal Approximation Properties and Depth . . . . . . . . . . . . . . 122
6.5 Feature / Representation Learning . . . . . . . . . . . . . . . . . . . . . 124
6.6 Piecewise Linear Hidden Units . . . . . . . . . . . . . . . . . . . . . . . 125
6.7 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7 Regularization 127
7.1 Classical Regularization: Parameter Norm Penalty . . . . . . . . . . . . 128
7.1.1 L 2 parameter regularization . . . . . . . . . . . . . . . . . . . . . 129
7.1.2 L 1 regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.1.3 L ∞ regularization . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.2 Classical regularization as constrained optimization . . . . . . . . . . . . 132
7.3 Regularization from a Bayesian perspective . . . . . . . . . . . . . . . . 134
7.4 Early stopping as a form of regularization . . . . . . . . . . . . . . . . . 134
7.5 Regularization and under-constrained problems . . . . . . . . . . . . . . 139
7.6 Parameter Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.7 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.8 Dataset augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.9 Classical regularization as noise robustness . . . . . . . . . . . . . . . . 141
7.10 Semi-supervised Training . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.11 Unsupervised Pretraining . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.11.1 The pretraining protocol. . . . . . . . . . . . . . . . . . . . . . . 142
7.12 Bagging and other ensemble methods . . . . . . . . . . . . . . . . . . . . 144
7.13 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.14 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8 Optimization for training deep models 150
8.1 Optimization for model training . . . . . . . . . . . . . . . . . . . . . . . 150
8.1.1 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.1.2 Plateaus, saddle points, and other flat regions . . . . . . . . . . . 150
3
8.1.3 Cliffs and Exploding Gradients . . . . . . . . . . . . . . . . . . . 150
8.1.4 Vanishing and Exploding Gradients - An Introduction to the Issue
of Learning Long-Term Dependencies . . . . . . . . . . . . . . . 153
8.2 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.2.1 Approximate Natural Gradient and Second-Order Methods . . . 156
8.2.2 Optimization strategies and meta-algorithms . . . . . . . . . . . 156
8.2.3 Coordinate descent . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.2.4 Greedy supervised pre-training . . . . . . . . . . . . . . . . . . . 157
8.3 Hints and Curriculum Learning . . . . . . . . . . . . . . . . . . . . . . . 157
9 Structured Probabilistic Models: A Deep Learning Perspective 158
9.1 The Challenge of Unstructured Modeling . . . . . . . . . . . . . . . . . 159
9.2 A Graphical Syntax for Describing Model Structure . . . . . . . . . . . 161
9.2.1 Directed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.2.2 Undirected Models . . . . . . . . . . . . . . . . . . . . . . . . . . 163
9.2.3 The Partition Function . . . . . . . . . . . . . . . . . . . . . . . 164
9.2.4 Energy-Based Models . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2.5 Separation and D-Separation . . . . . . . . . . . . . . . . . . . . 167
9.2.6 Operations on a Graph . . . . . . . . . . . . . . . . . . . . . . . 169
9.2.7 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.3 Advantages of Structured Modeling . . . . . . . . . . . . . . . . . . . . . 171
9.4 Learning about Dependencies . . . . . . . . . . . . . . . . . . . . . . . . 173
9.4.1 Latent Variables Versus Structure Learning . . . . . . . . . . . . 173
9.4.2 Latent Variables for Feature Learning . . . . . . . . . . . . . . . 174
9.5 Markov Chain Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . 174
9.6 Inference and Approximate Inference Over Latent Variables . . . . . . . 174
9.7 The Deep Learning Approach to Structured Probabilistic Modeling . . . 176
9.7.1 Example: The Restricted Boltzmann Machine . . . . . . . . . . . 177
10 Unsupervised and Transfer Learning 179
10.1 Auto-Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
10.1.1 Regularized Auto-Encoders . . . . . . . . . . . . . . . . . . . . . 181
10.1.2 Representational Power, Layer Size and Depth . . . . . . . . . . 184
10.1.3 Reconstruction Distribution . . . . . . . . . . . . . . . . . . . . . 185
10.2 Linear Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.2.1 Probabilistic PCA and Factor Analysis . . . . . . . . . . . . . . . 186
10.2.2 Manifold Interpretation of PCA and Linear Auto-Encoders . . . 188
10.2.3 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
10.2.4 Sparse Coding as a Generative Model . . . . . . . . . . . . . . . 191
10.3 RBMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
10.4 Greedy Layerwise Unsupervised Pre-Training . . . . . . . . . . . . . . . 192
10.5 Transfer Learning and Domain Adaptation . . . . . . . . . . . . . . . . 193
4
11 Convolutional Networks 199
11.1 The convolution operation . . . . . . . . . . . . . . . . . . . . . . . . . . 199
11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
11.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.4 Variants of the basic convolution function . . . . . . . . . . . . . . . . . 209
11.5 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
11.6 Efficient convolution algorithms . . . . . . . . . . . . . . . . . . . . . . . 216
11.7 Deep learning history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
12 Sequence Modeling: Recurrent and Recursive Nets 217
12.1 Unfolding Flow Graphs and Sharing Parameters . . . . . . . . . . . . . 217
12.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 219
12.2.1 Computing the gradient in a recurrent neural network . . . . . . 221
12.2.2 Recurrent Networks as Generative Directed Acyclic Models . . . 223
12.2.3 RNNs to represent conditional probability distributions . . . . . 225
12.3 Bidirectional RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.4 Recursive Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 229
12.5 Auto-Regressive Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 230
12.5.1 Logistic Auto-Regressive Networks . . . . . . . . . . . . . . . . . 231
12.5.2 Neural Auto-Regressive Networks . . . . . . . . . . . . . . . . . . 232
12.5.3 NADE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
12.6 Facing the Challenge of Long-Term Dependencies . . . . . . . . . . . . . 235
12.6.1 Echo StateNetworks: Choosing Weights toMake DynamicsBarely
Contractive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
12.6.2 Combining Short and Long Paths in the Unfolded Flow Graph . 237
12.6.3 Leaky Units and a Hierarchy Different Time Scales . . . . . . . . 238
12.6.4 The Long-Short-Term-Memory Architecture and OtherGated RNNs239
12.6.5 Deep RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
12.6.6 Better Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 243
12.6.7 Clipping Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . 244
12.6.8 Regularizing to Encourage Information Flow . . . . . . . . . . . 245
12.6.9 Organizing the State at Multiple Time Scales . . . . . . . . . . . 245
12.7 Handling temporal dependencies with n-grams, HMMs, CRFs and other
graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
12.7.1 N-grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
12.7.2 Efficient Marginalization and Inference for Temporally Structured
Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
12.7.3 HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
12.7.4 CRFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
12.8 Combining Neural Networks and Search . . . . . . . . . . . . . . . . . . 251
12.8.1 Joint Training of Neural Networks and Sequential Probabilistic
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
12.8.2 MAP and Structured Output Models . . . . . . . . . . . . . . . . 251
12.8.3 Back-prop through Search . . . . . . . . . . . . . . . . . . . . . . 251
5
12.9 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
13 The Manifold Perspective on Auto-Encoders 252
13.1 Manifold Learning via Regularized Auto-Encoders . . . . . . . . . . . . 261
13.2 Probabilistic Interpretation of Reconstruction Error as Log-Likelihood . 263
13.3 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
13.3.1 Sparse Auto-Encoders . . . . . . . . . . . . . . . . . . . . . . . . 266
13.3.2 Predictive Sparse Decomposition . . . . . . . . . . . . . . . . . . 267
13.4 Denoising Auto-Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . 267
13.4.1 Learning a Vector Field that Estimates a Gradient Field . . . . . 269
13.4.2 Turning the Gradient Field into a Generative Model . . . . . . . 271
13.5 Contractive Auto-Encoders . . . . . . . . . . . . . . . . . . . . . . . . . 274
14 Distributed Representations: Disentangling the Underlying Factors 275
14.1 Assumption of Underlying Factors . . . . . . . . . . . . . . . . . . . . . 275
14.2 Exponential Gain in Representational Efficiency from Distributed Repre-
sentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
14.3 Exponential Gain in Representational Efficiency from Depth . . . . . . . 275
14.4 Additional Priors Regarding The Underlying Factors . . . . . . . . . . . 275
15 Confronting the Partition Function 276
15.1 Estimating the partition function . . . . . . . . . . . . . . . . . . . . . . 276
15.1.1 Annealed importance sampling . . . . . . . . . . . . . . . . . . . 278
15.1.2 Bridge sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
15.1.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
15.2 Stochastic maximum likelihood and contrastive divergence . . . . . . . . 282
15.3 Pseudolikelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
15.4 Score matching and ratio matching . . . . . . . . . . . . . . . . . . . . . 291
15.5 Denoising score matching . . . . . . . . . . . . . . . . . . . . . . . . . . 293
15.6 Noise-contrastive estimation . . . . . . . . . . . . . . . . . . . . . . . . . 293
16 Approximate inference 296
16.1 Inference as optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 296
16.2 Expectation maximization . . . . . . . . . . . . . . . . . . . . . . . . . . 298
16.3 MAP inference: Sparse coding as a probabilistic model . . . . . . . . . . 299
16.4 Variational inference and learning . . . . . . . . . . . . . . . . . . . . . . 300
16.4.1 Discrete latent variables . . . . . . . . . . . . . . . . . . . . . . . 302
16.4.2 Calculus of variations . . . . . . . . . . . . . . . . . . . . . . . . 302
16.4.3 Continuous latent variables . . . . . . . . . . . . . . . . . . . . . 304
16.5 Stochastic inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
16.6 Learned approximate inference . . . . . . . . . . . . . . . . . . . . . . . 304
6
17 Deep generative models 305
17.1 Restricted Boltzmann machines . . . . . . . . . . . . . . . . . . . . . . . 305
17.2 Deep belief networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
17.3 Deep Boltzmann machines . . . . . . . . . . . . . . . . . . . . . . . . . . 308
17.3.1 Interesting properties . . . . . . . . . . . . . . . . . . . . . . . . 308
17.3.2 Variational learning with SML . . . . . . . . . . . . . . . . . . . 309
17.3.3 Layerwise pretraining . . . . . . . . . . . . . . . . . . . . . . . . 310
17.3.4 Multi-prediction deep Boltzmann machines . . . . . . . . . . . . 312
17.3.5 Centered deep Boltzmann machines . . . . . . . . . . . . . . . . 312
17.4 Boltzmann machines for real-valued data . . . . . . . . . . . . . . . . . . 312
17.4.1 Gaussian-Bernoulli RBMs . . . . . . . . . . . . . . . . . . . . . . 312
17.4.2 mcRBMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
17.4.3 Spike and slab restricted Boltzmann machines . . . . . . . . . . . 313
17.5 Convolutional Boltzmann machines . . . . . . . . . . . . . . . . . . . . . 313
17.6 Other Boltzmann machines . . . . . . . . . . . . . . . . . . . . . . . . . 314
17.7 Directed generative nets . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
17.7.1 Variational autoencoders . . . . . . . . . . . . . . . . . . . . . . 314
17.7.2 Generative adversarial networks . . . . . . . . . . . . . . . . . . 314
17.8 A generative view of autoencoders . . . . . . . . . . . . . . . . . . . . . 315
17.9 Generative stochastic networks . . . . . . . . . . . . . . . . . . . . . . . 315
17.10Methodological notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
18 Large scale deep learning 318
18.1 Fast CPU implementations . . . . . . . . . . . . . . . . . . . . . . . . . 318
18.2 GPU implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
18.3 Asynchronous parallel implementations . . . . . . . . . . . . . . . . . . . 318
18.4 Dynamically structured nets . . . . . . . . . . . . . . . . . . . . . . . . . 318
18.5 Model compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
19 Practical methodology 320
19.1 When to gather more data, control capacity, or change algorithms . . . 320
19.2 Machine Learning Methodology 101 . . . . . . . . . . . . . . . . . . . . 320
19.3 Manual hyperparameter tuning . . . . . . . . . . . . . . . . . . . . . . . 320
19.4 Hyper-parameter optimization algorithms . . . . . . . . . . . . . . . . . 320
19.5 Tricks of the Trade for Deep Learning . . . . . . . . . . . . . . . . . . . 322
19.5.1 Debugging Back-Prop . . . . . . . . . . . . . . . . . . . . . . . . 322
19.5.2 Automatic Differentation and Symbolic Manipulations of Flow
Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
19.5.3 Momentum and Other Averaging Techniques as Cheap Second
Order Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
7
20 Applications 323
20.1 Computer vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
20.1.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
20.1.2 Convolutional nets . . . . . . . . . . . . . . . . . . . . . . . . . . 329
20.2 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
20.3 Natural language processing and neural language models . . . . . . . . . 329
20.3.1 Neural language models . . . . . . . . . . . . . . . . . . . . . . . 329
20.4 Structured outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
20.5 Other applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Bibliography 330
Index 348

标签：

实例下载地址