Efficient_Processing_of_Deep_Neural_Networks

Clojure

下载此实例

开发语言：Others
实例大小：22.02M
下载次数：4
浏览次数：92
发布时间：2022-02-13
实例类别：Clojure
发布人：小旭商城
文件格式：.pdf
所需积分：10

相关标签： deep learning neural networks hardware design IC design FPGA

网友评论举报投诉收藏该页

下载此实例

实例介绍

[下载地址]

【实例简介】Efficient_Processing_of_Deep_Neural_Networks
【实例截图】

from clipboard

【核心代码】
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
PARTI Understanding Deep Neural Networks . . . . . . . . 1
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
1.1 Background on Deep Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Artificial Intelligence and Deep Neural Networks. . . . . . . . . . . . . . . . . 3
1.1.2 Neural Networks and Deep Neural Networks . . . . . . . . . . . . . . . . . . . . 6
1.2 Training versus Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Development History. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Applications of DNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Embedded versus Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2
Overview of Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
2.1 Attributes of Connections Within a Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Attributes of Connections Between Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Popular Types of Layers in DNNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 CONV Layer (Convolutional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 FC Layer (Fully Connected) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Pooling and Unpooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.5 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.6 Compound Layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Convolutional Neural Networks (CNNs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.1 Popular CNN Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Other DNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 DNN Development Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6.1 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
xii
2.6.2 Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.3 Popular Datasets for Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.4 Datasets for Other Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
PARTII Design of Hardwarefor ProcessingDNNs . . . 41
3
Key Metrics and Design Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
3.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Throughput and Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Energy Efficiency and Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Hardware Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7 Interplay Between Different Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4
Kernel Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
4.1 Matrix Multiplication with Toeplitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Tiling for Optimizing Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Computation Transform Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.1 Gauss’ Complex Multiplication Transform . . . . . . . . . . . . . . . . . . . . . 67
4.3.2 Strassen’s Matrix Multiplication Transform. . . . . . . . . . . . . . . . . . . . . 68
4.3.3 Winograd Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.4 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.5 Selecting a Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5
Designing DNN Accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
5.1 Evaluation Metrics and Design Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Key Properties of DNN to Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 DNN Hardware Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 Architectural Techniques for Exploiting Data Reuse . . . . . . . . . . . . . . . . . . . . 79
5.4.1 Temporal Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.2 Spatial Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5 Techniques to Reduce Reuse Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xiii
5.6 Dataflows and Loop Nests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.7 Dataflow Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.7.1 Weight Stationary (WS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7.2 Output Stationary (OS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.7.3 Input Stationary (IS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.7.4 Row Stationary (RS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.7.5 Other Dataflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.7.6 Dataflows for Cross-Layer Processing . . . . . . . . . . . . . . . . . . . . . . . . 106
5.8 DNN Accelerator Buffer Management Strategies . . . . . . . . . . . . . . . . . . . . . 107
5.8.1 Implicit versus Explicit Orchestration . . . . . . . . . . . . . . . . . . . . . . . . 107
5.8.2 Coupled versus Decoupled Orchestration . . . . . . . . . . . . . . . . . . . . . 109
5.8.3 Explicit Decoupled Data Orchestration (EDDO) . . . . . . . . . . . . . . 110
5.9 Flexible NoC Design for DNN Accelerators . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.9.1 Flexible Hierarchical Mesh Network . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6
Operation Mapping on Specialized Hardware . . . . . . . . . . . . . . . . . . . . . . . . .119
6.1 Mapping and Loop Nests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.2 Mappers and Compilers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3 Mapper Organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3.1 Map Spaces and Iteration Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3.2 Mapper Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3.3 Mapper Models and Configuration Generation . . . . . . . . . . . . . . . . 130
6.4 Analysis Framework for Energy Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.4.1 Input Data Access Energy Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4.2 Partial Sum Accumulation Energy Cost . . . . . . . . . . . . . . . . . . . . . . 132
6.4.3 Obtaining the Reuse Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.5 Eyexam: Framework for Evaluating Performance. . . . . . . . . . . . . . . . . . . . . . 134
6.5.1 Simple 1-D Convolution Example. . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.5.2 Apply Performance Analysis Framework to 1-D Example . . . . . . . . 137
6.6 Tools for Map Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
xiv
PARTIII Co-Design of DNN Hardwareand
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7
Reducing Precision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147
7.1 Benefits of Reduce Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 Determining the Bit Width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.1 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.2 Standard Components of the Bit Width . . . . . . . . . . . . . . . . . . . . . . 154
7.3 Mixed Precision: Different Precision for Different Data Types . . . . . . . . . . . 159
7.4 Varying Precision: Change Precision for Different Parts of the DNN. . . . . . 160
7.5 Binary Nets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.6 Interplay Between Precision and Other Design Choices . . . . . . . . . . . . . . . . 165
7.7 Summary of Design Considerations for Reducing Precision . . . . . . . . . . . . . 165
8
Exploiting Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .167
8.1 Sources of Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.1.1 Activation Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.1.2 Weight Sparsity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.2 Compression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.2.1 Tensor Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.2.2 Classification of Tensor Representations . . . . . . . . . . . . . . . . . . . . . . 193
8.2.3 Representation of Payloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.2.4 Representation Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.2.5 Tensor Representation Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.3 Sparse Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.3.1 Exploiting Sparse Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.3.2 Exploiting Sparse Activations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.3.3 Exploiting Sparse Weights and Activations . . . . . . . . . . . . . . . . . . . . 215
8.3.4 Exploiting Sparsity in FC Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.3.5 Summary of Sparse Dataflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9
Designing Efficient DNN Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .229
9.1 Manual Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
9.1.1 Improving Efficiency of CONV Layers . . . . . . . . . . . . . . . . . . . . . . . 230
9.1.2 Improving Efficiency of FC Layers . . . . . . . . . . . . . . . . . . . . . . . . . . 238
xv
9.1.3 Improving Efficiency of Network Architecture After Training. . . . . 239
9.2 Neural Architecture Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
9.2.1 Shrinking the Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.2.2 Improving the Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . 244
9.2.3 Accelerating the Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 246
9.2.4 Example of Neural Architecture Search. . . . . . . . . . . . . . . . . . . . . . . 248
9.3 Knowledge Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.4 Design Considerations for Efficient DNN Models . . . . . . . . . . . . . . . . . . . . 251
10
AdvancedTechnologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253
10.1 Processing Near Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
10.1.1 Embedded High-Density Memories . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.1.2 Stacked Memory (3-D Memory) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.2 Processing in Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.2.1 Non-Volatile Memories (NVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.2.2 Static Random Access Memories (SRAM) . . . . . . . . . . . . . . . . . . . . 263
10.2.3 Dynamic Random Access Memories (DRAM) . . . . . . . . . . . . . . . . 264
10.2.4 Design Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
10.3 Processing in Sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
10.4 Processing in the Optical Domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
11
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .281
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .283
Authors’Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .317

标签： deep learning neural networks hardware design IC design FPGA

实例下载地址