reinforcement learning and optimal control

一般编程问题

下载此实例

开发语言：Others
实例大小：2.39M
下载次数：14
浏览次数：96
发布时间：2021-01-30
实例类别：一般编程问题
发布人：好学IT男
文件格式：.pdf
所需积分：2

网友评论举报投诉收藏该页

下载此实例

实例介绍

【实例简介】
强化学习与最优控制；MIT；作者：Dimitri P. Bertsekas DRAFT TEXTBOOK；December 14, 2018
ABOUT THE AUTHOR Dimitri Bertsekas studied Mechanical and Electrical Engineering at the National Technical University of Athens, Greece, and obtained his Ph D in system science from the massachusetts Institute of Technology. He has held faculty positions with the Engineering- Economic Systems Department Stanford University, and the Electrical Engineering Department of the Uni- ersity of Illinois, Urbana. Since 1979 he has been teaching at the Electrical Engineering and Computer Science Department of the Massachusetts In stitute of Technology(MIT. where he is currently the McAfee Professor of Engineering His teaching and research spans several fields, including determinis tic optimization, dynamic programming and stochastic control, large-scale and distributed computation, and data communication networks. He has authored or coauthored numerous research papers and seventeen books, several of which are currently used as textbooks in MIT classes, including yllallic Progralllling anld OptiMal Control,”“ Data networks,” Intro duction to Probability, and "Nonlinear Programming Professor bertsekas was awarded the INfoRMs 1997 Prize for re- search Excellence in the Interface Between Operations Research and Com- puter Science for his book "Neuro-Dynamic Programming"(co-authored with John Tsitsiklis). the 2001 AACC John R Ragazzini Education Award the 2009 INFORMS Expository Writing Award, the 2014 AACC Richard Bellman Heritage Award, the 2014 Khachiyanl Prize for Life-Time AccOIll- plishments in Optimization, the 2015 George B. Dantzig Prize, and the 2018 John von Neumann Thcory Prizc. In 2001, hc was clected to the Unitcd States National Academy of Engineering for "pioneering contributions to fundamental research, practice and education of optimization / control the- ory, and especially its application to data communication networks ATHENA SCIENTIFIC OPTIMIZATION AND COMPUTATION SERIES 1. Abstract Dynamic Programming, 2nd Edition, by Dimitri P. Bertsekas,2018,ISBN978-1-886529-46-5.360 pages 2. Dynamic Programming and Optimal Control, Two-Volume Set by Dimitri P. Bertsekas, 2017, ISBN 1-886529-08-6, 1270 pages 3. Nonlinear Programming:: 3rd Edition, by Dimitri P. Bertsekas 2016,ISBN1-886529-05-1,880 pages 4. Convex Optimization Algorithms, by Dimitri P Bertsekas, 2015 ISBN978-1-886529-28-1,576 pages 5. Convex Optimization Theory, by Dimitri P. Bertsekas, 2009 ISBN978-1-886529-31-1,256 pages 6. Introduction to Probability. 2nd edition, by Dimitri P Bertsekas and John N. tsitsiklis, 2008, ISB978-1-886529-23-6, 544 pages 7. Convex Analysis and Optimization, by Dimitri P Bertsekas, An gelia Nedic, and AsuImlall E Ozdaglar, 2003, ISBN 1-886529-45-0 560 pages 8. Nctwork Optimization: Continuous and Discrctc Modcls. by Dim- itri P. Bertsekas. 1998, ISBN 1-886529-02-7. 608 pages 9. Network Flows and Monotropic Optimization, by R. Tyrrell rock afellar, 1998, ISBN 1-886529-06-X, 634 pages 10. Introduction to Lincar Optimization. by Dimitris Bcrtsimas and John N. Tsitsiklis, 1997, ISBN 1-886529-19-1, 608 pages 11. Parallel and Distributed Computation: Numerical Methods. by Dimitri P bertsekas and John n. Tsitsiklis. 1997. IsbN 1-886529 01-9,718 pages 12. Neuro-Dynamic Programming, by Dimitri P Bertsekas and John N. Tsitsiklis,1996,ISBN1-886529-10-8,512 13. Constrained Optimization and Lagrange Multiplier Methods, b Dimitri p. berts 1996,ISBN1-886529-04-3,410pag 14. Stochastic Optimal control: The Discrete-Time Case, by dimitri P. Bertsekas and Steven E. Shreve, 1996, ISBN 1-886529-03-5 pages Contents 1. Exact Dynamic Programming I.I. Deterministic Dynamic Programming 1.1.1. Dctcrministic Problcms 1.1.2. The Dynamic Programming Algorithm .p.7 1. 1.3. Approximation in value Space 12 1.1.4. Model-Free Approximate Solution-Q-Learning 13 1.2. Stochastic Dynamic Programming 14 1.3. Examples, Variations, and Simplifications p.17 1.3.1. Deterministic Shortest Path Problems p.18 1.3.2. Discrete det 19 1.3.3. Problems with a Terminal state 1.3. 4. Forecasts 26 1.3.5. Problems with Uncontrollable State Components 27 1.3.6. Partial State Information and belief states p.32 1.3.7. Linear Quadratic Optimal Control p.35 1.41. Reinforcement Learning and optimal Control-Some 1.5. Notes and Sources 2. Approximation in Value space 2.1. Variants of Approximation in Value Space .p.3 2.1.1. Off-Line and On-Line Methods 4 2. 1.2. Simplifying the Lookahead minimization 2.1.3. Model-Free Approximation in value and Policy s p.6 2.1.4. When is Approximation in Value Space Effective? 2.2. Multistep looka head p.10 2.2.1. Multistep Lookahead and Rolling Horizon 11 2.2.2. Multistep Lookahead and Deterministic Problems p.13 2.3. Problem Approximation p.14 2.3. 1. Enforced Decomposition 15 2.3.2. Probabilistic Approximation - Certainty Equivalent control 21 2. 4. Rollout and model Predictive Control p.27 2.4.1. Rollout for DeterMinistic Prubleiis 27 2.4.2. Stochastic rollout and monte carlo Tree search 34 2.4.3. Model Predictive Control 2.5. Notes and Sources p.46 3. Parametric Approximate 3.1. Approximation Architectures .p.2 3.1.1. Linear and Nonlinear Feature-Based Architectures 3.1.2. Training of Linear and Nonlinear Architectures p 3.1.3. Incremental gradient and newton methods .p.9 3.2. Neural Networks 21 3.2.1. Training of Neural Networks .p.24 3.2.2. Multilayer alld Deep Neural Networks 26 3.3. Sequential Dynamic Programming Approximation 3.4. Q-factor Parametric Approximation ..p.31 3.5. Notes and Sources 33 4. Infinite Horizon Renforcement Learning 4.1. An Overview of Infinite horizon problems p 4.2. Stochastic Shortest Path Problcms 4.3. Discounted proble 14 4. 4. Exact and Approximate Value Iteration p.19 1.5. Policy Iteration 22 4.5. 1. Exact P 22 4.5. 2. Policy Iteration for Q-factors p.27 4.5.3. Limited Lookahead Policies and Rollout 4.5.4. Approximate Policy Iteration- Error Bounds p.30 4.6. Silllulation-Based Policy Iteration with Parametric Approximation p.34 4.6.1. Sclf-Lcarning and Actor-Critic Systems 34 4.6.2. A Model-Based Variant 4. 6.3. A Model-Free variant 37 4.6. 4. Issues Relating to Approximate Policy Iteration 39 4.7. Exact and Approximate Linear Programming 42 4.8. Q-Learning .p.44 4.9. Additional Methods- Temporal Differences p.47 4.10.A Policy s 58 4.11. Notes and sources p.60 4. 12. Appendix: Mathematical Analysis 63 Contents 4.12.1. Proofs for Stochastic Shortest Path Problems p.63 4.12. 2. Proofs for Discounted problems p.69 4.12.3. Convergence of Exact Policy Iteration 4.12.4. Error Bounds for Approximate Policy Iteration p.70 5. Aggregation 5.1. Aggregation Frameworks 5.2. Classical and Biased Forms of the Aggregate Problem 5. 3. Bellman's Equation for the Aggregate Problem 5. 4. AlgorithIms for the Aggregate ProbleIn 5.5. Some examples 5.6. Spatiotemporal Aggregation for Deterministic Problems ppPpppp 5.7. Notes and Sources References Index p P reface In this book we consider large and challenging multistage decision prob lens, which can be solved in principle by dynamic programming(DP for short), but their exact solution is computationally intractable. We discuss solution mcthods that rely on approximations to producc suboptimal poli cies with adequate performance. These Inethods are collectively referred to S reinforcement learning, and also by alternative names such as approxi- mate dynamic programming, and nearo-dymamic programming Our subject has benefited greatly from the interplay of ideas from optimal control and from artificial intelligence. One of the aims of the book is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with background in either field Our primary focus will be on appro imation in value space. Here, the control at each state is obtained by limited lookahead with cost function approximation, i.c., by optimization of the cost ovcr a limited horizon, plus an approximation of the optimal future cost, starting from the end of this horizon. The latter cost, which we generally denote by is a function of the state where we may be at the end of the horizon. It may be computed by a variety of methods, possibly involving simulation and or some given or separately derived heuristic/suboptimal policy. The use of simulation often allows for model-free implementations that do not require the availability of a mathematical model, a major idea that has allowed the use of dynamic Drogralllining beyond its classical boundaries We focus selectively on four types of methods for obtaining J (a) Problem appro imation Here J is the optimal cost function of a re- lated simpler problem, which is solved by exact DP Certainty equiv alent control and enforced decomposition schemes are discussed in me detail (b) rollout and modcl predictive control: Horc J is thc cost function of SOine known heuristic policy. The needed cost values to iinplenlent a rollout, policy are often calculated by simulation. While this met hod applies to stochastic problems, the reliance on simulation favors de- terministic problems, including challenging combinatorial problems for which heuristics may be readily implemented. Rollout may also 【实例截图】
【核心代码】

标签：

实例下载地址