实例介绍
强化学习与最优控制;MIT;作者:Dimitri P. Bertsekas DRAFT TEXTBOOK;December 14, 2018
ABOUT THE AUTHOR Dimitri Bertsekas studied Mechanical and Electrical Engineering at the National Technical University of Athens, Greece, and obtained his Ph D in system science from the massachusetts Institute of Technology. He has held faculty positions with the Engineering- Economic Systems Department Stanford University, and the Electrical Engineering Department of the Uni- ersity of Illinois, Urbana. Since 1979 he has been teaching at the Electrical Engineering and Computer Science Department of the Massachusetts In stitute of Technology(MIT. where he is currently the McAfee Professor of Engineering His teaching and research spans several fields, including determinis tic optimization, dynamic programming and stochastic control, large-scale and distributed computation, and data communication networks. He has authored or coauthored numerous research papers and seventeen books, several of which are currently used as textbooks in MIT classes, including yllallic Progralllling anld OptiMal Control,”“ Data networks,” Intro duction to Probability, and "Nonlinear Programming Professor bertsekas was awarded the INfoRMs 1997 Prize for re- search Excellence in the Interface Between Operations Research and Com- puter Science for his book "Neuro-Dynamic Programming"(co-authored with John Tsitsiklis). the 2001 AACC John R Ragazzini Education Award the 2009 INFORMS Expository Writing Award, the 2014 AACC Richard Bellman Heritage Award, the 2014 Khachiyanl Prize for Life-Time AccOIll- plishments in Optimization, the 2015 George B. Dantzig Prize, and the 2018 John von Neumann Thcory Prizc. In 2001, hc was clected to the Unitcd States National Academy of Engineering for "pioneering contributions to fundamental research, practice and education of optimization / control the- ory, and especially its application to data communication networks ATHENA SCIENTIFIC OPTIMIZATION AND COMPUTATION SERIES 1. Abstract Dynamic Programming, 2nd Edition, by Dimitri P. Bertsekas,2018,ISBN978-1-886529-46-5.360 pages 2. Dynamic Programming and Optimal Control, Two-Volume Set by Dimitri P. Bertsekas, 2017, ISBN 1-886529-08-6, 1270 pages 3. Nonlinear Programming:: 3rd Edition, by Dimitri P. Bertsekas 2016,ISBN1-886529-05-1,880 pages 4. Convex Optimization Algorithms, by Dimitri P Bertsekas, 2015 ISBN978-1-886529-28-1,576 pages 5. Convex Optimization Theory, by Dimitri P. Bertsekas, 2009 ISBN978-1-886529-31-1,256 pages 6. Introduction to Probability. 2nd edition, by Dimitri P Bertsekas and John N. tsitsiklis, 2008, ISB978-1-886529-23-6, 544 pages 7. Convex Analysis and Optimization, by Dimitri P Bertsekas, An gelia Nedic, and AsuImlall E Ozdaglar, 2003, ISBN 1-886529-45-0 560 pages 8. Nctwork Optimization: Continuous and Discrctc Modcls. by Dim- itri P. Bertsekas. 1998, ISBN 1-886529-02-7. 608 pages 9. Network Flows and Monotropic Optimization, by R. Tyrrell rock afellar, 1998, ISBN 1-886529-06-X, 634 pages 10. Introduction to Lincar Optimization. by Dimitris Bcrtsimas and John N. Tsitsiklis, 1997, ISBN 1-886529-19-1, 608 pages 11. Parallel and Distributed Computation: Numerical Methods. by Dimitri P bertsekas and John n. Tsitsiklis. 1997. IsbN 1-886529 01-9,718 pages 12. Neuro-Dynamic Programming, by Dimitri P Bertsekas and John N. Tsitsiklis,1996,ISBN1-886529-10-8,512 13. Constrained Optimization and Lagrange Multiplier Methods, b Dimitri p. berts 1996,ISBN1-886529-04-3,410pag 14. Stochastic Optimal control: The Discrete-Time Case, by dimitri P. Bertsekas and Steven E. Shreve, 1996, ISBN 1-886529-03-5 pages Contents 1. Exact Dynamic Programming I.I. Deterministic Dynamic Programming 1.1.1. Dctcrministic Problcms 1.1.2. The Dynamic Programming Algorithm .p.7 1. 1.3. Approximation in value Space 12 1.1.4. Model-Free Approximate Solution-Q-Learning 13 1.2. Stochastic Dynamic Programming 14 1.3. Examples, Variations, and Simplifications p.17 1.3.1. Deterministic Shortest Path Problems p.18 1.3.2. Discrete det 19 1.3.3. Problems with a Terminal state 1.3. 4. Forecasts 26 1.3.5. Problems with Uncontrollable State Components 27 1.3.6. Partial State Information and belief states p.32 1.3.7. Linear Quadratic Optimal Control p.35 1.41. Reinforcement Learning and optimal Control-Some 1.5. Notes and Sources 2. Approximation in Value space 2.1. Variants of Approximation in Value Space .p.3 2.1.1. Off-Line and On-Line Methods 4 2. 1.2. Simplifying the Lookahead minimization 2.1.3. Model-Free Approximation in value and Policy s p.6 2.1.4. When is Approximation in Value Space Effective? 2.2. Multistep looka head p.10 2.2.1. Multistep Lookahead and Rolling Horizon 11 2.2.2. Multistep Lookahead and Deterministic Problems p.13 2.3. Problem Approximation p.14 2.3. 1. Enforced Decomposition 15 2.3.2. Probabilistic Approximation - Certainty Equivalent control 21 2. 4. Rollout and model Predictive Control p.27 2.4.1. Rollout for DeterMinistic Prubleiis 27 2.4.2. Stochastic rollout and monte carlo Tree search 34 2.4.3. Model Predictive Control 2.5. Notes and Sources p.46 3. Parametric Approximate 3.1. Approximation Architectures .p.2 3.1.1. Linear and Nonlinear Feature-Based Architectures 3.1.2. Training of Linear and Nonlinear Architectures p 3.1.3. Incremental gradient and newton methods .p.9 3.2. Neural Networks 21 3.2.1. Training of Neural Networks .p.24 3.2.2. Multilayer alld Deep Neural Networks 26 3.3. Sequential Dynamic Programming Approximation 3.4. Q-factor Parametric Approximation ..p.31 3.5. Notes and Sources 33 4. Infinite Horizon Renforcement Learning 4.1. An Overview of Infinite horizon problems p 4.2. Stochastic Shortest Path Problcms 4.3. Discounted proble 14 4. 4. Exact and Approximate Value Iteration p.19 1.5. Policy Iteration 22 4.5. 1. Exact P 22 4.5. 2. Policy Iteration for Q-factors p.27 4.5.3. Limited Lookahead Policies and Rollout 4.5.4. Approximate Policy Iteration- Error Bounds p.30 4.6. Silllulation-Based Policy Iteration with Parametric Approximation p.34 4.6.1. Sclf-Lcarning and Actor-Critic Systems 34 4.6.2. A Model-Based Variant 4. 6.3. A Model-Free variant 37 4.6. 4. Issues Relating to Approximate Policy Iteration 39 4.7. Exact and Approximate Linear Programming 42 4.8. Q-Learning .p.44 4.9. Additional Methods- Temporal Differences p.47 4.10.A Policy s 58 4.11. Notes and sources p.60 4. 12. Appendix: Mathematical Analysis 63 Contents 4.12.1. Proofs for Stochastic Shortest Path Problems p.63 4.12. 2. Proofs for Discounted problems p.69 4.12.3. Convergence of Exact Policy Iteration 4.12.4. Error Bounds for Approximate Policy Iteration p.70 5. Aggregation 5.1. Aggregation Frameworks 5.2. Classical and Biased Forms of the Aggregate Problem 5. 3. Bellman's Equation for the Aggregate Problem 5. 4. AlgorithIms for the Aggregate ProbleIn 5.5. Some examples 5.6. Spatiotemporal Aggregation for Deterministic Problems ppPpppp 5.7. Notes and Sources References Index p P reface In this book we consider large and challenging multistage decision prob lens, which can be solved in principle by dynamic programming(DP for short), but their exact solution is computationally intractable. We discuss solution mcthods that rely on approximations to producc suboptimal poli cies with adequate performance. These Inethods are collectively referred to S reinforcement learning, and also by alternative names such as approxi- mate dynamic programming, and nearo-dymamic programming Our subject has benefited greatly from the interplay of ideas from optimal control and from artificial intelligence. One of the aims of the book is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with background in either field Our primary focus will be on appro imation in value space. Here, the control at each state is obtained by limited lookahead with cost function approximation, i.c., by optimization of the cost ovcr a limited horizon, plus an approximation of the optimal future cost, starting from the end of this horizon. The latter cost, which we generally denote by is a function of the state where we may be at the end of the horizon. It may be computed by a variety of methods, possibly involving simulation and or some given or separately derived heuristic/suboptimal policy. The use of simulation often allows for model-free implementations that do not require the availability of a mathematical model, a major idea that has allowed the use of dynamic Drogralllining beyond its classical boundaries We focus selectively on four types of methods for obtaining J (a) Problem appro imation Here J is the optimal cost function of a re- lated simpler problem, which is solved by exact DP Certainty equiv alent control and enforced decomposition schemes are discussed in me detail (b) rollout and modcl predictive control: Horc J is thc cost function of SOine known heuristic policy. The needed cost values to iinplenlent a rollout, policy are often calculated by simulation. While this met hod applies to stochastic problems, the reliance on simulation favors de- terministic problems, including challenging combinatorial problems for which heuristics may be readily implemented. Rollout may also 【实例截图】
【核心代码】
标签:
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论