实例介绍
bishop经典著作Pattern Recognition and Machine Learning随书习题答案,对于这本书的爱好者而言,这个绝对是雪中送炭呀!
Pattern Recognition and Machine learning Solutions to the exercises: Tutors' edition Markus Svensen and Christopher M. Bishop Copyright C) 2002-2009 This is the solutions manual (lutors'Edition) for the book Pattern Recognition and Machine learning (PRML, published by Springer in 2006). This release was created September 8, 2009. Any future releases (e.g. with corrections Lo errors)will be announced on the PRMl web-Site(see below) and published via pringer. PLEASE DO NOT DISTRIBUTE Most of the solutions in this manual are intended as a resource for tutors teaching courses based on Prml and the value of this resource would be greatly diminished if was to become generally available. All tutors who want a copy should contact Springer directly The authors would like to express their gratitude to the various people who have provided feedback on earlier releases of this document The authors welcome all comments, questions and suggestions about the solutions as well as reports on (potential) errors in text or formulae in this document; please send any such feedback to prml-fbemicrcsoft. com Further information about prml is available from httpresearchmicrosoftcom/cmbishop/prml Contents Contents Chapter 1: Introduction Chapter 2: Probability Distributions 28 Chapter 3: Linear Models for Regression Chapter 4: Linear models for Classification .78 Chaptcr 5: Ncural Nctworks ....93 Chapter 6: Kernel Methods 114 Chapter 7: Sparse Kernel machines 128 Chapter 8: Graphical Models 136 Chapter 9: Mixture Models and EM Chapter 10: Approximate Inference 163 Chapter 11: Sampling Methods 198 Chapter 12: Continuous Latent Variables Chapter 13: Sequential Data Chapter 14: Combining models 246 6 CONTENTS Solutions 11-14 7 Chapter 1 Introduction 1.1 Substituting(1. 1) into(1.2)and then differentiating with respect to u2 wc obtain 0 Re-arranging terms then gives the required result 1.2 For the regularized sum-of-squares error function given by (1. 4) the corresponding linear equations are again obtained by differentiation and take the same form as (1. 122), but with Aij replaced by Aij, given b A; i+il 1.3 Let us denote apples, oranges and limes by a, o and l respectively. The marginal probability of selecting an apple is given by (a)= plar)p(r)+p(alb)p(b)+p(algp(g) 1 0.2+-×0.2+×0.6=0.3 where the conditional probabilities are obtained from the proportions of apples in each box To find the probability that the box was green, given that the fruil we selected was an orange, we can use Bayes'theorem p(glo)- plolgiplg) p(o The denominator in(4) is given b P(o)= p(orp(r)+plolb)p(b)+plolg)p(g) 0.2+×0.2+×0.6=0.36 5 from which we obtain 30.61 plg 100.3 1.4 We are often interested in finding the most probable value for some quantity. In the case of probability distributions over discrete variables this poses little problem However, for continuous variables there is a subtlety arising from the nature of prob ability densities and the way they transform under non-linear changes of variable 8 Solution Consider first the way a function f(a) behaves when we change to a new variable y where the two variables arc related by m=g(y). This defines a ncw function of y given f(y)=f(9(y) Suppose f()has a mode(i. e a maximum) at so that f(a)=0. The correspond ing mode of f(y will occur for a value y obtained by dimerentialing both sides of ()with respect to y r(0)=f(g(⑨)g(⑦)=0. Assuming g(9+0 at the mode, then f'(g()=0. However, we know that f()=0, and so we see that the locations of the mode expressed in terms of each of the variables w and y are related by t-y(y), as one would expect. Thus, finding a mode with respect to the variable is completely equivalent to first trans forming to the variable y, then finding a mode with respect to g, and then transforming back Now consider the behaviour of a probability density pr(a) under the change of vari- ables x g(y), where the density with respect to the new variable is pu(y) and is given by( (1. 27). Let us write g(y)=slg(3 where s c(1, +1]. Then((. 27) can be written P(y)=p2(9()sg'(y) Differentiating both sides with respect to y then gives p()=8p(y){9()}2+sp(g(y)g(y) Due to the presence of the second term on the right hand side of ( 9)the relationship e-g(g) no longer holds. Thus the value of obtained by maximizing p:(a)will not be the value obtained by transforming to py(y) then maximizing with respect to y and then transforming back to This causes modes of densities to be dependent on the choice of variables. In the case of linear transformation the second term on the right hand side of(9)vanishes, and so the location of the maximum transforms according tox=g(y) This effect can be illustrated with a simple example as shown in Figure 1. we begin by considcring a Gaussian distribution pa(r)ovcr r with mcan ll=6 and tandard deviation 1, shown by the red curve in Figure Next we draw a sample of N= 50, 000 points from this distribution and plot a histogram of their values, which as expected agrees with the distribution pr ( Now consider a non-lincar changc of variables from m to y/ given by x=9(y)=ln(y)-ln(1-y)+5. (10) The inverse of this function is given b 1+exp(-+ Figure 1 Example of the transformation of the mode of a density under a non-I linear change of variables, illus- Pg,(y) 1(x trating the different behaviour com- pared to a simple function. See the y text for details 0.5 10 which is a logistic sigmoid function, and is shown in Figure l by the blue curve If we simply transform p()as a function of x we obtain the green curve p(g(y)) shown in Figure 1, and we see that the mode of the densily p: (=) is transformed via the sigmoid function to the mode of this curve. However, the density over y transforms instead according to(1. 27)and is shown by the magenta curve on the left side of the diagram note that this has its mode shifted relative to the mode of the green cur To confirm this result we take our sample of 50, 000 values of a, evaluate the corre sponding values of y using(11), and then plot a histogram of their values. We see that this histogram matches the magenta curve in Figure l and not the green curve 1.5 Expanding the square we have E((r)-Elf()) f(x)2-2(x)E|f(x)]+Ef(x)2」 Ef(a)-2EIf (E[f(c)+Elf(a) E(x)2-网()2 as required 1.6 The definition of covariance is given by(1.41)as cov, y] -] Using(1.33)and the fact that pla, y)=p(cp(y when c and y are independent, we obtain 2p(,y)2y ∑)∑ ply)y =E[Eg 10 Solutions 1.7-1. 8 and hence cov[, y=0. The case where x and y are continuous variables is analo- gous, with (1.33)rcplaccd by(1.34)and thc sums rcplaccd by integrals 1.7 The transformation from Cartesian to polar coordinates is defined by os 0 (12 y r sin (13) and hencc we havc 2+y2=r2 whcrc wc havc uscd thc well-known trigonometric result2. 177). Also the Jacobian of the change of variables is easily seen to be 0:o 0(x,y) ar a8 du aL cos6-rsin e sin g where again we have used (2. 177). Thus the double integral in(1. 125) becomes 2丌 Boe)drdo (14 2丌 exp d (15) r[ep(2)(2)] (16) (17) where wc havc uscd the changc of variables r4= al. Thus Finally, using the transformation y=a-A, the integral of the gaussian distribution becomes =(2m (2丌σ as required 1. 8 From the definition(1. 46)of the univariate Gaussian distribution, we have =厂、(x)p(-d 【实例截图】
【核心代码】
标签:
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论