实例介绍
Regression Modeling Strategies.pdf
Frank e. harrell. r Regression Modeling Strategies With Applications to Linear Models Logistic and Ordinal regression, and Survival analysis Second edition S ringer Frank e. harrell. Jr Department of Biostatistics School of medicine Vanderbilt University Nashville. TN. uSa ISSN0172-7397 IssN 2197-568X (electronic) Springer Series in Statistics ISBN978-3-319-194240 ISBN978-3-319-19425-7(Book) DOI10.1007978-3-319-19425-7 Library of Congress Control Number: 2015942921 pringer Cham Heidelberg New York Dordrecht London C Springer Science+Business Media New York 2001 C Springer International Publishing Switzerland 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher. the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made Printed on acid-free paper Springer International Publishing AG Switzerland is part of Springer Science+Business Media(www springer. com) To the memories of frank E. Harrell, Sr Richard ackson.L. Richard smith. Johm Burdeshaw. and todd nick. and with ciation to l? d charlotte Harrell, two high school math teachers Wailes(nee Gaston)and Floyd Christian, two college professors: David Hurst who advised me to choose the field of biostatistics)and Doug Stocks, and mg graduate advisor P.K. Sen Preface There are many books that are excellent sources of knowledge about individual statistical tools(survival models, general linear models, etc. ) but the art of data analysis is about choosing and using multiple tools. In the words of Chatfield [100, p. 420"...students typically know the technical de- tails of regression for example, but not necessarily when and how to apply This argues the need for a better balance in the literature and in statistica teaching between techniques and problem solving strategies. Whether ana- lyzing risk factors, adjusting for biases in observational studies, or developing predictive models, there are common problems that few regression texts ad dress. For example, there are missing data in the majority of datasets one is likely to encounter (other than those used in textbooks! but most regression texts do not include methods for dealing with such data effectively, and most texts on missing data do not cover regression modeling This book links standard regression modeling approaches with methods for relaxing linearity assumptions that still allow one to easily obtain predictions and confidence limits for future observations, and to do orma hypothesis tests non-additive modeling approaches not requiring the assumption that interactions are always linear x linear methods for imputing missing data and for penalizing variances for incom plete data, methods for handling large numbers of predictors without resorting to problematic stepwise varlable selection techniques, data reduction methods(unsupervised learning methods, some of which are based on multivariate psychometric techniques too seldom used in statistics) that help with the problem of too many variables to analyze and not enough observations"as well as making the model more interpretable when there are predictor variables containing overlapping information methods for quantifying predictive accuracy of a fitted model v111 Preface powerful model validation techniques based on the bootstrap that allow the analyst to estimate predictive accuracy nearly unbiasedly without holding back data from the model development process, and graphical methods for understanding complex models On the last point, this text has special emphasis on what could be called presentation graphics for fitted models'"to help make regression analyses more palatable to non-statisticians. For example, nomograms have long been used to make equations portable, but they are not drawn routinely because doing so is very labor-intensive. An R function called nomogram in the package described below draws nomograms from a regression fit, and these diagrams can be used to communicate modeling results as well as to obtain predicted values manually even in the presence of complex variable transformations Most of the methods in this text apply to all regression models, but special emphasis is given to some of the most popular ones: multiple regression using least squares and its generalized least squares extension for serial (repeated measurement) data, the binary logistic model, models for ordinal responses parametric survival regression models, and the Cox semiparametric survival model. There is also a chapter on nonparametric transform-both-sides regres sion. Emphasis is given to detailed case studies for these methods as well as for data reduction, imputation, model simplification, and other tasks. Ex cept for the case study on survival of Titanic passengers, all examples are from biomedical research. However, the methods presented here have broad application to other areas including economics, epidemiology, sociology, psy chology, engineering, and predicting consumer behavior and other business outcomes This text is intended for Masters or PhD level graduate students who have had a general introductory probability and statistics course and who are well versed in ordinary multiple regression and intermediate algebra. The book is also intended to serve as a reference for data analysts and statistical methodologists. Readers without a strong background in applied statistics may wish to first study one of the many introductory applied statistics and regression texts that are available. The author's course notes Biostatistics for Biomedical research on the texts web site covers basic regression and many other topics. The paper by Nick and Hardin [476] also provides a good introduction to multivariable modeling and interpretation. There are many excellent intermediate level texts on regression analysis. One of them is by Fox, which also has a companion software-based text [200, 201]. For readers interested in medical or epidemiologic research, Steyerberg's excellent text Clinical Prediction Models [586] is an ideal companion for Regression Modeling Strategies. Steyerberg's book provides further explanations, examples, and simulations of many of the methods presented here. And no text on regression modeling should fail to mention the seminal work of John Nelder [4501 The overall philosophy of this book is summarized by the following stat ments Preface Satisfaction of model assumptions improves precision and increases statis tical power It is more productive to make a model fit step by step(e. g, transformation estimation) than to postulate a simple model and find out what went wrong Graphical methods should be married to formal inference Overfitting occurs frequently, so data reduction and model validation are important In most research projects, the cost of data collection far outweighs the cost of data analysis, so it is important to use the most efficient and accurate modeling techniques, to avoid categorizing continuous variables, and to not remove data from the estimation sample just to be able to validate the mod The bootstrap is a breakthrough for statistical modeling, and the analyst should use it for many steps of the modeling strategy, including deriva tion of distribution-free confidence intervals and estimation of optimism in model fit that takes into account variations caused by the modeling strateg. Imputation of missing data is better than discarding incomplete observa tions Variance often dominates bias, so biased methods such as penalized ma imum likelihood estimation yield models that have a greater chance of accurately predicting future observations Software without multiple facilities for assessing and fixing model fit may only seem to be user-triendly Carefully fitting an improper model is better than badly fitting(and over fitting)a well-chosen one Methods that work for all types of regression models are the most valuable Using the data to guide the data analysis is almost as dangerous as not doing so There are benefits to modeling by deciding how many degrees of freedom (i.e., number of regression parameters) can be "spent, " deciding where they should be spent, and then spending them. On the last point, the author believes that significance tests and P-values are problematic, especially when making modeling decisions. Judging by the Increased emphasis on confidence intervals in scientifc Journals there is reason to believe that hypothesis testing is gradually being de-emphasized. Yet the reader will notice that this text contains many P-values. How does that make sense when, for example, the text rece gainst simplifying a model when a test of linearity is not significant? First, some readers may wish to emphasize hypothesis testing in general, and some hypotheses have special interest, such as in pharmacology where one may be interested in whether the effect of a drug is linear in log dose. Second, many of the more interesting hypothesis tests in the text are tests of complexity(nonlinearity, interaction of the overall model. Null hypotheses of linearity of effects in particular are 【实例截图】
【核心代码】
标签:
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论