实例介绍
【实例简介】Building Machine Learning Systems with Python_ Sec.pdf
【实例截图】
【核心代码】
Table of Contents Preface vii Chapter 1: Getting Started with Python Machine Learning 1 Machine learning and Python – a dream team 2 What the book will teach you (and what it will not) 3 What to do when you are stuck 4 Getting started 5 Introduction to NumPy, SciPy, and matplotlib 6 Installing Python 6 Chewing data efficiently with NumPy and intelligently with SciPy 6 Learning NumPy 7 Indexing 9 Handling nonexisting values 10 Comparing the runtime 11 Learning SciPy 12 Our first (tiny) application of machine learning 13 Reading in the data 14 Preprocessing and cleaning the data 15 Choosing the right model and learning algorithm 17 Before building our first model… 18 Starting with a simple straight line 18 Towards some advanced stuff 20 Stepping back to go forward – another look at our data 22 Training and testing 26 Answering our initial question 27 Summary 28 Chapter 2: Classifying with Real-world Examples 29 The Iris dataset 30 Visualization is a good first step 30 Building our first classification model 32 Evaluation – holding out data and cross-validation 36 www.allitebooks.com Table of Contents [ ii ] Building more complex classifiers 39 A more complex dataset and a more complex classifier 41 Learning about the Seeds dataset 41 Features and feature engineering 42 Nearest neighbor classification 43 Classifying with scikit-learn 43 Looking at the decision boundaries 45 Binary and multiclass classification 47 Summary 49 Chapter 3: Clustering – Finding Related Posts 51 Measuring the relatedness of posts 52 How not to do it 52 How to do it 53 Preprocessing – similarity measured as a similar number of common words 54 Converting raw text into a bag of words 54 Counting words 55 Normalizing word count vectors 58 Removing less important words 59 Stemming 60 Stop words on steroids 63 Our achievements and goals 65 Clustering 66 K-means 66 Getting test data to evaluate our ideas on 70 Clustering posts 72 Solving our initial challenge 73 Another look at noise 75 Tweaking the parameters 76 Summary 77 Chapter 4: Topic Modeling 79 Latent Dirichlet allocation 80 Building a topic model 81 Comparing documents by topics 86 Modeling the whole of Wikipedia 89 Choosing the number of topics 92 Summary 94 Chapter 5: Classification – Detecting Poor Answers 95 Sketching our roadmap 96 Learning to classify classy answers 96 Tuning the instance 96 Table of Contents [ iii ] Tuning the classifier 96 Fetching the data 97 Slimming the data down to chewable chunks 98 Preselection and processing of attributes 98 Defining what is a good answer 100 Creating our first classifier 100 Starting with kNN 100 Engineering the features 101 Training the classifier 103 Measuring the classifier's performance 103 Designing more features 104 Deciding how to improve 107 Bias-variance and their tradeoff 108 Fixing high bias 108 Fixing high variance 109 High bias or low bias 109 Using logistic regression 112 A bit of math with a small example 112 Applying logistic regression to our post classification problem 114 Looking behind accuracy – precision and recall 116 Slimming the classifier 120 Ship it! 121 Summary 121 Chapter 6: Classification II – Sentiment Analysis 123 Sketching our roadmap 123 Fetching the Twitter data 124 Introducing the Naïve Bayes classifier 124 Getting to know the Bayes' theorem 125 Being naïve 126 Using Naïve Bayes to classify 127 Accounting for unseen words and other oddities 131 Accounting for arithmetic underflows 132 Creating our first classifier and tuning it 134 Solving an easy problem first 135 Using all classes 138 Tuning the classifier's parameters 141 Cleaning tweets 146 Taking the word types into account 148 Determining the word types 148 Successfully cheating using SentiWordNet 150 Table of Contents [ iv ] Our first estimator 152 Putting everything together 155 Summary 156 Chapter 7: Regression 157 Predicting house prices with regression 157 Multidimensional regression 161 Cross-validation for regression 162 Penalized or regularized regression 163 L1 and L2 penalties 164 Using Lasso or ElasticNet in scikit-learn 165 Visualizing the Lasso path 166 P-greater-than-N scenarios 167 An example based on text documents 168 Setting hyperparameters in a principled way 170 Summary 174 Chapter 8: Recommendations 175 Rating predictions and recommendations 175 Splitting into training and testing 177 Normalizing the training data 178 A neighborhood approach to recommendations 180 A regression approach to recommendations 184 Combining multiple methods 186 Basket analysis 188 Obtaining useful predictions 190 Analyzing supermarket shopping baskets 190 Association rule mining 194 More advanced basket analysis 196 Summary 197 Chapter 9: Classification – Music Genre Classification 199 Sketching our roadmap 199 Fetching the music data 200 Converting into a WAV format 200 Looking at music 201 Decomposing music into sine wave components 203 Using FFT to build our first classifier 205 Increasing experimentation agility 205 Training the classifier 207 Using a confusion matrix to measure accuracy in multiclass problems 207 Table of Contents [ v ] An alternative way to measure classifier performance using receiver-operator characteristics 210 Improving classification performance with Mel Frequency Cepstral Coefficients 214 Summary 218 Chapter 10: Computer Vision 219 Introducing image processing 219 Loading and displaying images 220 Thresholding 222 Gaussian blurring 223 Putting the center in focus 225 Basic image classification 228 Computing features from images 229 Writing your own features 230 Using features to find similar images 232 Classifying a harder dataset 234 Local feature representations 235 Summary 239 Chapter 11: Dimensionality Reduction 241 Sketching our roadmap 242 Selecting features 242 Detecting redundant features using filters 242 Correlation 243 Mutual information 246 Asking the model about the features using wrappers 251 Other feature selection methods 253 Feature extraction 254 About principal component analysis 254 Sketching PCA 255 Applying PCA 255 Limitations of PCA and how LDA can help 257 Multidimensional scaling 258 Summary 262 Chapter 12: Bigger Data 263 Learning about big data 264 Using jug to break up your pipeline into tasks 264 An introduction to tasks in jug 265 Looking under the hood 268 Using jug for data analysis 269 Reusing partial results 272 Table of Contents [ vi ] Using Amazon Web Services 274 Creating your first virtual machines 276 Installing Python packages on Amazon Linux 282 Running jug on our cloud machine 283 Automating the generation of clusters with StarCluster 284 Summary 288 Appendix: Where to Learn More Machine Learning 291 Online courses 291 Books 291 Question and answer sites 292 Blogs 292 Data sources 293 Getting competitive 293 All that was left out 293 Summary 294 Index 295
标签: pdf
六步使用Python构建机器学习系统(Building Machine Learning Systems with Python_ Sec.pdf)
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论