在好例子网,分享、交流、成长!
您当前所在位置:首页Python 开发实例Python语言基础 → Python_文本处理指南[经典]

Python_文本处理指南[经典]

Python语言基础

下载此实例
  • 开发语言:Python
  • 实例大小:5.78M
  • 下载次数:13
  • 浏览次数:80
  • 发布时间:2022-01-11
  • 实例类别:Python语言基础
  • 发 布 人:hy009
  • 文件格式:.pdf
  • 所需积分:1
 相关标签: python 指南 py 文本 经典

实例介绍

【实例简介】Python_文本处理指南[经典]
【实例截图】

【核心代码】

Table of Contents
Preface 1
Chapter 1: Getting Started 7
Categorizing types of text data 8
Providing information through markup 8
Meaning through structured formats 9
Understanding freeform content 9
Ensuring you have Python installed 9
Providing support for Python 3 10
Implementing a simple cipher 10
Time for action – implementing a ROT13 encoder 11
Processing structured markup with a filter 15
Time for action – processing as a filter 15
Time for action – skipping over markup tags 18
State machines 22
Supporting third-party modules 23
Packaging in a nutshell 23
Time for action – installing SetupTools 23
Running a virtual environment 25
Configuring virtualenv 25
Time for action – configuring a virtual environment 25
Where to get help? 28
Summary 28
Chapter 2: Working with the IO System 29
Parsing web server logs 30
Time for action – generating transfer statistics 31
Using objects interchangeably 35
Time for action – introducing a new log format 35
Accessing files directly 37
Table of Contents
[ ii ]
Time for action – accessing files directly 37
Context managers 39
Handling other file types 41
Time for action – handling compressed files 41
Implementing file-like objects 42
File object methods 43
Enabling universal newlines 45
Accessing multiple files 45
Time for action – spell-checking HTML content 46
Simplifying multiple file access 50
Inplace filtering 51
Accessing remote files 52
Time for action – spell-checking live HTML pages 52
Error handling 55
Time for action – handling urllib 2 errors 55
Handling string IO instances 57
Understanding IO in Python 3 58
Summary 59
Chapter 3: Python String Services 61
Understanding the basics of string object 61
Defining strings 62
Time for action – employee management 62
Building non-literal strings 68
String formatting 68
Time for action – customizing log processor output 68
Percent (modulo) formatting 74
Mapping key 75
Conversion flags 76
Minimum width 76
Precision 76
Width 77
Conversion type 77
Using the format method approach 78
Time for action – adding status code data 79
Making use of conversion specifiers 83
Creating templates 86
Time for action – displaying warnings on malformed lines 86
Template syntax 88
Rendering a template 88
Calling string object methods 89
Time for action – simple manipulation with string methods 89
Aligning text 92
Table of Contents
[ iii ]
Detecting character classes 92
Casing 93
Searching strings 93
Dealing with lists of strings 94
Treating strings as sequences 95
Summary 96
Chapter 4: Text Processing Using the Standard Library 97
Reading CSV data 98
Time for action – processing Excel formats 98
Time for action – CSV and formulas 101
Reading non-Excel data 103
Time for action – processing custom CSV formats 103
Writing CSV data 106
Time for action – creating a spreadsheet of UNIX users 106
Modifying application configuration files 110
Time for action – adding basic configuration read support 110
Using value interpolation 114
Time for action – relying on configuration value interpolation 114
Handling default options 116
Time for action – configuration defaults 116
Writing configuration data 118
Time for action – generating a configuration file 119
Reconfiguring our source 122
A note on Python 3 122
Time for action – creating an egg-based package 122
Understanding the setup.py file 131
Working with JSON 132
Time for action – writing JSON data 132
Encoding data 134
Decoding data 135
Summary 136
Chapter 5: Regular Expressions 137
Simple string matching 138
Time for action – testing an HTTP URL 138
Understanding the match function 140
Learning basic syntax 140
Detecting repetition 140
Specifying character sets and classes 141
Applying anchors to restrict matches 143
Wrapping it up 144
Table of Contents
[ iv ]
Advanced pattern matching 145
Grouping 145
Time for action – regular expression grouping 146
Using greedy versus non-greedy operators 149
Assertions 150
Performing an 'or' operation 152
Implementing Python-specific elements 153
Other search functions 153
search 153
findall and finditer 153
split 154
sub 154
Compiled expression objects 155
Dealing with performance issues 156
Parser flags 156
Unicode regular expressions 157
The match object 158
Processing bind zone files 158
Time for action – reading DNS records 159
Summary 164
Chapter 6: Structured Markup 165
XML data 166
SAX processing 168
Time for action – event-driven processing 168
Incremental processing 171
Time for action – driving incremental processing 171
Building an application 172
Time for action – creating a dungeon adventure game 172
The Document Object Model 176
xml.dom.minidom 176
Time for action – updating our game to use DOM processing 176
Creating and modifying documents programmatically 183
XPath 185
Accessing XML data using ElementTree 186
Time for action – using XPath in our adventure 187
Reading HTML 194
Time for action – displaying links in an HTML page 194
BeautifulSoup 195
Summary 196
Table of Contents
[ v ]
Chapter 7: Creating Templates 197
Time for action – installing Mako 198
Basic Mako usage 199
Time for action – loading a simple Mako template 199
Generating a template context 203
Managing execution with control structures 204
Including Python code 205
Time for action – reformatting the date with Python code 205
Adding functionality with tags 206
Rendering files with %include 206
Generating multiline comments with %doc 207
Documenting Mako with %text 207
Defining functions with %def 208
Time for action – defining Mako def tags 208
Importing %def sections using %namespace 210
Time for action – converting mail message to use namespaces 210
Filtering output 213
Expression filters 214
Filtering the output of %def blocks 214
Setting default filters 215
Inheriting from base templates 215
Time for action – updating base template 215
Growing the inheritance chain 218
Time for action – adding another inheritance layer 219
Inheriting attributes 221
Customizing 222
Custom tags 222
Time for action – creating custom Mako tags 223
Customizing filters 226
Overviewing alternative approaches 226
Summary 227
Chapter 8: Understanding Encodings and i18n 229
Understanding basic character encodings 230
ASCII 230
Limitations of ASCII 231
KOI8-R 232
Unicode 232
Using Unicode with Python 3 233
Understanding Unicode 234
Design goals 234
Organizational structure 236
Backwards compatibility 236
Table of Contents
[ vi ]
Encoding 237
UTF-32 237
UTF-8 237
Encodings in Python 238
Time for action – manually decoding 239
Reading Unicode 240
Writing Unicode strings 241
Time for action – copying Unicode data 242
Time for action – fixing our copy application 244
The codecs module 245
Time for action – changing encodings 245
Adopting good practices 248
Internationalization and Localization 249
Preparing an application for translation 250
Time for action – preparing for multiple languages 250
Time for action – providing translations 253
Looking for more information on internationalization 254
Summary 255
Chapter 9: Advanced Output Formats 257
Dealing with PDF files using PLATYPUS 258
Time for action – installing ReportLab 258
Generating PDF documents 259
Time for action – writing PDF with basic layout and style 259
Writing native Excel data 266
Time for action – installing xlwt 266
Building XLS documents 267
Time for action – generating XLS data 267
Working with OpenDocument files 271
Time for action – installing ODFPy 272
Building an ODT generator 273
Time for action – generating ODT data 273
Summary 277
Chapter 10: Advanced Parsing and Grammars 279
Defining a language syntax 280
Specifying grammar with Backus-Naur Form 281
Grammar-driven parsing 282
PyParsing 283
Time for action – installing PyParsing 283
Time for action – implementing a calculator 284
Parse actions 287
Time for action – handling type translations 287
Table of Contents
[ vii ]
Suppressing parts of a match 289
Time for action – suppressing portions of a match 289
Processing data using the Natural Language Toolkit 297
Time for action – installing NLTK 298
NLTK processing examples 298
Removing stems 298
Discovering collocations 299
Summary 300
Chapter 11: Searching and Indexing 301
Understanding search complexity 302
Time for action – implementing a linear search 302
Text indexing 304
Time for action – installing Nucular 304
An introduction to Nucular 305
Time for action – full text indexing 307
Time for action – measuring index benefit 310
Scripts provided by Nucular 312
Using XML files 312
Advanced Nucular features 313
Time for action – field-qualified indexes 314
Performing an enhanced search 317
Time for action – performing advanced Nucular queries 317
Indexing and searching other data 320
Time for action – indexing Open Office documents 320
Other index systems 325
Apache Lucene 325
ZODB and zc.catalog 325
SQL text indexing 325
Summary 326
Appendix A: Looking for Additional Resources 327
Python resources 328
Unofficial documentation 328
Python enhancement proposals 328
Self-documenting 329
Using other documentation tools 331
Community resources 332
Following groups and mailing lists 332
Finding a users' group 333
Attending a local Python conference 333
Honorable mention 333
Lucene and Solr 333
Table of Contents
[ viii ]
Generating C-based parsers with GNU Bison 334
Apache Tika 335
Getting started with Python 3 335
Major language changes 336
Print is now a function 336
Catching exceptions 337
Using metaclasses 338
New reserved words 338
Major library changes 339
Changes to list comprehensions 339
Migrating to Python 3 339
Time for action – using 2to3 to move to Python 3 340
Summary 342
Appendix B: Pop Quiz Answers 343
Chapter 1: Getting Started 343
ROT 13 Processing Answers 343
Chapter 2: Working with the IO System 344
File-like objects 344
Chapter 3: Python String Services 344
String literals 344
String formatting 345
Chapter 4: Text Processing Using the Standard Library 345
CSV handling 345
JSON formatting 346
Chapter 5: Regular Expressions 346
Regular expressions 346
Understanding the Pythonisms 346
Chapter 6: Structured Markup 347
SAX processing 347
Chapter 7: Creating Templates 347
Template inheritance 347
Chapter 8: Understanding Encoding and i18n 347
Character encodings 347
Python encodings 348
Internationalization 348
Chapter 9: Advanced Output Formats 348
Creating XLS documents 348
Chapter 11: Searching and Indexing 349
Introduction to Nucular 349
Index 351

实例下载地址

Python_文本处理指南[经典]

不能下载?内容有错? 点击这里报错 + 投诉 + 提问

好例子网口号:伸出你的我的手 — 分享

网友评论

发表评论

(您的评论需要经过审核才能显示)

查看所有0条评论>>

小贴士

感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。

  • 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
  • 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
  • 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
  • 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。

关于好例子网

本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明

;
报警