实例介绍
Dan Gusfield的字符串处理的经典书籍Algorithms on Strings, Trees and Sequences,完整版,550页。pdf版本
Algorithms on Strings, Trees, and sequences Algorithms on Strings, Trees, and sequences COMPUTER SCIENCE AND COMPUTATIONAL BIOLOGY Dan Gusfield Universiry of califonia, Davis 醪 CAMBRIDGE UNIVERSITY PRESS PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge CB2 IRP 40 West 20th Street, New York, NY 10011-4211, USA 10 Stamford road, Oakleigh, Melbourne 3166, australi C Dan Gusfield 1997 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press st pI Reprinted 1999(with corrections) Printed in the United States of America Typeset in Times Roman Library of Congress Cataloging-in-Publication Data Gusfield. Dan Algorithms on strings, trees, and sequences: computer science and mutational biology /Dan Gusfield.I ISBN0-52|-585198(hc) 1. Computer algorithms. 2. Molecular biology - Data processing L, Title QA769.A433871997 005.73-dc21 86-46612 A catalog record for this book is available from the British library ISBN 52158519 8 hardback Dedicated to the memory of Gene lawler Teacher, Colleague, and friend Contents Preface X111 I Exact String Matching: The Fundamental String Problem 1 Exact Matching: Fundamental Preprocessing and First algorithms 1.1 The naive method 1.2 The preprocessing approach 1.3 Fundamental preprocessing of the pattem 1.4 Fundamental preprocessing in linear time 1.5 The simplest linear-time exact matching algorithm 10 1.6 Exercises 2 Exact Matching: Classical Comparison-Based Methods 16 2. 1 Introduction 2.2 The Boyer-Moore Algorithm 16 2.3 The Knuth-Morris-Pratt algorithm 2.4 Real-time string matching 27 2.5 Exercises 9 3 Exact Matching: A Deeper look at Classical Methods 35 3.1 A Boyer-Moore variant with a"simple"" linear time bound 35 3. 2 Coles linear worst-case bound for boyer-Moore 39 3.3 The original preprocessing for Knuth-Morris-Pratt 48 3.4 Exact matching with a set of patterns 3.5 Three applications of exact set matching 61 3.6 Regular expression pattern matching 65 3.7 Exercises 67 4 Seminumerical String Matching 70 4. 1 Arithmetic versus comparison-based methods 4. 2 The Shift-And method 70 4.3 The match-count problem and Fast Fourier Transform 4.4 Karp-Rabin fingerprint methods for exact match 77 4.5 Exercises 84 vIII CONTENTS iI Suffix Trees and Their uses 87 5 Introduction to Suffix Trees 89 5.1 A short history 5.2 Basic definitions 90 5.3 A motivating example 91 5.4 a naive algorithm to build a suffix tree 6 Linear- Time Construction of suffix trees 94 6.1 Ukkonen,'s linear-time suffix tree algorithm 94 6.2 Weiner's linear-time suffix tree algorithm 107 6.3 McCreight's suffix tree algorithm 115 6.4 Generalized suffix tree for a set of strings 116 6.5 Practical implementation issues 6.6 Exercises 119 7 First Applications of Suffix Trees 122 7.1 APLI: Exact string matching 122 7. 2 APL2: Suffix trees and the exact set matching problem 123 7.3 APL3: The substring problem for a database of pattems 124 7.4 APLA: Longest common substring of two strings 125 7.5 APLS: Recognizing DNA contamination 125 7.6 APL6 Common substrings of more than two strings 127 7.7 APL7: Building a smaller directed graph for exact matching 129 7.8 APL& A reverse role for suffix trees, and major space reduction 132 7.9 APL9: Space-efficient longest common substring algorithm 135 7.10 APL10: All-pairs suffix-prefix matching 135 7.11 Introduction to repetitive structures in molecular strings 138 7. 12 APLll: Finding all maximal repetitive structures in linear time 143 7. 13 APL12: Circular string linearization 148 7.14 APL13: Suffix arrays-more space reduction 149 7.15 APL14: Suffix trees in genome-scale projects 156 7.16 APL15: A Boyer-Moore approach to exact set matching 157 7.17 APL16: Ziv-Lempel data compression 164 7.18 APL17: Minimum length encoding of DNA 167 7. 19 Additional applications 168 7.20 exercises 168 8 Constant-Time Lowest Common Ancestor retrieval 181 8.1 Introduction 18 8. 2 The assumed machine model 182 8.3 Complete binary trees: a very simple case 182 8.4 How to solve lca queries in 3 183 8.5 First steps in mapping to B 184 8.6 The mapping of T to 3 186 8.7 The linear-time preprocessing of T 188 8.8 Answering an lca query in constant time 189 8.9 The binary tree is only conceptual 191 CONTENTS 8.10 For the purists: how to avoid bit-level operations 192 8. 11 Exercises 193 9 More applications of Suffix Trees 196 9.1 Longest common extension: a bridge to inexact matching 196 9. 2 Finding all maximal palindromes in linear time 197 9.3 Exact matching with wild cards 199 9, 4 The k- h problem 200 9.5 Approximate palindromes and repeats 201 9.6 Faster methods for tandem repeats 02 9. A linear-time solution to the multiple common substring problem 205 9.8 Exercises 207 II Inexact Matching, Sequence Alignment, Dynamic Programming 209 10 The Importance of Sub)sequence Comparison in Molecular Biology 212 1 Core String Edits, Alignments, and Dynamic Programming 215 11.1 Introduction 215 11.2 The edit distance between two strings 215 11.3 Dynamic programming calculation of edit distance 217 11. 4 Edit graphs 223 11.5 Weighted edit distance 224 11.6 String similarity 225 11.7 Local alignment: finding substrings of high similarity 230 118Ga 235 119Ex 245 12 Refining Core String Edits and Alignments 54 12.1 Computing alignments in only linear space 254 12.2 Faster algorithms when the number of differences are bounded 259 12.3 Exclusion methods: fast expected running tim 270 12. 4 Yet more suffix trees and more hybrid dynamic programming 12.5 A faster(combinatorial)algorithm for longest common subsequence 287 12.6 Convex gap weights 293 12. 7 The Four- Russians speedu 302 12. 8 Exercises 308 13 Extending the Core Problems 13.1 Parametric sequence alignment 312 13. 2 Computing suboptimal alignments 321 13,3 Chaining diverse local alignments 325 13.4 Exercises 329 14 Multiple string Comparison-The Holy grail 332 14.1 Why multiple string comparison? 332 14.2 Three"big-picture " biological uses for multiple string comparison 35 14.3 Family and superfamily representation 336 X CONTENTS 14.4 Multiple sequence comparison for structural inference 34 145 Introduction to computing multiple string alignments 342 14.6 Multiple alignment with the sum-of-pairs(SP)objective function 343 14.7 Multiple alignment with consensus objective functions 351 14.8 Multiple alignment to a(phylogenetic) tree 354 14.9 Comments on bounded-error approximations 358 14.10 Common multiple alignment methods 14.11 Exercises 366 15 Sequence Databases and Their Uses- The mother Lode 370 5.1 Success stories of database search 370 15.2 The database industry 373 15.3 Algorithmic issues in database search 375 15. 4 Real sequence database search 376 15.5 FASTA 15.6 BLAST 379 15.7 PAM: the first major amino acid substitution matrices 381 15.8 PROSITE 385 15.9 BLOCKS and bloSum 385 15.10 The blosum substitution matrices 386 15.11 Additional considerations for database searching 387 15.12 Exercises 391 I Currents, Cousins and Cameos 393 16 Maps, Mapping, Sequencing, and Superstrings 395 6.1 A look at some DNa mapping and sequencing problems 395 16.2 Mapping and the genome project 395 16.3 Physical versus genetic maps 396 6.4 Physical mapping 398 16.5 Physical mapping: STS-content mapping and ordered clone libraries 398 16.6 Physical mapping: radiation-hybrid mapping 401 16.7 Physical mapping: fingerprinting for general map construction 406 16.8 Computing the tightest layout 407 16.9 Physical mapping: last comments 411 6.10 An introduction to map alignment 16.11 Large-scale sequencing and sequence assembly 6.12 Directed sequencing 415 16.13 Top-down, bottom-up sequencing: the picture using YACs 416 16.14 Shotgun DNA sequencing 420 16.15 Sequence assembl 20 16.16 Final comments on top-down, bottom-up sequencing 424 16.17 The shortest superstring problem 16.18 Sequencing by hybridization 437 16.19 Exercises 442 【实例截图】
【核心代码】
标签:
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论