实例介绍
实用语音识别基础--21世纪高等院校技术优秀教材 ISBN:711803746 作者:王炳锡 屈丹 彭煊 出版社:国防工业出版社 本书从语音识别的基本理论出发,以“从理论到实用”为主线,讲解了国际上最新、最前沿的语音识别领域的关键技术,从语料库建立、语音信号预处理、特征提取、特征变换、模型建立等方面详细介绍了语音识别系统建立的过程,并针对语音识别系统实用化的问题,给出了一些改善语音识别系统性能的关键技术,力求语音识别能走出实验室,向实用发展。 全书共分四个部分(17章),第一部分介绍语音识别的基本理论;第二部分介绍实用语音识别系统建立的过程;第三部分列举了语音识别系统工程化所
图书在版编目(CP数据 实用语音识别基础/王炳锡等著.一北京:国防工业 出版社,2005 21世纪高等院校优秀教材 ISBⅣ7118-03746-X I.实.Ⅱ.王,,.语音识别一高等学校一教 材Ⅳ.H012 中国版本图书馆CIP数据核字(2004第127395号 阍-草急祉出版发行 (北京市海淀区震竹院南路23号 (郎政编码100044 新艺印利厂印到 新华书店經售 开本737x1092116印张24539千字 2005年1月第1版2005年1月北京第1次印刷 印数:1-4000册定价:38.00元 本书如有印装错误,我牡负责调换〕 国防书店:(010)6842842 发行邮购:(0106841474 发行传真:(010)69411535 发行业务:(010)68472754 计算机技术是二十世纪最伟大的发明,是当代发展最为迅猛的科学技术,它几乎 渗透到人类社会活动的每一个领域;计算机网络的出现使人类的信息交流超越了时 间和空间,知识能够共享,引发了经济结构和生活方式的深刻交革,极大地推动着人 类社会的发展和进步。 计算机科学和众多科学交叉、融合渗透产生出许多新学科,推动着科学技术向 前发展。 多媒体牧术的兴起,成为计算机与人之间信息交流的关键技术,由此引发的多媒 体信息处理领域的研究课题涉及模仿人类感官的信息采集和模仿人类智能的处理、 理解判断因此不仅对生理器官要有深刻的了解,而且要对神经中枢的作用、心理作 用做相应的研究,这是一个极具挑战性的研究领域。 语音识别是人机语音通信的关键技术之一,也是难题之一,经过广大科技工作着 的不懈努力,已在不同层面上有突破性进展,并取得了可喜的成绩。 《实用语音识别基础》对语音识别的理论和关键技术作了回顾和总结,从实用出 发,引人了现代非线性处理理论和方法,为把语音识别从实验室推向实际应用提供 了必要的解决思路和方案,同时也反映了学科前沿和发展起势。我认为,4书的出版 对语音识别的研究将起到推动作用,对信息处理的发展将做出有益的贡敵 中国工程院院士 2002年中国国家最高科技奖获得者 2004年5月 前 人类有个理想,让机器具有“听”、“说人类语言的能力。这个理想,在信息时代正逐 步变成现实,童话般的神奇世界正慢慢地H我们走来。语音识别正是解决桃器“听”懂人 类语言的一项研究ε 新世纪伊始,信息革命如火如茶语音识别也进入了全面发展时期,适时地回顾语音 认别在理论和实践方面的发展历程,总结研究成果,理清未来发展思路无疑对语音识别 的研究是十分重要的。作者从事语音识别教学、科研二十多年,做了一些工作,积累了一 些经验,有些想法和思路,希望与学术界同行交流。当然其中有偏颇之处,只是管窥之见, 一家之言 语音信号是非平稳的时变的,复杂的信息量大的信号,是语义信息加个人特征的混 合信号。我们目前的语音识别是以话音信号为原始素材做三个层面的处理:物理(声学) 层面,语言学层面(自然语言理解)大脑裨经(智能)。物理层面的研究较为深人,在理论 和实践上有较多的积累;结合语言学的自然语言理解也有相当的研究,但成果不甚显著; 结合大脑神经的智能化研究则刚刚起步;而要使语音识别做到像人一样,我认为还应在心 理层面开展工作,当前这方面的研究几乎是个空白。 不管是用于针么目的的语音识删,无外乎三个模块参数描述、模型逼近推理判决。 人们总是希望描述语音某个特狂的参数集合稳定,便于提取,仙现在仍没找到这种对应关 系,各种特征在参数集合中有交织,只是在交织的程度上表现不同。模型逼近具有理论基 础,有代表怍的是模板匹配、统计概率模型和判别模型,另外还有某些特殊用途的背景模 型(反模型废料槟型)。由于对语音信号物理本质的认识的局限性,这些模型只能是逼 近。推理判决从简单的最小失真距离判别假设检验,发展到数据融合、证据理论的应用, 应该说这些知识在不同程度上发挥了重要作用,但离实用还有很大的距离。我认为语音 信号的物理本质是湍流,是一种复杂的混沌现象,而人耳感知语音的模型仍是个难题,目 前只是简单地看作滤波器组对语音信号滤波,语音识别还应有大脑神经的加工处理,以及 人的心理活动参与。有人在研究语音情感识别,还有人结合图像处理进行口型识别我认 为这些研究都会对语音认别起到辅助作用 语音识别既是个理论问题,也是一个工程化的问题。它综合多学科的理论成果,如声 学、语音学语言学、生理学数字信号处理、信息工程、通信理论电子技术、计算机科学 模式识别、人工智能等,结合语音信号的特点,产生一系列语音识别的理论。而要实用还 有一个工程化问题需要解决,语音识别的成果走出实验室所面临的问题比语音识别本身 还要多,还要复杂,还要难。首先遇到的是各种噪声干扰,其次是各种信道条件下的频谱 踦变,还有各种不同用户的不同需求,应用场合(如工厂、车间、马路、酒吧、歌厅等)的不 同,诸如多人话音背景下的语音识别,音乐、疒播等背景噪声下的语音识别、等等 值得称道的是口呼电话拨号、口授打宇技术的成功使用,取得了巨大的商业利益,给 语音识别研究人员树立了信心。我们的研究也启示我们应立足于实用,把复杂繁乱的应 用环境纳入实用语音识别的研究中。科学地筒化问题理性地处理应用环增是语音识别 实用化的基础。 由此引发的课题:语音信号表意性稳健参数的研究及提取语音信号个人特征稳健参 数的研究及提取;口音自适应信道自适应送话器自适应背景环境自适应;语音、语言、 心理智能模型:多参数、多模式、名模型的融合、推理、判决等。语者识别系统的评测标推 及方法研究应引起学术界的重视,它是语音识别系统走出实验室,投入实际使用,进而为 某种需求研制专用系统变成产品,投放市场的重要环节。我认为,评测标准和方法具有 导向作用,它是从另一个角度推进语音识别研究的动力。《实用语音识别基础》就是在这 样的思考中成书的,之所以称之为基础它确实是些某本的理论和概念含基本的技术和方 法;之所以称之为实用,它试图把语音订别推出实验室,按照实际应居来整合内容。在这 个框架下,以介绍新的实用理论为主,尽可能注意在数学上严谨在逻辑上严密,在工程化 的介绍中以工科大学生的理论基础为基点,以我们的研究思路和成果为主,注重可操作 性 木书的内容安排如下。本书共分4个部分17章。第1章简要介绍了语音识别的发 展历程以及语音识别技术的研究现状和未来趋势。第1部分:基本理论〔第2章第5 章),介绍∫语音识的基本理论。其中,第2章介绍了听觉机理和汉语话音基础;第3 章~第5章详细讲解了语音信号的处理方法,包括时域处理时频分析倒谱同态处理。 第2部分:语音识别系统(第6章~第10章),详细讲述了实用语音识别系统的建立过程 分别介绍了语料库的建立原则语音信号的预处理*征提取特征变换识别模型。其 中,第10章识别模型中着重讲解了5种常用模型:动态时间规整,隐马尔可夫模型,支持 向量机,人工裨经网络和高斯混合模型第3部分:语音识别中关键处理技术(第11 第13章},针对语音识别系统实用化的问题,给出了一些改善语音识别系统性能的关键技 术。其中,第11章分绍了说话入自适应和说话人归一化技术;第12章给出了当前一些有 效的噪声抑制的方法;第13章针对不同信道条件下的语音识别系统的不匹配何题,提出 ∫信号补偿的方法。这些技术是语音识别走冋实用的重要环节。第4部分:语音识别应 用(第14章~第l7章),介绍了语音识别系统的4个主要应用,即说话入识别、关键词识 别、语种识别和连续语音识别。这4个部分中,每一部分内容可独立使用在教学中可灵 活安排。每章后面附有支持本章内容的参考文献,供读者深人研究之用,书后附有英汉名 词对照,供读者查阅外文瓷料考。第1章、第6章第7章第11章、第12章、第16章 由屈丹编写,第2章、第3章、第5章第9章、第15章由彭煊编写,第8章、第10章由王 波、彭煊共同编写第4章第13章第14章第17敢分别由马占武王炜侯风徐望 编写。全书内容由王炳铴统筹指导,由屈丹整理修改,最后由王炳锡审校定稿 本书的特点是:讲解了目前最前沿的语音识别理论和技术,反映了语音识别技术的最 如坫爪凸共执古口戡寶坪应品中1丹哥出,上1 Preface Man has long dreamed of having a machine that can" listen to"and speak human lan guages, which enables him to enjoy a fairy world of wonder. This ideal of wuarl, in the infoE nation era, i gradually becoming a reality with the state-of-the-art technology in speech recognition, the task of which is to golve the problem of machine understanding the human speech At the beginning of the century, information technology develops by leaps and bounda and speech recognition also atepe into an all-round developing period. It is abeoiutaly neces sary to review the history of speech recognition in both theory and practice, to summarize the research achievements and to make clear the intended developing thought. The author has been teaching and researching in speech recognition for over twenty year and has obtained some achievement, experience, ideas and thoughts which, biased as they may be due to the author's own limited k wledge, he would hke to share with other acadcmic colleagues Speech signal is non-stationary, time-varying and complex with a large amount uf infon mation including semantic and personal alike. The present speech recognition is to process the original speech signal in three levels: physical (acoustic) level, Linguistic leve (natura languag: processing),cerebral nerve level (inteligence). Physieal in-depth research has bad much accumulation in both theory and practice. The achievements of natural langunge under tanding combined with hnguistics are not as notable. The intelligence integrating the human brains is just a start. Speech recognition comparable to man calls for more studies at the men- tal level, which is now al most a blank pesch recognition, whatever ita applcation, involves three procedures; paremeter de scription, model approximation and reasoning decision. It is expected that the parameter set describing speech signals is stalle, and easily extracted, hut so far the relationshiy between ahem is not found. Various parameters interlace in the parameter space, and the anly differ- ence lies in the degree of interlacing, The model approximations have the theoretics which are representative of mode matching, statistical probubility model and decision model with other edditional background models fo for some applications (anti-model and filer. Bue to the hmited knowledge of the physical nature of speech signals, these models are only approx imation. Reasoning decision develops from minimum distortion distance decisions, hypothe Sized test to data fusion and evidence theory. It should be said that these knowledge have played an impartant role in different degrees, but it is a long way before practical applica- tian. Speech signals are essentially turbulence in phy sical nature, a complex chaos phe nomenon,but it is difficult to get human ear perception model of speeches. At present, it only regarded as the filering speech signals using filter banks, Speech recognition also in wolves the brain processing and human psychological actions. Emotion speech recognition and mouth shape recognition combining image processing is also studied in recent years, which lays an auxihary role to speech recognition Not only is speech recognition a thcorctical prohlcm, but algo an engineering problem It integrates theoretic achievements of many disciplines, for example, acoustics, phoneticS linguistics, physiology digital processing, information engineering, communication theo ries, electronic technology, computer science, pattern recognition and artificial intelli Hence Integrated with the chararteristics of speech signala, speech recognition bings torward a se- ries of speech recognition theories. But in order to apply it to real environment, there are many engineering problcma to bc golve, When the achievements of speech recognition come out of the lataralury', they lace mure complex and difficult problems. The first problem is various kinds of noise interferences. The next is the spectral distortion in various channel eonditions. In addition, different requirements of users and different application environ- ments, such as in a factory or a warkshop, on the street, in a bar, are new tasks in real en virunments. Fur example, speech recognition in the background of many speakers and apccch recognition in the noise of music and radio speech are all the difficult tasks It ie worth saying that the successful apphcations of oral telephone dialing and spoken typing have obtained great comtncrcial benefits, which adds the confidence of researches in apeech recognition, Our researches also enlighten us that we should h: established in pranti cality and bring complex and multifarious application environments into the researches of the applied speech recognition. Scientific simplification of the problems and rational processing of the application environments are the basis of application of speech recognition So the evocable tasks include the study and extraction of robust semantic parameters of speech signals, the study and extraction of robust individual parameters of speech signals, ac cent adaptation, channel adaptation, telephone transmitter adaptation, environment adapta lioi, the intelligent model uf specht, laniguage and mentality, the fusion, reasoning and de cision of multipla parameters, patterns and models etc. The academe should attach impor tance to the evaluation standards and methods of speech recognition syatem, which will bridge the gap bet ween laboratory research and real application in the forrn of tusk-zpecific systems and completc market products. I think the cvaluation standards and methoda guide the developments, which is another nio'tivation of speech reaxognitior researches.PraclieHl Fundamentals of Speech Recognition is completed in guch considerations. It is called funda- mentals because it certainly includes some fundamental theories and concepts, technologies and methods. It is called practical speech recognition because it attempts to put the speech recognition out of laboratory and arRanges the contents according to apphcation, In such a framework, this book puts emphasis un introducing ncw practical theoria, and more atten- tion has been paid to mathematical preciseness and rigorous logic. in Lhe engineenng intro- duction,we briefly introduce our research ideas and achievements and pay attention to tie maneuverability based on theorelical fundamentals of college students in engineering he contentA of this book are arranged into four parts of seventeen chapters Chapter k briefly introduces the development process, rescarch status and intended trends of speech technologies. The first part( Chapters 2-5)presents fundamentals of speech recognition Chapter 2 gives the perceptual mechanism and the basis of the mandarin speech apter to 5 elaborately present speech proeessing methods inclucing time processing, time-lrequency analysis, cepstral homeostasis processing. )The second part (chapters 6-10), speech recog- nition system, introduces in detail the construction process of the practical speech recognition system comprising the foundation principles of Bpccch corpus, pre-processing of speech sig mals. feature extraction, feature transformation and recogNiTion lnodels.( Chapter 10 puts emphasis on five common models: dynamic time warping hidden markov modal, support vector machine, artificial neural network and mixture Gaussian model. )In Part 3( chapters 11-13),practical key processing technologies in speech me:ognition and key technologies to improve the performance of specch recognition systems are presented, Chapter 11 gives the technologie吕 of the speaker adaptation and speaker normalization, Chapter 12 gives the meth- ods of noise suppression. In chapter 13, for non-matching of speech recognition systems in yarious environments, methods of signal compensation are provided. All the technologies in art 3 are the imporlant factors of the application of speech recognition system. ) Part 4, ap DIcation of speech recognition chapters 14-17), introduce four primary applications of speech recognition systems. They are spanker recognition, keyword spotting, Language iden- tification and continuous speech recognition. Each of the fonr parts can be used independent ly in teaching. Fach chapter is attached with references for further study. An English-Chi nese translation of special terminology is attached at the end of the hook. Chapters 1, 6,7 11, 12, 16 arc compiled by Qu Dan, chapters 2,3,5,9,15 by Peng Xuan, and chapters 10 and 12 by both of Wang Bo and Peng Xuan. The remaining chaplets 4, 13, 14, 17 b a Zhan wu, Wang Wei, Hou Fenglei and Xu Wang respectively. Qu Dan revised all of the matcrial. The whole book was planned, supervised and finally checked by Professor Wang Bingxi This book is intended to pIesent most popular thearics, state-of-the-art technologies and prospective development trends. The chapters ure arranged step by step from theory to pplication with regional knowledge structure, close relationship between chapters, and cone tinuity bet ween perts of the hook. Thc book is based on our achievements and alo provides a 【实例截图】
【核心代码】
标签:
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论