Skip to main content
Taylor & Francis Group Logo
    Advanced Search

    Click here to search products using title name,author name and keywords.

    • Login
    • Hi, User  
      • Your Account
      • Logout
      Advanced Search

      Click here to search products using title name,author name and keywords.

      Breadcrumbs Section. Click here to navigate to respective pages.

      Book

      Speech Processing
      loading

      Book

      Speech Processing

      DOI link for Speech Processing

      Speech Processing book

      A Dynamic and Optimization-Oriented Approach

      Speech Processing

      DOI link for Speech Processing

      Speech Processing book

      A Dynamic and Optimization-Oriented Approach
      ByLi Deng, Douglas O'Shaughnessy
      Edition 1st Edition
      First Published 2003
      eBook Published 12 December 2018
      Pub. Location Boca Raton
      Imprint CRC Press
      DOI https://doi.org/10.1201/9781482276237
      Pages 752
      eBook ISBN 9781315214702
      Subjects Engineering & Technology
      Share
      Share

      Get Citation

      Deng, L., & O'Shaughnessy, D. (2003). Speech Processing: A Dynamic and Optimization-Oriented Approach (1st ed.). CRC Press. https://doi.org/10.1201/9781482276237

      ABSTRACT

      Based on years of instruction and field expertise, this volume offers the necessary tools to understand all scientific, computational, and technological aspects of speech processing. The book emphasizes mathematical abstraction, the dynamics of the speech process, and the engineering optimization practices that promote effective problem solving in this area of research and covers many years of the authors' personal research on speech processing. Speech Processing helps build valuable analytical skills to help meet future challenges in scientific and technological advances in the field and considers the complex transition from human speech processing to computer speech processing.

      TABLE OF CONTENTS

      part 1|2 pages

      Part I ANALYTICAL BACKGROUND AND TECHNIQUES

      chapter 1|2 pages

      Discrete-Time Signals, Systems, and Transforms

      chapter |5 pages

      f_ 00 X(w) exp(7w77T 0)&7.

      , - oo < 0 < oo (1.4) where each replica is shifted horizontally by an integer multiple of the angular sampling frequency wo = Proof: Using the inverse Fourier transform Eq. 1.1.2 at a sampling time t = nTo, we have exp(7w77T 0)&7. This integral can be decomposed into an infinite sum, of 27r/To length each, to give -(2k+1)7/To exp(3wnTo)dw. Using variable substitution w = - tc, this becomes
      Byk=—. X(w) x(nTo) = -(2k-1)7r/To

      chapter |1 pages

      Ex(wF — 27 b T= b

      k ) 1 p-27k(wh—win• Using the definition of the band-pass signal, Eq. 1.6, which specifies only a narrow frequency range of wh — wi over which the spectrum is non-zero, we in the above, X (w — 27rk(wh — we)), is non-zero only in the range of < Iw — 27rk(wh — wt)I < wh •
      Bysee that each term

      chapter 1|13 pages

      2 Discrete-Time Systems and z-Transforms

      chapter |5 pages

      E [n] p

      0 < n < N — 1, (1.24)
      Bywkn, [kiwk-

      chapter 2|24 pages

      Analysis of Discrete-Time Speech Signals

      chapter 111111111|10 pages

      111111111

      chapter 2|2 pages

      7 Summary

      chapter 3|7 pages

      Probability and Random Processes

      chapter |3 pages

      r( (

      The following summary statistics can be easily verified: (a + /(3)2(a + + 1) • The beta distribution can be generalized to its multivariate counterpart, called the Dirichlet distribution, which has found useful applications in speech processing (Chapter 13). A vector-valued discrete random variable, x = (x1 ,..., sk)Tr, has a Dirichlet
      By(1x) , (0 < x < 1;a > 0;# > 0) E(x) = 0 ; var

      chapter 3|7 pages

      2 Conditioning, Total Probability Theorem, and Bayes' Rule

      The notion of conditioning has a fundamental importance in probability theory, statis- tics, and their engineering applications including speech processing. We first discuss several key concepts related to conditioning. 3.2.1 Conditional probability, conditional PDF, and conditional inde-

      chapter 3|5 pages

      3 Conditional Expectations

      chapter 3|7 pages

      5 Markov Chain and Hidden Markov Sequence

      In this section, we provide basics for two very important random sequences related to the general Markov sequence discussed above. 3.5.1 Markov chain as discrete-state Markov sequence A Markov chain or discrete-state Markov sequence is a special case of a general Markov sequence. The state space of a Markov chain is of a discrete nature and is finite: which is called the transition matrix of the Markov chain. Given the transition probabilities of a Markov chain, the state-occupation probability can be easily computed. The computation is recursive according to
      ByMt) = P[xt = 8

      chapter |3 pages

      = Ep(st=j,oils„

      chapter 4|4 pages

      Linear Model and Dynamic System Model

      chapter |14 pages

      Eo k,n o[nk] + v[n]. Eon(4, 00[n

      This can be put in the canonical form of a linear model with the parameter vector This is a time-invariant linear model because the parameter vector 0 is not a function of time, n. To turn this linear model into a time-varying one, we impose time dependence
      ByOko[n — k] v[n]. 0 = [01, 02 • • • , 0K1 11. •

      chapter 4|1 pages

      4 Time-Varying Linear Dynamic System Model

      4.4.1 From time-invariant model to time-varying model The linear state-space model defined earlier by Eqs. 4.14 and 4.15 is time invariant. This is so because the parameters 0 = {A, u, Q, C, R} that characterize this model do not change as a function of time k. When these model parameters are constant, it can be easily shown that the first-, second-, and higher-order statistics are all invariant with

      chapter 4|5 pages

      5 Non-Linear Dynamic System Model

      4.5.1 From linear model to nonlinear model Many physical systems are characterized by nonlinear relationships between various phys- ical variables. Taking an example of the speech process, we can use a dynamic system model to describe the speech production process. The state equation can be used to describe the dynamic articulation process, while the observation equation be used to

      chapter 5|1 pages

      Optimization Methods and Estimation Theory

      chapter 5|1 pages

      1 Classical Optimization Techniques

      chapter |20 pages

      f (x*), , f(x * ) EE...E

      where for each r, 1 < r < where the number of summations equals r.
      Byf (x*) = axiOxi...axi'

      chapter |6 pages

      a? +2 E nNf oi

      a2 — al The several estimators discussed in the above preliminary section are, although fun- damental to estimation theory, often not widely used in engineering and signal/speech processing applications. This is because they are either difficult to find and to compute (while having highly desirable qualities such as an MVU estimator), or because they are too empirical in nature (such as the method of moments). The requirement for knowing

      chapter 5|14 pages

      6 Maximum Likelihood Estimation

      estimation of deterministic parameters in statistical signal processing, and in speech processing in particular. In addition to its asymptotic optimal properties, the MLE technique is especially powerful in handling the complex situations when the data are partially, rather than fully, observable. In this section, we will discuss the simplest case
      ByMaximum likelihood estimation (MLE) is one very commonly used technique for

      chapter |3 pages

      A = -, = E(010) = + c oan (HeO ' + c,„)--' ( 0 - Hp(?

      which is the classic estimator. 5.7.2 Bayesian linear model Now consider the Bayesian linear model (for which the MMSE computation is also rel- atively easy): o = HO + w, where 0 is the unknown (random) parameter to be estimated with prior PDF AT(µ0, Co) c90 coo j L / x k / x / then the conditional PDF, p(OIo), is also Gaussian and: E(01o) = Applying this property to o and 0 that are jointly Gaussian in the Bayesian linear model,
      ByE(0) + Co0C;j(o — E(o)).

      chapter |1 pages

      = (5.36) 6)(0 - 6)1

      where now Coo = E [(0 - E(0))(o - E(o))Trj is the p x N cross-covariance matrix. The error covariance matrix is: Mb- = where Coo = E [(0 - E(0))(0 - E(0)) is the p x p covariance matrix.

      chapter |1 pages

      e are assumed zero mean. Thus the

      chapter |11 pages

      e[n] = b[n — 1] + K[n](o[n] — hTr[n]i3[n — 1]) [7*(0 — e[n])11 8 State Estimation

      K[n] M[n — 1]h[n] M[n] = (I — K[n]h rrr[n])M[n — 1], where M[n] is the error covariance matrix: M[n] = E[(0 — Most of the materials presented so far in this chapter have concerned the problem of estimating parameters in statistical models. Some common statistical models used in signal processing, speech processing in particular, have been covered in Chapters 3 and 4. These models can be classified into two groups. The first group has the models with no "hidden" random variables or states. That is, all random variables defined in the
      By+1111-[n]M[n — 1]h[n]

      chapter 6|1 pages

      Statistical Pattern Recognition

      chapter 6|2 pages

      1 Bayes' Decision Theory

      Bayes' decision theory is the foundation for optimal pattern classifier design, and pro- vides the "fundamental equation" for modern speech recognition. The theory quantifies the concept of "accuracy" in pattern classification and recognition in statistical terms. That is, it defines the measure of accuracy in terms of the minimum expected risk, which can be achieved via the use of the Bayes decision rule.

      chapter 6|15 pages

      2 Minimum Classification Error Criterion for Recognizer Design

      The practical issues discussed above against the theoretical optimality of the MAP- based classifier design motivate the search for alternative designs of classifiers. In this section, we discuss one such alternative, namely the use of the minimum classification error (MCE) criterion. This particular approach was originally proposed in [Juang 92],

      chapter |4 pages

      E6E

      8(Pit - i) (6.52) (6.53)
      By(P11 — o(Ptt (x —

      part 2|2 pages

      Part II FUNDAMENTALS OF SPEECH SCIENCE

      chapter 7|16 pages

      Phonetic Process

      7.1 Introduction

      chapter |44 pages

      AiNZ.--- B -----"N.

      / F2

      chapter 8|11 pages

      Phonological Process

      8.1 Introduction

      chapter 8|6 pages

      5 Feature Geometry — Internal Organization of Speech Sounds

      8.5.1 Introduction The discussion in the preceding section illustrates that the basic units of phonological representation are not phonemes but features. Features play the same linguistically meaningful distinctive role as phonemes, but the use of features offers straightforward

      chapter I|1 pages

      n p ou z [+ con

      —son J [—cont] [+nas] [—cont] [+vc] —vc] [cor] [lab]

      chapter |1 pages

      n1Isit L+con Lconsi son si +son [—cont] [+nas] [+cont] [+lat] [+vc] [cor] [cor] [+ant] [+ant]

      chapter |8 pages

      k u 1

      [— cons +sons [+cont] [+cont] [+lat]

      chapter |1 pages

      onset

      chapter |1 pages

      wj r lmnptk p—+++_____ t +++—————— k++++————— b—+++————— d +++————— — f —+ ++_____ 0++ +______ s+ + — + + + + + +

      chapter |3 pages

      • 4'

      part 3|2 pages

      Part III COMPUTATIONAL PHONOLOGY AND PHONETICS

      chapter 9|19 pages

      Computational Phonology

      chapter 9|17 pages

      4 Use of High-Level Linguistic Constraints

      chapter 10|9 pages

      Computational Models for Speech Production

      chapter |12 pages

      Qc(1.14.0) = E p(sior, a3t3t+, t=1 E Nt Elog aso m •

      4.o) log P(oT, SIB). (10.4) To simplify the writing, denote by Nt(i) the quantity log(2r) log lEd— 2— (of — gt(A,))TE:, 1(ot — gt (10.5)
      By(A,)).

      chapter |9 pages

      [EEEN(.k4_1) , EEEN(..10)1.9

      chapter 10|5 pages

      4 Hidden Dynamic Model Implemented Using Piece-wise Linear Approximation

      In this section, we will describe a particular computational implementation of the hidden dynamic model in Eqs. 10.40 and 10.41, where a deterministic target vector is assumed and the general nonlinear function h[x(k)] in Eq. 10.41 is approximated by a set of

      chapter |1 pages

      p({0,m} N1 0) FEL P(°n 1 171n , e)P(ma le)

      p({o} NIe) E7,-113 (°n I/7 C)P(11e) (10.52) where we define the (token-dependent) mixture weighting factors to be logp(fo,x ,mrio) E flogp({o,z,M}N10) p(fmrifor,e), {M}N where O denotes the model parameters associated with the immediately previous it- eration of the EM algorithm. Since is continuously distributed and
      By{x}N {M}N is

      chapter |3 pages

      Q(Ole) = E E foogp(zgi.) + Epog p(xz fin ;:_ifin) +

      log p(ok'z14,m)1} logp(m10) • 4„ (10.56) after substituting Eq. 10.52 into Eq. 10.56, changing the order of the summations in
      Byp(Xn

      chapter |5 pages

      a Qx = \--, KnRma ,

      E En [42,,nek2fin aR;7_ 2 .., m 2 L-• n=1 k=1 Setting the derivatives equal to zero, we obtain the estimates for Qm and R.,n:
      ByEn[eil,m4i,m 1Y1' Qx = KnRma

      chapter L|4 pages

      P(stist_i)=HP(.014"i). p(sio=71.41)1=i)=.T.

      The transition structure of this constrained (uncoupling) factorial Markov chain can be parameterized by L distinct KM x K(i) matrices. This is significantly simpler than the original KL L matrix as in the unconstrained case. This model is called an overlapping feature model because the independent dynam- ics of the features at different tiers cause many ways in which different feature values

      chapter |2 pages

      E wi, • 6 Summary

      where y = ELI W,i,xr. Besides the MLP, the radial basis function (RBF) is an attractive alternative choice as another form of universal function approximator for implementing the articulatory- to-acoustic mapping.
      Bysi(y) • (1— sj(y)) • wi t, 1 < < I, 1 < < L, (10.89)

      chapter 11|6 pages

      Computational Models for Auditory Speech Processing

      chapter |31 pages

      _ 1/2+1 - 2Yi +Y_1 d2 ,Tx7

      x)2 • For the boundary point at i = 1, the forward-difference approximation to the derivative is used, which gives u2 -u1 (k(x)d no simplification was made and all the terms involved are taken into account. Further, because there is no coupling at the edges, the longitudinal stiffness coupling coefficient
      Bydx2 (0 dx Ax

      chapter |2 pages

      (b) v/ Tx

      chapter 00000|9 pages

      0 0 0 0 0

      part 4|2 pages

      Part IV SPEECH TECHNOLOGY IN SELECTED AREAS

      chapter 12|10 pages

      Speech Recognition

      chapter 12|14 pages

      4 Use of HMMs in Acoustic Modeling

      chapter |12 pages

      EN-1 A = N-1E[z(k)jo, 0], B = E EN-1 N-1C = E[z(k)z(k)Tr lo, 0], D = E

      E[z(k)jo, 0], B = E[z(k)z(k)Tr lo, 0], k=0 Eq. 12.8 is another third-order nonlinear algebraic equation (in 4 , and t) of the form: N4.T1.4.t — — N4,11"t — N4,t + 4,T 3 + 4, A + Nt — B = 0. (12.12)
      ByE[z(k + 1)1o, 0], E[z(k + 1)z(k)T r lo, 0].

      chapter vi|3 pages

      12(:_,V1

      -11 wi,J) 0 < i < 2Q, i < j < 2Q, 1 < v < V. (12.24) Re-estimation of the remaining model parameters via maximization of Q2(41 ,14,0) is described in detail below. Referring to the constraint expressed in Eq. 12.20, we note that the constraint has been imposed only on µ,,, m , and be. The parameters and a,' are free of the constraint and hence can be re-estimated using the conventional formulae [Baum 72]:
      ByE eC 1 E7; —11 7t(i)

      chapter |12 pages

      +EE E + EEE

      thme
      By7t(q)(1 2Q — q 2 v=1 t=1 q=0 v=1 t=1 q=Q+1 7t(q)(1 L)25c v=1 t=1 q=0

      chapter |1 pages

      Elog f (bi, o 2)} , (12.44)

      2)} , (12.44) where the first term, the conditional expectation (E step in the EM algorithm) involving the log-likelihood function for the observation data, was derived previously in [Deng 92b] and is rewritten as
      Byf (bi, o

      chapter |12 pages

      0 art ' a

      which, after use of Eq. 12.46 again, becomes E-yt(i) log(ri) — 1. (ot bi)2} + log(ri) — 2[b — ttirMT I[bi — ,J + (pi — 1) log(ri) — gird = 0, 2qi + t(i)(ot _xpb,)2+[bi cbi] MT 1 Ebi — (12.51)

      chapter |7 pages

      E,-LIEri )

      ET—(;) Ek( -1 e(k) E1' 1 T(r) r 1Erl Ecc 1 4r Err Er,)Ek er(k) • — md)2
      Byn 8,1 s, /, I Tn(A,), W) • p(,) dn, (12.78) 5,1 denote the HMM state sequence and the Gaussian component label sequence,

      chapter 12|5 pages

      11 Statistical Language Modeling

      chapter 12|2 pages

      12 Summary

      chapter 13|48 pages

      Speech Enhancement

      chapter 14|21 pages

      Speech Synthesis

      chapter 14|1 pages

      10 Summary

      T&F logoTaylor & Francis Group logo
      • Policies
        • Privacy Policy
        • Terms & Conditions
        • Cookie Policy
        • Privacy Policy
        • Terms & Conditions
        • Cookie Policy
      • Journals
        • Taylor & Francis Online
        • CogentOA
        • Taylor & Francis Online
        • CogentOA
      • Corporate
        • Taylor & Francis Group
        • Taylor & Francis Group
        • Taylor & Francis Group
        • Taylor & Francis Group
      • Help & Contact
        • Students/Researchers
        • Librarians/Institutions
        • Students/Researchers
        • Librarians/Institutions
      • Connect with us

      Connect with us

      Registered in England & Wales No. 3099067
      5 Howick Place | London | SW1P 1WG © 2022 Informa UK Limited