Let us start by recalling that a
Markov chain
is a random process generating a
sequence
S
=
s
1
s
2
· · ·
s
n
such that the probability of
generating
, or
emitting
, each
s
i
is fixed beforehand and depends only on the previously generated symbols
s
1
· · ·
s
i
−
1
;
we will denote this probability by
P
(
s
i

s
1
· · ·
s
i
−
1
)
. An important subclass of Markov
chains consists in those of
finite history
, namely those in which the probability of
generating any symbol
s
i
depends only on the last
k
previous symbols, for some
constant
k
; that is,
P
(
s
i

s
1
· · ·
s
i
−
1
)
=
P
(
s
i

s
i
−
k
· · ·
s
i
−
1
)
(
k
is called the
order
of the
Markov chain).
Consider now for concreteness a Markov chain model of order 1, that is,
P
(
s
i

s
1
· · ·
s
i
−
1
)
=
P
(
s
i

s
i
−
1
)
. In such a Markov chain, the probability of emitting
s
i
depends only on the previous symbol
s
i
−
1
. In an HMM, however, the model can have
different states, in which the probability of emitting
s
i
depends instead
only
on the
current state, which is unknown, or
hidden
, from an observer. Moreover, at each step,
the model can also probabilistically choose to transition to another hidden state. In other
words, if we observe
S
, we cannot specify the state of the HMM in which each
s
i
was
generated. Assuming that we know the HMM which generated
S
, that is, we know the
possible transitions and their probabilities between the hidden states, and the emission
probabilities in each state, we can instead try to find the
most probable
sequence of
states of the HMM which generated
S
. We will solve this problem in Section 7.2,
113
http:/
.
Downloaded from
http:/
. Columbia University Libraries, on 13 Dec 2016 at 20:03:59, subject to the Cambridge Core terms of use, available at
114
Hidden Markov models (HMMs)
with the Viterbi algorithm, which is in close kinship with the Bellman–Ford algorithm
for finding a shortest path in a graph. Another interesting question is that of how to
compute the probability that the HMM generated
S
, which we solve in Section 7.3 by
adapting the Viterbi algorithm into the forward and backward algorithms. Finally, in
Section 7.4 we discuss how we get to know the parameters of an HMM in the first
place, by showing a technique for learning the transition and emission probabilities
from data.
7.1
Definition and basic problems
Before giving the formal definition of an HMM, let us give a concrete example for the
segmentation problem mentioned above.
Example 7.1
The segmentation problem
In prokaryote genomes, the coding and noncoding regions have some easily
recognizable statistical properties on their nucleotide content. The simplest statistical
property to consider is the
GC
content
, since coding regions typically have more
G
and
C
nucleotides than
A
and
T
. The segmentation problem is to partition a genome
into coding and noncoding regions.