The Hierarchical Dirichlet Process Hidden Semi-Markov Model

In my work at DARPA, I’ve been exposed to hidden Markov models in applications as diverse as temporal pattern recognition such as speech, handwriting, gesture recognition, musical score following, and bioinformatics. My background is in stochastic modeling and optimization, and hidden Markov models are a fascinating intersection between my background and my more recent work with machine learning. Recently, I’ve come across a new twist on the Markov model: the Hierarchical Dirichlet Process Hidden Markov Model.

What is a Markov model?

Say in DC, we have three types of weather: (1) sunny, (2) rainy and (3) foggy. Lets assume for the moment that the doesnt change from rainy to sunny in the middle of the day. Weather prediction is all about trying to guess what the weather will be like tomorrow based on a history of observations of weather. If we assume the days preceding today will give you a good weather prediction for today we need the probability for each state change:

$$ P(w_n | w_{n-1}, w_{n-2},\ldots, w_1) $$

So, if the last three days were sunny, sunny, foggy, we know that the probability that tomorrow would be rainy is given by:

$$ P(w_4 = \text{rainy}| w_3 = \text{foggy}, w_2 = \text{sunny}, w_1 = \text{sunny}) $$

This all works very well, but the state space grows very quickly. Just based on the above, we would need $3^4$ histories. So fix this we make the Markov Assumption that everything really depends on the previous state alone, or:

$$ P(w_n | w_{n-1}, w_{n-2},\ldots, w_1) \approx P(w_n| w_{n-1}) $$

which allows us to calculate the joint probability of weather in one day given we know the weather of the previous day:

$$ P(w_1, \ldots, w_n) = \prod_{i=1}^n P(w_i| w_{i-1})$$

and now we only have nine numbers to characterize statistically.

What is a hidden Markov model?

In keeping with the example above, suppose you were locked in a room and asked about the weather outside and the only evidence you have is that that ceiling drips or not from the rain outside. We are still in the same world with the same assumptions and the probability of each state is still given by:

$$ P(w_1, \ldots, w_n) = \prod_{i=1}^n P(w_i| w_{i-1})$$

but we have to factor that the actual weather is hidden from you. We can do that using Bayes’ rule where $u_i$ is true if the ceiling drips on day $i$ and false otherwise:

$$P(w_1, \ldots, w_n)| u_1,\ldots,u_n)=\frac{P(u_1,\ldots,u_n | w_1, \ldots, w_n))}{P(u_1,\ldots,u_n)}$$

Here the probability $P(u_1,\ldots,u_n)$ is the prior probability of seeing a particular sequence of ceiling leak events ${True,False,True}$. With this, you can answer questions like:

Suppose the day you were locked in it was sunny. The next day the ceiling leaked. Assuming that the prior probability of the caretaker carrying an umbrella on any day is 0.5, what is the probability that the second day was rainy?

So if Markov models consider states that are directly visible to the observer, the state transition probabilities are the only parameters. By contrast, in hidden Markov models (HMMs) the state is not directly visible, but the output, dependent on the state, is visible. A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (or hidden) states. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. In this context, ‘hidden’ refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a ‘hidden’ Markov model even if these parameters are known exactly.

OK, so what is a Hierarchical Dirichlet Process Hidden Semi-Markov Model?

Hidden Markov models are generative models where the joint distribution of observations and hidden states, or equivalently both the prior distribution of hidden states (the transition probabilities) and conditional distribution of observations given states (the emission probabilities) are modeled. Instead of implicitly assuming a uniform prior distribution over the transition probabilities, it is also possible to create hidden Markov models with other types of prior distributions. An obvious candidate, given the categorical distribution of the transition probabilities, is the Dirichlet distribution, which is the conjugate prior distribution of the categorical distribution.

In fact, it is possible to use a Dirichlet process in place of a Dirichlet distribution. This type of model allows for an unknown and potentially infinite number of states. It is common to use a two-level Dirichlet process, similar to the previously described model with two levels of Dirichlet distributions. Such a model is called a hierarchical Dirichlet process hidden Markov model, or HDP-HMM for short or it is also called the “Infinite Hidden Markov Model”.

The Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) is a natural Bayesian nonparametric extension of the traditional HMM. The single parameter of this distribution (termed the concentration parameter) controls the relative density or sparseness of the resulting transition matrix. By using the theory of Dirichlet processes it is possible to integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence.

This is really cool. If you formulate a HMMs with a countably infinite number of hidden states,
you would have infinitely many parameters in the state transition matrix. The key idea is that the theory of Dirichlet processes can implicitly integrate out all but the three parameters which define the prior over transition dynamics. It is also possible to use a two-level prior Dirichlet distribution, in which one Dirichlet distribution (the upper distribution) governs the parameters of another Dirichlet distribution (the lower distribution), which in turn governs the transition probabilities. The upper distribution governs the overall distribution of states, determining how likely each state is to occur; its concentration parameter determines the density or sparseness of states. Such a two-level prior distribution, where both concentration parameters are set to produce sparse distributions, might be useful for example in unsupervised part-of-speech tagging, where some parts of speech occur much more commonly than others; learning algorithms that assume a uniform prior distribution generally perform poorly on this task. The parameters of models of this sort, with non-uniform prior distributions, can be learned using Gibbs sampling or extended versions of the expectation-maximization algorithm.

So how can we use this?

A common problem in speech recognition is segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. This problem is often called speaker diarization. This is particularly challenging since you don’t know the number of people participating in the meeting and modified HDP-HMMs have been very effective at achieving state-of-the-art speaker diarization results.

Other interesting applications of HDP-HMMs have been modeling otherwise intractable linear dynamical systems which describing dynamical phenomena as diverse as human motion, financial time-series, maneuvering targets, and the dance of honey bees. (See this paper for more details. Results have shown that HDP-HMM can identify periods of higher volatility in the daily returns on the IBOVESPA stock index (Sao Paulo Stock Exchange). Most interesting to me was the application to using HDP-HMMs on a set of six dancing honey bee sequences aiming to segment the sequences into distinct dances.

You can see some other cool motion capture examples here.

Review: Abundance

Humanity is now entering a period of radical transformation in which technology has the potential to significantly raise the basic standards of living for every man, woman and child on the planet.

The future can be a scary place

It can be easy to develop a gloomy view of the future. Malthus was the first public voice that compared population growth to the world’s diminishing resources to arrive at the conclusion that our days were numbered. Jared Diamond has argued well that we are gorging ourselves way past sustainability and flirting with our own collapse. Other books I’ve read recently to include a Short History of Nearly Everything and Sapiens take a long view of history and produce a masterful explanation that humans dominate the planet and that we are in the midst of an unprecedented experiment with our ecosystem, the world economy and even our own biology.

Add this to the angst in my conservative evangelical community that is beset with rapid culture change1, secularization and nearly complete societal swap of epistemology based on transcendent (i.e. God’s) design with a fluid soup of cultural opinion and emotion. But pessimism isn’t limited to my crowd, it’s practiced well on both sides of the aisle with Jeremiads about income inequality, environmental destruction and corporate power and malfeasance arriving daily from both the Clinton and Sanders camps. 2

Economically, the risks are also very real. The 2008 financial crisis highlighted the systemic risk, addiction to growth and optimistic future projections that are baked into our system. Just as our epistemology now rests on emotion, it seems that our economic theory does as well. It is becoming increasingly difficult to track all of the bubbles and capital mis-allocations that have resulted from 7 years of ZIRP, NIRP and QE. How much more can we print money before the serial, or parallel, and long overdue day of reckoning arrives? In 2008/9, while the equity markets went down, the bond markets compensated. What if next time, there is a concurrent bond market and equity collapse? By some calculations, interest rates are at seven hundred-year lows and a third of Europe is now at negative rates. The high yield market is precarious, and if that falls treasuries will get bid to the stratosphere and at some point you’ve got to get a real return and that is a long way down from the market’s current position.

And technology seems to make it all worse. Communication, information and transportation technology pulls us all together into one collective mush that is controlled by the market and state as we all slavishly let world-fashion trends define what we see in the mirror. Everything from the climate to the markets is influenced by a common mass of humanity participating in the same economic dance. What we are left with is an ersatz diversity based on skin-color and political preference, instead of the truly distinct cultures that marked the pre-communication and global transportation revolutions of the last 100 years.

What this perspective misses is that technology has saved our bacon many times and it might just do it again. Mr. Diamandis, the chairman and chief executive of the X Prize Foundation and the founder of more than a dozen high-tech companies, boldly makes the case that the glass is not just half-full, it is about to become much bigger. He makes his case in his latest book: Abundance.

Technology to the rescue

How awesome would it be if technology is about to solve the challenges provided by overpopulation, food, water, energy, education, health care and freedom? If we carefully look back instead of nervously forward, technology has clearly made some amazing contributions. Take one of the most talked-about societal problems that is driving a lot of the progressive tax-policy discussion: income inequality. Here Diamandis discussion of poverty is especially insightful.

If you look at the data, the number of people in the world living in absolute poverty has fallen by more than half since the 1950s. At the current rate of decline it will reach zero by around 2035. Groceries today cost 13 times less than 150 years ago in inflation-adjusted dollars. In short, the standard of living has improved: 95{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} of Americans now living below the poverty line have not only electricity and running water but also internet access, a refrigerator and a television—luxuries that Andrew Carnegie’s millions couldn’t have bought at any price a century ago.

You can make other comparisons such as information wealth. I’m eager to plot when the average citizen gained near information parity with the president. (I’m thinking that a basic citizen with an iPhone today has more access to information than George Bush had when he started his presidency.) And who would have dreamed that a family could consolidate their GPS, video camera, library and photo-albums in 112 grams in their pocket?

Through a mix of sunny-side up data and technical explanation, Diamandis makes a good point that a focus on immediate events and bad news and often blinds us to long-term trends and good news. A nice surprise of the book is that he doesn’t just preach the technology gospel, but he delves into our cognitive biases bringing in Daniel Kahne­man into the mix and explaining how our modern analytical minds aren’t incentivized by see the beautiful wake behind us, but rather focus on the potentially choppy waters ahead. While prudence is always advised, Diamandis makes the case that the resultant pessimism is easy to overstate and can diminish our potential.

Through many historical examples, he makes the point of the massive goodness results when technology transforms a scarce quantity into a plentiful one. One fun example is aluminum. In the Atlantic Sarah Lascow describes that while aluminum is the most common metal in the Earth’s crust, it binds tightly to other elements and was consequently very scarce. It wasn’t until 1825 that anyone was able to produce even a sample of aluminum, and even that wasn’t pure. Napoleon honored guests by setting their table places with aluminum silverware, even over gold. It is a fascinating story that two different chemists3 figured out how to use cryolite—an aluminum compound—in a solution that, when shot through with electricity, would produce pure aluminum. The data show the resultant price drop from \$12 a pound in 1880, to \$4.86 in 1888, to 78 cents in 1893 to, by the 1930s, just 20 cents a pound. And technology leads to more exciting technology in unanticipated ways. In 1903, the Wright Brothers used aluminum to build a lightweight and strong crankcase for their aircraft, which further connected the scientific community around the world to make even more rare things plentiful.

Diamandis certainly plays his hand well and I’m inclined to side with him on many of his arguments. I’ll always side with the definite optimists before I join the scoffer’s gallery. After all, the pessimists were the cool kids in school, but it is the nerds who get things done. I’m a big believer that engineers are the ultimate creators of all wealth, and here Diamandis is preaching to the choir.

The case for abundance from technology

To summarize his argument, he makes four basic points:

First, we are both individually and collectively terrible at predicting the future, particularly when it comes to technology, which often exceeds our expectations in producing wealth. He claims technologies in computing, energy, medicine and a host of other areas are improving at such an exponential rate that they will soon enable breakthroughs we now barely think possible. Yes, we don’t have HAL, jet-packs and our moon-base in 2015, but we do have rapid DNA sequences, an instant collection of the world’s information and weapons that can burn up whole cities under a second.

Second, these technologies have empowered do-it-yourself innovators to achieve startling advances — in vehicle engineering, medical care and even synthetic biology — with scant resources and little manpower, so we can stop depending on big corporations or national laboratories.

Third, technology has created a generation of techno-philanthropists (think Bill Gates or Mark Zuckerberg) who are pouring their billions into solving seemingly intractable problems like hunger and disease and not hoarding their wealth robber-baron style.

Fourth, “the rising billion.” These are the world’s poor, who are now (thanks again to technology) able to lessen their burdens in profound ways and start contributing. “For the first time ever,” Diamandis says, “the rising billion will have the remarkable power to identify, solve and implement their own abundance solutions.”

Ok, should we bet the farm on this?

Diamandis is banking on revolutionary changes from technology and from my perspective, expectations are already sky high. (Really, P/E ratios close to 100 for companies like Amazon and Google?) In fairness, by a future of abundance, he doesn’t mean luxury, but rather a future that will be "providing all with a life of possibility". While that sounds great, to those of us in the west, this might just be a reversion to the mean from the advances of the last 100 years.

However, I loved the vision he mapped out. Will there be enough food to feed a world population of 20 billion? What about 50 billion? Diamandis tells us about “vertical farms” within cities with the potential to provide vegetables, fruits and proteins to local consumers on a mass scale. Take that Malthus.

While he does a good job of lining up potential technical solutions with major potential problems, he doesn’t address what I consider the elephant in the room: are we developing morally in a way that leads us to use technology in a way that will broadly benefit the world? Markets are pretty uncaring instruments, and I would at least like to hear the case that the future’s bigger pie will be broadly shared. As it is, I’m pretty unconvinced.

Also, his heroes are presented as pure goodness and their stories are a big hagiographic for my tastes. For example, Dean Kamen’s water technology is presented as an imminent leap forward while in reality his technology is widely considered far too expensive for widespread adoption. While he exalts the impact of small groups of driven entrepreneurs, how much can they actually do without big corporations to scale their innovations? In all his case studies, the stories are very well told, but the take-away is not quite convincing against a backdrop of such a strong desire for technology to guide us into a future of global abundance. And even though he acknowledges the magnitude of our global problems; and he hints, in places, at the complexity of overcoming them, he doesn’t address that these systems can have negative exponential feedback loops as well. In my view, technology is just an amoral accelerator that requires moral wisdom.

No, but you should read this book anyway


In all, this was a great read and his perspective is interesting, insightful and inspiring. It forces us to at least consider the outcome that the glass half full might actually overfill from technology and that it certainly has in the past. Who can argue against hoping for more “radical breakthroughs for the benefit of humanity.” All considered, this book is a great resource for leaders, technologists and anyone in need of some far too scarce good news.

  1. Ravi Zacharias writes that “The pace of cultural change over the last few decades has been unprecedented in human history, but the speed of those changes has offered us less time to reflect on their benefits.” 
  2. Consider that about 30 percent of the world’s fish populations have either collapsed or are on their way to collapse. Or, global carbon emissions rose by a record 5.9 percent in 2010, a worrisome development considering that the period was characterized by slow economic growth. 
  3. Charles Martin Hall was 22 when he figured out how to create pure globs of aluminum. Paul Héroult was 23 when he figured out how to do the same thing, using the same strategy, that same year. Hall lived in Oberlin, Ohio; Héroult lived in France. 

Some tax-time automation

I often struggle to find the right balance between automation and manual work. As it is tax time, and Chase bank only gives you 90 days of statements, I find myself every year going back through our statements to find any business expenses and do our overall financial review for the year. In the past I’ve played around with MS Money, Quicken, Mint and kept my own spreadsheets. Now, I just download the statements at the end of year and use acrobat to combine and ruby to massage the combined PDF into a spreadsheet.1

To do my analysis I need everything in a CSV format. After, getting one PDF, I end up looking at the structure of the document which looks like:

Earn points [truncated] and 1{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} back per $1 spent on all other Visa Card purchases.

Date of Transaction Merchant Name or Transaction Description $ Amount
01/23 -865.63

12/29  NEW JERSEY E-ZPASS 888-288-6865 NJ  25.00

0000001 FIS33339 C 2 000 Y 9 26 15/01/26 Page 1 of 2

I realize that I want all lines that have a number like MM/DD followed by some spaces and a bunch of text, followed by a decimal number and some spaces. In regular expression syntax, that looks like:


which is literally just a way of describing to the computer where my data are.

Through using Ruby, I can easily get my expenses as CSV:

Boom. Hope this helps some of you who might otherwise be doing a lot of typing. Also, if you want to combine PDFs on the command line, you can use PDFtk thus:

pdftk file1.pdf file2.pdf cat output -

  1. The manual download takes about 10 minutes. When I get some time, I’m up for the challenge of automating this eventually with my own screen scraper and web automation using some awesome combination Ruby and Capybara. I also use PDFtk to combine PDF files.