Markov Models and Hidden Markov Models are excellent tools to help model stochastic processes like random walks with stock price changes in a market. Their utility gives us a set of probabilities we can use to determine whether we enter or exit a position.

In the world of trading, traders use trading indicators, momentum bands, trends, and other methods to try and infer what the price action will be in the short term future. Based on the work of Louis Bachelier, financial markets are perceived to be random walks, however he also believed in what is known as an efficient market hypothesis: that the current price is neither bullish nor bearish but the net effect of those two forces that have settled on a price.

Any day trader knows that this isn’t the case and there are always other factors that come into play, determining the direction of a stock. There are several dependent variables that influence speculation, moving price in various directions — a latent velocity that rears its head in an admittedly stochastic (random) fashion. To determine the probability of a potential position netting us a greater probability of outcome in our favor, we need a model that will realistically project the future based on the current state.

Enter: Markov Models and Hidden Markov Models.

Markov Models

Think of a Markov Model as your friend who lives in the moment without any care for the future — they live in the now and take things a step at a time. This is an inherent property in a Markov Model, known as a Markov Property where it is “memoryless”. A Markov process is used to model something that moves between states over time — in software we consider this to be like a state machine, but it’s different from a Bayesian process where priors are used to inform posteriors.

When modeling seemingly random behavior, we do so through a stochastic process. These stochastic processes have randomness and uncertainty injected into the very fabric of the models. The stochastic process indexes discrete instances of time when plotting the states as time passes, which is a perfect sandbox for data scientists, product managers, and anyone looking to simulate a random environment prior to making any business or product decisions.

Ex. How to model state transitions through a mobile app

Let’s say you are looking to model a user’s path through various screens in your mobile music player app. We begin by defining the various screens:

Home
Search
Library
Audio Player
Playlist

We would model these states as $S=\{s_{1},s_{2},s_{3},s_{4},s_{5}\}$ . To model their transition through these various permutations of states, we would create what is called a transition matrix and input probability vectors expressing the probability of landing in that state based on the prior state.

P=\begin{bmatrix} \text{Home} & \text{Search} & \text{Library} & \text{Audio} & \text{Playlist} \\ \hline P_{11} & P_{12} & P_{13} & P_{14} & P_{15} \\ P_{21}& P_{22} & P_{23} & P_{24} & P_{25} \\ P_{31} & P_{32} & P_{33} & P_{34} & P_{35}\\ P_{41} & P_{42} & P_{43} & P_{44} & P_{45}\\ P_{51} & P_{52} & P_{53} & P_{54} & P_{55}\\ \end{bmatrix}

For a first time user, you wish to understand the probability of their path each step of the way.

P = \begin{bmatrix} \text{Home} & \text{Search} & \text{Library} & \text{Audio} & \text{Playlist} \\ \hline 0.1 & 0.4 & 0.2 & 0.2 & 0.1 \\ 0.1 & 0.3 & 0.1 & 0.3 & 0.2 \\ 0.2 & 0.1 & 0.3 & 0.2 & 0.2 \\ 0.1 & 0.2 & 0.2 & 0.4 & 0.1 \\ 0.1 & 0.2 & 0.2 & 0.3 & 0.2 \\ \end{bmatrix}

You would read this as follows: from Home 40% of users go to Search but 20% of users go to the Audio Player.

From this point there are several different things you can do with this information:

1️⃣ Simulate User Flow Over Time
2️⃣ Find the Steady-State Distribution
3️⃣ Diagnose Bottlenecks and Loops
4️⃣ A/B Test Different Flows
5️⃣ Predict Churn or Feature Activation

1️⃣ Simulate User Flow Over Time

With this probability matrix, you can use it to run simulations to determine all the various paths a user can take, determining the probability they will end up in certain states at particular $t$ time-steps. Especially with our example of first-time users, we can determine probability of user churn along various paths and at certain states.

\vec{x}_{t}=\vec{x}_{0}\cdot{P}^t

The example above is what this simulation would look like — another matrix which you can use to apply linear algebra in order to understand the more common paths taken by users.

2️⃣ Find the Steady-State Distribution

Similar to the great Garbage Land Mass being indicative of how the ocean flows coalesce at a certain point, you can think of these simulations illuminating where your users are likely to end up. With a mobile audio player app, you would expect a high percentage of users ending up in the Audio Player — any divergence would be a cause for alarm and could correlate with churn.

With enough transitions through states, we expect the system to settle on a certain vector $\pi$ which implies the vector remain the same with any application of the transition matrix.

\pi=\pi{P} \: \: \text{where} \: \: \sum_{i}\pi_{i}=1

$\pi$ acts as the left eigenvector for $P$ with an eigenvalue of $1$ . This inherently captures the “memorylessness” of a Markov model since the irreducible, aperiodic finite state space forgets where it started, even though it always ends up in its steady state.

\lim_{t\rightarrow\infty}\vec{x}_{t}=\pi

So in our example, the vector $\pi$ we are solving would be as follows:

\pi = [\pi_{Home},\pi_{Search},\pi_{Library},\pi_{Audio},\pi_{Playlist}]

Adhering to $\pi=\pi{P}$ we would solve the following system of linear equations:

P = \begin{bmatrix} 0.1\pi_{Home} & 0.4\pi_{Search} & 0.2\pi_{Library} & 0.2\pi_{Audio} & 0.1\pi_{Playlist} \\ 0.1\pi_{Home} & 0.3\pi_{Search} & 0.1\pi_{Library} & 0.3\pi_{Audio} & 0.2\pi_{Playlist} \\ 0.2\pi_{Home} & 0.1\pi_{Search} & 0.3\pi_{Library} & 0.2\pi_{Audio} & 0.2\pi_{Playlist} \\ 0.1\pi_{Home} & 0.2\pi_{Search} & 0.2\pi_{Library} & 0.4\pi_{Audio} & 0.1\pi_{Playlist} \\ 0.1\pi_{Home} & 0.2\pi_{Search} & 0.2\pi_{Library} & 0.3\pi_{Audio} & 0.2\pi_{Playlist} \\ \end{bmatrix}

Let’s say our net result is the following row vector (which is because of the unique properties of the left eigenvector):

\pi=[0.15,0.25,0.2,0.3,0.1]

Because steady state distribution can be modeled, you also have the best proxy for determining user retention and “sticky” features that can be shown to engage users — which in this case would be the Audio Player since 30% of users spend most of their time there.

3️⃣ Diagnose Bottlenecks and Loops

Since these are essentially paths, you expect it to run smoothly without anomalous drops in probability. This is indicative of bottlenecks and can signal a poor UX, poor information architecture, non-receptive feature adoption, or a bug in your software. Known as sink states, if one of the states you’re measuring is “Exit” and you have a high probability of exits on a single state, this is also a smoking gun.

Loops are also identified based on the probability distribution. A loop between a state like Search and Playlist may be indicative of poor recommendations or personalization. This is a good way to weed out uncharacteristic behavior in your product.

4️⃣ A/B Test Different Flows

You can build different transition matrices for different onboarding versions and A/B test them. With enough iterations in your simulation, you can monitor the steady states of the testing variants. We can start with two probabilities: $P_{A}$ and $P_{B}$ .

We’re looking to understand how these flows evolve with time, so we express the probability vectors with the transition matrix in this manner:

\vec{x}_{t}=\vec{x}_{0}\cdot{P}_{A}^{t} \: \: \: \text{compared with} \: \: \: \vec{x}_{0}\cdot{P}_{B}^{t}

Unlike any traditional A/B test where you have a single success metric and two states captured through a numerator and denominator, these matrixes give you more insight into the overall paths. You can determine whether the user experiences showed signs of wandering, backtracking, or looping behaviors. One could also use this to determine the average number of steps it takes to reach a desired state — optimal for onboarding flows and first time user experiences.

More importantly, an A/B test has a clearly defined success criteria and with a software product, there are tangible business impacts depending on how users interact with the product. In the case of the mobile audio player, the business may hold the LTV (lifetime value) of users depending on which combination of features they interact with. So you could theoretically hold an A/B test seeing how users engage with podcasts vs music playlists and attribute a cost/reward to that feature. You would calculate this as the expectation value $R$ and multiple the conversion for each testing variant to determine which one nets in the most business value.

E=\pi\cdot{R}

To those who are anointed in physics, you can think of the transition matrix as the Lagrangian, a single mathematical artifact that possess all the information on how something moves dynamically with time.

5️⃣ Predict Churn or Feature Activation

You can add an absorbing state like “Exit” or “Churn”, which can determine the number of steps before a user churns, which feature drove the most exits, and where friction occurs across several different dimensions.

Order-k Markov process

To model this “limited horizon”, where we can only infer with reasonable probability the price at t based on the price at t-1, it would look as follows:

P(z_{t}, \; z_{t-1}, \; z_{t-2} \; ... \;z_{1}) \; = \; P(\frac{z_{t}}{z_{t-1}})

This assumption is an Order-1 Markov process, where an order-k Markov process assumes there is no direct causation between the state $z$ at time $t$ from the states that are $k+1$ time steps before it. Now that we’ve seen how only the prior time increment influences the next price, we want to show that the underlying statistics behind the system itself do not change. We do this so that we can be more consistent in our predictions — it’s like in machine learning where the model has learned a certain relationship among the various features, this implies the same systemic, statical features are preserved. If you see below, the formula showcases this by taking two discrete time steps and comparing them, showing the conditional probabilities remain unchanged:

P(\frac{z_{t}}{z_{t-1}}) = P(\frac{z_{2}}{z_{1}}) \rightarrow t \; \epsilon \; 2 \; ... \; T

The aim here is to simplify the natural complexity of the myriad of inputs that would drive price action and instead use the price of the asset as it currently stands and not account for the past. In order to understand the probability of certain actions happening in a 1-step sequence, we would use what is called a state transition matrix.

Going from state i to state j, there are many different prices that can occur. The outcome we are looking for will be a probability distribution, so we will create a matrix across all permutations of prices that can occur. Let’s take for example the stock price for PLTR, which is $15.98

i / j	$15.98	$15.99	$16.00	$16.01	$16.02
$15.98		0.25	0.25	0.25	0.25
$15.99	0	0.7	0.1	0.1	0.1
$16.00	0	0.3	0.5	0.1	0.1
$16.01	0	0.2	0.1	0.6	0.1
$16.02	0	0.1	0.2	0.2	0.5

Probability of Sequences

If we were to predict the likelihood of the the price action being as follows:

$15.98 → $15.99 → $16.00 → $16.01 → $16.02

which would imply that this is a bullish position that is going to rally. We will want to rewrite this equation using its probabilities as follows:

P(\vec{z}) \; = \; P(\$15.99|\$15.98) \; * \; P(\$16.00|\$15.99) \; * \; P(\$16.01|\$16.00) \; * \; P(\$16.02|\$16.01)

We would use the chain rule of probability to multiply those probabilities together as follows:

P(\vec{z}) \; = \; 0.25 \; * \; 0.1 \; * \; 0.1 \; * \; 0.1 \; = \; 0.00025

So the likelihood of this exact sequence in price action is 0.025%.

Using Markov Models in Python notebooks

Below is the code sequence I use to initialize and generate a Markov Model, with the example of using one year’s historical data of closing prices for TSLA and generating the next 10 sequences.

import numpy as np
import yfinance as yf

class MarkovModel:
    def __init__(self, historical_prices, initial_state):
        """
        Initialize the Markov model.

        Parameters:
        - historical_prices: List of historical stock prices.
        - initial_state: The initial state of the system.
        """
        self.historical_prices = historical_prices
        self.current_state = initial_state
        self.num_states = len(historical_prices)

    def next_state(self):
        """
        Transition to the next state based on the historical stock prices.
        """
        price_changes = np.diff(self.historical_prices)
        probabilities = np.exp(price_changes) / np.sum(np.exp(price_changes))
        next_state = np.random.choice(range(self.num_states - 1), p=probabilities)
        self.current_state = next_state
        return next_state

# Download TSLA historical data from Yahoo Finance
symbol = "TSLA"
start_date = "2023-01-01"
end_date = "2024-01-01"
data = yf.download(symbol, start=start_date, end=end_date)

# Extract closing prices
closing_prices = data['Close'].values

# Normalize the prices to calculate price changes
normalized_prices = (closing_prices - np.mean(closing_prices)) / np.std(closing_prices)

# Initial state based on the first observed price
initial_state = 0

# Create a Markov model
markov_model = MarkovModel(normalized_prices, initial_state)

# Generate a sequence of states
num_steps = 10
state_sequence = [markov_model.next_state() for _ in range(num_steps)]

# Display the generated sequence
print("Generated State Sequence:", state_sequence)

Hidden Markov Models

The states of the system in this case are directly observable, while its counterpart, Hidden Markov Models, do not have directly observable system states.

When we say observable and non-observable states, we mean that:

Observable states can be something like stock price, interest rates, events on the economic calendar.
Non-observable states can be something like investor sentiment or credit worthiness of the borrower.

To break down this down further:

Hidden Markov Model = Hidden Markov Chain + Observed Variables