Bayesian probabilities grant us higher confidence in making predictions when we have some certainty in the past and general uncertainty in the future. As humans we naturally use Bayesian methods to navigate in the world — using Bayesian mathematics can help machine learning models “learn” as well as give traders an edge as they enter high probability trade positions.
During World War II, the German threat lathered Europe with a general sense of hopelessness. If it wasn’t their military might and the use of blitzkrieg that triggered this, it was their technological prowess. Codebreaking was an important role for any participant in the war but it seemed breaking the Germans’ codes were impossible. Once the rumor of the unbreakable German encryption machine called “The Enigma” was proven to be true, mathematicians felt the added pressure of decrypting these messages that were the source of what was slaughtering their fellow country men and women.
That is until Alan Turing set forth his efforts at Bletchley Park to create a general purpose computer that could have a better chance at decrypting these messages. We all know by now that Turing’s efforts were fruitful and not only were him and his team able to decrypt German codes, but the computer revolution took flight. In order to understand one crucial part of Turing’s approach, let’s go back 200 years to the 18th century.
Reverend Thomas Bayes was a second generation minister, born in 1702. A man of many interests, upon his death he left many notebooks filled with his various ideas. These notebooks were scoured posthumously by his friend Richard Price, searching every nook and cranny until he discovered a notebook titled An Essay Towards Solving a Problem in the Doctrine of Chances. Price’s motivation in searching for this work was to refute the work of philosopher David Hume, who in his piece “Of Miracles” tried to refute the existence of God. Price hoped the logic laid out by Bayes would offer hope in proving the existence of God by showing that it isn’t possible to eliminate the chance or a miracle based on a large number of negative observations. The method in question: Bayes’ Theorem.
Bayes’ Theorem allows us to extract precise information from vague data, to find specific solutions taken from a huge universe of possibilities. More specifically, it allows us to make reasonable probabilities of an unknown event using known probabilities with certainty (from the past). Rather than approaching probabilities as we would with Markov models, the Bayesian approach bakes in the probability of an event with respect to the known probability of a prior event, known as a Bayesian prior.
The Bayesian formula is as follows, with A and B being separate events.
We can start with an example to see how this is used. It has been estimated on a given day, there is a 0.54 probability that the stock market will be bullish that day. We can take this probability of being bullish as A. Next, we want to know what the probability will be that NVDA will continue to rise, let’s call NVDA price of going up or down as B. We will create a decision tree to plot the various iterations of possible outcomes in our sample space, with the discrete set of assumptions mapped (for example when the market is bullish, we project an 82% probability that NVDA will be bullish as well).
On a given day, we can then derive the probability that NVDA will be bullish by summing its two probabilities, which equate to:
So there is a ~65% chance of NVDA being bullish on a given day, regardless of market direction. We can derive the same for NVDA being bullish as follows:
There would be a 35% chance of NVDA being bearish on a given day, regardless of market direction. You’ll notice that both probabilities add up to 1.0 which gives us a clean read on the probabilities while taking into account Bayesian priors.
We can also infer the reverse as well — what is the probability the market is bearish or bullish depending on whether NVDA is bearish or bullish?
Within trading, Bayesian analysis would warrant a trader to continually update their expectation (mean return) and confidence (variance) as they accumulate more information about both the market and their (potential) positions in it.
Now back to Alan Turing. Turing used a Bayesian process he invented known as Banburismus in order to crack the codes produced by the German Enigma. Banburismus is a highly intensive process that uses sequential conditional probabilities to infer information about the likely settings of the Engima machine, using information known about prior decrypted messages. The straw that broke the camel’s back was the fact that every message ended with “Heil Hitler” but he was able to build on his process through general syntax of the German language.
Bayes’ influence is far-reaching and today we even use “Bayesian reasoning engines” embedded in software to drive the recognition process in finding a signal through the noise.
Python Code
def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
# Calculate P(not A)
p_not_a = 1 - p_a
# Calculate P(B)
p_b = (p_a * p_b_given_a) + (p_not_a * p_b_given_not_a)
# Calculate P(A|B) using Bayes' Theorem
p_a_given_b = (p_a * p_b_given_a) / p_b
return p_a_given_b
# Define probabilities
p_a = 0.54 # Probability of Market being bullish
p_b_given_a = 0.4428 # Probability of NVDA being Bullish given Market is bullish
p_b_given_not_a = 0.207 # Probability of NVDA being Bullish given Market is not bullish
# Calculate P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_b_given_not_a)
# Print the result
print(f"The probability of Market being bullish given NVDA is Bullish is: {result:.4f}")
Machine Learning
When building an ML model in trading, a linear regression is sufficient in forecasting a prediction about a certain stock’s price action. We would employ Bayesian methods in this case because we are uncertain of many of the inputs involved, one of them being the future price.
Using Bayesian probabilities, the coefficient that is estimated indicates the direction and magnitude of a factor's impact on market returns. The output would be a probability distribution, which is a departure from traditional statistics where the coefficient is normally fixed. The Bayesian perspective allows for the coefficient to be dynamic, changing over time as new data is incorporated. This gets at the heart of the “learning” of a model, where we iterate on our Bayesian function to continually refine the coefficient based on the past and current information that is fed to the system in realtime.
The nature of the coefficients would need to be designed in such a way where we can decay some of the information that has already been priced into the stock — so the window to look back would need to be sufficiently short and the processing of information in realtime would need optimum performance.