Calcworkshop

Bayes Theorem Easily Explained w/ 7 Examples!

// Last Updated: September 25, 2020 - Watch Video //

What is Bayes’ Theorem?

Jenn (B.S., M.Ed.) of Calcworkshop® teaching bayes theorem

Jenn, Founder Calcworkshop ® , 15+ Years Experience (Licensed & Certified Teacher)

To best understand Bayes’ Theorem, also referred to as Bayes’ Rule, I find it helpful to start with a story.

In Harry Potter and the Goblet of Fire, the fourth book in the Harry Potter series by J.K. Rowling, the Dark Mark has been released over the Quidditch World cup, and total pandemonium has ensued.

Harry, Hermione, Ron, and Winky, the scared house-elf, have all been found at the crime scene.

Moreover, there is the discovery of a wand in Winky’s hand. But is she to blame for the dastardly deed?

While many of you already know the outcome of this part of the story, let’s assume you don’t for a moment – and play along just a bit longer. In other words, let’s pretend you are a Ministry wizard or witch and have happened upon the scene, and it is up to you to solve the crime.

Where do you begin?

Seeing that you are a very clever witch or wizard, you quickly employ bayesian statistics to help you explain this mystery.

How To Use Bayes Rule?

First, let’s take a look at our suspects:

Now, let’s present our evidence – the wand.

Your job is to determine the likelihood that each suspect is responsible for conjuring the Dark Mark, given the fact the wand was found at the scene of the crime.

We do this by reversing the question. Instead of saying, what is the chance that Winky committed the crime (if B is true), given the discovery of the wand (A is true), but rather, if the wand is real (if A is true), what is the probability that Winky conjured the Dark Mark (B is true)?

At first glance, it appears that these are the same question, but mathematically, the slight change or reversal helps us solve the puzzle and is the key to understanding Bayes’ Rule!

Bayes’ Theorem states when a sample is a disjoint union of events, and event A overlaps this disjoint union, then the probability that one of the disjoint partitioned events is true given A is true, is:

bayes theorem formula

Bayes Theorem Formula

For example, the disjoint union of events is the suspects: Harry, Hermione, Ron, Winky, or a mystery suspect.

And event A that overlaps this disjoint partitioned union is the wand. Therefore, all Bayes’ Theorem says is, “if the wand is true, what is the probability that one of the suspects is true?”

That’s it!

When To Use Bayes Theorem?

So, Bayes’ Rule represents the probability of an event based on the prior knowledge of the conditions that might be related to that event, as Analytics Vidhya accurately states.

If we already know the conditional probability, we use Bayes’ Theorem to find the reverse probabilities. All this means is that we are going to use a Tree Diagram in reverse.

How Do We Use Bayes’ Theorem?

There are various ways to use Bayes’ Rule, such as Venn diagrams and Punnett squares, but I think the easiest way to understand how this works is to picture a tree diagram. We’re going to start at the end branches and backtrack along the branch-stems to find the beginning.

Example – Tree Diagram

While it is known that in a criminal trial, it must be shown that a defendant is guilty beyond a reasonable doubt (i.e., innocent until proven guilty), let’s assume that in a criminal trial by jury, the probability the defendant is convicted, given they are guilty, is 82%.

The probability that the defendant is acquitted, given innocence, is 80%.

And, suppose that 85% of all defendants are indeed guilty. Now, suppose a particular defendant is convicted of a crime. Find the probability they are innocent.

First, let’s create a tree diagram to help us make sense of all the information we have been given.

bayes theorem tree diagram

Bayes Theorem Tree Diagram

Now, we are ready to answer the question, “supposing a defendant is convicted, find the probability the defendant is innocent.” All this means is that we are being asked to find the probability of innocence, given conviction. So, we will use Bayes’ Rule while working backward along our tree branches, as illustrated below.

bayes theorem conditional probability

Bayes Theorem Conditional Probability

This means that the likelihood a defendant is found guilty, when in fact they are innocent, is 4.13%.

Now another incredibly important application of Bayes’ Theorem is found with sensitivity, specificity, and prevalence as it applies to positivity rates for a disease.

  • Sensitivity is the actual positive rate or the probability that a person tests positive for a disease when they do indeed have it.
  • Specificity is the true negative rate or the probability that a person tests negative for a disease when they do not have the condition.
  • Prevalence is the probability of having the disease.

prevalence and sensitivity and specificity

Prevalence, Sensitivity, and Specificity

Example – Positivity Rate

Okay, so let’s see how Bayes’ rule helps us to determine the correct positivity rate.

Assume a new test is developed for cancer detection with sensitivity 0.79 and specificity 0.95, with prevalence 0.04. Determine the likelihood that an individual has cancer if their test result is positive.

understanding sensitivity and specificity from a tree diagram

Understanding Sensitivity And Specificity From A Tree Diagram

So, given that this particular test is positive, the probability that an individual has cancer is about 39.7%

Together in this video, we will use two-way tables and tree diagrams to assist us in applying Bayes’ rule.

Additionally, we will walk through countless examples in detail as we endeavor to find the probability of an event based on prior knowledge of conditions related to the event and learn how to identify prevalence, specificity, and sensitivity correctly.

Bayes Theorem – Lesson & Examples (Video)

1 hr 17 min

  • Introduction to Video: Bayes’s Rule
  • 00:00:24 – Overview of Total Probability Theorem and Bayes’s Rule
  • Exclusive Content for Members Only
  • 00:09:12 – Use Bayes’s Rule to find the probability a product is made by a particular machine (Example #1)
  • 00:24:59 – Use Bayes’s Theorem to find the probability (Examples #2-3)
  • 00:38:04 – If a random product is found defective, which plan is most likely responsible? (Example #4)
  • 00:50:17 – Find the probability using Bayes’s Rule (Examples #5-6)
  • 01:03:49 – Overview of Prevalence, Specificity and Sensitivity
  • 01:07:25 – Given the prevalence, specificity and sensitivity of a disease create a tree diagram and find the probability (Example #7)
  • Practice Problems with Step-by-Step Solutions
  • Chapter Tests with Video Solutions

Get access to all the courses and over 450 HD videos with your subscription

Monthly and Yearly Plans Available

Get My Subscription Now

Still wondering if CalcWorkshop is right for you? Take a Tour and find out how a membership can take the struggle out of learning math.

5 Star Excellence award from Shopper Approved for collecting at least 100 5 star reviews

Bayes' Theorem

Bayes can do magic!

Ever wondered how computers learn about people?

shoe laces

An internet search for "movie automatic shoe laces" brings up "Back to the future"

Has the search engine watched the movie? No, but it knows from lots of other searches what people are probably looking for.

And it calculates that probability using Bayes' Theorem.

Bayes' Theorem is a way of finding a probability when we know certain other probabilities.

The formula is:

P(A|B) = P(A) P(B|A) P(B)

Let us say P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke, then:

P(Fire|Smoke) means how often there is fire when we can see smoke P(Smoke|Fire) means how often we can see smoke when there is fire

So the formula kind of tells us "forwards" P(Fire|Smoke) when we know "backwards" P(Smoke|Fire)

Example: dangerous fires are rare (1%) but smoke is fairly common (10%) due to barbecues, and 90% of dangerous fires make smoke

We can then discover the probability of dangerous Fire when there is Smoke :

So it is still worth checking out any smoke to be sure.

picnic

Example: Picnic Day

You are planning a picnic today, but the morning is cloudy

  • Oh no! 50% of all rainy days start off cloudy!
  • But cloudy mornings are common (about 40% of days start cloudy)
  • And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)

What is the chance of rain during the day?

We will use Rain to mean rain during the day, and Cloud to mean cloudy morning.

The chance of Rain given Cloud is written P(Rain|Cloud)

So let's put that in the formula:

P(Rain|Cloud) = P(Rain) P(Cloud|Rain) P(Cloud)

  • P(Rain) is Probability of Rain = 10%
  • P(Cloud|Rain) is Probability of Cloud, given that Rain happens = 50%
  • P(Cloud) is Probability of Cloud = 40%

P(Rain|Cloud) = 0.1 x 0.5 0.4  = .125

Or a 12.5% chance of rain. Not too bad, let's have a picnic!

Just 4 Numbers

Imagine 100 people at a party, and you tally how many wear pink or not, and if a man or not, and get these numbers:

Bayes' Theorem is based off just those 4 numbers!

Let us do some totals:

And calculate some probabilities:

  • the probability of being a man is P(Man) = 40 100 = 0.4
  • the probability of wearing pink is P(Pink) = 25 100 = 0.25
  • the probability that a man wears pink is P(Pink|Man) = 5 40 = 0.125
  • the probability that a person wearing pink is a man P(Man|Pink) = ...

puppy rips

And then the puppy arrives! Such a cute puppy.

But all your data is ripped up ! Only 3 values survive:

  • P(Man) = 0.4,
  • P(Pink) = 0.25 and
  • P(Pink|Man) = 0.125

Can you discover P(Man|Pink) ?

Imagine a pink-wearing guest leaves money behind ... was it a man? We can answer this question using Bayes' Theorem:

P(Man|Pink) = P(Man) P(Pink|Man) P(Pink)

P(Man|Pink) = 0.4 × 0.125 0.25 = 0.2

Being General

Why does it work?

Let us replace the numbers with letters:

Now let us look at probabilities . So we take some ratios:

  • the overall probability of "A" is P(A) = s+t s+t+u+v
  • the probability of "B given A" is P(B|A) = s s+t

And then multiply them together like this:

Now let us do that again but use P(B) and P(A|B) :

Both ways get the same result of s s+t+u+v

So we can see that:

P(B) P(A|B) = P(A) P(B|A)

Nice and symmetrical isn't it?

It actually has to be symmetrical as we can swap rows and columns and get the same top-left corner.

And it is also Bayes Formula ... just divide both sides by P(B):

Remembering

First think "AB AB AB" then remember to group it like: "AB = A BA / B"

P( A | B ) = P( A ) P( B | A ) P( B )

Cat Allergy?

One of the famous uses for Bayes Theorem is False Positives and False Negatives .

For those we have two possible cases for "A", such as Pass / Fail (or Yes/No etc)

Example: Allergy or Not?

cat

Hunter says she is itchy. There is a test for Allergy to Cats, but this test is not always right:

  • For people that really do have the allergy, the test says "Yes" 80% of the time
  • For people that do not have the allergy, the test says "Yes" 10% of the time ("false positive")

If 1% of the population have the allergy, and Hunter's test says "Yes" , what are the chances that Hunter really has the allergy?

We want to know the chance of having the allergy when test says "Yes", written P(Allergy|Yes)

Let's get our formula:

P(Allergy|Yes) = P(Allergy) P(Yes|Allergy) P(Yes)

  • P(Allergy) is Probability of Allergy = 1%
  • P(Yes|Allergy) is Probability of test saying "Yes" for people with allergy = 80%
  • P(Yes) is Probability of test saying "Yes" (to anyone) = ??%

Oh no! We don't know what the general chance of the test saying "Yes" is ...

... but we can calculate it by adding up those with , and those without the allergy:

  • 1% have the allergy, and the test says "Yes" to 80% of them
  • 99% do not have the allergy and the test says "Yes" to 10% of them

Let's add that up:

P(Yes) = 1% × 80% + 99% × 10% = 10.7%

Which means that about 10.7% of the population will get a "Yes" result.

So now we can complete our formula:

P(Allergy|Yes) = 1% × 80% 10.7%  = 7.48%

P(Allergy|Yes) = about 7%

This is the same result we got on False Positives and False Negatives .

In fact we can write a special version of the Bayes' formula just for things like this:

P(A|B) = P(A)P(B|A) P(A)P(B|A) + P(not A)P(B|not A)

"A" With Three (or more) Cases

We just saw "A" with two cases (A and not A), which we took care of in the bottom line.

When "A" has 3 or more cases we include them all in the bottom line:

P(A1|B) = P(A1)P(B|A1) P(A1)P(B|A1) + P(A2)P(B|A2) + P(A3)P(B|A3) + ...etc

art show

Example: The Art Competition has entries from three painters: Pam, Pia and Pablo

  • Pam put in 15 paintings, 4% of her works have won First Prize.
  • Pia put in 5 paintings, 6% of her works have won First Prize.
  • Pablo put in 10 paintings, 3% of his works have won First Prize.

What is the chance that Pam will win First Prize?

P(Pam|First) = P(Pam)P(First|Pam) P(Pam)P(First|Pam) + P(Pia)P(First|Pia) + P(Pablo)P(First|Pablo)

Put in the values:

Multiply all by 30 (makes calculation easier):

A good chance!

Pam isn't the most successful artist, but she did put in lots of entries.

Now, back to Search Engines.

Search Engines take this idea and scale it up a lot (plus some other tricks).

It makes them look like they can read your mind!

It can also be used for mail filters, music recommendation services and more.

Free Mathematics Tutorials

Free Mathematics Tutorials

Bayer's theorem examples with solutions.

Bayes' theorem to find conditional porbabilities is explained and used to solve examples including detailed explanations. Diagrams are used to give a visual explanation to the theorem. Also the numerical results obtained are discussed in order to understand the possible applications of the theorem.

Bayes' theorem

Pin it!

Use of Bayes' Thereom Examples with Detailed Solutions

Example 3 Three factories produce light bulbs to supply the market. Factory A produces 20%, 50% of the tools are produced in factories B and 30% in factory C. 2% of the bulbs produced in factory A, 1% of the bulbs produced in factory B and 3% of the bulbs produced in factory C are defective. A bulb is selected at random in the market and found to be defective. what is the probability that this bulb was produced by factory B? Solution to Example 3 Let \( P(A) = 20\% \), \( P(B) = 50\% \) and \( P(C) = 30\% \) represent the probabilities that a bulb selected at random is from factory A, B and C respectively. Let \( P(D) \) be the probability that a defective bulb is selected. Let \( P(D | A) = 2\% \), \( P(D | B) = 1\% \) and \( P(D | C) = 3\%\) represent the conditional probabilities that a bulb is defective given that it is selected from factory A, B and C respectively. We now calculate that the conditional probability that the bulb was produced by factory B given that it is defective written as \( P(B | D) \) and given by Bayes' theorem. \( P(B | D) = \dfrac{P(D | B) P(B) }{ P(D | A) P(A) + P(D | B) P(B) + P(D | C) P(C)}\) \( = \dfrac{1\% \times 50\%}{ 2\% \times 20\% + 1\% \times 50\% + 3\% \times 30\%} = 0.2777\) Although factory B produces 50% of the bulbs, the probability that the selected (defective) bulb comes from this factory is low because the bulbs produced by this factory have low probability (1%) of being defective.

More References and links

Popular pages.

{ezoic-ad-1}

  • Privacy Policy

Forgot password? New user? Sign up

Existing user? Log in

Bayes' Theorem and Conditional Probability

Already have an account? Log in here.

Recommended Course

Applied probability.

A framework for understanding the world around us, from sports to science.

  • Gene Keun Chung
  • Varun Tandon
  • Mahindra Jain
  • Christopher Williams
  • Geoff Pilling
  • Worranat Pakornrat
  • Daniel Brahneborg
  • Pedro Mayorga
  • Suyeon Khim

Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability , but can be used to powerfully reason about a wide range of problems involving belief updates.

Given a hypothesis \(H\) and evidence \(E\), Bayes' theorem states that the relationship between the probability of the hypothesis before getting the evidence \(P(H)\) and the probability of the hypothesis after getting the evidence \(P(H \mid E)\) is

\[P(H \mid E) = \frac{P(E \mid H)} {P(E)} P(H).\]

Many modern machine learning techniques rely on Bayes' theorem. For instance, spam filters use Bayesian updating to determine whether an email is real or spam, given the words in the email. Additionally, many specific techniques in statistics, such as calculating \(p\)-values or interpreting medical results , are best described in terms of how they contribute to updating hypotheses using Bayes' theorem.

Explaining Counterintuitive Results

Deriving bayes' theorem, visualizing bayes’ theorem, diagnosing disease, more examples.

Probability problems are notorious for yielding surprising and counterintuitive results. One famous example--or a pair of examples--is the following:

A couple has 2 children and the older child is a boy. If the probabilities of having a boy or a girl are both 50%, what's the probability that the couple has two boys? We already know that the older child is a boy. The probability of two boys is equivalent to the probability that the younger child is a boy, which is \(50\%\). A couple has two children, of which at least one is a boy. If the probabilities of having a boy or a girl are both \(50\%\), what is the probability that the couple has two boys? At first glance, this appears to be asking the same question. We might reason as follows: “We know that one is a boy, so the only question is whether the other one is a boy, and the chances of that being the case are \(50\%\). So again, the answer is \(50\%\).” This makes perfect sense. It also happens to be incorrect.

Bayes' theorem centers on relating different conditional probabilities . A conditional probability is an expression of how probable one event is given that some other event occurred (a fixed value). For instance, "what is the probability that the sidewalk is wet?" will have a different answer than "what is the probability that the sidewalk is wet given that it rained earlier?"

For a joint probability distribution over events \(A\) and \(B\), \(P(A \cap B)\), the conditional probability of \(A\) given \(B\) is defined as

\[P(A\mid B) = \frac{P(A\cap B)}{P(B)}.\]

In the sidewalk example, where \(A\) is "the sidewalk is wet" and \(B\) is "it rained earlier," this expression reads as "the probability the sidewalk is wet given that it rained earlier is equal to the probability that the sidewalk is wet and it rains over the probability that it rains."

Note that \(P(A \cap B)\) is the probability of both \(A\) and \(B\) occurring, which is the same as the probability of \(A\) occurring times the probability that \(B\) occurs given that \(A\) occurred: \(P(B \mid A) \times P(A).\) Using the same reasoning, \(P(A \cap B)\) is also the probability that \(B\) occurs times the probability that \(A\) occurs given that \(B\) occurs: \(P(A \mid B) \times P(B)\). The fact that these two expressions are equal leads to Bayes' Theorem. Expressed mathematically, this is:

\[\begin{align} P(A \mid B) &= \frac{P(A\cap B)}{P(B)}, \text{ if } P(B) \neq 0, \\ P(B \mid A) &= \frac{P(B\cap A)}{P(A)}, \text{ if } P(A) \neq 0, \\ \Rightarrow P(A\cap B) &= P(A\mid B)\times P(B)=P(B\mid A)\times P(A), \\ \Rightarrow P(A \mid B) &= \frac{P(B \mid A) \times P(A)} {P(B)}, \text{ if } P(B) \neq 0. \end{align}\]

Notice that our result for dependent events and for Bayes’ theorem are both valid when the events are independent. In these instances, \(P(A \mid B) = P(A)\) and \(P(B \mid A) = P(B)\), so the expressions simplify.

Bayes' Theorem \[P(A \mid B) = \frac{P(B \mid A)} {P(B)} P(A)\]

While this is an equation that applies to any probability distribution over events \(A\) and \(B\), it has a particularly nice interpretation in the case where \(A\) represents a hypothesis \(H\) and \(B\) represents some observed evidence \(E\). In this case, the formula can be written as

\[P(H \mid E) = \frac{P(E \mid H)}{P(E)} P(H).\]

This relates the probability of the hypothesis before getting the evidence \(P(H)\), to the probability of the hypothesis after getting the evidence, \(P(H \mid E)\). For this reason, \(P(H)\) is called the prior probability , while \(P(H \mid E)\) is called the posterior probability . The factor that relates the two, \(\frac{P(E \mid H)}{P(E)}\), is called the likelihood ratio . Using these terms, Bayes' theorem can be rephrased as "the posterior probability equals the prior probability times the likelihood ratio."

If a single card is drawn from a standard deck of playing cards, the probability that the card is a king is 4/52, since there are 4 kings in a standard deck of 52 cards. Rewording this, if \(\text{King}\) is the event "this card is a king," the prior probability \(P(\text{King}) = \frac{4}{52} = \frac{1}{13}.\) If evidence is provided (for instance, someone looks at the card) that the single card is a face card, then the posterior probability \(P(\text{King} \mid \text{Face})\) can be calculated using Bayes' theorem: \[P(\text{King} \mid \text{Face}) = \frac{P(\text{Face} \mid \text{King})}{P(\text{Face})} P(\text{King}).\] Since every King is also a face card, \(P(\text{Face} \mid \text{King}) = 1\). Since there are 3 face cards in each suit (Jack, Queen, King) , the probability of a face card is \(P(\text{Face}) = \frac{3}{13}\). Combining these gives a likelihood ratio of \(\frac{1}{\hspace{2mm} \frac3{13}\hspace{2mm} } = \frac{13}{3}\). Using Bayes' theorem gives \(P(\text{King} \mid \text{Face}) = \frac{13}{3} \frac{1}{13} = \frac{1}{3}\). \(_\square\)

You randomly choose a treasure chest to open, and then randomly choose a coin from that treasure chest. If the coin you choose is gold, then what is the probability that you chose chest A?

Bayes' theorem clarifies the two-children problem from the first section:

1. A couple has two children, the older of which is a boy. What is the probability that they have two boys? 2. A couple has two children, one of which is a boy. What is the probability that they have two boys? \[\] Define three events, \(A\), \(B\), and \(C\), as follows: \[ \begin{align} A & = \mbox{ both children are boys}\\ B & = \mbox{ the older child is a boy}\\ C & = \mbox{ one of their children is a boy.} \end{align}\] Question 1 is asking for \(P(A \mid B)\), and Question 2 is asking for \(P(A \mid C)\). The first is computed using the simpler version of Bayes’ theorem: \[P(A \mid B) = \frac{P(A)P(B \mid A)}{P(B)} = \frac{ \frac{1}{4}\cdot 1 }{\frac{1}{2}} = \frac{1}{2}.\] To find \(P(A \mid C)\), we must determine \(P(C)\), the prior probability that the couple has at least one boy. This is equal to \(1 - P(\mbox{both children are girls}) = 1 - \frac{1}{4}=\frac{3}{4}\). Therefore the desired probability is \[P(A \mid C) = \frac{P(A)P(C \mid A)}{P(C)} = \frac{\frac{1}{4}\cdot 1}{\frac{3}{4}} = \frac{1}{3}. \ _\square \] For a similarly paradoxical problem, see the Monty Hall problem .

Venn diagrams are particularly useful for visualizing Bayes' theorem, since both the diagrams and the theorem are about looking at the intersections of different spaces of events.

A disease is present in 5 out of 100 people, and a test that is 90% accurate (meaning that the test produces the correct result in 90% of cases) is administered to 100 people. If one person in the group tests positive, what is the probability that this one person has the disease?

The intuitive answer is that the one person is 90% likely to have the disease. But we can visualize this to show that it’s not accurate. First, draw the total population and the 5 people who have the disease:

The circle A represents 5 out 100, or 5% of the larger universe of 100 people.

Next, overlay a circle to represent the people who get a positive result on the test. We know that 90% of those with the disease will get a positive result, so need to cover 90% of circle A, but we also know that 10% of the population who does not have the disease will get a positive result, so we need to cover 10% of the non-disease carrying population (the total universe of 100 less circle A).

Circle B is covering a substantial portion of the total population. It actually covers more area than the total portion of the population with the disease. This is because 14 out of the total population of 100 (90% of the 5 people with the disease + 10% of the 95 people without the disease) will receive a positive result. Even though this is a test with 90% accuracy, this visualization shows that any one patient who tests positive (Circle B) for the disease only has a 32.14% (4.5 in 14) chance of actually having the disease.

Main article: Bayesian theory in science and math

Bayes’ theorem can show the likelihood of getting false positives in scientific studies. An in-depth look at this can be found in Bayesian theory in science and math .

Many medical diagnostic tests are said to be \(X\)% accurate, for instance 99% accurate, referring specifically to the probability that the test result is correct given your condition (or lack thereof). This is not the same as the posterior probability of having the disease given the result of the test. To see this in action, consider the following problem.

The world had been harmed by a widespread Z-virus, which already turned 10% of the world's population into zombies.

The scientists then invented a test kit with the sensitivity of 90% and specificity of 70%: 90% of the infected people will be tested positive while 70% of the non-infected will be tested negative.

If the test kit showed a positive result, what would be the probability that the tested subject was truly zombie?

If the solution is in a form of \(\frac{a}{b}\), where \(a\) and \(b\) are coprime positive integers, submit your answer as \(a+b\).

A disease test is advertised as being 99% accurate: if you have the disease, you will test positive 99% of the time, and if you don't have the disease, you will test negative 99% of the time.

If 1% of all people have this disease and you test positive, what is the probability that you actually have the disease?

Balls numbered 1 through 20 are placed in a bag. Three balls are drawn out of the bag without replacement. What is the probability that all the balls have odd numbers on them? In this situation, the events are not independent. There will be a \(\frac{10}{20} = \frac{1}{2}\) chance that any particular ball is odd. However, the probability that all the balls are odd is not \(\frac{1}{8}\). We do have that the probability that the first ball is odd is \(\frac{1}{2}.\) For the second ball, given that the first ball was odd, there are only 9 odd numbered balls that could be drawn from a total of 19 balls, so the probability is \(\frac{9}{19}\). For the third ball, since the first two are both odd, there are 8 odd numbered balls that could be drawn from a total of 18 remaining balls. So the probability is \(\frac{8}{18}\). So the probability that all 3 balls are odd numbered is \(\frac{10}{20} \times \frac{9}{19} \times \frac{8}{18} = \frac{2}{19}.\) Notice that \(\frac{2}{19} \approx 0.105\), whereas \(\frac{1}{8} = 0.125.\) \(_\square\)
A family has two children. Given that one of the children is a boy, what is the probability that both children are boys? We assume that the probability of a child being a boy or girl is \(\frac{1}{2}\). We solve this using Bayes’ theorem. We let \(B\) be the event that the family has one child who is a boy. We let \(A\) be the event that both children are boys. We want to find \(P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)}\). We can easily see that \(P(B \mid A) = 1\). We also note that \(P(A) = \frac{1}{4}\) and \(P(B) = \frac{3}{4}\). So \(P(A \mid B) = \frac{1 \times \frac{1}{4}}{\frac{3}{4}} = \frac{1}{3} \). \(_\square\)
A family has two children. Given that one of the children is a boy, and that he was born on a Tuesday, what is the probability that both children are boys? Your first instinct to this question might be to answer \(\frac{1}{3}\), since this is obviously the same question as the previous one. Knowing the day of the week a child is born on can’t possibly give you additional information, right? Let’s assume that the probability of being born on a particular day of the week is \(\frac{1}{7}\) and is independent of whether the child is a boy or a girl. We let \(B\) be the event that the family has one child who is a boy born on Tuesday and \(A\) be the event that both children are boys, and apply Bayes’ Theorem. We notice right away that \(P(B \mid A)\) is no longer equal to one. Given that there are 7 days of the week, there are 49 possible combinations for the days of the week the two boys were born on, and 13 of these have a boy who was born on a Tuesday, so \(P( B \mid A) = \frac{13}{49}\). \(P(A)\) remains unchanged at \(\frac{1}{4}\). To calculate \(P(B)\), we note that there are \(14^2\ = 196\) possible ways to select the gender and the day of the week the child was born on. Of these, there are \(13^2 = 169\) ways which do not have a boy born on Tuesday, and \(196 - 169 = 27\) which do, so \(P(B) = \frac{27}{196}\). This gives is that \(P(A \mid B) = \frac{ \frac{13}{49} \times \frac{1}{4}} {\frac{27}{196}} = \frac{13}{27}\). \(_\square\) Note: This answer is certainly not \(\frac{1}{3}\), and is actually much closer to \(\frac{1}{2}\).

Zeb's coin box contains 8 fair, standard coins (heads and tails) and 1 coin which has heads on both sides. He selects a coin randomly and flips it 4 times, getting all heads. If he flips this coin again, what is the probability it will be heads? (The answer value will be from 0 to 1, not as a percentage.)

There are 10 boxes containing blue and red balls.

The number of blue balls in the \(n^\text{th}\) box is given by \(B(n) = 2^n\). The number of red balls in the \(n^\text{th}\) box is given by \(R(n) = 1024 - B(n)\).

A box is picked at random, and a ball is chosen randomly from that box. If the ball is blue, and the probability that the \(10^\text{th}\) box was picked can be expressed as \( \frac ab\), where \(a\) and \(b\) are coprime positive integers, find \(a+b\).

More probability questions

Photo credit: http://www.spi-global.com/

steps for solving bayesian probability problems

Master concepts like these

Learn more in our Applied Probability course, built by experts for you.

Problem Loading...

Note Loading...

Set Loading...

Bayesian Statistics: A Beginner's Guide

Article updated April 2022 for Python 3.8

Over the last few years we have spent a good deal of time on QuantStart considering option price models, time series analysis and quantitative trading. It has become clear to me that many of you are interested in learning about the modern mathematical techniques that underpin not only quantitative finance and algorithmic trading, but also the newly emerging fields of data science and statistical machine learning .

Quantitative skills are now in high demand not only in the financial sector but also at consumer technology startups, as well as larger data-driven firms. Hence we are going to expand the topics discussed on QuantStart to include not only modern financial techniques, but also statistical learning as applied to other areas, in order to broaden your career prospects if you are quantitatively focused.

In order to begin discussing the modern techniques, we must first gain a solid understanding in the underlying mathematics and statistics that underpins these models. One of the key modern areas is that of Bayesian Statistics . We have not yet discussed Bayesian methods in any great detail on the site. This article has been written to help you understand the "philosophy" of the Bayesian approach, how it compares to the traditional/classical frequentist approach to statistics and the potential applications in both quantitative finance and data science.

In the article we will:

  • Define Bayesian statistics (or Bayesian inference)
  • Compare Classical ("Frequentist") statistics and Bayesian statistics
  • Derive the famous Bayes' rule, an essential tool for Bayesian inference
  • Interpret and apply Bayes' rule for carrying out Bayesian inference
  • Carry out a concrete probability coin-flip example of Bayesian inference

What is Bayesian Statistics?

Bayesian statistics is a particular approach to applying probability to statistical problems . It provides us with mathematical tools to update our beliefs about random events in light of seeing new data or evidence about those events .

In particular Bayesian inference interprets probability as a measure of believability or confidence that an individual may possess about the occurance of a particular event.

We may have a prior belief about an event, but our beliefs are likely to change when new evidence is brought to light. Bayesian statistics gives us a solid mathematical means of incorporating our prior beliefs, and evidence, to produce new posterior beliefs.

Bayesian statistics provides us with mathematical tools to rationally update our subjective beliefs in light of new data or evidence.

This is in contrast to another form of statistical inference , known as classical or frequentist statistics, which assumes that probabilities are the frequency of particular random events occuring in a long run of repeated trials .

For example, as we roll a fair (i.e. unweighted) six-sided die repeatedly, we would see that each number on the die tends to come up 1/6 of the time.

Frequentist statistics assumes that probabilities are the long-run frequency of random events in repeated trials.

When carrying out statistical inference, that is, inferring statistical information from probabilistic systems, the two approaches - frequentist and Bayesian - have very different philosophies.

Frequentist statistics tries to eliminate uncertainty by providing estimates . Bayesian statistics tries to preserve and refine uncertainty by adjusting individual beliefs in light of new evidence.

Frequentist vs Bayesian Examples

In order to make clear the distinction between the two differing statistical philosophies, we will consider two examples of probabilistic systems:

  • Coin flips - What is the probability of an unfair coin coming up heads?
  • Election of a particular candidate for UK Prime Minister - What is the probability of seeing an individual candidate winning, who has not stood before?

The following table describes the alternative philosophies of the frequentist and Bayesian approaches:

Thus in the Bayesian interpretation a probability is a summary of an individual's opinion . A key point is that different (intelligent) individuals can have different opinions (and thus different prior beliefs), since they have differing access to data and ways of interpreting it. However, as both of these individuals come across new data that they both have access to their (potentially differing) prior beliefs will lead to posterior beliefs that will begin converging towards each other under the rational updating procedure of Bayesian inference.

In the Bayesian framework an individual would apply a probability of 0 when they have no confidence in an event occuring, while they would apply a probability of 1 when they are absolutely certain of an event occuring. A probability assigned between 0 and 1 allows weighted confidence in other potential outcomes.

In order to carry out Bayesian inference, we need to utilise a famous theorem in probability known as Bayes' rule and interpret it in the correct fashion . In the following box, we derive Bayes' rule using the definition of conditional probability . However, it isn't essential to follow the derivation in order to use Bayesian methods, so feel free to skip the box if you wish to jump straight into learning how to use Bayes' rule.

Deriving Bayes' Rule

We begin by considering the definition of conditional probability , which gives us a rule for determining the probability of an event $A$, given the occurance of another event $B$. An example question in this vein might be "What is the probability of rain occuring given that there are clouds in the sky?"

The mathematical definition of conditional probability is as follows:

This simply states that the probability of $A$ occuring given that $B$ has occured is equal to the probability that they have both occured, relative to the probability that $B$ has occured.

Or in the language of the example above: The probability of rain given that we have seen clouds is equal to the probability of rain and clouds occuring together, relative to the probability of seeing clouds at all.

If we multiply both sides of this equation by $P(B)$ we get:

But, we can simply make the same statement about $P(B|A)$, which is akin to asking "What is the probability of seeing clouds, given that it is raining?" :

Note that $P(A \cap B) = P(B \cap A)$ and so by substituting the above and multiplying by $P(A)$, we get:

We are now able to set the two expressions for $P(A \cap B)$ equal to each other:

If we now divide both sides by $P(B)$ we arrive at the celebrated Bayes' rule:

However, it will be helpful for later usage of Bayes' rule to modify the denominator, $P(B)$ on the right hand side of the above relation to be written in terms of $P(B|A)$. We can actually write:

This is possible because the events $A$ are an exhaustive partition of the sample space.

So that by substituting the defintion of conditional probability we get:

Finally, we can substitute this into Bayes' rule from above to obtain an alternative version of Bayes' rule, which is used heavily in Bayesian inference:

Now that we have derived Bayes' rule we are able to apply it to statistical inference.

Applying Bayes' Rule for Bayesian Inference

As we stated at the start of this article the basic idea of Bayesian inference is to continually update our prior beliefs about events as new evidence is presented. This is a very natural way to think about probabilistic events. As more and more evidence is accumulated our prior beliefs are steadily "washed out" by any new data.

Consider a (rather nonsensical) prior belief that the Moon is going to collide with the Earth. For every night that passes, the application of Bayesian inference will tend to correct our prior belief to a posterior belief that the Moon is less and less likely to collide with the Earth, since it remains in orbit.

In order to demonstrate a concrete numerical example of Bayesian inference it is necessary to introduce some new notation.

Firstly, we need to consider the concept of parameters and models . A parameter could be the weighting of an unfair coin, which we could label as $\theta$. Thus $\theta = P(H)$ would describe the probability distribution of our beliefs that the coin will come up as heads when flipped. The model is the actual means of encoding this flip mathematically. In this instance, the coin flip can be modelled as a Bernoulli trial.

Bernoulli Trial

A Bernoulli trial is a random experiment with only two outcomes, usually labelled as "success" or "failure", in which the probability of the success is exactly the same every time the trial is carried out. The probability of the success is given by $\theta$, which is a number between 0 and 1. Thus $\theta \in [0,1]$.

Over the course of carrying out some coin flip experiments (repeated Bernoulli trials) we will generate some data , $D$, about heads or tails.

A natural example question to ask is "What is the probability of seeing 3 heads in 8 flips (8 Bernoulli trials), given a fair coin ($\theta=0.5$)?".

A model helps us to ascertain the probability of seeing this data, $D$, given a value of the parameter $\theta$. The probability of seeing data $D$ under a particular value of $\theta$ is given by the following notation: $P(D|\theta)$.

However, if you consider it for a moment, we are actually interested in the alternative question - "What is the probability that the coin is fair (or unfair), given that I have seen a particular sequence of heads and tails?".

Thus we are interested in the probability distribution which reflects our belief about different possible values of $\theta$, given that we have observed some data $D$. This is denoted by $P(\theta|D)$. Notice that this is the converse of $P(D|\theta)$. So how do we get between these two probabilities? It turns out that Bayes' rule is the link that allows us to go between the two situations.

Bayes' Rule for Bayesian Inference

  • $P(\theta)$ is the prior . This is the strength in our belief of $\theta$ without considering the evidence $D$. Our prior view on the probability of how fair the coin is.
  • $P(\theta|D)$ is the posterior . This is the (refined) strength of our belief of $\theta$ once the evidence $D$ has been taken into account. After seeing 4 heads out of 8 flips, say, this is our updated view on the fairness of the coin.
  • $P(D|\theta)$ is the likelihood . This is the probability of seeing the data $D$ as generated by a model with parameter $\theta$. If we knew the coin was fair, this tells us the probability of seeing a number of heads in a particular number of flips.
  • $P(D)$ is the evidence . This is the probability of the data as determined by summing (or integrating) across all possible values of $\theta$, weighted by how strongly we believe in those particular values of $\theta$. If we had multiple views of what the fairness of the coin is (but didn't know for sure), then this tells us the probability of seeing a certain sequence of flips for all possibilities of our belief in the coin's fairness.

The entire goal of Bayesian inference is to provide us with a rational and mathematically sound procedure for incorporating our prior beliefs, with any evidence at hand, in order to produce an updated posterior belief. What makes it such a valuable technique is that posterior beliefs can themselves be used as prior beliefs under the generation of new data. Hence Bayesian inference allows us to continually adjust our beliefs under new data by repeatedly applying Bayes' rule.

There was a lot of theory to take in within the previous two sections, so I'm now going to provide a concrete example using the age-old tool of statisticians: the coin-flip.

Coin-Flipping Example

In this example we are going to consider multiple coin-flips of a coin with unknown fairness. We will use Bayesian inference to update our beliefs on the fairness of the coin as more data (i.e. more coin flips) becomes available. The coin will actually be fair, but we won't learn this until the trials are carried out. At the start we have no prior belief on the fairness of the coin, that is, we can say that any level of fairness is equally likely.

In statistical language we are going to perform $N$ repeated Bernoulli trials with $\theta = 0.5$. We will use a uniform distribution as a means of characterising our prior belief that we are unsure about the fairness. This states that we consider each level of fairness (or each value of $\theta$) to be equally likely.

We are going to use a Bayesian updating procedure to go from our prior beliefs to posterior beliefs as we observe new coin flips. This is carried out using a particularly mathematically succinct procedure using conjugate priors . We won't go into any detail on conjugate priors within this article, as it will form the basis of the next article on Bayesian inference. It will however provide us with the means of explaining how the coin flip example is carried out in practice.

The uniform distribution is actually a more specific case of another probability distribution, known as a Beta distribution . Conveniently, under the binomial model, if we use a Beta distribution for our prior beliefs it leads to a Beta distribution for our posterior beliefs. This is an extremely useful mathematical result, as Beta distributions are quite flexible in modelling beliefs. However, I don't want to dwell on the details of this too much here, since we will discuss it in the next article. At this stage, it just allows us to easily create some visualisations below that emphasises the Bayesian procedure!

In the following figure we can see 6 particular points at which we have carried out a number of Bernoulli trials (coin flips). In the first sub-plot we have carried out no trials and hence our probability density function (in this case our prior density) is the uniform distribution. It states that we have equal belief in all values of $\theta$ representing the fairness of the coin.

The next panel shows 2 trials carried out and they both come up heads. Our Bayesian procedure using the conjugate Beta distributions now allows us to update to a posterior density. Notice how the weight of the density is now shifted to the right hand side of the chart. This indicates that our prior belief of equal likelihood of fairness of the coin, coupled with 2 new data points, leads us to believe that the coin is more likely to be unfair (biased towards heads) than it is tails.

The following two panels show 10 and 20 trials respectively. Notice that even though we have seen 2 tails in 10 trials we are still of the belief that the coin is likely to be unfair and biased towards heads. After 20 trials, we have seen a few more tails appear. The density of the probability has now shifted closer to $\theta=P(H)=0.5$. Hence we are now starting to believe that the coin is possibly fair.

After 50 and 500 trials respectively, we are now beginning to believe that the fairness of the coin is very likely to be around $\theta=0.5$. This is indicated by the shrinking width of the probability density, which is now clustered tightly around $\theta=0.46$ in the final panel. Were we to carry out another 500 trials (since the coin is actually fair) we would see this probability density become even tighter and centred closer to $\theta=0.5$.

Bayesian update using Beta-Binomial Model

Thus it can be seen that Bayesian inference gives us a rational procedure to go from an uncertain situation with limited information to a more certain situation with significant amounts of data. In the next article we will discuss the notion of conjugate priors in more depth, which heavily simplify the mathematics of carrying out Bayesian inference in this example.

For completeness, I've provided the Python code (heavily commented) for producing this plot. It makes use of SciPy 's statistics model, in particular, the Beta distribution :

I'd like to give special thanks to my good friend Jonathan Bartlett, who runs TheStatsGeek.com , for reading drafts of this article and for providing helpful advice on interpretation and corrections. Thanks Jon!

Related Articles

  • Bayesian Inference of a Binomial Proportion - The Analytical Approach

QSAlpha

Join the QSAlpha research platform that helps fill your strategy research pipeline, diversifies your portfolio and improves your risk-adjusted returns for increased profitability.

Quantcademy

The Quantcademy

Join the Quantcademy membership portal that caters to the rapidly-growing retail quant trader community and learn how to increase your strategy profitability.

Successful Algorithmic Trading

Successful Algorithmic Trading

How to find new trading strategy ideas and objectively assess them for your portfolio using a Python-based backtesting engine.

Advanced Algorithmic Trading

Advanced Algorithmic Trading

How to implement advanced trading strategies using time series analysis, machine learning and Bayesian statistics with R and Python.

Bayes Theorem

Bayes theorem is a theorem in probability and statistics, named after the Reverend Thomas Bayes, that helps in determining the probability of an event that is based on some event that has already occurred. Bayes rule has many applications such as Bayesian interference, in the healthcare sector - to determine the chances of developing health problems with an increase in age and many others.

The Bayes theorem is based on finding P(A | B) when P(B | A) is given. Here, we will aim at understanding the use of the Bayes rule in determining the probability of events, its statement, formula, and derivation with the help of examples.

What is Bayes Theorem?

Bayes theorem , in simple words, determines the conditional probability of event A given that event B has already occurred based on the following:

  • Probability of B given A
  • Probability of A
  • Probability of B

Bayes Law is a method to determine the probability of an event based on the occurrences of prior events. It is used to calculate conditional probability . Bayes theorem calculates the probability based on the hypothesis. Now, let us state and prove Bayes Theorem. Bayes rule states that the conditional probability of an event A, given the occurrence of another event B, is equal to the product of the likelihood of B, given A and the probability of A divided by the probability of B. It is given as:

\(P(A|B) = \dfrac{P(B|A)P(A)}{P(B)}\)

Here, P(A) = how likely A happens(Prior knowledge)- The probability of a hypothesis is true before any evidence is present.

P(B) = how likely B happens(Marginalization)- The probability of observing the evidence.

P(A|B) = how likely A happens given that B has happened(Posterior)-The probability of a hypothesis is true given the evidence.

P(B|A) = how likely B happens given that A has happened(Likelihood)- The probability of seeing the evidence if the hypothesis is true.

Bayes Theorem states that P of H by E equals P of H times P of E by H all times P of E

Bayes Theorem - Statement

The statement of Bayes Theorem is as follows: Let \(E_{1}, E_{2}, E_{3}, ..., E_{n}\) be a set of events associated with a sample space S, where all events \(E_{1}, E_{2}, E_{3}, ..., E_{n}\) have non-zero probability of occurrence and they form a partition of S. Let A be any event which occurs with \(E_{1} or E_{2} or E_{3} ...or E_{n}\), then according to Bayes Theorem,

\(P(E_{i} | A) = \dfrac{P(E_{i})P(A|E_{i})}{\sum_{k=1}^{n}P(E_{k})P(A|E_{k})} , i=1,2,3,...,n\)

  • Here E\(_i\) ∩ E\(_j\) = φ, where i ≠ j. (i.e) They are mutually exhaustive events
  • The union of all the events of the partition, should give the sample space.
  • 0 ≤ P(E\(_{i}\)) ≤ 1

Bayes Theorem Proof

To prove the Bayes Theorem, we will use the total probability and conditional probability formulas .

  • The total probability of an event A is calculated when not enough data is known about event A, then we use other events related to event A to determine its probability.
  • Conditional probability is the probability of event A given that other related events have already occurred.

(E\(_{i}\)), be is a partition of the sample space S. Let A be an event that occurred. Let us express A in terms of (E\(_{i}\)).

= A ∩ (\(E_{1}, E_{2}, E_{3}, ..., E_{n}\))

A = (A ∩\(E_{1}\)) ∪ (A ∩\(E_{1}\)) ∪ (A ∩\(E_{1}\))....∪ ( A ∩\(E_{1}\))

P(A) = P[(A ∩\(E_{1}\)) ∪ (A ∩\(E_{1}\)) ∪ (A ∩\(E_{1}\))....∪ ( A ∩\(E_{1}\))]

We know that when A and B are disjoint sets , then P(A∪B) = P(A) + P(B)

Thus here, P(A) = P(A ∩\(E_{1}\)) +P(A ∩\(E_{1}\))+ P(A ∩\(E_{1}\)).....P(A ∩\(E_{n}\))

According to the multiplication theorem of a dependent events , we have

P(A) = P(E). P(A|\(E_{1}\)) + P(E). P(A|\(E_{2}\)) + P(E). P(A|\(E_{3}\))......+ P(A|\(E_{n}\))

Thus total probability of P(A) = \(\sum_{i=1}^{n}P(E_{i})P(A|E_{i}) , i=1,2,3,...,n\) --- (1)

Recalling the conditional probability, we get

\(P(E_{i}|A) = \dfrac{P(E_{i}\cap A)}{P(A)} , i=1,2,3,...,n\) ---(2)

Using the formula for conditional probability of \(P(A|E_{i})\), we have

\(P(E_{i}\cap A) = P(A|E_{i}) P(E_{i})\) --- (3)

Substituting equations (1) and (3) in equation (2), we get

\(P(E_{i}|A) = \dfrac{P(A|E_{i}) P(E_{i})}{\sum_{k=1}^{n}P(E_{k})P(A|E_{k})}, i=1,2,3,...,n\)

Hence, Bayes rule is proved.

Bayes Theorem Formula

Bayes formula exists for events and random variables. Bayes theorem formulas are derived from the definition of conditional probability. It can be derived for events A and B, as well as continuous random variables X and Y. Let us first see the formula for events.

Bayes Theorem Formula for Events

The formula for events derived from the definition of conditional probability is:

\(P(A|B) = \dfrac{P(B|A)P(A)}{P(B)}, P(B) \neq 0\)

bayes' theorem formula is p of a by b equals p of b by a times p of a all over p of b

Derivation:

According to the definition of conditional probability, \(P(A|B) = \dfrac{P(A \cap B)}{P(B)}, P(B) \neq 0\) and we know that \(P(A \cap B) = P(B \cap A) = P(B|A)P(A)\), which implies,

Hence, the Bayes theorem formula for events is derived.

Bayes Theorem for Continuous Random Variables

The formula for continuous random variables X and Y derived from the definition of the conditional probability of continuous variables is:

\(f_{X|Y=y}(x) = \dfrac{f_{Y|X=x}(y)f_{X}(x)}{f_{Y}(y)}\)

According to the definition of conditional density or conditional probability of continuous random variables, we know that \(f_{X|Y=y}(x)=\dfrac{f_{X,Y}(x,y)}{f_{Y}(y)}\) and \(f_{Y|X=x}(y)=\dfrac{f_{X,Y}(x,y)}{f_{X}(x)}\), which implies,

Hence, the Bayes Theorem formula for random continuous variables is derived.

Difference Between Conditional Probability and Bayes Rule

Terms related to bayes theorem.

As we have studied about Bayes theorem in detail, let us understand the meanings of a few terms related to the concept which have been used in the Bayes theorem formula and derivation:

  • Conditional Probability - Conditional Probability is the probability of an event A based on the occurrence of another event B. It is denoted by P(A|B) and represents the probability of A given that event B has already happened.
  • Joint Probability - Joint probability measures the probability of two more events occurring together and at the same time. For two events A and B, it is denoted by \(P(A \cap B)\).
  • Random Variables - Random variable is a real-valued variable whose possible values are determined by a random experiment. The probability of such variables is also called the experimental probability.
  • Posterior Probability - Posterior probability is the probability of an event that is calculated after all the information related to the event has been accounted for. It is also known as conditional probability.
  • Prior Probability - Prior probability is the probability of an event that is calculated before considering the new information obtained. It is the probability of an outcome that is determined based on current knowledge before the experiment is performed.

Important Notes on Bayes Law:

  • Bayes theorem is used to determine conditional probability.
  • When two events A and B are independent, P(A|B) = P(A) and P(B|A) = P(B)
  • Conditional probability can be calculated using the Bayes theorem for continuous random variables.

☛ Related Topics:

  • Probability and statistics
  • Baye's Theorem Calculator

Bayes Theorem Examples

Example 1: Amy has two bags. Bag I has 7 red and 4 blue balls and bag II has 5 red and 9 blue balls. Amy draws a ball at random and it turns out to be red. Determine the probability that the ball was from the bag I.

Solution: Assume A to be the event of drawing a red ball. We know that the probability of choosing a bag for drawing a ball is 1/2, that is,

P(X) = P(Y) = 1/2

Let X and Y be the events that the ball is from the bag I and bag II, respectively. Since there are 7 red balls out of a total of 11 balls in the bag I, therefore, P(drawing a red ball from the bag I) = P(A|X) = 7/11

Similarly, P(drawing a red ball from bag II) = P(A|Y) = 5/14

We need to determine the value of P(the ball drawn is from the bag I given that it is a red ball), that is, P(X|A). To determine this we will use Bayes Theorem. Using Bayes theorem, we have the following:

\(P(X|A) = \dfrac{P(A|X)P(X)}{P(A|X)P(X)+P(A|Y)P(Y)}\)

= [((7/11)(1/2))/(7/11)(1/2)+(5/14)(1/2)]

Answer: ∴ The probability that the ball is drawn is from bag I is 0.64

Example 2: Assume that the chances of a person having a skin disease are 40%. Assuming that skin creams and drinking enough water reduces the risk of skin disease by 30% and prescription of a certain drug reduces its chance by 20%. At a time, a patient can choose any one of the two options with equal probabilities. It is given that after picking one of the options, the patient selected at random has the skin disease. Find the probability that the patient picked the option of skin screams and drinking enough water using the Bayes theorem.

Solution: Assume E1: The patient uses skin creams and drinks enough water; E2: The patient uses the drug; A: The selected patient has the skin disease

P(E1) = P(E2) = 1/2

Using the probabilities known to us, we have

P(A|E1) = 0.4 × (1-0.3) = 0.28

P(A|E2) = 0.4 × (1-0.2) = 0.32

Using Bayes rule, the probability that the selected patient uses skin creams and drinks enough water is given by,

\(P(E1|A) = \dfrac{P(A|E1)P(E1)}{P(A|E1)P(E1)+P(A|E2)P(E2)}\)

= (0.28 × 0.5)/(0.28 × 0.5 + 0.32 × 0.5)

= 0.14/(0.14 + 0.16)

Answer: ∴ The probability that the patient picked the first option is 0.47

Example 3: A man is known to speak the truth 3/4 times. He draws a card and reports it is king. Find the probability that it is actually a king.

Let E be the event that the man reports that king is drawn from the pack of cards

A be the event that the king is drawn

B be the event that the king is not drawn.

Then we have P(A) = probability that king is drawn = 1/4

P(B) = probability that king is drawn = 3/4

P(E/A) = Probability that the man says the truth that king is drawn when actually king is drawn = P(truth) = 3/4

P(E/B)= Probability that the man lies that king is drawn when actually king is drawn = P(lie) = 1/4

Then according to Bayes formula, the probability that it is actually a king = P(A/E)

=\(\dfrac{P(A)P(E|A)}{P(A)P(E|A)+P(B)P(E|B)}\)

= [1/4 × 3/4] ÷[(1/4 × 3/4) + (1/4 × 3/4)]

= 3/16 ÷12/16

= 3/16 × 16/12

Answer: ∴ The probability that the drawn card is actually a king = 0.5

go to slide go to slide go to slide

steps for solving bayesian probability problems

Book a Free Trial Class

Practice Questions on Bayes Theorem

go to slide go to slide

FAQs on Bayes Theorem

State bayes theorem probability..

Bayes theorem is a statistical formula to determine the conditional probability of an event. It describes the probability of an event based on prior knowledge of events that have already happened. Bayes rule is named after the Reverend Thomas Bayes and Bayesian probability formula for random events is \(P(A|B) = \dfrac{P(B|A)P(A)}{P(B)}\), where

  • P(A) = how likely A happens
  • P(B) = how likely B happens
  • P(A/B) = how likely does A to happen given that B has happened
  • P(B/A) = how likely does B to happen given that A has happened

What Does the Bayes Theorem State?

Let \(E_{1}, E_{2}, E_{3}, ..., E_{n}\) be a set of events associated with a sample space S, where all events \(E_{1}, E_{2}, E_{3}, ..., E_{n}\) are mutually exclusive and exhaustive events of the sample space S. Let A be an event related to S, then according to Bayesian probability, \(P(E_{i} | A) = \dfrac{P(E_{i})P(A|E_{i})}{\sum_{k=1}^{n}P(E_{k})P(A|E_{k})} , i=1,2,3,...,n\).

Is Conditional Probability the Same as Bayes Theorem?

Conditional probability is the probability of the occurrence of an event based on the occurrence of other events whereas the Bayes theorem is derived from the definition of conditional probability. Bayes law includes the two conditional probabilities.

How to Use Bayes Theorem?

To determine the probability of an event A given that the related event B has already occurred, that is, P(A|B) using the Bayes Theorem, we calculate the probability of the event B, that is, P(B); the probability of event B given that event A has occurred, that is, P(B|A); and the probability of the event A individually, that is, P(A). Then, we substitute these values into the Bayes formula \(P(A|B) = \dfrac{P(B|A)P(A)}{P(B)}\) to determine the probability.

Is Bayes Rule for Independent Events ?

If two events A and B are independent, then P(A|B) = P(A) and P(B|A) = P(B), therefore Bayes theorem cannot be used here to determine the conditional probability as we need to determine the total probability and there is no dependency of events.

What is the Bayes Theorem in Machine Learning?

Bayes theorem provides a method to determine the probability of a hypothesis based on its prior probability, the probabilities of observing various data given the hypothesis, and the observed data itself. It helps immensely in getting a more accurate result. Hence, whenever there is a conditional probability problem, the Bayes rule in Machine Learning is used.

Bayes' Theorem Calculator

Table of contents

The Bayes' theorem calculator helps you calculate the probability of an event using Bayes' theorem. The Bayes' theorem calculator finds a conditional probability of an event based on the values of related known probabilities.

Bayes' rule or Bayes' law are other names that people use to refer to Bayes' theorem, so if you are looking for an explanation of what these are, this article is for you. Below you can find the Bayes' theorem formula with a detailed explanation as well as an example of how to use Bayes' theorem in practice.

You can check out our conditional probability calculator to read more about this subject!

💡 For a more general introduction to probabilities and how to calculate them, check out our probability calculator .

What is Bayes' theorem?

Bayes' theorem is named after Reverend Thomas Bayes , who worked on conditional probability in the eighteenth century. Bayes' rule calculates what can be called the posterior probability of an event, taking into account the prior probability of related events .

To give a simple example – looking blindly for socks in your room has lower chances of success than taking into account places that you have already checked. If you have a recurring problem with losing your socks, our sock loss calculator may help you. On the other hand, taking an egg out of the fridge and boiling it does not influence the probability of other items being there. These may be funny examples, but Bayes' theorem was a tremendous breakthrough that has influenced the field of statistics since its inception.

The importance of Bayes' law to statistics can be compared to the significance of the Pythagorean theorem to math. Nowadays, the Bayes' theorem formula has many widespread practical uses. You may use them every day without even realizing it! To find more about it, check the Bayesian inference section below.

So how does Bayes' formula actually look?

What is the Bayes' formula?

In its simplest form, we are calculating the conditional probability denoted as P(A|B) – the likelihood of event A occurring provided that B is true. Bayes' rule is expressed with the following equation:

P(A|B) = [P(B|A) × P(A)] / P(B) ,

  • P(A) , P(B) – Probability of event A and even B occurring, respectively;
  • P(A|B) – Conditional probability of event A occurring given that B has happened; and similarly
  • P(B|A) – Conditional probability of event B occurring given that A has happened.

Bayes' rule formula – tests

The Bayes' theorem can be extended to two or more cases of event A. This can be useful when testing for false positives and false negatives . The probability of event B is then defined as:

P(B) = P(A) × P(B|A) + P(not A) × P(B|not A) ,

where P(not A) is the probability of event A not occurring.

The following equation is true: P(not A) + P(A) = 1 as either event A occurs or it does not.

The extended Bayes' rule formula would then be:

P(A|B) = [P(B|A) × P(A)] / [P(A) × P(B|A) + P(not A) × P(B|not A)]

In medicine – it can help improve the accuracy of allergy tests. Bayes' theorem can help determine the chances that a test is wrong . What is the likelihood that someone has an allergy? A false positive is when results show someone with no allergy having it. A false negative would be the case when someone with an allergy is shown not to have it in the results. Bayes' formula can give you the probability of this happening. The table below shows possible outcomes:

Bayes' theorem for dummies – Bayes' theorem example

Now that you know Bayes' theorem formula, you probably want to know how to make calculations using it.

Suppose you want to go out but aren't sure if it will rain. Do you need to take an umbrella? Let's assume you checked past data, and it shows that this month's 6 of 30 days are usually rainy. In this case, the probability of rain would be 0.2 or 20%. To quickly convert fractions to percentages, check out our fraction to percentage calculator . Let's also assume clouds in the morning are common; 45% of days start cloudy. Additionally, 60% of rainy days start cloudy. So what are the chances it will rain if it is an overcast morning?

To make calculations easier, let's convert the percentage to a decimal fraction, where 100% is equal to 1, and 0% is equal to 0. Now, let's match the information in our example with variables in Bayes' theorem:

  • A is the rain event.
  • B is the cloudy morning event.
  • P(A) is the probability of rain. In this case, 20% or 0.2.
  • Likewise P(B) is the probability of clouds occurring – 45% or 0.45.
  • P(A|B) is the probability of rain occurring given the cloudy morning – this is what we want to calculate.
  • Similarly P(B|A) is the probability of clouds on a rainy day – 60% or 0.6.

P(A|B) = [0.6 × 0.2] / 0.45) ≈ 0.27

In this case, the probability of rain occurring provided that the day started with clouds equals about 0.27 or 27%. Roughly a 27% chance of rain. So how about taking the umbrella just in case? Or do you prefer to look up at the clouds?

A quick side note; in our example, the chance of rain on a given day is 20%. Providing more information about related probabilities (cloudy days and clouds on a rainy day) helped us get a more accurate result in certain conditions. The example shows the usefulness of conditional probabilities. Now that we have seen how Bayes' theorem calculator does its magic, feel free to use it instead of doing the calculations by hand.

💡 If you'd like to learn how to calculate a percentage, you might want to check our percentage calculator .

Bayesian inference – real life applications

Bayesian inference is a method of statistical inference based on Bayes' rule. While Bayes' theorem looks at pasts probabilities to determine the posterior probability , Bayesian inference is used to continuously recalculate and update the probabilities as more evidence becomes available. This is possible where there is a huge sample size of changing data.

This technique is also known as Bayesian updating and has an assortment of everyday uses that range from genetic analysis , risk evaluation in finance, search engines and spam filters to even courtrooms. Jurors can decide using Bayesian inference whether accumulating evidence is beyond a reasonable doubt in their opinion.

Similarly, spam filters get smarter the more data they get. Seeing what types of emails are spam and what words appear more frequently in those emails leads spam filters to update the probability and become more adept at recognizing those foreign prince attacks. 😉

When should I use Bayes' theorem?

To know when to use Bayes' formula instead of the conditional probability definition to compute P(A|B) , reflect on what data you are given:

  • If you know the probability P(A) and the conditional probability P(B|A) , use Bayes' formula.
  • If you know the probability of intersection P(A∩B) , use the conditional probability formula.

How do I use Bayes' theorem?

To find the conditional probability P(A|B) using Bayes' formula, you need to:

  • Make sure the probability P(B) is non-zero.
  • Take the probabilities P(B|A) and P(A) and compute their product.
  • Divide the result from Step 2 by P(B) .
  • That's it! You've just successfully applied Bayes' theorem!

How can I prove Bayes theorem?

The simplest way to derive Bayes' theorem is via the definition of conditional probability. Let A, B be two events of non-zero probability. Then:

Write down the conditional probability formula for A conditioned on B : P(A|B) = P(A∩B) / P(B) .

Repeat Step 1, swapping the events: P(B|A) = P(A∩B) / P(A) .

Solve the above equations for P(A∩B) . We obtain P(A|B) × P(B) = P(B|A) × P(A) .

Solve for P(A|B) : what you get is exactly Bayes' formula: P(A|B) = P(B|A) × P(A) / P(B) .

The alien civilization calculator explores the existence of extraterrestrial civilizations by comparing two models: the Drake equation and the Astrobiological Copernican Limits👽

Welcome to the Christmas tree calculator, where you will find out how to decorate your Christmas tree in the best way. Take a look at the perfect Christmas tree formula prepared by math professors and improved by physicists. Plan in advance how many lights and decorations you'll need!

Unlock the power of statistics with our expected value formula calculator. Learn how to calculate the expected value swiftly. Try it today!

Enter the values of probabilities between 0% and 100%.

Probability of an event A.

Probability of an event B.

Probability of B under the condition A.

Probability of A under the condition B.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

9.6: Bayes' Theorem

  • Last updated
  • Save as PDF
  • Page ID 62027

  • David Lippman
  • Pierce College via The OpenTextBookStore

In this section we concentrate on the more complex conditional probability problems we began looking at in the last section.

Suppose a certain disease has an incidence rate of 0.1% (that is, it afflicts 0.1% of the population). A test has been devised to detect this disease. The test does not produce false negatives (that is, anyone who has the disease will test positive for it), but the false positive rate is 5% (that is, about 5% of people who take the test will test positive, even though they do not have the disease). Suppose a randomly selected person takes the test and tests positive. What is the probability that this person actually has the disease?

There are two ways to approach the solution to this problem. One involves an important result in probability theory called Bayes' theorem. We will discuss this theorem a bit later, but for now we will use an alternative and, we hope, much more intuitive approach.

Let's break down the information in the problem piece by piece.

Suppose a certain disease has an incidence rate of 0.1% (that is, it afflicts 0.1% of the population). The percentage 0.1% can be converted to a decimal number by moving the decimal place two places to the left, to get 0.001. In turn, 0.001 can be rewritten as a fraction: 1/1000. This tells us that about 1 in every 1000 people has the disease. (If we wanted we could write P (disease)=0.001.)

A test has been devised to detect this disease. The test does not produce false negatives (that is, anyone who has the disease will test positive for it). This part is fairly straightforward: everyone who has the disease will test positive, or alternatively everyone who tests negative does not have the disease. (We could also say P (positive | disease)=1.)

The false positive rate is 5% (that is, about 5% of people who take the test will test positive, even though they do not have the disease). This is even more straightforward. Another way of looking at it is that of every 100 people who are tested and do not have the disease, 5 will test positive even though they do not have the disease. (We could also say that \(P\)(positive | no disease)=0.05.)

Suppose a randomly selected person takes the test and tests positive. What is the probability that this person actually has the disease? Here we want to compute \(P\)(disease|positive). We already know that \(P\)(positive|disease)=1, but remember that conditional probabilities are not equal if the conditions are switched.

Rather than thinking in terms of all these probabilities we have developed, let's create a hypothetical situation and apply the facts as set out above. First, suppose we randomly select 1000 people and administer the test. How many do we expect to have the disease? Since about 1/1000 of all people are afflicted with the disease, \(\frac{1}{1000}\) of 1000 people is 1. (Now you know why we chose 1000.) Only 1 of 1000 test subjects actually has the disease; the other 999 do not.

We also know that 5% of all people who do not have the disease will test positive. There are 999 disease-free people, so we would expect \((0.05)(999)=49.95\) (so, about 50) people to test positive who do not have the disease.

Now back to the original question, computing P (disease|positive). There are 51 people who test positive in our example (the one unfortunate person who actually has the disease, plus the 50 people who tested positive but don't). Only one of these people has the disease, so

P(disease | positive) \(\approx \frac{1}{51} \approx 0.0196\)

or less than 2%. Does this surprise you? This means that of all people who test positive, over 98% do not have the disease .

The answer we got was slightly approximate, since we rounded 49.95 to 50. We could redo the problem with 100,000 test subjects, 100 of whom would have the disease and \((0.05)(99,900)=4995\) test positive but do not have the disease, so the exact probability of having the disease if you test positive is

P(disease | positive) \(\approx \frac{100}{5095} \approx 0.0196\)

which is pretty much the same answer.

But back to the surprising result. Of all people who test positive, over 98% do not have the disease. If your guess for the probability a person who tests positive has the disease was wildly different from the right answer (2%), don't feel bad. The exact same problem was posed to doctors and medical students at the Harvard Medical School 25 years ago and the results revealed in a 1978 New England Journal of Medicine article. Only about 18% of the participants got the right answer. Most of the rest thought the answer was closer to 95% (perhaps they were misled by the false positive rate of 5%).

So at least you should feel a little better that a bunch of doctors didn't get the right answer either (assuming you thought the answer was much higher). But the significance of this finding and similar results from other studies in the intervening years lies not in making math students feel better but in the possibly catastrophic consequences it might have for patient care. If a doctor thinks the chances that a positive test result nearly guarantees that a patient has a disease, they might begin an unnecessary and possibly harmful treatment regimen on a healthy patient. Or worse, as in the early days of the AIDS crisis when being HIV-positive was often equated with a death sentence, the patient might take a drastic action and commit suicide.

As we have seen in this hypothetical example, the most responsible course of action for treating a patient who tests positive would be to counsel the patient that they most likely do not have the disease and to order further, more reliable, tests to verify the diagnosis.

One of the reasons that the doctors and medical students in the study did so poorly is that such problems, when presented in the types of statistics courses that medical students often take, are solved by use of Bayes' theorem, which is stated as follows:

Bayes’ Theorem

\(P(A | B)=\frac{P(A) P(B | A)}{P(A) P(B | A)+P(\bar{A}) P(B | \bar{A})}\)

In our earlier example, this translates to

\(P(\text { disease } | \text { positive })=\frac{P(\text { disease }) P(\text { positive } | \text { disease })}{P(\text { disease }) P(\text { positive } | \text { disease })+P(\text { no disease }) P(\text { positive } | \text { no disease })}\)

Plugging in the numbers gives

\(P(\text { disease } | \text { positive })=\frac{(0.001)(1)}{(0.001)(1)+(0.999)(0.05)} \approx 0.0196\)

which is exactly the same answer as our original solution.

The problem is that you (or the typical medical student, or even the typical math professor) are much more likely to be able to remember the original solution than to remember Bayes' theorem. Psychologists, such as Gerd Gigerenzer, author of Calculated Risks: How to Know When Numbers Deceive You , have advocated that the method involved in the original solution (which Gigerenzer calls the method of "natural frequencies") be employed in place of Bayes' Theorem. Gigerenzer performed a study and found that those educated in the natural frequency method were able to recall it far longer than those who were taught Bayes' theorem. When one considers the possible life-and-death consequences associated with such calculations it seems wise to heed his advice.

A certain disease has an incidence rate of 2%. If the false negative rate is 10% and the false positive rate is 1%, compute the probability that a person who tests positive actually has the disease.

Imagine 10,000 people who are tested. Of these 10,000, 200 will have the disease; 10% of them, or 20, will test negative and the remaining 180 will test positive. Of the 9800 who do not have the disease, 98 will test positive. So of the 278 total people who test positive, 180 will have the disease. Thus

\(P(\text { disease } | \text { positive })=\frac{180}{278} \approx 0.647\)

so about 65% of the people who test positive will have the disease.

Using Bayes theorem directly would give the same result:

\(P(\text { disease } | \text { positive })=\frac{(0.02)(0.90)}{(0.02)(0.90)+(0.98)(0.01)}=\frac{0.018}{0.0278} \approx 0.647\)

Try it Now 5

A certain disease has an incidence rate of 0.5%. If there are no false negatives and if the false positive rate is 3%, compute the probability that a person who tests positive actually has the disease.

Out of 100,000 people, 500 would have the disease. Of those, all 500 would test positive. Of the 99,500 without the disease, 2,985 would falsely test positive and the other 96,515 would test negative.

\(\mathrm{P}(\text { disease } | \text { positive })=\frac{500}{500+2985}=\frac{500}{3485} \approx 14.3 \%\)

OPINION article

On bayesian problem-solving: helping bayesians solve simple bayesian word problems.

\r\nMiroslav Sirota*

  • 1 Department of Psychology, Kingston University, London, UK
  • 2 Department of Management, Kingston University, London, UK

Resolving the “Bayesian Paradox”—Bayesians who Failed to Solve Bayesian Problems

A well-supported conclusion a reader would draw from the vast amount of research on Bayesian inference could be distilled into one sentence: “People are profoundly Bayesians, but they fail to solve Bayesian word problems.” Indeed, two strands of research tell different stories about our ability to make Bayesian inferences—our ability to infer posterior probability from prior probability and new evidence according to Bayes's theorem. People see, move, coordinate, remember, learn, reason and argue consistently with complex probabilistic Bayesian computations, but they fail to solve, computationally much simpler, Bayesian word problems.

On the one hand, a first strand of research shows that people are profoundly Bayesians. Strong evidence indicates that the brain represents probability distributions and certain neural circuits perform Bayesian computations ( Pouget et al., 2013 ). Bayesian computation models account for a wide range of observations on sensory perception, motoric behavior and sensorimotor coordination (see Chater et al., 2010 ; Pouget et al., 2013 ). Bayesian computations approximate observed patterns in inductive reasoning, memory, language production, and language comprehension ( Chater et al., 2010 ). Even 12-month-old preverbal infants present behavior consistent with the behavior of a Bayesian ideal observer: infants integrate multiple sources of information to form rational expectations about situations they have never encountered before ( Téglás et al., 2011 ). In everyday life, people form cognitive judgments predicting the occurrence of everyday events consistent with a Bayesian ideal observer ( Griffiths and Tenenbaum, 2006 ).

On the other hand, however, a second strand of research shows that people fail to make the simplest possible Bayesian inference once they are presented with Bayesian word problems. Indeed, people tend to largely ignore or neglect base-rate information in probability judgment tasks such as social judgment or textbook problem tasks ( Kahneman and Tversky, 1973 ; Bar-Hillel, 1980 ) or they tend to fail to be Bayesians in a completely opposite way—by overweighting base-rate information ( Teigen and Keren, 2007 ). In fact, people require costly and intense training with most statistical formats to achieve good performance with probabilistic inferences that deteriorates with time very quickly ( Sedlmeier and Gigerenzer, 2001 ).

So people are Bayesians who fail to solve simple Bayesian word problems. As with most paradoxes, a solution to this “Bayesian paradox” lies in taking closer look at conceptualizations: at what constitutes a Bayesian inference in these two strands of research. Such analysis uncovers important design differences, Bayesian classification criteria and statistical approaches ( Vallée-Tourangeau et al., 2015 ). However, the crucial difference that we highlight here lies in the cognitive processes involved in performing the task. What is described as a “Bayesian inference” in the two strands conflates very different processes. Implicit processes —implicit calculations with probabilities mostly acquired from experience—are involved in the Bayesian computations approximating the performance of various cognitive functions and in the estimation of experienced real-life outcomes. Explicit processes —explicit calculations with probabilities typically extracted from a textual description—are involved in solving Bayesian textbook problems or social judgment problems. The different information source, experience or description, for example, has been shown to lead to dramatically different choices and decisions (e.g., Hertwig et al., 2004 ). With this distinction, of course, we do not intend to imply that all the cognitive processes involved in estimating probabilities are necessarily implicit and engage only with the probabilities from experience or vice versa. Rather we wish to point out that the different experimental paradigms outlined here require typically different cognitive processes operating over different types of information.

This postulated distinction between cognitive processes involved in these different types of Bayesian inference tasks can be mapped onto a distinction between biologically primary (pan-cultural, evolutionary purposeful) cognitive abilities and biologically secondary (culturally specific) cognitive abilities ( Geary, 1995 ). It could also be linked to the debate on how people form probability judgments, either through automatic frequency encoding of sequentially presented information (e.g., Hasher and Zacks, 1984 ) or through heuristic inferences from aggregated information (e.g., representativeness heuristic, Kahneman and Tversky, 1974 ).

Which type of evidence should we call upon to help us decide whether people are Bayesians or not? Both implicit and explicit processes are relevant for assessing this ability. Having Bayesian eyes, hands and minds is arguably important for survival. Yet, our environment has changed dramatically in the Twentieth century—it became crowded with explicit aggregated statistical information. Learning from described aggregated information condenses the learning process compared with learning from experience. Imagine, for example, an experienced UK physician relocating to Nigeria. Her experience would provide her with an adequate knowledge of the disease base rates, sensitivity and specificity of medical tests within the UK population; however her experience may not be applicable or may even be deleterious in Nigeria given that those pieces of information may differ. The doctor would greatly benefit from reading explicit aggregated statistical information on base rates of diseases, sensitivity and specificity of medical tests in the local population to avoid making errors and the long learning process based on personal experience. Most importantly, she should be able to integrate this information into her diagnostic judgments when facing a given set of symptoms in a patient in Nigeria. More generally, in their probability-laden environment, all people (not just physicians) may come across a lot of problems similar to Bayesian textbook problems, of which cancer or prenatal screening are just examples (e.g., Navarrete et al., 2014 ). It is clear, therefore, that we should focus on improving the explicit processes that underpin Bayesian reasoning as a problem-solving ability.

Bayesian Problem-Solving

Although the processes involved in solving Bayesian textbook problems resemble the processes involved in solving other mathematical problems, research on Bayesian reasoning has evolved in parallel to the research on problem solving. Reframing processes involved in Bayesian textbook reasoning in terms of the processes examined in the problem-solving literature can benefit Bayesian reasoning research efforts. The problem-solving literature not only extends the sound methodological toolkit to explore underpinning mental processes (e.g., thinking aloud protocols), but it also offers alternative concepts enacting novel insights, different explanations and more elaborate models generating deeper understanding of Bayesian problem-solving. We outline three examples of such theoretical benefits in the context of facilitating Bayesian problem-solving.

First, applying problem-solving concepts to Bayesian reasoning offers a novel and productive perspective. For example, we could think of Bayesian textbook problems in a problem-solving framework as a combination of insight and analytical problems. Typically, the problem-solving literature distinguishes two classes of problems: analytical and insight problems ( Gilhooly and Murphy, 2005 ). With analytical problems, people can work out an incremental solution and rarely experience an Aha! moment in the process. Consider, for instance, this multi-digit addition problem: “Sum up the following numbers: 13, 27, 12, 32, 25, 11”; participants announcing an answer rarely do so with Eureka glee (although they might experience relief). With insight problems, people have to overcome an initial impasse to reach a completely new way of thinking about the problem; they need to transform the initial problem representation into a new representation which will lead them to the goal state. Consider, for instance, the following problem: “Place 17 animals in 4 enclosures in such a manner that there will be an odd number of animals in each enclosure” (adapted from Metcalfe and Wiebe, 1987 ). You probably try 17/4 and it did not work: The problem masquerades as an arithmetic puzzle. However, in contrast to an analytic problem, the initial problem presentation cannot be transformed step-by-step to a solution (in this case the solution involves overlapping sets). This distinction suggests that decomposing the question of “What facilitates Bayesian reasoning?” into “What facilitates the insight?” and “What facilitates the computation?” will pave the way for better understanding what factors facilitate the problem-structure understanding and what factors facilitate the computational operations in Bayesian problem-solving (see also Johnson and Tubau, 2015 ).

Second, rephrasing Bayesian reasoning as a form of problem-solving offers different explanations of the processes implicated, for example, those involved in representational training (e.g., Sedlmeier and Gigerenzer, 2001 ; Mandel, 2015 ; Sirota et al., 2015a ). In representational training, participants learn to transform the statistical format representation of a problem—they learn to translate single-event probabilities into natural frequencies. For example, the statements “a 1% probability that a woman has breast cancer” and “if a woman has cancer then there is an 80% probability that she will get a positive mammogram” are translated as “10 out of every 1000 women have breast cancer” and “8 out of the 10 who have breast cancer will get a positive mammogram.” The problem-solving approach posits that the underlying mechanism of such representational training consists of the acquisition of an appropriate problem representation—a nested-sets representation of the Bayesian problem, regardless of frequencies or probabilistic information contained in such problem—during the learning phase, which is then transferred to similar problems in the testing phase (for evidence see Sirota et al., 2015a ). This goes beyond the default explanation that participants translated single-event probabilities into natural frequencies ( Sedlmeier and Gigerenzer, 2001 ) and it accounts for the training success in terms of the specific mental processes involved in problem representation learning and its transfer (for the importance of a good representation in different problems of a belief revision not depending on natural frequencies, see Mandel, 2014 ).

Third, recruiting problem-solving models offers a better understanding of well-known effects in Bayesian reasoning than we currently have, for example, the format effect. Statistical formats such as natural frequencies represent probably the most cost-effective (and the most discussed) tool to facilitate Bayesian problem-solving, given that visual aids offer mixed evidence of their effectiveness (e.g., Cosmides and Tooby, 1996 ; Sirota et al., 2014b ). Natural frequencies enhance Bayesian problem-solving when compared with formats involving normalization such as probability formats (e.g., Gigerenzer and Hoffrage, 1995 ; Cosmides and Tooby, 1996 ; Barbey and Sloman, 2007 ). Natural frequencies, introduced by Kleiter (1994) , integrate the base-rate information in their structure making the base-rate information per se redundant. For example, the statement “8 women out of the 10 who have breast cancer will get a positive mammogram” includes the base-rate information of the 10 (out of 1000) women with cancer from our previous example.

According to the general framework of mathematical verbal problem solving ( Kintsch and Greeno, 1985 ; Kintsch, 1988 ), which integrates formal mathematical and linguistic knowledge, two processes should be differentiated here: the processes involved in representing the problem and those involved in producing a solution (for specific approaches to probability representation, see Johnson-Laird et al., 1999 ; Mandel, 2008 ). In the problem representation phase, a mental representation is constructed from the text that triggers available knowledge schemas stored in long-term memory. Familiar cues in the text activate a correct mental representation of the problem more easily than unfamiliar or misleading ones; this enables an easier integration with existing knowledge. In the problem solution phase, rules or strategies corresponding to the problem representation are implemented. We suggest that the facilitative effect of natural frequencies in Bayesian inference problems is due to a similar process. A wording of the task with frequencies (e.g., explicit set reference language such as “10 out of the remaining 90”)—not the numerical format by itself—may trigger a representation of the problem as nested sets, while a wording of the task with probabilities which conceal the nested set structure due to normalizing, does not. Such an explanation casts natural frequencies as a familiar format rather than a privileged one. Some authors view natural frequencies as a privileged format because they are processed by a specialized frequency-coding mechanism shaped by evolutionary forces ( Gigerenzer and Hoffrage, 1995 ). If true (and some specific conditions are fulfilled, Barrett et al., 2006 ) then processing of a privileged format should not be cognitively demanding at all or at least less cognitively demanding than processing of a computationally equivalent and equally familiar format (e.g., Cosmides and Tooby, 1996 ). It means, for instance, that measures of cognitive capacity should not be predictive of performance in Bayesian reasoning. However, several recent studies have provided evidence rebutting the claim of easier processing of natural frequencies ( Sirota and Juanchich, 2011 ; Lesage et al., 2013 ; Sirota et al., 2014a ).

Our environment is laden with statistical information and demands from people that they successfully solve problems that are exactly the same as, or similar to, classical Bayesian textbook problems. Although some brain function appears to implement Bayesian computations, people's abilities to solve Bayesian word problems could still be substantially improved. We should therefore strive to understand and improve people's performance with this kind of problems. We suggest thinking about the involved processes as processes akin to those engaged during problem-solving (see also Johnson and Tubau, 2015 ; Sirota et al., 2015b ). Such a re-classification would not only resolve contradictions in research on Bayesian inference, it would also facilitate the application of conceptual and methodological tools from problem-solving research. It would allow us to ask what enacts the insight about the problem structure, what facilitates the relevant computations and how exactly people implement these processes. It would allow us to conceptually re-frame observed effects such as representational training effects. It would also allow us to shed more light on the underlying processes by utilizing elaborate process-oriented models developed in this area.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank David Mandel, Ulrich Hoffrage and the anonymous reviewer for helpful suggestions on an earlier version of this manuscript.

Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychol. 44, 211–233. doi: 10.1016/0001-6918(80)90046-3

CrossRef Full Text | Google Scholar

Barbey, A. K., and Sloman, S. A. (2007). Base-rate respect: from ecological rationality to dual processes. Behav. Brain Sci. 30, 241–254. doi: 10.1017/S0140525X07001653

PubMed Abstract | CrossRef Full Text

Barrett, H. C., Frederick, D. A., Haselton, M. G., and Kurzban, R. (2006). Can manipulations of cognitive load be used to test evolutionary hypotheses? J. Pers. Soc. Psychol. 91, 513. doi: 10.1037/0022-3514.91.3.513

PubMed Abstract | CrossRef Full Text | Google Scholar

Chater, N., Oaksford, M., Hahn, U., and Heit, E. (2010). Bayesian models of cognition. Wiley Interdiscipl. Rev. Cogn. Sci. 1, 811–823. doi: 10.1002/wcs.79

Cosmides, L., and Tooby, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58, 1–73. doi: 10.1016/0010-0277(95)00664-8

Geary, D. C. (1995). Reflections of evolution and culture in children's cognition: implications for mathematical development and instruction. Am. Psychol. 50, 24. doi: 10.1037/0003-066X.50.1.24

Gigerenzer, G., and Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction - Frequency formats. Psychol. Rev. 102, 684–704. doi: 10.1037/0033-295X.102.4.684

Gilhooly, K. J., and Murphy, P. (2005). Differentiating insight from non-insight problems. Think. Reason. 11, 279–302. doi: 10.1080/13546780442000187

Griffiths, T. L., and Tenenbaum, J. B. (2006). Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–773. doi: 10.1111/j.1467-9280.2006.01780.x

Hasher, L., and Zacks, R. T. (1984). Automatic processing of fundamental information: the case of frequency of occurrence. Am. Psychol. 39, 1372. doi: 10.1037/0003-066x.39.12.1372

Hertwig, R., Barron, G., Weber, E. U., and Erev, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychol. Sci. 15, 534–539. doi: 10.1111/j.0956-7976.2004.00715.x

Johnson-Laird, P. N., Legrenzi, P., Girotto, V., Legrenzi, M. S., and Caverni, J. P. (1999). Naive probability: a mental model theory of extensional reasoning. Psychol. Rev. 106, 62–88. doi: 10.1037/0033-295X.106.1.62

Johnson, E. D., and Tubau, E. (2015). Comprehension and computation in Bayesian problem solving. Front. Psychol. 6:938. doi: 10.3389/fpsyg.2015.00938

Kahneman, D., and Tversky, A. (1973). On the psychology of prediction. Psychol. Rev. 80, 237. doi: 10.1037/h0034747

Kahneman, D., and Tversky, A. (1974). “Subjective probability: a judgment of representativeness,” in The Concept of Probability in Psychological Experiments , ed C.-A. S. Staël Von Holstein (Dordrecht: Springer), 25–48.

Google Scholar

Kintsch, W. (1988). The role of knowledge in discourse comprehension: a construction-integration model. Psychol. Rev. 95, 163. doi: 10.1037/0033-295X.95.2.163

Kintsch, W., and Greeno, J. G. (1985). Understanding and solving word arithmetic problems. Psychol. Rev. 92, 109. doi: 10.1037/0033-295X.92.1.109

Kleiter, G. D. (1994). “Natural sampling: rationality without base rates,” in Contributions to Mathematical Psychology, Psychometrics, and Methodology , eds G. Fischer and D. Laming (New York, NY: Springer), 375–388. doi: 10.1007/978-1-4612-4308-3_27

Lesage, E., Navarrete, G., and De Neys, W. (2013). Evolutionary modules and Bayesian facilitation: the role of general cognitive resources. Think. Reason. 19, 27–53. doi: 10.1080/13546783.2012.713177

Mandel, D. R. (2008). Violations of coherence in subjective probability: a representational and assessment processes account. Cognition 106, 130–156. doi: 10.1016/j.cognition.2007.01.001

Mandel, D. R. (2014). Visual representation of rational belief revision: another look at the Sleeping Beauty problem. Front. Psychol. 5:1232. doi: 10.3389/fpsyg.2014.01232

Mandel, D. R. (2015). Instruction in information structuring improves Bayesian judgment in intelligence analysts. Front. Psychol. 6:387. doi: 10.3389/fpsyg.2015.00387

Metcalfe, J., and Wiebe, D. (1987). Intuition in insight and noninsight problem solving. Mem. Cogn. 15, 238–246. doi: 10.3758/BF03197722

Navarrete, G., Correia, R., and Froimovitch, D. (2014). Communicating risk in prenatal screening: the consequences of Bayesian misapprehension. Front. Psychol. 5:1272. doi: 10.3389/fpsyg.2014.01272

Pouget, A., Beck, J. M., Ma, W. J., and Latham, P. E. (2013). Probabilistic brains: knowns and unknowns. Nat. Neurosci. 16, 1170–1178. doi: 10.1038/nn.3495

Sedlmeier, P., and Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours. J. Exp. Psychol. Gen. 130, 380–400. doi: 10.1037/0096-3445.130.3.380

Sirota, M., and Juanchich, M. (2011). Role of numeracy and cognitive reflection in Bayesian reasoning with natural frequencies. Stud. Psychol. 53, 151–161.

Sirota, M., Juanchich, M., and Hagmayer, Y. (2014a). Ecological rationality or nested sets? Individual differences in cognitive processing predict Bayesian reasoning. Psychon. Bull. Rev. 21, 198–204. doi: 10.3758/s13423-013-0464-6

Sirota, M., Kostovičová, L., and Juanchich, M. (2014b). The effect of iconicity of visual displays on statistical reasoning: evidence in favor of the null hypothesis. Psychon. Bull. Rev. 21, 961–968. doi: 10.3758/s13423-013-0555-4

Sirota, M., Kostovičová, L., and Vallee-Tourangeau, F. (2015a). How to train your Bayesian: A problem-representation transfer rather than a format-representation shift explains training effects. Q. J. Exp. Psychol. 68, 1–9. doi: 10.1080/17470218.2014.972420

Sirota, M., Kostovičová, L., and Vallée-Tourangeau, F. (2015b). Now you Bayes, now you don't: effects of set-problem and frequency-format mental representations on statistical reasoning. Psychon. Bull. Rev. doi: 10.3758/s13423-015-0810-y. [Epub ahead of print].

Téglás, E., Vul, E., Girotto, V., Gonzalez, M., Tenenbaum, J. B., and Bonatti, L. L. (2011). Pure reasoning in 12-month-old infants as probabilistic inference. Science 332, 1054–1059. doi: 10.1126/science.1196404

Teigen, K. H., and Keren, G. (2007). Waiting for the bus: when base-rates refuse to be neglected. Cognition 103, 337–357. doi: 10.1016/j.cognition.2006.03.007

Vallée-Tourangeau, G., Sirota, M., Juanchich, M., and Vallée-Tourangeau, F. (2015). Beyond getting the numbers right: what does it mean to be a “successful” Bayesian reasoner? Front. Psychol. 6:712. doi: 10.3389/fpsyg.2015.00712

Keywords: Bayesian problem-solving, Bayesian research paradox, natural frequencies, Bayesian reasoning, mathematical problem solving

Citation: Sirota M, Vallée-Tourangeau G, Vallée-Tourangeau F and Juanchich M (2015) On Bayesian problem-solving: helping Bayesians solve simple Bayesian word problems. Front. Psychol. 6:1141. doi: 10.3389/fpsyg.2015.01141

Received: 31 March 2015; Accepted: 22 July 2015; Published: 10 August 2015.

Reviewed by:

Copyright © 2015 Sirota, Vallée-Tourangeau, Vallée-Tourangeau and Juanchich. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Miroslav Sirota, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Bayes’ Theorem Questions

Bayes’ theorem questions with solutions are given here for students to practice and understand how to apply Bayes’ theorem as a special case for conditional probability . These questions are specifically designed as per the CBSE class 12 syllabus. Every year, a good weightage question is asked based on Bayes’ theorem; practicising these questions will help in the preparations for the board exams.

Famous mathematician Thomas Bayes gave this theorem to solve the problem of finding reverse probability by using conditional probability.

The theorem is stated as follows:

Bayes’ theorem for two events is given as:

Bayes Theorem

To learn more about Bayes’ Theorem, click here .

Bayes’ Theorem Questions With Solutions

To get a better understanding of Bayes’ theorem, let us apply it by solving a few questions.

Question 1:

Three persons A, B and C have applied for a job in a private company. The chance of their selections is in the ratio 1 : 2 : 4. The probabilities that A, B and C can introduce changes to improve the profits of the company are 0.8, 0.5 and 0.3, respectively. If the change does not take place, find the probability that it is due to the appointment of C.

Let E 1 : person A get selected

E 2 : person B get selected

E 3 : person C get selected

A: Changes introduced but profit not happened

Now, P(E 1 ) = 1/(1+2+4) = 1/7

P(E 2 ) = 2/7 and P(E 3 ) = 4/7

P(A|E 1 ) = P(Profit not happened by the changes introduces by A) = 1 – P(Profit happened by the changes introduces by A) = 1 – 0.8 = 0.2

P(A|E 2 ) = P(Profit not happened by the changes introduces by B) = 1 – P(Profit happened by the changes introduces by B) = 1 – 0.5 = 0.5

P(A|E 3 ) = P(Profit not happened by the changes introduces by C) = 1 – P(Profit happened by the changes introduces by C) = 1 – 0.3 = 0.7

We have to find the probability of not happening profit due to selection of C

\(\begin{array}{l}P(E_{3}|A)=\frac{P(A|E_{3})P(E_{3})}{P(A|E_{1})P(E_{1})+P(A|E_{2})P(E_{2})+P(A|E_{3})P(E_{3})}\end{array} \)

\(\begin{array}{l}P(E_{3}|A)=\frac{0.7\times \frac{4}{7}}{0.2\times \frac{1}{7}+0.5\times \frac{2}{7}+0.7\times \frac{4}{7}}\end{array} \)

∴ the required probability is 0.7.

Question 2:

A bag contains 4 balls. Two balls are drawn at random without replacement and are found to be blue. What is the probability that all balls in the bag are blue?

Let E 1 = Bag contains two blue balls

E 2 = Bag contains three blue balls

E 3 = Bag contains four blue balls

A = event of getting two white balls

P(E 1 ) = P(E 2 ) = P(E 3 ) = ⅓

P(A|E 1 ) = 2 C 2 / 4 C 2 = ⅙

P(A|E 2 ) = 3 C 2 / 4 C 2 = ½

P(A|E 3 ) = 4 C 2 / 4 C 2 = 1

= [⅓ × 1]/[⅓ × ⅙ + ⅓ × ½ + ⅓ × 1]

Check your answers with Bayes’ theorem calculator .

Question 3:

In a neighbourhood, 90% children were falling sick due flu and 10% due to measles and no other disease. The probability of observing rashes for measles is 0.95 and for flu is 0.08. If a child develops rashes, find the child’s probability of having flu.

F: children with flu

M: children with measles

R: children showing the symptom of rash

P(F) = 90% = 0.9

P(M) = 10% = 0.1

P(R|F) = 0.08

P(R|M) = 0.95

\(\begin{array}{l}P(F|R)=\frac{P(R|F)P(F)}{P(R|M)P(M)+P(R|F)P(F)}\end{array} \)

\(\begin{array}{l}P(F|R)=\frac{0.08\times 0.9}{0.95\times 0.1+0.08\times 0.9}\end{array} \)

= 0.072/(0.095 + 0.072) = 0.072/0.167 ≈ 0.43

⇒ P(F|R) = 0.43

Question 4:

There are three identical cards except that both the sides of the first card is coloured red, both sides of the second card is coloured blue and for the third card one side is coloured red and the other side is blue. One card is randomly selected among these three cards and put down, visible side of the card is red. What is the probability that the other side is blue?

Let RR: card with both side red

BB: card with both side blue

RB: card with one side red and other side blue

A: event that the visible side of the chosen card is red

P(RR) = ⅓ , P(BB) = ⅓ and P(RB) = ⅓

By the theorem of total probability , P(A) = P(A|RR).P(RR) + P(A|BB).P(BB) + P(A|RB).P(RB)

∴ P(A) = 1 × ⅓ + 0 × ⅓ + ½ × ⅓ = ½

\(\begin{array}{l}P(RB|A)=\frac{P(RB\cap A)}{P(A)}=\frac{P(A\cap RB)}{P(A)}=\frac{P(A|RB)P(RB)}{P(A)}\end{array} \)

= ½ × ⅓ ÷ ½ = ⅓ .

Question 5:

Three urns are there containing white and black balls; first urn has 3 white and 2 black balls, second urn has 2 white and 3 black balls and third urn has 4 white and 1 black balls. Without any biasing one urn is chosen from that one ball is chosen randomly which was white. What is probability that it came from the third urn?

Let E 1 = event that the ball is chosen from first urn

E 2 = event that the ball is chosen from second urn

E 3 = event that the ball is chosen from third urn

A = event that the chosen ball is white

Then, P(E 1 ) = P(E 2 ) = P(E 3 ) = ⅓.

P(A|E 1 ) = 3/5

P(A|E 2 ) = ⅖

P(A|E 3 ) = ⅘

\(\begin{array}{l}= \frac{4/5 \times 1/3}{3/5 \times 1/3 + 2/5 \times 1/3 + 4/5 \times 1/3}\end{array} \)

Question 6:

It is observed that 50% of mails are spam. There is a software that filters spam mail before reaching the inbox. It accuracy for detecting a spam mail is 99% and chances of tagging a non-spam mail as spam mail is 5%. If a certain mail is tagged as spam find the probability that it is not a spam mail.

Let E 1 = event of spam mail

E 2 = event of non-spam mail

A = event of detecting a spam mail

Now, P(E 1 ) = 0.5 and P(E 2 ) = 0.5

P(A|E 1 ) = 0.99 and P(A|E 2 ) = 0.05

\(\begin{array}{l}P(E_{2}|A)=\frac{P(A|E_{2})P(E_{2})}{P(A|E_{1})P(E_{1})+P(A|E_{2})P(E_{2})}\end{array} \)

\(\begin{array}{l}=\frac{0.05 \times0.5}{0.99\times 0.5 + 0.05 \times0.5}=\frac{0.025}{0.520}\end{array} \)

= 5/104 ≈ 4.8%

Also Check:

  • Multiplication Rule of Probability
  • Probability for Class 12
  • Probability Distribution
  • Bionomial Distribution

Question 7:

An unbiased dice is rolled and for each number on the dice a bag is chosen:

Bag A contains 3 white ball and 2 black ball, bag B contains 3 white ball and 4 black ball and bag C contains 4 white ball and 5 black ball. Dice is rolled and bag is chosen, if a white ball is chosen find the probability that it is chosen from bag B.

Let E 1 = event of choosing bag A

E 2 = event of choosing bag B

E 3 = event of choosing bag C

A = event of choosing white ball

Then, P(E 1 ) = ⅙, P(E 2 ) = 2/6 = ⅔, P(E 3 ) = 3/6 = ½

And P(A|E 1 ) = ⅗, P(A|E 2 ) = 3/7, P(A|E 3 ) = 4/9

\(\begin{array}{l}P(E_{2}|A)=\frac{P(A|E_{2})P(E_{2})}{P(A|E_{1})P(E_{1})+P(A|E_{2})P(E_{2})+P(A|E_{3})P(E_{3})}\end{array} \)

\(\begin{array}{l}=\frac{3/7 \times 1/3}{3/5\times 1/6 + 3/7\times 1/3+4/9\times 1/2}= \frac{1/7}{1/10 + 1/7+2/9}\end{array} \)

⇒ P(E 2 |A) = 90/293.

Question 8:

A insurance company has insured 4000 doctors, 8000 teachers and 12000 businessmen. The chances of a doctor, teacher and businessman dying before the age of 58 is 0.01, 0.03 and 0.05, respectively. If one of the insured people dies before 58, find the probability that he is a doctor.

Let, E 1 = event of a person being a doctor

E 2 = event of a person being a teacher

E 3 = event of a person being a businessman

A = event of death of an insured person

P(E 1 ) = 4000/(4000+8000+12000) = ⅙

P(E 2 ) = 8000/(4000+8000+12000) = ⅓

P(E 3 ) = 12000/(4000+8000+12000) = ½

P(A|E 1 ) = 0.01, P(A|E 2 ) = 0.03 and P(A|E 3 ) = 0.05

\(\begin{array}{l}P(E_{1}|A)=\frac{P(A|E_{1})P(E_{1})}{P(A|E_{1})P(E_{1})+P(A|E_{2})P(E_{2})+P(A|E_{3})P(E_{3})}\end{array} \)

\(\begin{array}{l}=\frac{0.01\times 1/6}{0.01\times 1/6+0.03\times 1/3+0.05\times 1/2}=\frac{0.01}{0.01+0.06+0.15}= \frac{1}{22}\end{array} \)

⇒ P(E 1 |A) = 1/22

Question 9:

A card is lost from a pack of 52 cards. From the remaining cards two are drawn randomly and found to be both clubs. Find the probability that the lost card is also a clubs.

Let E 1 = Lost card is a club

E 2 = lost card is not a club

A = both drawn cards are clubs

P(E 1 ) = 13/52 = ¼ P(E 2 ) = 39/52 = ¾

P(A|E 1 ) = P(drawing both club cards when the lost card is a club) = 12/51 × 11/50

P(A|E 2 ) = P(drawing both club cards when the lost card is not a club) = 13/51 × 12/50

\(\begin{array}{l}P(E_{1}|A)=\frac{P(A|E_{1})P(E_{1})}{P(A|E_{1})P(E_{1})+P(A|E_{2})P(E_{2})}\end{array} \)

\(\begin{array}{l}=\frac{12/51 \times 11/50 \times 1/4}{12/51 \times 11/50 \times 1/4+13/51 \times 12/50 \times 3/4}=\frac{12\times 11}{12\times11+3\times 13\times 12}\end{array} \)

⇒ P(E 1 |A) = 11/50.

Question 10:

In shop A, 30 tin pure ghee and 40 tin adulterated ghee are kept for sale while in shop B, 50 tin pure ghee and 60 tin adulterated ghee are there. One tin of ghee is purchased from one of the shops randomly and it is found to be adulterated. Find the probability that it is purchased from shop B.

Let E 1 = event of choosing shop A

E 2 = event of choosing shop B

A = event of purchasing adultrated tin of ghee

P(E 1 ) = ½ and P(E 2 ) = ½

P(A|E 1 ) = P(purchasing adultrated ghee from shop A) = 40/70 = 4/7

P(A|E 2 ) = P(purchasing adultrated ghee from shop B) = 60/110 = 6/11

\(\begin{array}{l}=\frac{6/11\times 1/2}{4/7 \times 1/2+6/11 \times 1/2}=\frac{21}{43}\end{array} \)

Practice Questions on Bayes’ Theorem

1. If A, B and C have chances of being selected as a manager at private firm is in the ratio 4:1:2. The chances of for them to introduce changes in marketing strategies are 0.3, 0.8 and 0.5, respectively. If a change has taken place, find the the probability that it is due to the selection of B.

2. A man speaks the truth 4 out of 5 times. He throws a die and reports that it is actually a six. Find the probability that it is actually a six.

3. A sack contains 4 balls. Two balls are drawn at random (without replacement) and are found to be red. What is the probability that all balls in the bag are red?

Learn about various mathematical concepts in a simple manner with detailed information, along with step by step solutions to all questions, only at BYJU’S. Download BYJU’S – The Learning App to get personalised videos.

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Request OTP on Voice Call

Post My Comment

steps for solving bayesian probability problems

  • Share Share

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

close

AIP Publishing Logo

  • Previous Article
  • Next Article

AUTHOR DECLARATIONS

Conflict of interest, author contributions, data availability, solving inference problems of bayesian networks by probabilistic computing.

ORCID logo

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Open the PDF for in another window
  • Reprints and Permissions
  • Cite Icon Cite
  • Search Site

Seokmin Hong; Solving inference problems of Bayesian networks by probabilistic computing. AIP Advances 1 July 2023; 13 (7): 075226. https://doi.org/10.1063/5.0157394

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Recently, probabilistic computing approach has shown its broad application in problems ranging from combinatorial optimizations and machine learning to quantum simulation where a randomly fluctuating bit called p-bit constitutes a basic building block. This new type of computing scheme tackles domain-specific and computationally hard problems that can be efficiently solved using probabilistic algorithms compared to classical deterministic counterparts. Here, we apply the probabilistic computing scheme to various inference problems of Bayesian networks with non-linear synaptic connections without auxiliary p-bits. The results are supported by nanomagnet-based SPICE (Simulation Program with Integrated Circuit Emphasis) results, behavioral model, and hardware implementations using a field-programmable gate array. Two types of Monte Carlo sampling methods are tested, namely rejection and importance samplings, where clamping of p-bits is applicable as in Boltzmann networks in the latter. Partial parallelism that can be used for the sequential update of each p-bit in Bayesian networks is presented. Finally, the model is directly applied to temporal Bayesian networks with relevant inference problems. We believe that the proposed approaches provide valuable tools and practical methods for various inference problems in Bayesian networks.

Bayesian (belief) networks 1–3 are widely used for probabilistic representations of the world characterizing the simplified probabilistic dependencies among variables. They are composed of directed acyclic graphs without cycles, and the nodes are represented by random variables for probability information. The relationships or connections are defined by conditional probability tables (CPTs) describing the conditional probability of a child node given the states of parent nodes. They are particularly useful for quantifying uncertain knowledge of the world and find various applications, including robotics, image classification, neuroscience, and medical decision support. Although they are substantially concise representations compared to the full joint probabilities, the worst-case inference problems are still computationally intractable or NP (Non-deterministic Polynomial Time)-hard, 4 hence calling for energy-efficient and dedicated hardware implementation of them.

The probabilistic computing (p-computing) scheme 5–8 has recently been proposed and applied to domain-specific and computationally hard problems ranging from combinatorial optimizations and machine learning to quantum simulations where a randomly fluctuating bit called p-bit constitutes a basic building block. More broadly, the random-enabled computing with emerging devices has been applied to Bayesian (belief) networks both theoretically 9–14 and experimentally, 15–18 showing a promising possibility of this new type of computing paradigm. In a typical implementation of Bayesian networks in the context of p-computing, nodes are represented by p-bits, which are connected in such a way that given conditional probability dependencies are properly implemented between parent and child nodes. This synaptic connection employs linear connections requiring auxiliary p-bits for a node with multiple parent nodes. In addition, sequential updates of random p-bits are necessary from parent to child unlike Boltzmann networks where random order or grouped updates are possible. The pinning or clamping of relevant p-bits in problem solving is not straightforwardly applicable either compared to Boltzmann networks with symmetrical connections. In this paper, we explore those issues and provide alternative or new ways for applying probabilistic computing to Bayesian networks using SPICE (Simulation Program with Integrated Circuit Emphasis) results, the behavioral model, and hardware implementations based on a Field-Programmable Gate Array (FPGA).

(a) The typical example of a Bayesian network is shown with four nodes of Cloudy (C), Sprinkler (S), Rain (R), and WetGrass (W). There are directed connections from parent to child with given conditional probability tables (CPTs). (b) The corresponding hardware design using a p-bit is shown where randomness comes from stochastic magnetic tunnel junctions. The results from Bayes’ rule, the behavioral model (labeled as “PPSL”), and SPICE simulations are compared in (d) with the time evolution result from SPICE in (c).

(a) The typical example of a Bayesian network is shown with four nodes of Cloudy ( C ), Sprinkler ( S ), Rain ( R ), and WetGrass ( W ). There are directed connections from parent to child with given conditional probability tables (CPTs). (b) The corresponding hardware design using a p-bit is shown where randomness comes from stochastic magnetic tunnel junctions. The results from Bayes’ rule, the behavioral model (labeled as “PPSL”), and SPICE simulations are compared in (d) with the time evolution result from SPICE in (c).

A simple but representative example Ref. 3 of Bayesian networks is shown in Fig. 1(a) with four nodes and connecting arrows. Each node has a local probability table known as the conditional probability table (CPT) containing conditional probabilities of the on-site node for all possible combinations of its conditioning parent nodes. For example, the WetGrass ( W ) node has two parent nodes, namely Sprinkler ( S ) and Rain ( R ), with a CPT with four cases of its parent nodes ( true / false ). The relevant circuit representation is shown in Fig. 1(b) with four p-bits where each p-bit has a one-to-one correspondence with each node in Fig. 1(a) . Each output of the p-bit is connected to the output of its child node p-bit where the rectangular box represents the incorporation of its CPT electrically without additional p-bits using a multiplexer.

Two types of sampling methods are applied: (a) rejection sampling and (b) importance sampling for the inference problem of P(Rain = true|Cloudy = true, WetGrass = true). Note that in the rejection sampling, the system is freely evolving, but in the importance sampling, the nodes for Cloudy and WetGrass are clamped to high (true) in time. The result of the inference probability is shown in (c) with varying I0, and the error rate is shown in (d) with the number of samples or trials generated.

Two types of sampling methods are applied: (a) rejection sampling and (b) importance sampling for the inference problem of P ( Rain = true | Cloudy = true ,  WetGrass = true ). Note that in the rejection sampling, the system is freely evolving, but in the importance sampling, the nodes for Cloudy and WetGrass are clamped to high ( true ) in time. The result of the inference probability is shown in (c) with varying I 0 , and the error rate is shown in (d) with the number of samples or trials generated.

Unlike Boltzmann networks, the typical Monte Carlo simulation of Bayesian networks requires careful parent-to-child sequential updates 9,10 of each node (sometimes known as “topological order” 2,3 ) to maintain proper conditional probabilities in the network. In the case of Fig. 1(a) , two choices are possible, namely Cloudy–Sprinkler–Rain–WetGrass and Cloudy–Rain–Sprinkler–WetGrass , of which both satisfy the parent-to-child sequential update rule. The powerful parallelism or simultaneous update method in Gibbs sampling has been demonstrated 27 to show a speedup gain reducing from O ( N ) complexity to O (1) in undirected networks. Although this method is not straightforwardly applicable in directed networks, such as Bayesian networks, a similar idea can be applicable using the property that given its parent nodes, each node is conditionally independent of its non-descending nodes. 3 Nodes that are conditionally independent from each other can be updated together and belong to the same group. In the example of Fig. 1(a) , one can update the Rain and Sprinkler nodes simultaneously, given the state of Cloudy node so that the overall update sequence is reduced from four to three. As shown in Fig. 3(a) , Rain and Sprinkler nodes are grouped together for the simultaneous update, and the result in Fig. 3(b) shows that it is consistent with the one from Bayes’ rule for full joint probabilities denoted as the binary number of [ Cloudy Sprinkler Rain WetGrass ] 2 . To show the hardware compatibility of the proposed method regarding grouped updates and the synaptic connection, the FPGA results are presented. The FPGA implementation of the Bayesian network has utilized a Zybo Z7-20 development board following the previous work 27 and is described in Fig. 3(c) , where a multiplexer (MUX) for the synaptic connection of Eq.  (3) , a lookup table (LUT) for tanh function evaluations, and a linear-feedback shift register (LFSR) for random number generators are shown with dedicated phase-shifted clocks for sequential updates of four p-bits. 10 4 samples have been obtained at the clock speed of 15 MHz with 8-bit MUXs due to 8-bit number precision for synaptic weights. The MUX delay is guaranteed to be less than the n th of a clock period 27 for proper updates of synaptic connections, where n is the number of phase-shifted clocks. For grouped updates, the phases of Sprinkler and Rain are matched so that they are updated simultaneously. The samples are collected through a Universal Serial Bus (USB) JTAG (Joint Test Action Group) interface using MATLAB AXI (Advanced eXtensible Interface) Master IP. With PPSL, although the update sequence is inherently asynchronous due to the random fluctuations of nanomagnets at each p-bit, the required parent-to-child update rule can still be satisfied by the aforementioned p-bit design of Fig. 1(b) . 9  

(a) Schematic of the Bayesian network with different groups labeled. The groups are chosen in a way that each variable is conditionally independent from others in the same group given its parent groups. The partial parallelism is possible within the same group in a way that the random values of nodes are updated simultaneously. In this example, Sprinkler and Rain are updated together. (b) The corresponding result from the FPGA of 104 samplings shows a good agreement with the exact result from Bayes’ rule where the states represent the binary number of [Cloudy Sprinkler Rain Wet Grass]2 for full joint probabilities. The simplified structure of the FPGA implementation is shown in (c).

(a) Schematic of the Bayesian network with different groups labeled. The groups are chosen in a way that each variable is conditionally independent from others in the same group given its parent groups. The partial parallelism is possible within the same group in a way that the random values of nodes are updated simultaneously. In this example, Sprinkler and Rain are updated together. (b) The corresponding result from the FPGA of 10 4 samplings shows a good agreement with the exact result from Bayes’ rule where the states represent the binary number of [ Cloudy Sprinkler Rain Wet Grass ] 2 for full joint probabilities. The simplified structure of the FPGA implementation is shown in (c).

The described probabilistic computing scheme for Bayesian networks can be easily extended to the temporal Bayesian network 3 where the repeated structure of the network represents the sequences of states at discrete time. In the example of Fig. 4(a) , Rain and Umbrella nodes are evolving in time with the given CPT, where each node at discrete time can be represented by p-bits as before. Here, two types of inference problems are treated, namely P ( Rain N = true | Umbrella 1 = true , …, Umbrella N = true ) and P ( Rain 1 = true | Umbrella 1 = true , …, Umbrella N = true ), which are sometimes called filtering and smoothing, 3 respectively. The same rejection and importance sampling approaches are adopted, and the result based on PPSL shows a reasonably good agreement as we increase the number of stages N . Compared to the direct evaluation of inference probabilities, one is also interested in their relative magnitudes. In the inference problem of Fig. 5(a) , one asks the most probable sequence of Rain 0 , Rain 1 , and Rain 2 given Umbrella 1 = true and Umbrella 2 = true . In this problem, one is supposed to compare the probabilities of all eight possibilities for three consecutive Rain states, namely the probability of P ( Rain 0 , Rain 1 , Rain 2 | Umbrella 1 = true , Umbrella 2 = true ), which is denoted as a binary number of [ Rain 0 Rain 1 Rain 2 ] 2 in Fig. 5(b) . The result from PPSL using rejection and importance samplings shows a good agreement with the one from Bayes’ rule, indicating that three successive Rain = true states are most probable in this problem. Since we are comparing the relative magnitude of each probability, it can be less prone to errors from various non-ideal components in the network as compared to the direct calculation of probability itself.

(a) A schematic of a temporal Bayesian network is shown, where Rain and Umbrella nodes are evolving in discrete time with given CPTs. (b) The result of filtering probability P(RainN = true|Umbrella1 = true, …, UmbrellaN = true) is shown for the two types of sampling methods. Similarly, the smoothing probability P(Rain1 = true|Umbrella1 = true, …, UmbrellaN = true) is shown in (c) for both types of samplings. Rain0 has equal probabilities of being true and false.

(a) A schematic of a temporal Bayesian network is shown, where Rain and Umbrella nodes are evolving in discrete time with given CPTs. (b) The result of filtering probability P ( Rain N = true | Umbrella 1 = true , …, Umbrella N = true ) is shown for the two types of sampling methods. Similarly, the smoothing probability P ( Rain 1 = true | Umbrella 1 = true , …, Umbrella N = true ) is shown in (c) for both types of samplings. Rain 0 has equal probabilities of being true and false.

(a) A schematic of a temporal Bayesian network is shown for the problem of finding the most likely sequence of Rain0, Rain1, and Rain2 given Umbrella1 = true and Umbrella2 = true. The corresponding probabilities for eight different possibilities from PPSL using rejection and importance samplings in (b) show a good agreement with the result from Bayes’ rule, where the states are denoted as binary numbers of [Rain0 Rain1 Rain2]2.

(a) A schematic of a temporal Bayesian network is shown for the problem of finding the most likely sequence of Rain 0 , Rain 1 , and Rain 2 given Umbrella 1 = true and Umbrella 2 = true . The corresponding probabilities for eight different possibilities from PPSL using rejection and importance samplings in (b) show a good agreement with the result from Bayes’ rule, where the states are denoted as binary numbers of [ Rain 0 Rain 1 Rain 2 ] 2 .

In this paper, we apply the probabilistic computing to various inference problems in Bayesian networks using non-linear synaptic connections. Two types of Monte Carlo sampling methods, namely rejection and importance samplings, are applied in representative inference problems, and the clamping of conditioned p-bits is shown in the importance sampling with appropriate weight factors needed for each sample. Partial parallel updates for each grouped p-bit are shown with the FPGA implementation, which relaxes the careful parent-to-child sequential updates needed in Bayesian networks. Finally, inferences of temporal Bayesian networks, including filtering and smoothing, are treated with a straightforward extension of the previous model. We believe the approaches presented here find their importance in hardware applications of the probabilistic computing to Bayesian inferences and in tackling practical problems in large scale networks.

The author thanks Sunil Sa, Dasol Jeong, and Hyunbin Kim for their technical help in FPGA and OukJae Lee for useful discussions on the applications of Bayesian networks. This work was supported by the KIST Institutional Program and the National Research Foundation of Korea (NRF) program (Grant No. NRF-2020M3F3A2A01081635).

The author has no conflicts to disclose.

Seokmin Hong : Conceptualization (lead); Data curation (lead); Formal analysis (lead); Funding acquisition (lead); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Citing articles via

  • Online ISSN 2158-3226
  • For Researchers
  • For Librarians
  • For Advertisers
  • Our Publishing Partners  
  • Physics Today
  • Conference Proceedings
  • Special Topics

pubs.aip.org

  • Privacy Policy
  • Terms of Use

Connect with AIP Publishing

This feature is available to subscribers only.

Sign In or Create an Account

IMAGES

  1. Bayes Theorem

    steps for solving bayesian probability problems

  2. Bayes Theorem in Probability with Examples

    steps for solving bayesian probability problems

  3. Bayes Theorem

    steps for solving bayesian probability problems

  4. [Solved] Probability Bayesian network problem

    steps for solving bayesian probability problems

  5. Using Bayes Theorem through Tree Diagrams

    steps for solving bayesian probability problems

  6. probability

    steps for solving bayesian probability problems

VIDEO

  1. Problem 1 of Bayes Theorem of Probability

  2. Perfect Bayesian Equilibrium Practice: Introduction

  3. Tutorial-Bayesian Belief Networks

  4. Perfect Bayesian Equilibrium Practice: Example 4

  5. 5.3 Bayes’ Theorem

  6. Bayesian probability

COMMENTS

  1. Bayes' Theorem Problems, Definition and Examples

    The Formula Bayes' Theorem (also known as Bayes' rule) is a deceptively simple formula used to calculate conditional probability. The Theorem was named after English mathematician Thomas Bayes (1701-1761). The formal definition for the rule is:

  2. A Simple Guide to Solving Bayes' Theorem Problems

    A Simple Guide to Solving Bayes' Theorem Problems Trees Are Your Friend Hamilton Chang · Follow Published in Towards Data Science · 8 min read · Jun 24, 2020 Let's talk about Bayes' Theorem. Bayes' Theorem is a simple probability formula that is both versatile and powerful.

  3. How to solve Bayes' theorem problems

    Share Watch on Bayes' theorem, also known as Bayes' law or Bayes' rule, tells us the probability of an event, given prior knowledge of related events that occurred earlier. To simplify Bayes' theorem problems, it can be really helpful to create a tree diagram.

  4. Bayes Theorem (Easily Explained w/ 7 Examples!)

    Bayes' Theorem states when a sample is a disjoint union of events, and event A overlaps this disjoint union, then the probability that one of the disjoint partitioned events is true given A is true, is: Bayes Theorem Formula For example, the disjoint union of events is the suspects: Harry, Hermione, Ron, Winky, or a mystery suspect.

  5. Bayes' Theorem

    Has the search engine watched the movie? No, but it knows from lots of other searches what people are probably looking for. And it calculates that probability using Bayes' Theorem. Bayes' Theorem is a way of finding a probability when we know certain other probabilities. The formula is: P (A|B) = P (A) P (B|A) P (B)

  6. Bayes' Theorem Examples with Solutions

    Bayes' theorem to find conditional porbabilities is explained and used to solve examples including detailed explanations. Diagrams are used to give a visual explanation to the theorem. Also the numerical results obtained are discussed in order to understand the possible applications of the theorem. Bayes' theorem From law of total probability

  7. Bayes' Theorem and Conditional Probability

    Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates. Given a hypothesis H H and evidence E E, Bayes' theorem states that the ...

  8. Bayesian Statistics: A Beginner's Guide

    A probability assigned between 0 and 1 allows weighted confidence in other potential outcomes. In order to carry out Bayesian inference, we need to utilise a famous theorem in probability known as Bayes' rule and interpret it in the correct fashion. In the following box, we derive Bayes' rule using the definition of conditional probability.

  9. 12.4: Bayes Theorem

    Solution There are two ways to approach the solution to this problem. One involves an important result in probability theory called Bayes' theorem. We will discuss this theorem a bit later, but for now we will use an alternative and, we hope, much more intuitive approach.

  10. Bayes Theorem

    Bayes theorem is a statistical formula to determine the conditional probability of an event. It describes the probability of an event based on prior knowledge of events that have already happened. Bayes rule is named after the Reverend Thomas Bayes and Bayesian probability formula for random events is P (A|B) = P (B|A)P (A) P (B) P ( A | B) = P ...

  11. PDF Practice Questions on Bayes'S Formula and On Probability (Not to Be

    Assume that the probability of having a rash if one has measles is P(R jM) = 0:95. However, occasionally children with u also develop rash, and the probability of having a rash if one has u is P(R jF) = 0:08. Upon examining the child, the doctor nds a rash. What is the probability that the child has measles? Solution. We use Bayes's formula ...

  12. Bayes' Theorem Calculator

    P (A|B) = [P (B|A) × P (A)] / P (B), where: P (A), P (B) - Probability of event A and even B occurring, respectively; P (A|B) - Conditional probability of event A occurring given that B has happened; and similarly P (B|A) - Conditional probability of event B occurring given that A has happened. Bayes' rule formula - tests

  13. How to solve Bayesian problems. Step-by-step guide using the Hypotheses

    On the other hand, Gerd Gigerenzer's evolutionary point of view argues that humans are not adapted to solve Bayesian inference problems that communicates data in probability format (i.e. a number between 0 to 1, or in percentages) — like in the Tversky and Kahneman research.

  14. Bayesian probability

    Bayesian probability (/ ˈ b eɪ z i ən / BAY-zee-ən or / ˈ b eɪ ʒ ən / BAY-zhən) is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.. The Bayesian interpretation of probability can be seen as ...

  15. Bayes Theorem

    Bayes' theorem describes the probability of occurrence of an event related to any condition. It is also considered for the case of conditional probability. Bayes theorem is also known as the formula for the probability of "causes".

  16. PDF A Gentle Introduction to Bayesian Inference

    Check out my hands-on articles about solving a slightly more difficult problem using Bayes. Beginner-friendly Bayesian Inference. Let's do Bayesian inference hands- on with a classical coin example! towardsdatascience.com. Conducting Bayesian Inference in P ython using P yMC3. Revisiting the coin example and using P yMC3 to solve it computa ...

  17. Three Steps to Learn Bayes' Theorem

    Step 2: Conditional Probability. Taking the above example, we can divide the problem into 2 parts. 1. Part 1 : if I have already chosen the chalk. 2. Part 2: Probability of me choosing a cheese ...

  18. PDF Lecture 15: Bayesian networks III

    allows us to de ne a joint probability distribution over many variables (e.g., P(C; A; H; I)) by specifying local conditional distributions (e.g., p(i j a)). Two lectures ago, we talked about modeling: how can we use Bayesian networks to represent real-world problems. Review: probabilistic inference Bayesian network: n P(X Y = x) = p(xi

  19. 9.6: Bayes' Theorem

    If there are no false negatives and if the false positive rate is 3%, compute the probability that a person who tests positive actually has the disease. Answer. This page titled 9.6: Bayes' Theorem is shared under a CC BY-SA license and was authored, remixed, and/or curated by David Lippman ( The OpenTextBookStore) .

  20. On Bayesian problem-solving: helping Bayesians solve simple Bayesian

    Resolving the "Bayesian Paradox"—Bayesians who Failed to Solve Bayesian Problems. A well-supported conclusion a reader would draw from the vast amount of research on Bayesian inference could be distilled into one sentence: "People are profoundly Bayesians, but they fail to solve Bayesian word problems.". Indeed, two strands of ...

  21. Solved Which of the following are steps for solving Bayesian

    Which of the following are steps for solving Bayesian probability problems from this section? Select all that apply. Construct a tree diagram. Write the question as a conditional Use the tree diagram to answer the two pieces of the conditional formula. Use the conditional probability formula.

  22. Bayes' Theorem Questions With Solutions

    Famous mathematician Thomas Bayes gave this theorem to solve the problem of finding reverse probability by using conditional probability. The theorem is stated as follows: If E 1, E 2, E 3, …, E n are non-empty events which form a partition of the sample space S, that is, E 1, E 2, E 3, …, E n are pairwise disjoint and E 1 U E 2 U E 3 U …U E n = S.

  23. Solving inference problems of Bayesian networks by probabilistic

    Bayesian (belief) networks 1-3 are widely used for probabilistic representations of the world characterizing the simplified probabilistic dependencies among variables. They are composed of directed acyclic graphs without cycles, and the nodes are represented by random variables for probability information.