Null Hypothesis and Void?

DataScience Deep Dive
4 min readDec 24, 2020

The Null Hypothesis is one in which human beings' innate curiosity has never so much appreciated as compared to finding reason in ideas. Relaxing the null hypothesis makes for great ideas and greater curiosity.

Which information and which perspectives we take into account, when we try to decide whether a pattern we’ve matched is real or void (false alarm), affects our ability to determine whether something is void or whether we’ve just stopped letting our imagination, bias? and curiosity to run free.

This is where science and even deeper, mathematics, steps in and presents us with p-values and alpha-values. These two values keep us closer in check to how far and freely we can run.

Scientists use the null hypothesis to become very specific in their findings. This is because a successful experiment doesn’t actually prove a relationship between a dependent and independent variable. Instead, it just proves that there is not enough evidence to convincingly believe there is no relationship between the dependent and the independent variable.

No matter what you’re experimenting on, good experiments come down to one question: Is your p-value less than your alpha value? Let’s dive into what each of these values represents, and why they’re so important to experimental design.

p-value: The probability of observing a test statistic at least as large as the one observed, by random chance, assuming that the null hypothesis is true.

If you calculate a p-value and it comes out to 0.02, you can interpret this as saying “There is a 2% chance of obtaining the results I’m seeing when the null hypothesis is true.”

𝛼α (alpha value): The marginal threshold at which you’re okay with rejecting the null hypothesis.

An alpha value can be any value set between 0 and 1. However, the most common alpha value in science is 0.05 (although this is somewhat of a controversial topic in the scientific community, currently).

If you set an alpha value of 𝛼=0.05α=0.05, you’re essentially saying “I’m okay with accepting my alternative hypothesis as true if there is less than a 5% chance that the results that I’m seeing are actually due to randomness.”

When you conduct an experiment, your goal is to calculate a p-value and compare it to the alpha value. If 𝑝<𝛼p<α, then you reject the null hypothesis and accept that there is not “no relationship” between the dependent and independent variables. Note that any good scientist will admit that this doesn’t prove that there is a direct relationship between the dependent and independent variables — just that they now have enough evidence to the contrary to show that they can no longer believe that there is no relationship between them.

In simple terms:

𝑝<𝛼p<α: Reject the Null Hypothesis and accept the Alternative Hypothesis

𝑝>=𝛼p>=α: Fail to reject the Null Hypothesis.

Type I and Type II Errors

There are times, when researchers reject the null hypothesis when they should have not rejected it. The opposite might happen as well, where you might fail to reject the null hypothesis when it should have been rejected. These errors as type I and type II errors, respectively.

Data scientists have the ability to choose a confidence level, alpha (𝛼α) that they will use as the threshold for accepting or rejecting the null hypothesis. This confidence level is also the probability that you reject the null hypothesis when it is actually true. This scenario is a type I error, more commonly known as a False Positive.

Another type of error is beta (𝛽β), which is the probability that you fail to reject the null hypothesis when it is actually false. Type II errors are also referred to as False Negatives. Beta is related to something called Power_, which is the probability of rejecting the null hypothesis given that it actually is false. Mathematically, _Power = 1 — 𝛽β. When designing an experiment, scientists will frequently choose a power level they want for an experiment and from that obtain their type II error rate.

When scientists are designing experiments, they need to weigh the risks of type I and type II errors and make decisions about choosing alpha level and power. For example when looking at:

  • 𝐻0:𝑝𝑎𝑡𝑖𝑒𝑛𝑡=ℎ𝑒𝑎𝑙𝑡ℎ𝑦
  • 𝐻1:𝑝𝑎𝑡𝑖𝑒𝑛𝑡≠ℎ𝑒𝑎𝑙𝑡ℎ𝑦

A type II error, in this case, would mean that the patient is healthy even though he/she isn’t!

Making errors with the null hypothesis is inevitable as humans but critical decisions made by a model require us to err on the side of science and mathematics.

Here is a project I did on Hypothesis Testing which queries a customer order database to validate Hypothesis questions on the effect of discount on the total price paid on an order: https://github.com/Sue-Mir/Module2_Project_Hypothesis_Testing/blob/master/Student_Hypothesis_I.ipynb ie:

  • H0 : Discount amount DOES NOT have a statistically significant effect on the quantity of a product in an order.
  • H1 : Discount amount DOES have a statistically significant effect on the quantity of a product in an order.

The null hypothesis in the above example project was rejected as the P-Value was <0.5. The alpha threshold of 0.5 in this case was determined a low-risk threshold to base the results on given that it was not a ‘life and health dependant’ analysis but then bias steps in again and says perhaps a company is strongly dependent on the quantity of a product in an order. This is the constant balance we seek between Alternative Hypothesis or Null and Void.

--

--