What is AB Test? How Are Statistical A / B Tests Done? — 1

Mehmet Akturk
6 min readFeb 8, 2021

--

In this article, Statistical A / B test, how to choose the right methods and points to be considered in A / B tests will be mentioned and an example will be made. The first part of the article, which will be discussed in two parts, will be conceptually and result-oriented, the second part will be technically related to the background tests and the theoretical working principles of these tests.

https://gotvantage.com/easy-ab-testing/ab-testing/

Note: I usually will use some abbreviated words below:

  • Data Science — DS
  • Artificial Intelligence — AI
  • Machine Learning — ML
  • Big Data — BD

Let’s start…

If A / B tests are summarized in a sentence: It is the investigation of whether the innovations and improvements made on the website for Front-End or Back-End projects create a “meaningful difference” according to certain criteria. For example: Is there a significant difference in conversion rates between the red and green of the buy button for a product? Another example is to investigate whether the new ML project in the living system causes a significant difference in terms of increasing revenues compared to the old system.

Although the “significant difference” part here is very important, when the usage examples in the world are examined, the definition of significance comes with statistical tests.

Incomplete or misunderstanding of statistical tests will cause errors and low reliability results in the study.

As an example, a short research on Google and Booking can be done. It can be seen how excitedly the team at Google is interested in these tests. You can get course about A / B in the Udacity. This is
https://www.udacity.com/course/ab-testing–ud979

Photo by Giordano Rossoni on Unsplash

In addition, Netflix’s legendary recommendation was tested with these methods. Here, we are talking about testing the performance of the model in vivo, not the test of the recommendation algorithm. For this study, you can review the following article http://dl.acm.org/citation.cfm?id=2843948

To list the situations where A / B test is used frequently:

1. Optimizing “Customer Journey” steps for e-commerce, that is, increasing conversion rates.

2. An innovation on the front-end side. Changes such as color change, menu change, button change, ad space change.

3. Revenue-driven performance of a ML (BD, DS) project in live.

These three items can be added to other items, but tests generally occur around these situations for a website.

Well, we want to measure innovation somehow and we know that these statistical tests should be used.

What path should be followed and what should be considered?

1.Correct experimental design and correct data collection is required.
The area where the test will be applied (touchpoint), the target audience to be tested, the active condition and the sample volume must be determined correctly. If the test is to be made for a situation in different sites of the same brand, different tests are to be made for a change made on the same page of the same site, different tests must be designed if different products are to be tried on the same page of the same site.

https://moz.com/blog/how-do-you-know-if-your-data-is-accurate

For example, if it is thought that a decision is wanted for a button with color change, then test and control groups should be created. Questions will arise as to how to select these two groups, how to collect the data and what will be its volume. It is necessary to make theoretical based decisions for these situations because the tests applied may have theoretical assumptions. If we go through our example, it is necessary to determine whether it will be grouped according to cookie_id, according to site visit time or with a different approach.

Probably, approaches such as online sample selection for e-commerce have started to emerge in the literature.

For our example, let’s assume that the region where the test will be applied is the buy button, two different colors are active, the target audience is the audience in the evening, the sample selection is made by choosing two different groups according to the visiting hours in the evening, and the sample volume is selected in sufficient amount.

Let’s move on to step two.

2. The correct measurement tool (test method) must be selected.

https://www.freepik.com/premium-vector/human-eye-measured-with-vernier-caliper-creative-concept-measuring-tool-precise-dimension-measurement-scaling-high-accuracy-precision_10429255.htm

At this stage, this problem is: how many people saw, how many people bought or how many people saw how many people clicked, so this problem is the conversion rate problem and its counterpart in statistics is “p ratio test”.

For such a case, it would be wrong to decide by examining the differences between cumulative total or direct conversion rate values.

If the problem was about the revenues of a living ML project, then the test method would not be the “p ratio test” but the dependent or independent t test.

3. Correct statistical hypotheses need to be established and tested.

https://ersoykubraa.medium.com/a-b-testing-ef7e8dbb9859

It will help us a lot in performing the A / B test in the picture above.

The hypotheses to be established for this problem are:

H0: P1 — P2 = 0

H1: P1 — P2 are not equal 0

In other words;

H0: There is no statistically significant difference between the conversion rate value of the old system and the conversion rate value of the new system.

H1: There is a statistically significant difference between the conversion rate value of the old system and the conversion rate value of the new system.

Important note!

The reason why the letter p is small in the first part is that the two p values are values measured over the sample.

The capitalization of the letter P in the hypotheses part is because we are trying to reach N from n, that is, we are pointing to the population.

After the correct test method and correct hypothesis test are established, let’s move on to the next step.

Other cases and possible hypothesis testing will be discussed in the second article.

Let’s leave it here for now and say to see you in my next and last article under this topic.

http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/

References
1. https://gotvantage.com/easy-ab-testing/ab-testing/
2. https://www.veribilimiokulu.com/blog/istatistiksel-a-b-testleri-nasil-yapilir/
3. https://medium.com/r?url=https%3A%2F%2Funsplash.com%3Futm_source%3Dmedium%26utm_medium%3Dreferral
4. https://moz.com/blog/how-do-you-know-if-your-data-is-accurate
5. https://www.freepik.com/premium-vector/human-eye-measured-with-vernier-caliper-creative-concept-measuring-tool-precise-dimension-measurement-scaling-high-accuracy-precision_10429255.htm
6. https://ersoykubraa.medium.com/a-b-testing-ef7e8dbb9859
7. http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/

--

--

Mehmet Akturk
Mehmet Akturk

Written by Mehmet Akturk

Experienced Ph.D. with a demonstrated history of working in the higher education industry. Skilled in Data Science,AI,NLP,Deep Learning,Big Data,& Mathematics.

No responses yet