AB TEST conversion test
http://math.tut.fi/~ruohonen/S_1.pdf
Test: the Difference between Two Means
Hypothesis testing steps:
1) Define null hypothesis, H0: e.g., two samples belong to the same population, or there is no trend. Usually we would like to reject it.
2) Choose the test statistics for a given data, e.g., mean, trend, and a test level α , e.g., 5%.
3) Consider or create the null distribution: assume H0 is true, and obtain statistics for H0.
5) Compare the test statistics to the null distribution. Obtain the probability p of the test statistic to be observed in the null distribution. If the p-value (probability of finding this sample mean or trend within the null distribution) is less than the test level, p < α , then the null hypothesis is rejected.
Test ststics(검정 통계량) 분포
case1: Difference between Two Means (z-test)
case2: Difference between Two Means( sigma unknown-ttest)
student t distribution with n-1 d.o.f., used if you estimate the standard deviation from the sample
case3: Proportion
Power(검정력)
Critical Value
In hypothesis testing, a critical value is a point on the test distribution that is compared to the test statistic to determine whether to reject the null hypothesis
• Example of test statistic: t-value
If the absolute value of your test statistic is greater than the critical value, you can declare statistical significance and reject the null hypothesis
• Example: t-value > critical t-value
α : the threshold value that we measure p-values against. •
For results with 95% level of confidence: α = 0.05
• = probability of type I error
• p-value: probability that the observed statistic occurred by chance alone
• Statistical significance: comparison between α and the p-value • p-value < 0.05: reject H0 and p-value > 0.05: fail to reject H0
Type II error (β) is the failure to reject a false H0 • Direct relationship between Power and type II error: • β = 0.2 and Power = 1 – β = 0.8 (80%)
The effect size
It depends on the type of difference and the data •
Easy example: comparison between 2 means
• The bigger the effect (the absolute difference), the bigger the power
• = the bigger the probability of picking up the difference
Determine Sample size
the difference of two means
Proportions of two groups
유도...
Winner's Curse : Bias Estimation for Total Effects of Features in Online Controlled Experiments 리뷰
Blog : https://medium.com/airbnb-engineering/selection-bias-in-online-experimentation-c3d67795cceb
What Is The Winner’s Curse?
Are You Suffering From Winner’s Curse?
Winner curse: a phenomenon in common value auctions where the winner tends to overpay for the value of the item
Winner curse를 설명하기 위한 예를 들어 보자, 10개의 실험을 진행했고 각 실험 결과의 standard deviation는 1%로로 고정 한다.
충분한 수의 sample을 통해 각 실험은 독립적으로 진행 되었다. 검정 결과는 observed effect(Test statics)이고 아래 줄은 실제의 true effect 이다.
신뢰 수준 0.05인 T-test를 진행 할경 경우 observed effect가 임계치(critical value)인 1.96보다 큰 경우과 효과 있음을 나타낸다(Red).
3가지 실험의 total observed effec(Bottom-up)은 2.7%+2.6%+3.3%=8/6%이다. 하지만 True effect는 1%+1%+4%=6% 이다. 이 경우 upward bias 는 2.6% 이다.
True Effect가 있는 경우 즉 0보다 큰 경우는 True effect가 T-test 임계치인 1.96보다 작은 경우와 큰 경우 2가지로 나뉠 수 있다.
Obaserved effect는 true effect를 평균으로 하는 정규 분포에서 어느 값이든지 나올 수 있기 때문에 true effect가 있어도 1.96보다 클수도 작으수도 있다.
Case A인 경우는 항상 1.96보다 큰 Observed effect만 택하기에 항상 upward bias가 발생 한다. 반대로 CaseB는 Observed effect가 true effect보다 작은 경우도 그 값이 1.96이기 때문에 upward/downward bias가 발생 하다. 실제로, 위에서 10번째 Test에서 Ture Effect가 4%이지만 3.3% Observed effect가 있는 Test를 선택 했다. 10번째 Test만 있는 경우는 total effect도 downward bias 가 발생 할 수 있다.
따라서 저자는 Winner curse는 평균 적으로 발생 한다고 논문을 이어나가고, 그 증명은 다음과 같다. 하지만, 두 경우 모두 위 그래프에서 a영역만 전체 나올 수 있는 부분에서 상위부분만을 띄어다가 평균을 냈기 때문에 당연한 결과 이다.
Selection bias는 true effect값에 따라 다르게 나타아면 그 결과는 아래와 같다.
1.96보다 작은경우 1.96에 가까워 질수록 selection bias가 늘어나고, 1.96보다 큰 경우는 점점 그 값이 감소 한다.
유도를 해보면..? 유도한 식 그려보면 똑같이 나옴.
Selection bias는 p-value가 작을수록 그 값이 작아진다.
위에 유도한 식에서 1.96대신에 각 p-value에 대응 하는 값을 넣어서 그리면 유도 됨
Okay, What Shall We Do Then?
A Comparison of Approaches to Advertising Measurement: 리뷰
용어
Ad effectiveness metrics : how we report the effectiveness of ad campaigns
=> “exposure to ads increased the share of consumers buying by 0.4 percentage points, or an increase in purchase likelihood of 50%.” is right?
=> No, not all consumers who were assigned to the test group were exposed to ads during the study
The incremental conversion rate (ICR) is the actual conversion rate minus the counterfactual conversion rate, 1.8%-1.0% in our example
=>
A Comparison of Approaches to Advertising Measurement: 리뷰
Introduction
Facebook lift study : A/B test with two important differences
-the control group : scaled so that the size of the test and control groups are the same
-Reached audience : Members of the test group who are shown the advert at least once during the test period
- Unreached audience : have not seen the advert during the test period
-> The activity of the unreached audience introduces variance that is not present in a standard A/B test
- Multi-Cell: the target population is split into multiple cells each with a control and test group of their own, as illustrated in Figure 1
-> used to compare two marketing strategies where the target audience exhibits a selection bias
Summary
How des Facebook calculate incrementality and lift?
Control 그룹과 Test 그룹의 conversion에는 reached R와 unreached U audiences를 모두 포함 하고 있음
reached R와 unreached U audiences의 conversion은 같다고 가정
Test group중에 광고를 본 사람의 비율 Reach r은 Control Goup과 같다고 가정
In the control group the conversion rates are the same in the unreached and reached audiences
The incrementality is the difference in conversions between the test and scaled control groups and originates solely from the reached audiences
The test statistic(검정 통계량) is lift (L) defined as incrementality divided by the number of reached conversions in the scaled control
Facebook’s Null Hypothesis Significance Test determines if there is a non-zero lift at 90% confidence level (two-tailed)
Derivation of the lift distributions
Power and Minimum Sample size
While we have derived the necessary CMF to calculate power and sample size, we also explore the possibility to proceed by simulating the distribution for L using a large number of samples
Step for numerical CMF of L(under H0)
1. Estimates for E(CC ) and r can be taken from previous Facebook advertising results.
2. E(CT)는 아래의 식을 통해 구함
3. Under H0 하에서는 E(L)=0이므로
4, treating CC and CT as Poisson random variables with means λC and λT respectively
5, drawing samples from CC and CT
6. to scale them to obtain samples for RS and CS
7. get critical value c is calcualted as the 95th precentile of this distribution
Step for numerical CMF of L(under H1)