Q: What is the difference between a p-value and a confidence interval?

A p-value answers a yes/no question: is this result significant? A confidence interval gives a range: the true lift is likely between X% and Y%. Confidence intervals are often more useful because they show the size of the effect, not just whether it exists.

Q: How long should I run an A/B test?

Run your test until you hit your pre-calculated sample size, and for at least 1-2 full weeks to smooth out day-of-week variation. Calculate the required sample size before you start based on your baseline conversion rate and minimum detectable effect.

Q: Can I stop an A/B test early if results look strong?

No. Stopping early because results look good is called peeking, and it inflates your false-positive rate from 5% to 25% or higher. Decide your sample size upfront and honour it, regardless of what early results show.

Q: What is the difference between statistical significance and practical significance?

Statistical significance means the result probably is not luck. Practical significance means the result is large enough to matter for your business. A 0.1% lift can be statistically significant with a large sample but not worth acting on. Always set a minimum detectable effect based on business impact before you run the test.

Q: What sample size do I need for an A/B test?

It depends on your baseline conversion rate and the lift you want to detect. As a rough guide: a 5% baseline with a 20% relative lift target requires approximately 1,900 conversions per group at 95% confidence. Use a sample size calculator before starting your test.

Q: What should I do if my p-value is 0.06?

A p-value of 0.06 is not statistically significant. You have two options: collect more data and retest if your sample size was too small, or accept the test as inconclusive and move on. Do not lower your threshold to 0.10 to manufacture significance.

Q: How do I know if my A/B test result is a fluke?

Statistical significance answers this directly. If p 0.05, you do not have enough evidence — run the test longer, increase your sample size, or call it a draw.

Question 1

What is a p-value?

Accepted Answer

A p-value is the probability that your result happened by random chance, assuming control and variant perform identically. A p-value of 0.05 means there is a 5% chance the result is luck. The conventional threshold is p < 0.05 for statistical significance at 95% confidence.

Question 2

What is the difference between a p-value and a confidence interval?

Accepted Answer

A p-value answers a yes/no question: is this result significant? A confidence interval gives a range: the true lift is likely between X% and Y%. Confidence intervals are often more useful because they show the size of the effect, not just whether it exists.

Question 3

How long should I run an A/B test?

Accepted Answer

Run your test until you hit your pre-calculated sample size, and for at least 1-2 full weeks to smooth out day-of-week variation. Calculate the required sample size before you start based on your baseline conversion rate and minimum detectable effect.

Question 4

Can I stop an A/B test early if results look strong?

Accepted Answer

No. Stopping early because results look good is called peeking, and it inflates your false-positive rate from 5% to 25% or higher. Decide your sample size upfront and honour it, regardless of what early results show.

Question 5

What is the difference between statistical significance and practical significance?

Accepted Answer

Statistical significance means the result probably is not luck. Practical significance means the result is large enough to matter for your business. A 0.1% lift can be statistically significant with a large sample but not worth acting on. Always set a minimum detectable effect based on business impact before you run the test.

Question 6

What sample size do I need for an A/B test?

Accepted Answer

It depends on your baseline conversion rate and the lift you want to detect. As a rough guide: a 5% baseline with a 20% relative lift target requires approximately 1,900 conversions per group at 95% confidence. Use a sample size calculator before starting your test.

Question 7

What should I do if my p-value is 0.06?

Accepted Answer

A p-value of 0.06 is not statistically significant. You have two options: collect more data and retest if your sample size was too small, or accept the test as inconclusive and move on. Do not lower your threshold to 0.10 to manufacture significance.

Question 8

How do I know if my A/B test result is a fluke?

Accepted Answer

Statistical significance answers this directly. If p < 0.05, you can be 95% confident the result is not a fluke. If p > 0.05, you do not have enough evidence — run the test longer, increase your sample size, or call it a draw.

Statistical Significance Calculator

How to Run an A/B Test

Define your hypothesis

Calculate your required sample size

Assign traffic randomly

Run the test for the full duration

Analyse results with this calculator

Common A/B Testing Pitfalls

Peeking at results

Stopping too early

Sample size too small

Testing too many variants

Ignoring seasonality

Confusing statistical with practical significance

Frequently Asked Questions

What is a p-value?

What's the difference between a p-value and a confidence interval?

How long should I run an A/B test?

Can I stop the test early if results are clearly winning?

What's the difference between statistical and practical significance?

What sample size do I need?

My p-value is 0.06 — is that close enough?

How do I know if my result is a fluke?

Ready to scale your winners?