The Complete Guide to Split Testing with AI
Learn how to run experiments that actually move the needle. Avoid common pitfalls, understand your results, and let AI do the heavy lifting.
What Makes abee.pro Different
Traditional A/B testing tools make you do all the work: come up with ideas, write copy, analyze results, decide what to test next. abee.pro flips the script.
AI-Generated Hypotheses
Our AI analyzes your goals and generates dozens of test hypotheses based on proven conversion principles. No more staring at a blank page.
Continuous Optimization
Winners are automatically promoted. Losers are retired. New challengers are generated. Your experiments run 24/7 without babysitting.
Statistical Rigor
Bayesian statistics give you early indicators while frequentist tests confirm significance. Know when you have a real winner, not a fluke.
Learns From History
Every test teaches the AI something new. Patterns that work for your audience get reinforced. Dead ends get avoided.
Under the Hood: How the AI Works
abee.pro uses a sophisticated multi-agent system to generate, evaluate, and refine test ideas. Here's a peek behind the curtain.
The Strategist
Analyzes your test history and decides the optimal balance between exploration (trying new themes) and exploitation (doubling down on what works). Early on, it explores widely. As patterns emerge, it focuses on winners.
The Hypothesis Generator
Creates test ideas based on your prompt, conversion psychology principles, and what's worked before. Each hypothesis targets a specific theme: urgency, social proof, clarity, benefit-focus, and more.
The Critic
Scores every hypothesis on relevance, novelty, testability, and predicted impact. Only the strongest ideas make it through. Weak hypotheses get sent back for refinement or discarded entirely.
The Devil's Advocate
Challenges every hypothesis with hard questions: “Will this actually change behavior?” “Is this different enough to matter?” Ideas that can't withstand scrutiny don't get tested.
The Editor
Polishes the final copy for clarity, impact, and brand voice. Makes sure your variations are professional and ready for real users.
The Retrospective Analyst
After each test, extracts learnings and updates the knowledge base. What worked? What didn't? Why? These insights inform future tests.
The result? A tireless optimization team that generates better ideas than most humans, tests them rigorously, and gets smarter with every experiment.
Best Practices for Split Testing
Even with AI doing the heavy lifting, following these principles will dramatically improve your results.
One Test Per Page
If you're testing your homepage headline AND your pricing page CTA, you won't know which change drove results. Run one experiment per page at a time.
Unique Goals Per Experiment
If your homepage test and signup page test both use “completed_signup” as the goal, attribution becomes impossible. Which test caused the signup?
Give Tests Time to Run
Statistical significance requires sufficient data. Don't call a winner after 50 visitors. The “Early Indicators” feature shows Bayesian probabilities, but wait for the “Significant Winner” badge before making permanent changes.
Describe Your Context, Not Your Strategy
The prompt builder is conversational—it asks questions to understand your situation. Your job is to describe what you're testing and who it's for. The AI decides how to optimize it. Don't try to direct the testing strategy; let the AI explore angles you might not have considered.
Trust the Process
Not every test will be a winner. That's not failure—that's learning. A test that shows no difference tells you that element isn't the bottleneck. The AI uses this information to focus on more promising areas.
Creating Your First Experiment
Define Your Goal
What action do you want users to take? Click a button? Sign up? Add to cart? Create a goal in abee.pro and add the tracking code to fire when that action happens.
Create an Experiment
Give it a descriptive name and key (e.g., “homepage_hero”). Select your primary goal. Enable Auto-Optimize to let the AI generate and manage variations.
Write Your Prompt
Tell the AI what you're testing and why. Include context about your product, audience, and what you want to achieve. The AI will ask clarifying questions if needed.
Integrate the Code
Use the provided API endpoints to fetch the assigned variation and render it on your page. The integration modal shows copy-paste code for both client-side and server-side implementations.
Start the Experiment
Click the Play button to activate your experiment. Traffic will be split between your control and the AI-generated challenger. Watch the results roll in.
Understanding Your Results
The experiment page gives you everything you need to understand how your test is performing. Here's what each section tells you.
Quick Stats
At the top, you'll see lifetime totals: Total Views (how many visitors entered the experiment), Conversions (how many completed your goal), and Conversion Rate (conversions / views).
Winner Callout
When a variation achieves statistical significance, a prominent callout appears showing the winner, its lift over control, and the p-value. You can keep the experiment running to find even bigger wins, or end it and update your site whenever you're satisfied.
Variation Cards
Each variation gets a card showing its performance:
- Views: How many visitors saw this variation
- Conversions: How many completed the goal
- CR (Conversion Rate): The percentage that converted
- Uplift: How much better/worse than control (e.g., +15.2%)
- Early Indicators: Bayesian probability of beating control (before significance)
Goal Performance Chart
Track conversion trends over time. See how each variation performs day-by-day or hour-by-hour. Iteration markers show when new challengers were introduced.
Activity Log
For AI-optimized experiments, the Activity Log shows what the AI is thinking:
- Hypothesis: The reasoning behind the current test
- Theme: The psychological lever being tested (urgency, social proof, etc.)
- Summary: What was learned from completed iterations
- Learnings: Specific insights extracted for future tests
Hypotheses Explorer
Dive deep into the AI's hypothesis registry. See which ideas have been tested, which are queued, and how each performed. Track win/loss records by theme to understand what resonates with your audience.
Common Mistakes to Avoid
🚫 Peeking and Stopping Early
Checking results hourly and stopping when you see a “winner” leads to false positives. Statistical significance exists for a reason. Let tests reach the required sample size.
🚫 Testing Too Many Things at Once
If you change the headline, button color, and image simultaneously, you'll never know which change mattered. Test one element at a time, or use proper multivariate testing (coming soon).
🚫 Ignoring Segment Differences
A variation might lose overall but win for mobile users. Or vice versa. Consider whether your audience segments might respond differently.
🚫 Testing Low-Traffic Pages
If a page gets 10 visitors per week, it'll take months to reach significance. Focus experiments on high-traffic pages where you can learn quickly.
🚫 Not Acting on Winners
Finding a winner is only valuable if you implement it. Don't let successful tests sit in limbo. Update your production copy and start the next experiment.
Start Optimizing Today
Join thousands of teams using AI to optimize their conversion rates. Your first experiment is free.