Generate A/B Test Ideas for Your Funnel
You'll end up with: A prioritized backlog of funnel A/B test ideas—each with a clear hypothesis, primary metric, stage, effort tags, and a one-line implementation note—ready to run or hand to a designer or developer.
Brainstorming clever copy variants without tying each test to one stage metric or enough volume to learn—so nothing ships or you run everything at once. Fix: in Step 1 lock one north-star metric for this session; in Step 4 kill or downgrade tests that need traffic you don't have; in Step 5 rank to five and only fully spec the top three.
- Your funnel stages from first touch to conversion (bullets are OK)
- One primary conversion you care about this quarter (e.g. booked calls, checkout, trial signups)
- Approx traffic or list size at the weakest stage (order of magnitude is fine)
- One place you already see numbers (analytics export, screenshot, or honest gut like "~70% bounce on pricing")
- Claude and Google Sheets open in two tabs
Lock the funnel, stages, and one north-star metric
Mirror your funnel and pick one primary metric before any test ideas—stops random headline brainstorms.
1. Open https://claude.ai and start a new chat. Keep this single thread open through Step 5. 2. Paste and fill every bracket: I am prepping a batch of funnel A/B tests. Do NOT propose test ideas yet. Product / offer (one sentence): [...] Who we sell to (one sentence): [...] Funnel stages in order from first touch to conversion (bullets are fine): [...] North-star metric for THIS batch (pick exactly one—e.g. booked calls, trial signups, checkout completion, qualified leads): [...] Planning horizon (e.g. next 30 days): [...] Reply with ONLY: (a) A markdown table: Stage | Job_of_stage | Typical_drop_off_guess (b) Up to 5 missing numbers or facts you still need before ideation (c) One sentence restating the single north-star metric for this batch Rules: No hypotheses. No copy or layout ideas. No tool recommendations. 3. Read Claude's reply. If it lists tests anyway, send: "Stop. No test ideas in this thread yet—regenerate (a)-(c) only." If a stage is missing, answer briefly, then: "Regenerate the table only."
Inventory friction (symptoms, not solutions)
List observable frictions per stage—evidence vs assumption—so later hypotheses map to real symptoms.
1. In the same Claude chat as Step 1, paste: Using ONLY the funnel table and north-star from above, for each stage list 3–6 friction bullets. Each bullet must be an observable symptom (confusion, anxiety, mismatch, speed, trust, proof gap, pricing clarity)—not a solution or test yet. Tag every bullet either Evidence or Assumption. Evidence must cite one of: analytics number, support ticket theme, sales-call pattern (last 3 calls), refund/churn reason, or email reply pattern. If you cannot find at least one Evidence bullet for a stage, write NEEDS_DATA and name the smallest metric to collect. Do not propose tests. 2. Add your own Evidence tags where you have them—do not leave every line as Assumption.
Convert frictions into a burst of test hypotheses
Generate 15–25 atomic hypotheses in If / Then / Because form, typed as copy, layout, offer, proof, speed, or pricing presentation.
1. In the same chat, paste the funnel table from Step 1 and the friction lists from Step 2. 2. Ask Claude: Generate at least 15 and at most 25 A/B test hypotheses. Rules: - One atomic change per row (no bundles like "rewrite page and change offer"). - Use this exact sentence shape: If we change [element] for [segment], we expect [metric] to [up/down] because [one-line mechanism]. - Type must be exactly one of: Copy | Layout | Offer | Proof | Speed | Pricing presentation - Primary metric must match a stage metric (CTR, bounce, form start, form complete, reply rate, booking rate, checkout, AOV, etc.)—not vanity unless tied to the stage job. Output a markdown table with columns: Stage | Hypothesis | Type | Primary_metric | Mechanism_one_line 3. Scan for duplicate mechanisms; if two rows differ only in adjectives, merge or delete duplicates and ask Claude to output the cleaned table only.
Apply traffic, ethics, and sequencing guardrails
Label each hypothesis Feasible, Stretch, or Do not run yet using your real volumes and brand no-gos.
1. In the same chat, paste approximate weekly volumes (pick what you have—site visitors, landing page uniques, outbound emails sent, replies, trials, checkouts): Weekly visitors or emails by stage: - Stage 1: [...] - Stage 2: [...] (add rows until every stage is covered) Ethical / brand no-gos I will not run (list, e.g. hidden fees, fake scarcity, dark patterns, misleading claims): [...] 2. Ask Claude to take the full hypothesis table from Step 3 and add a column: Feasibility = Feasible | Stretch | Do not run yet Rules: - One sentence per row explaining the label. - Assume conservative baseline conversion rates when unsure. - Mark Do not run yet if learning would likely take more than ~4 weeks at stated volume OR the test conflicts with a no-go. - If two tests would compete for the same audience at the same time, note "sequence after [other test]" in that sentence. 3. Require at least three rows labeled Do not run yet unless volume is huge—if Claude marks everything Feasible, reply: "Assume conservative conversion rates; downgrade anything needing more than four weeks at this volume. Re-output the full table with Feasibility column."
Prioritize to the top 5 (ICE)
Score Impact, Confidence, and Ease 1–5 each; sort and keep five ranked tests with a one-line why now.
1. In the same chat, paste: Using only rows labeled Feasible or Stretch from the latest table, score each row: - Impact (1-5): expected effect on the north-star for this batch - Confidence (1-5): evidence strength behind the mechanism - Ease (1-5): implementation speed for you (S/M/L mapped to numbers is fine) ICE_total = Impact + Confidence + Ease (max 15). 2. Sort descending by ICE_total. Output exactly the top 5 rows with columns: Rank (1-5) | Stage | Hypothesis | Type | Primary_metric | ICE_total | Why_now (one non-generic sentence) 3. If ties clog the ranking, reply: "Break ties by proximity to money for the north-star; re-output top 5 only." 4. Sanity-check: disagree out loud with at least one rank—if you cannot, ask Claude which assumption would most change the ranking if wrong.
Build the Sheet backlog and fully spec the top 3
Create a reusable Google Sheet: five rows from the shortlist; ranks 1–3 get control, variant, run duration heuristic, and instrumentation.
1. Open https://sheets.google.com and create a new spreadsheet named: Funnel AB backlog — [YOUR BUSINESS] — [DATE] 2. Row 1 headers (exact text): Rank | Stage | Hypothesis | Type | Primary metric | ICE total | Effort S/M/L | Run notes | Falsify / watch-outs | Status 3. Paste the top 5 table from Step 5 into rows 2–6 under those columns (fill Effort S/M/L and Status yourself; Status can be Backlog). 4. For ranks 1–3 only, add four new columns after Status (insert columns or place to the right): Control | Variant | Minimum run rule | Instrumentation Fill Control and Variant in plain English (what stays vs what changes). Minimum run rule: use this heuristic unless you have a power calculator: "Two full business weeks OR 100 conversions on this step metric, whichever is later." Adjust the number only if Claude's volume notes justify it—write the final rule in the cell. Instrumentation: name the exact report, event, or sheet column you will read (e.g. GA4 landing page conversion, ESP click map, checkout step funnel). 5. Optional: freeze row 1 and turn Status into a data validation list: Planned | Running | Done | Killed.
All done!
You now have: A prioritized backlog of funnel A/B test ideas—each with a clear hypothesis, primary metric, stage, effort tags, and a one-line implementation note—ready to run or hand to a designer or developer.
Explore more guidesWant this workflow built for your business?
Book a free audit