Predict Amazon Purchase Intent with AI: LLM Product Research Guide

A peer-reviewed method using LLMs as synthetic consumers can predict whether Amazon shoppers will buy your product — before you order inventory, run a single ad, or request a single review — and the workflow is accessible to any seller with ChatGPT or Claude today.

If you're launching a new product on Amazon in 2026, you're making a capital commitment based on lagging indicators. You're looking at keyword search volume (which tells you awareness, not intent), competitor BSR (which tells you what's selling, not whether yours will), and 1-star review mining (which tells you what's wrong with existing products, not whether your differentiation resonates).

By the time you have real market feedback, you've already ordered inventory.

But there's a new approach that inverts this: use LLMs to simulate 100+ consumer purchase decisions before you commit a dollar. And according to peer-reviewed research published in October 2025, these synthetic consumers predict actual purchase intent with 90% accuracy.

This isn't theory. This is a validated method that serious sellers and agencies are using right now to kill losing concepts early and tighten positioning on viable ones.

The Research Behind the 90% Claim (And Why It Matters)

The core study was published October 27, 2025 by Benjamin F. Maier at PyMC Labs and Kli Pappas at Colgate-Palmolive: "LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings."

The methodology is called SSR: Semantic Similarity-based Rating. Here's what they actually tested:

57 real consumer product surveys totaling 9,300 human responses
GPT-4o and Gemini-2.0-flash as the synthetic consumers
Product categories: primarily personal care, household goods, and everyday consumer products

The key finding: when an LLM is given a demographic persona, shown a product image, asked to respond naturally in text (not rate on a scale), and that response is then mapped to a 1–5 purchase intent score via semantic similarity to anchor statements, the model achieves 90% of human test-retest reliability with distributional similarity exceeding 0.85.

Crucially, it outperformed supervised ML approaches trained on in-sample data — LightGBM only achieved 65% correlation — meaning the LLMs aren't just pattern-matching. They've internalized consumer reasoning.

Why Traditional Amazon Launch Validation Fails

Let's be honest about what most sellers do for pre-launch validation:

Keyword volume as a demand proxy: You check Helium 10 or Jungle Scout for search volume. But search volume tells you awareness, not purchase intent. "Wireless earbuds" has 500K monthly searches — that doesn't tell you if your specific product will convert.
Competitor BSR guessing: You look at the top 10 listings and estimate their sales. This tells you what's currently selling, not whether your differentiation will capture share.
1-star review mining: You read competitor complaints and design around them. This identifies problems with existing options, but doesn't validate that your solution resonates.

All three are lagging indicators. By the time you have this data, the market has already spoken — and you've already committed capital.

Traditional focus groups and consumer panels cost £5,000–£15,000 and take 4–6 weeks. Most Amazon sellers skip this step entirely because it's prohibitively expensive for a single SKU test.

The SSR method changes the economics: you can run 100+ synthetic consumer interviews in an afternoon for the cost of $20 in API credits.

The SSR Method Explained Simply

Harvard Business Review describes the core workflow as "persona conditioning + stimulus + free-text impression + semantic scoring."

Here's what that means in practice:

1. Persona Conditioning

You define the demographic and psychographic profile of a potential buyer:

Age, gender, income, location
Shopping habits, pain points, values
Current solutions they use

2. Stimulus (Product Image + Description)

You show the LLM your product images and a brief description (not your full listing copy — just the core value prop).

3. Free-Text Impression

You ask the LLM to respond naturally: "What's your first impression? Would you consider buying this?"

The key is letting the model respond in natural language, not forcing it to pick a number on a scale. This unstructured response captures nuance that Likert scales miss.

4. Semantic Scoring

You then map the LLM's response to a 1–5 purchase intent score using semantic similarity to pre-defined anchor statements:

5: "I would definitely buy this"
4: "I'm very interested and would likely purchase"
3: "It seems okay, I might consider it"
2: "I'm not very interested"
1: "I would not buy this"

The semantic similarity model compares the LLM's natural language response to these anchors and assigns a score.

Step-by-Step Workflow for Amazon Sellers

You don't need Python or ML expertise. Here's how to run this with ChatGPT or Claude:

Step 1: Build 5–8 Buyer Personas

Define your target customer segments. For each persona, write:

Example Persona:

Name: Sarah, 34
Demographics: Female, household income $85K, Seattle
Psychographics: Health-conscious, busy professional, shops Amazon 2x/week
Current solution: Uses generic vitamin D supplements from Costco
Pain points: Forgets to take pills, unsure if dosage is right, wants vegan options
Values: Transparency, convenience, quality over price

Create personas that represent different buyer motivations in your category.

Step 2: Feed the LLM Your Product Context

For each persona, start a new conversation with this prompt structure:

You are Sarah, a 34-year-old female professional living in Seattle with a 
household income of $85K. You're health-conscious but busy, and you shop on 
Amazon twice per week. You currently use generic vitamin D supplements from 
Costco but forget to take them regularly. You're unsure if your dosage is 
correct, and you prefer vegan options. You value transparency, convenience, 
and quality over price.

I'm going to show you a product. Please respond naturally about your first 
impression and whether you would consider buying it.

[Attach product images]

Product: VitaPatch Daily Vitamin D Patches — Vegan, 5000 IU per patch, 
apply once per week, subscription available. $24.99 for 8 patches (2-month supply).

What's your first impression? Would you consider buying this?

Step 3: Probe for Objections

After the initial response, ask follow-up questions:

What specific concerns or hesitations do you have?
What would make you more likely to buy?
How does this compare to what you currently use?
What additional information would you need before purchasing?

Step 4: Map the Semantic Score

Take the LLM's natural language response and compare it to the anchor statements. Most modern LLMs can self-score if you ask directly:

Based on your impression above, how would you rate your purchase intent on a 
scale of 1-5, where:
5 = I would definitely buy this
4 = I'm very interested and would likely purchase
3 = It seems okay, I might consider it
2 = I'm not very interested
1 = I would not buy this

Step 5: Aggregate Across Personas

Run this workflow for all 5–8 personas. Calculate:

Mean purchase intent score (target: 3.5+ to proceed)
Distribution: How many 4s and 5s vs. 1s and 2s
Objection patterns: What concerns appear across multiple personas

What the Output Actually Tells You

The real value isn't just the score — it's the objection map.

When you analyze the free-text responses across all personas, you'll see patterns:

Example Objection Map for VitaPatch:

Price concern (4 of 8 personas): "$24.99 for 8 patches feels expensive compared to $12 for 100 pills"
Efficacy doubt (3 of 8 personas): "Do patches actually work as well as oral supplements?"
Convenience benefit (6 of 8 personas): "Once-per-week application is way more convenient than daily pills"
Trust signal (5 of 8 personas): "Need to see third-party testing or doctor endorsement"

This objection map feeds directly into your listing optimization:

Bullet point hierarchy: Lead with convenience (6/8 mentioned it), then address efficacy doubt with clinical study results
A+ Content story arc: Show the science behind transdermal delivery before price justification
Image stack: Include a comparison chart showing cost-per-dose vs. traditional pills
Rufus-optimized copy: Use natural language like "works as well as oral supplements" (the exact phrase customers use when questioning efficacy)

This bridges directly to how Rufus AI evaluates your listing — the semantic framing that emerges from your synthetic panel is the same language Rufus will surface in AI-powered search results.

Where This Method Breaks Down (And When You Still Need Human Validation)

The researchers are explicit about limitations. The study validated the method on personal care products with established consumer behavioral patterns.

Where SSR is less reliable:

Novel product categories: If there's no comparable reference frame (e.g., a truly first-to-market invention), LLMs lack the training data to simulate realistic purchase reasoning
Niche B2B products: Complex purchase decisions involving procurement committees, compliance requirements, or technical specifications that require domain expertise
Emotionally complex purchases: High-ticket items with strong identity signals (luxury goods, certain fashion categories) where the "why" of purchase is harder to articulate
Cultural specificity: Products where purchase intent varies dramatically by geography/culture and you're targeting a market the LLM wasn't heavily trained on

As Resultsense notes in their validation framework, synthetic consumers should complement, not replace, human research for final go/no-go decisions.

Practical workflow:

Use LLM pre-testing to kill clearly losing concepts early (< 2.5 average intent score)
Use LLM pre-testing to tighten positioning on viable concepts (3.0–3.5 score range)
Use real human validation (PickFu, focus groups, Vine outreach) only for concepts that score 3.5+ — the ones worth spending money to validate further

This dramatically reduces your validation costs. Instead of spending $5K to test 3 concepts, you spend $20 to test 20 concepts synthetically, then spend $5K to validate the best 2–3 with real humans.

Tools Comparison: ChatGPT vs. Claude vs. Dedicated Platforms

You can run the SSR method manually with any frontier LLM. Here's how they compare:

ChatGPT o3 (via OpenAI API)

Best for: Complex reasoning, nuanced objection probing
Strengths: Strong persona consistency, handles follow-up questions well
Weaknesses: Can be verbose, sometimes over-indexes on being helpful vs. realistic
Cost: ~$0.10–0.20 per persona interview (10–20 interactions)

Claude Sonnet 4.5 (via Anthropic API)

Best for: Natural conversational flow, authentic-sounding consumer language
Strengths: More concise responses, better at skepticism/objections
Weaknesses: Can be less consistent across long multi-turn conversations
Cost: ~$0.08–0.15 per persona interview

PyMC Labs Synthetic Consumer Platform

Best for: Teams wanting a productized workflow with automatic scoring
Strengths: Pre-built persona libraries, automated semantic scoring, built-in reporting
Weaknesses: More expensive, requires commitment to their platform
Cost: Enterprise pricing (starts around $2K/month for small teams)

Altair Media Synthetic Audiences

Best for: Agencies managing multiple brand clients
Strengths: Claimed 94% accuracy in separate validation, includes video stimulus support
Weaknesses: Closed platform, requires onboarding
Cost: Per-project pricing (typically $500–1,500 per study)

For most Amazon sellers, start with ChatGPT or Claude. You can replicate the full workflow for under $50. If you're an agency or brand with 10+ launches per year, the productized platforms save time but aren't necessary.

How This Connects to Your Listing Copy

The objection map from your synthetic panel isn't just for go/no-go decisions — it's a direct input into your listing optimization.

Here's the workflow:

Run SSR method → Get objection map with exact customer language
Map objections to listing sections:
- Most common objection → Bullet point #1
- Trust concerns → A+ Content module #1
- Price justification → Image stack comparison chart
Use customer language verbatim in bullets and A+ Content (this is what Rufus AI rewards)
Pre-seed Q&A section with the questions synthetic consumers asked
Write image alt text using the benefit phrases that resonated highest

Locomotive Agency's case study shows a 34% lift in conversion rate when landing page copy was rewritten using objection language from synthetic focus groups. The same principle applies to Amazon listings.

Example:

Before SSR (feature-led): "Made with premium Japanese steel, 7-inch blade, ergonomic handle"

After SSR (objection-led, using customer language): "Stays sharp through daily use — premium Japanese steel holds an edge 3x longer than standard knives, so you're not constantly re-sharpening"

The second version directly addresses the "how long does it stay sharp?" objection that appeared in 6 of 8 synthetic personas.

The Bigger Picture: Where This Is Heading

PwC projects that synthetic data will account for 50%+ of market research inputs by 2027. Amazon's own research tools — Product Opportunity Explorer, Brand Analytics, Search Query Performance — are all backward-looking. They tell you what is selling, not what will sell.

LLM-based purchase intent prediction inverts this. For the first time, small sellers have access to the same type of pre-launch validation that enterprise CPG brands spend six figures on.

But here's the critical nuance: this is not a replacement for all human research. It's a filter. Use it to eliminate bad ideas cheaply and quickly, so you can focus expensive human validation on the concepts most likely to win.

The sellers who adopt this workflow in 2026 will launch fewer products — but the products they do launch will have dramatically higher win rates.

Bottom Line: Start Testing Ideas Before You Order Inventory

The traditional Amazon launch playbook is: guess based on keyword volume → order inventory → hope it sells → pivot if it doesn't.

The LLM-enabled playbook is: test 20 concepts with synthetic consumers → eliminate the 15 losers → validate the 5 winners with real humans → order inventory only for the 2–3 you're confident in.

The workflow is accessible today:

Build 5–8 buyer personas
Feed them your product images and description
Ask for natural language impressions
Map semantic scores
Aggregate objection patterns
Feed objections directly into listing copy

It costs under $50 and takes an afternoon. The alternative is ordering $10K in inventory and hoping you guessed right.

The research is peer-reviewed. The accuracy is validated. The tools are available. The only question is whether you're using them yet.

Read Next: Amazon Rufus AI Is Changing How Listings Get Found — The objection map from LLM pre-testing feeds directly into the semantic, benefit-led listing copy that Rufus rewards.

The Lucrivo Newsletter — Coming Soon! Please check out our content on our website for now — explore the blog, tools, and automations roadmap.

How to Use LLMs to Predict Amazon Purchase Intent Before Launch