Synthetic Data and The Pursuit of Authentic Intelligence

Rob Kaiser, PhD

Chief Methodologist

“Just because you can, doesn’t mean you should.”

AI’s promises are many, and we are excited about the future of insights in an AI-impacted world. Temptations to use it inappropriately or before it has been proven are certainly enticing, but may lead down a primrose path. As marketers consider the use of synthetic audiences—which offer whole datasets that purport to replicate human responses through AI training—it’s important to understand what’s truly effective and what’s merely a mirage.

A Framework for AI Insights

We designed a four quadrant map to help our clients and other insights professionals understand how AI fits as both a data collector and a data source. Across one axis, we move from synthetic to real human (authentic) data. On the other, we move from surface metrics to deep insights, giving us four quadrants:

  • Basic Understanding reflects synthetic audiences that provide quick, surface-level input.
  • Synthetic Smart uses AI personas to synthesize or summarize.
  • Measurement captures traditional research, now enhanced by AI in analysis and reporting.
  • Enhanced Intelligence represents the goal: Real data amplified by AI to uncover deeper meaning.

If what you need is speed, synthetic audiences might be useful. After all, they can deliver results at the push of a button, without the need for time and money spent on recruiting and incentives. They can also provide a sandbox for curiosity, a place to experiment before you field and highlight what might be broken in your survey before it goes live. But their limitations are vast.

Real insights come from the unexpected—the surprising data point that shifts a message, redefines a category, or uncovers a hidden motivation. Data from synthetic audiences can replay the patterns of the past with extraordinary precision, but it cannot tell you something truly new. Those moments do not come from large language models. They come from people.

A recent Columbia University study explored how synthetic “twins” (AI data designed to match real individuals) compared to their human counterparts. At the individual level, the correlation was weak at best. The researchers found that the data looked real until they asked it to respond like an actual person to novel questions. That’s the problem: Large language models can reproduce patterns, but not the lived experiences that create the patterns. They cannot feel, they lack context, and they avoid contradiction—the very things that make humans, well, human—and why real, human responses are so valuable.

We also know that the use of synthetic data is more or less inevitable in at least some capacity. It’s designed to address the most annoying but essential step of insights research —engaging with real people. If you are going to use it, we recommend starting with pre-testing research studies. This lets you learn and improve your human-focused research at the same time. If you go beyond that, you should make sure it’s balanced, differentiating, and continuously refreshed with real and relevant human input—not just stitched together from the same recycled training models.

Making Synthetic Data Smart

A better solution can be found in what we call Synthetic Smart. Instead of creating whole synthetic audiences, we can create AI interfaces you can query. Some researchers even create AI personas that interpret or synthesize real data. When grounded in authentic inputs, they can help us interact with results, summarize themes, and explore complexities. The closer these tools stay to human truth, the better they perform, so we build them with real consumer data that’s relevant to business needs. A caution here is that personas (like all synthetic data) are highly sophisticated stereotypes that do not capture the complexity and diversity of human thoughts, feelings, and experiences. This can lead to problems. Smart tools for querying and other alternatives avoid this issue while still allowing intelligent and dynamic interactions with findings without quite the same risk of oversimplification.

We’ve found that synthetic data needs a lot of inputs to provide relevant insights that move past the base LLM models. You’ll want to feed these tools with all of your company’s research and plan to feed it fresh data and information frequently, even continuously for some applications. This isn’t a shortcut around authentic research. It’s an enhancement, making authentic research even more useful. If you’re feeding these tools with solely LLM inputs, they start producing stereotypes.

Authenticity is Timeless

Asking real people real questions is how we capture what is happening now and anticipate what’s to come. AI can and should make this process smarter by automatically coding themes, summarizing feedback, and surfacing insights that once took weeks to analyze. It is not about replacing researchers —it is about freeing them to think.

That brings us to the real opportunity: Enhanced Intelligence. This is where AI amplifies human insight rather than fabricating it. It starts with real voices and strong methods, then uses technology to dig deeper and find meaning at scale without losing authenticity.

At PSB, we have applied this idea through approaches such as Narrative Intelligence, where AI transforms open-ended responses into structured understanding that captures values, emotions, and decision drivers that rating scales miss. We are also applying AI to interpret how people react to images and to explore the “why” within choice models. The common thread? We’re using AI to enhance human truth, not manufacture it.

The risk for the insights industry is not that AI will take over research. The real risk is that we will hand it the microphone before it earns the right to speak. Overreliance on synthetic data leads to confirmation bias and blindness to risks and opportunities outside the training set. The more we train it on the past, the harder it will become to see the future.

Moving Forward: AI as a Partner

In the years ahead, the strongest research strategies will be hybrid, using AI where it accelerates or deepens our understanding while keeping real people at the center. That means knowing when to use each tool:

  • Synthetic data for early exploration and pre-testing.
  • Synthetic Smart for synthesis and interactivity.
  • Traditional measurement for reliable benchmarks.
  • Enhanced Intelligence for discovery and depth.

While AI’s impact is undeniable, our job as researchers has not changed: To understand people. And that means understanding their messy, emotional, and often contradictory nature. No algorithm can replicate that. Replacing the human voice won’t help you make smarter decisions—but using tools to hear it more clearly will.

Time to tackle that thorny problem

Let's talk