The personality you forgot to procure

When a business shortlists an AI assistant, it weighs the usual suspects: benchmark scores, latency, price, context window, ecosystem and governance. What it almost never writes down is the one thing every user will actually notice within thirty seconds — what the thing is like to talk to.

That's a strange blind spot, because in the human workplace we treat exactly this question as a multi-billion-pound problem.

We already spend a fortune on this — for humans

Walk into any leadership away-day and you'll meet the Myers-Briggs Type Indicator. The Myers-Briggs Company reports that 88% of Fortune 500 companies have used it, and roughly two million people take it every year. CliftonStrengths claims north of 34 million people; DISC, over 50 million. There's an entire personality-assessment industry sized at around $10 billion in 2025 , built on a single premise: matching communication styles to people and tasks improves outcomes.

Zoom out and the numbers get bigger. US corporate training spend hit $102.8 billion in 2025 , a good slice of it on leadership, team-building and communication. And the perceived cost of getting it wrong is enormous — Grammarly and the Harris Poll estimated up to $1.2 trillion a year lost to ineffective communication in US firms, about $12,500 per employee.

A caveat worth stating plainly: much of this science is shaky. MBTI's test–retest reliability is poor, and its own publisher says it should never be used for hiring. So treat the precise figures as directional, not gospel. But the willingness-to-pay is real and well established. Organisations clearly believe that how people communicate carries serious financial weight.

So here's the obvious question nobody's asking: if communication style matters this much between humans, why do we assume it doesn't matter between a human and the AI they'll spend all day talking to?

AI personality is real, measurable, and it moves the needle

It turns out this isn't hand-waving. Researchers have been putting language models through the same psychometric tests we use on people, and the models have profiles.

The landmark paper, "Personality Traits in Large Language Models" (Serapio-García et al., Google DeepMind and Cambridge, 2023), applied validated Big Five instruments to eighteen models and found personality in their outputs can be reliably measured — and deliberately shaped. Follow-on work like PsychoBench (ICLR 2024) found distinct, sometimes stubbornly stable profiles: one widely-cited study found ChatGPT registers as the same Myers-Briggs type no matter how you prod it.

A health warning, because this field invites nonsense: these scores are wildly prompt-sensitive, and "personality" here means a behavioural disposition in the output, not an inner life. "Claude is an INTJ" is exactly the clickbait to avoid. The defensible claim is narrower and more useful — models have measurable, consistent stylistic dispositions.

More importantly, those dispositions change how people respond. This is old news in human-computer interaction. Clifford Nass's "Computers Are Social Actors" work in the 1990s showed people unconsciously apply social rules to machines, and prefer ones whose style resembles their own — the similarity-attraction effect. The LLM era confirms it: a 2025 study cheekily called "Vibe Check" found a medium level of personality expression scored best across trust, likeability and intention to adopt, and that closer user–agent alignment improved perceptions further. Not too flat, not too much — and matched to the user.

And the stakes aren't only about whether people like the tool — they're about whether it performs. Every clarifying question a model has to ask, every answer pitched at the wrong level, every misread of an ambiguous brief is friction. Friction is precisely what that $1.2 trillion human-communication figure measures: time and effort lost in translation. The same logic carries straight over. An assistant whose style fits how you actually express goals and context reaches a usable result in fewer turns.

That matters most for agents, because agents run on briefs. Hand a model a goal and some context and let it act, and the quality of the outcome depends heavily on how well it infers what you meant from an underspecified instruction — and how well its output lands without a human reshaping it. A communication mismatch there isn't a bruised ego; it's wasted cycles, wrong turns and re-work. Fit, in other words, is throughput. It's not cosmetic, and it's not just about trust — it's about how much of the job gets done correctly before anyone has to intervene.

Where does the personality come from — the model or the prompt?

Both, and the split matters for how much control you actually have.

The default comes overwhelmingly from post-training. Anthropic openly describes shaping "Claude's character" during fine-tuning — instilling traits like curiosity and honesty about its own views. OpenAI's Model Spec does the equivalent job from the other direction, with explicit rules like "don't be sycophantic." When that scaffolding slips, the result is dramatic: in April 2025 an update made GPT-4o so fawning that OpenAI rolled it back , concluding afterwards that "personality and other behavioural issues should be launch blocking."

On top of that baseline, you can steer with system prompts and persona instructions — a lot. But not infinitely. Anthropic's "Persona Vectors" work shows traits like sycophancy live as measurable directions inside the model that can drift during further fine-tuning. And those stable-MBTI findings show models pushing back against instructed change.

The practical upshot: you never start from a blank slate. Prompting moves personality substantially, but the provider's training sets a default that keeps reasserting itself — and can shift under your feet when they ship an update.

You're already inheriting personalities — usually by accident

Here's where it gets concrete for buyers. Every platform you might adopt has already made these choices for you.

ChatGPT has selectable styles, adjustable warmth, custom instructions and Custom GPTs. Claude has Projects and Styles. Gemini has Gems. Microsoft Copilot Studio lets you build agents with defined personas. Adopt any of them and you inherit a personality — typically without anyone deciding on it, or writing it down.

Go further into multi-agent frameworks and persona becomes a literal building block. In CrewAI, every agent is defined by a role, a goal and a backstory — and the docs confirm that backstory goes straight into the system prompt. Microsoft's Agent Framework lets you assign a different underlying model to each agent. So you're choosing both the persona and which model plays it, often without realising those are the levers.

Does assigning personas actually improve performance? Honestly, the evidence is mixed. Some studies show distinct personas improve coordination; others find the specific persona barely matters, only that agents are differentiated. And it can backfire — one 2024 study found irrelevant demographic personas shifted task performance by up to 19%. Persona is a real lever, but an unreliable, occasionally biasing one. Which is all the more reason to test rather than assume.

What to actually do

Nothing here demands a new department. It demands moving personality from "thing we noticed afterwards" to "thing we evaluated on purpose":

Put it on the scorecard. Alongside accuracy and price, assess default tone, verbosity, sycophancy, and willingness to disagree. Write down the character you're inheriting.
Run a vibe check with your own people. Test two or three models on real tasks with real staff, and measure trust and satisfaction — not just whether the answer was correct.
Pin it, then re-test on every update. Specify tone explicitly, version it, and treat a personality regression as launch-blocking. OpenAI learned that one the hard way.
Don't over-claim. "Measurable stylistic dispositions that affect outcomes" — yes. "Model X is an INTJ" — no.

We've built a billion-pound industry around the idea that how humans communicate with each other determines whether work gets done. We're now handing a chunk of that work to machines we'll talk to all day long. Pretending their personality doesn't matter isn't scepticism. It's just the one variable we forgot to procure.