Shreya Mathur

Assessing Trust in AI Writing Tools: Grammarly

Or: yes, I ran a usability study on my spell-checker.

DATE

19th January 2026

16th Nov 2024

19th January 2026

CATEGORY

Human-AI Interaction

READING TIME

7 Min

I use Grammarly the way most people use spell-check: reflexively and a little mindlessly. At what point did I stop reading its corrections, and just copied its output? Where did this trust come from?

While diving deep into how human and data systems work, I decided to actually look.

My team and I ran a usability study on Grammarly's AI writing tools, specifically asking: do users actually trust the outputs of this tool they use everyday? Do they trust it the way they trust a smart friend who edits their work? Or do they just don't know what's behind the blackbox of 'AI'.

Grammarly AI text editing interface

Trust in AI tools is design problem

Before we started, we needed a framework. We landed on Yocco's model of trust, which breaks it down into four components: integrity (does it follow standards?), ability (is it actually good at what it does?), benevolence (does it have your interests at heart?), and reliability (can you predict how it'll behave?). We layered this with self-determination theory, which argues that people feel most at ease in a system when they feel autonomous and competent.

Here's what's interesting: trust in a machine isn't the same as trust in a person. With a person, you're reading body language, history, reputation. With an AI, you're reading interface cues. You're asking: does this thing tell me why it's making a suggestion? Does it show me where its confidence ends? Does it explain itself, or just hand you a suggestion and hope you nod along?

The usability tests

We recruited eight current Grammarly users, who are also designers, Chicago-based, people who'd used the tool at least ten times in the past six months. We ran them through four tasks, each designed to probe a different trust dimension:

Revising a paragraph using Grammarly's live suggestions
Fact-checking a paragraph that had intentional inaccuracies
Exploring Grammarly's data privacy controls before pasting in something "sensitive"
Adjusting an email's tone from formal to casual

We watched, asked questions, and scored everything on a SUS-style scale. The average score across participants: 56.25 out of 100. To say the least, it was a surprising result. For context, the industry average is 68. So this is clearly not a good sign for Grammarly.

The unexpected findings

One: 75% of participants went to Grammarly support — not settings — to find the data training toggle. The control exists, it's right there in settings. But nobody looked there first. If users instinctively go to a help article to find a privacy control that lives in their account dashboard, the discoverability has failed, and a control nobody can find isn't really in control of anything.

Two: Every single participant was annoyed by the constant "Unlock Pro" prompts. But more than annoyed, half of them assumed that the free version's responses were less trustworthy. Not just less powerful. Less honest. That's a wild outcome. The upsell pattern accidentally taught users that trust is something you pay for.

Three: When asked, nobody said they would paste real work content into Grammarly, even after finding the opt-out toggle. NDA anxiety was real and immediate. Which made me realize: transparency about data controls and actual felt safety are not the same thing. You can build a toggle and still lose the user's confidence entirely.

Four: Everyone defaulted to the AI Chat instead of using Grammarly's specialized tools (Fact Checker, Citation Finder, Humanizer, etc.). Turns out people are just more comfortable in a conversation. The chat interface felt knowable. The feature panel felt like a menu at a restaurant where you don't recognize half the dishes.

"A data training toggle that lives in settings but people instinctively search for in support docs is a discoverability failure.

How do we evaluate "context"?

The most interesting task of all was where we asked participants to make an email more casual using Grammarly's Humanizer tool. Every single person had to manually correct the AI's output afterward.

Because "casual" is contextual. My casual email to a close collaborator reads differently than my casual email to a professor I've had coffee with once. Grammarly gives you broad tone presets (Everyday Voice, The Precisionist, The Executive etc) but these are personas, not contexts. They flatten nuance.

This is the calibration of expectations problem. The tool behaves in a way that is internally consistent but externally misaligned. And when users feel misunderstood by a tool, they stop relying on it. They mentally move it from "assistant" to "starting point."

My simmering thoughts

I started this study thinking about Grammarly specifically. I ended it thinking about AI tools generally and how often "trust" is treated as a feeling users either have or don't, rather than something a product is actively responsible for building.

If I were Grammarly's design team (hi there!), here's what I'd be sitting with:

Trust-building and monetization cannot share the same UX real estate. Every time a "Get Pro" banner interrupts a suggestion, it implicitly asks: how much do you trust an answer you didn't pay for?
Clarity beats comprehensiveness. Fewer suggestions, better explained, would do more for user confidence than an exhaustive feature panel that nobody explores.
Privacy controls need to be more explicit. An onboarding moment that says "here's what we collect and here's how you control it" is an act of respect.

Fin.

Intersectionality in the Design Practice

Politics of Design

Intersectionality in the Design Practice

Politics of Design

Intersectionality in the Design Practice

Politics of Design

Grade Inflation in Schools is a Design Challenge

Systems Thinking

Grade Inflation in Schools is a Design Challenge

Systems Thinking

Grade Inflation in Schools is a Design Challenge

Systems Thinking