ai-visibility

How to measure AI visibility for your brand

AI visibility does not behave like keyword rankings. Here is the measurement model that actually works in 2026, the five metrics leadership will accept, and the tools we use.

Updated May 18, 202612 min read

Originally published December 24, 2024

AI visibility does not behave like keyword rankings. There is no stable "position one" to screenshot. Answers vary by platform, by prompt phrasing, and by the week the model was last updated. Rank-tracker dashboards that worked for a decade now measure the wrong thing, and the marketing leaders we talk to can feel it. They are spending on AI visibility and cannot tell their board whether it is working.

This guide is the measurement model we use with clients. It assumes your goal is not a vanity chart. It is a defensible number you can take into a quarterly review and say "this is what changed." Soar is a community marketing agency that has run 4,200+ community campaigns across 280+ brands since 2017, and the framework below is the one we built after measuring AI citations for the first cohort of clients who asked "are we actually showing up in ChatGPT?"

Why AI visibility cannot be measured like keyword rankings

AI visibility measurement requires a share-of-voice model, not a rank model. A single query into ChatGPT can produce a different answer tomorrow, from a different session, for a different user, with different sources cited. The unit that holds still is not "position." It is the percentage of prompts in a defined set where your brand appears, cited, linked, or described. You have to run the same prompt set on a recurring cadence and watch the trend.

The numbers behind this shift are dramatic. Only 12% of AI citations overlap with Google's top-10 results, so even a brand that dominates classic SEO can be invisible in AI answers. 82% of AI citations are earned media from third-party editorial, community threads, and review sites, not your own pages. And Perplexity pulls 47% of its top-10 cited sources from Reddit. If your measurement stack only looks at your own URLs, it is blind to most of the signal.

The implication for Sarah: stop asking "where do we rank?" Start asking "on what percentage of the questions our buyers ask AI do we appear, and how does that compare to our top three competitors?" That is the question leadership actually cares about, and it is measurable.

What are the five KPIs that matter for AI visibility?

The five KPIs that matter are mention frequency, citation share, answer share of voice, sentiment alignment, and source inclusion rate. Together they tell you whether you are present, dominant, accurately positioned, and whether the AI actually uses your content as source material. No single metric is enough. A brand can be mentioned often but never cited, or cited often but described inaccurately.

Mention frequency - the share of prompts in your fixed set where your brand name appears anywhere in the answer, linked or not. This is the top-of-the-funnel visibility number.
Citation share - among the URLs the model links to, how many are yours or a third-party writing about you. This is the nearest equivalent to SERP presence.
Answer share of voice - across all the branded mentions in your prompt set, how many are yours vs. competitors. This is the number a CMO will accept in a board review.
Sentiment alignment - whether the model describes you the way your positioning document says it should. A brand cited negatively is worse than a brand not cited at all.
Source inclusion rate - the percentage of prompts where the model cites a domain you own (site, subdomain, owned subreddit, company LinkedIn) vs. a third party. Tells you how much of your visibility is earned through your own publishing vs. the community.

A report that shows all five on one page, with a 90-day trend line and competitor benchmarks, is the artifact that wins budget conversations. Reports that show one number in isolation lose them.

How do you build a prompt set that actually represents buyer intent?

Start with 75-150 prompts derived from real buyer intent, grouped into three buckets: category prompts ("best [category] for [use case]"), comparison prompts ("[your brand] vs [competitor]"), and problem prompts ("how do I solve [pain point]"). These are not keyword lists. They are full questions, written the way your customer would type them into ChatGPT. A prompt set of 30 is too noisy to trend. A prompt set of 500 is too expensive to run weekly and too diluted to act on.

Source prompts from three places. First, your sales team, meaning the literal questions prospects ask on discovery calls. Second, Reddit and Quora threads where your category is discussed (search queries like "best X for Y" on both platforms). Third, the fan-out sub-queries that search-enabled AI models generate internally: if a prospect asks "best CRM for a 50-person sales team," the model will also retrieve "most affordable CRM for SMB," "HubSpot alternatives," and "CRM with strong Gmail integration." Cover the fan-out, not just the parent query.

Freeze the set quarterly. The single most common mistake in AI visibility measurement is changing the prompt set mid-quarter and then wondering why the numbers jumped. Treat your prompt set like a benchmark portfolio: rebalance on a schedule, not in response to a bad week. This is also the discipline that makes your dashboard legible to a CFO who has never heard of GEO.

How does measurement differ across ChatGPT, Perplexity, Google AI Overviews, and Claude?

Measurement has to be run per platform because each model pulls from different sources and behaves differently. Google AI Mode shows the strongest brand preference at 59.8% of citations being brand-owned content, vs 44.7% in ChatGPT and 28.9% in Perplexity. Perplexity cites Reddit in 47% of its top-10 sources. Google AI Mode ranks Quora as the #4 most-cited domain at 7.25% of responses. A single brand can be dominant in one and invisible in another.

Pick the two or three platforms where your buyers actually spend time. For B2B SaaS, that is usually ChatGPT and Perplexity. For consumer and DTC categories, Google AI Overviews carries more weight because it rides on top of Google's installed base. For technical audiences and research-heavy decisions, Perplexity and Claude matter more. Running all four every week is expensive and rarely worth the cost for a mid-market brand.

Platform	Where it pulls most-cited sources from	Measurement priority
ChatGPT	Earned media, Reddit, category publications	Highest for B2B SaaS and professional services
Google AI Overviews	Brand-owned pages, Reddit, Quora, YouTube	Highest for consumer, DTC, local
Perplexity	Reddit (47%), news, research sources	Highest for technical and research-heavy buyers
Claude	Long-form documentation, news, editorial	Secondary - run quarterly, not weekly

The table is the report slide. A VP of Marketing who can show platform-by-platform share of voice against two competitors, not an average across platforms, is the one who defends her budget in Q4.

Which AI visibility measurement tools do marketing leaders actually use?

In 2026, four commercial tools cover most mid-market use cases: Profound, Otterly.AI, Ahrefs Brand Radar, and Parse. Each has a different strength, and brands often run two in combination. Profound leads on enterprise-grade prompt volume and platform coverage. Otterly is lighter and faster to set up. Ahrefs Brand Radar launched in early 2026 and layers AI prompt tracking onto Ahrefs' existing backlink and brand-mention infrastructure, which makes it cost-efficient for teams already on Ahrefs. Parse is Soar's internal tool and focuses on tying AI citations to the community signals that produced them.

Tool	Starting price (2026)	Best for	Watch out for
Profound	~$499/month (source)	Enterprise prompt volume, multi-platform	Overkill for teams tracking fewer than 100 prompts
Otterly.AI	From ~$29/month (source)	SMB and early-stage, faster setup	Thinner sentiment and attribution layer
Ahrefs Brand Radar	Bundled with Ahrefs subscription	Teams already using Ahrefs; mentions + citations in one view	Newer product - reporting still maturing
Parse (Soar)	Free tier + paid plans	Tying AI citation to community source	Best when paired with community marketing execution

Free stacks are viable for smaller brands and are covered in detail in our guide to free AI visibility tracking tools. For mid-market brands serious enough to report share of voice to a board, budget $500–$1,500 per month for tooling alone, separate from agency or in-house labor.

How often should you report AI visibility, and to whom?

Report weekly to the marketing team, monthly to leadership, quarterly to the board. Weekly cadence catches prompt-set drift and surfaces platform-specific surprises (e.g., ChatGPT adding a new retrieval pass). Monthly cadence is the right unit for leadership because it matches the content and community work that drives AI visibility. Weekly swings look like noise, monthly trends look like direction. Quarterly is the right unit for the board, where only the share-of-voice trend against competitors matters.

The report template that works: one page per platform, with mention frequency, citation share, and share of voice on a 90-day trend, a small competitor set (3-5 brands), and a short qualitative commentary ("we were picked up in this Reddit thread in week 4, citations lifted 11% the following week"). Avoid vanity metrics like "prompts tested" or "total mentions" because they grow mechanically with prompt-set expansion and confuse leaders.

What to cut from the report: raw screenshots of AI answers (they age out in hours), keyword rankings (wrong measurement frame), and "AI Overviews impressions" unless you have a source truly confident in that number. The goal is a report Sarah's CFO will accept as a proxy for category presence, not a dashboard that impresses the team.

How long before AI visibility measurement becomes meaningful?

Expect 60 days before your baseline is stable and 4–6 months before measurement can defend or kill a program. The first 30 days are noisy because models retrain, platforms change retrieval behavior, and your prompt set is still being tuned. Content updated within 30 days receives 3.2x more AI citations, which means your own content freshness has an outsized effect on early readings and can mask or inflate the baseline.

Months 2-3 are where platform-specific patterns emerge. You see which platform your brand is strongest in, which prompt groups you own, and which competitors outperform you. Months 4-6 are where community-marketing investments start to show up because AI models retrain on fresh third-party content on a 60-90 day cycle, so a Reddit thread that mentions your brand in April will not move citations until June at the earliest. This is why "we bought a measurement tool in January and we are not seeing movement" in February is not a program problem. It is a timing problem.

The practical implication: run at least two full measurement cycles before deciding whether a program is working. Anything less is a premature verdict and usually kills a program that would have worked.

Frequently asked questions

What is share of voice in AI search?

Share of voice in AI search is the percentage of branded mentions, across a fixed prompt set, that are yours rather than a competitor's. If your prompt set produces 100 branded mentions in a month and 32 of them are your brand, your share of voice is 32%. It is the closest equivalent to share of voice in traditional media and the most defensible KPI to present to leadership.

Can I use Google Search Console to measure AI visibility?

No. Google Search Console does not currently surface AI Overview impressions or citations as a dedicated metric in a way you can trend. You need either a commercial tool (Profound, Otterly, Ahrefs Brand Radar, Parse) or a manual prompt testing workflow.

How many prompts do I need to track?

75 to 150 for a mid-market brand. Fewer than 50 is too noisy to trend. More than 200 is expensive to run weekly and tends to dilute the prompts that actually matter. Rebalance quarterly.

What does citation share measure vs. mention frequency?

Mention frequency counts every time your brand name appears in an AI answer, whether or not a link is attached. Citation share counts only the cases where the model links to a URL you own or a third-party writing about you. Citation share is a stronger commercial signal because it indicates the model is treating your sources as authoritative.

How do I measure AI visibility if I do not have budget for a tool?

Run a manual prompt set weekly using ChatGPT, Perplexity, Google AI Mode, and Claude, record results in a spreadsheet, and calculate the five KPIs by hand. It is tedious but possible for a brand with 50 to 100 prompts. Our guide to free AI visibility tracking tools covers the semi-automated free stack that reduces the manual work.

Is AI visibility the same as GEO?

Not exactly. GEO (Generative Engine Optimization) is the strategy of becoming citable by AI models. AI visibility is the measurement layer, which is how you know the strategy is working. The two are complementary. Our broader overview of how to rank on ChatGPT and Claude sits on the strategy side.

What this means for your 2026 plan

If your existing dashboard is a keyword rank report with a single AI Overviews metric bolted on, it is not a measurement system. It is a placeholder. Build the share-of-voice model described above, commit to a fixed prompt set for at least two quarters, and accept that the meaningful trend will appear in months 4-6, not weeks 1-2. That is the hardest part: leadership will ask for a verdict before the data can produce one, and holding the line is the job.

Measurement without a program attached is expensive analytics. The brands that win the budget conversation are the ones that pair a defensible share-of-voice model with the community and editorial program that actually moves it.