Methodology

How Faro Index measures whether AI gets your company right

Most companies have two problems with how AI describes them. They are not measuring what the assistants actually say, and even when they look, they cannot tell which structural fix will move the score. Faro Index was built to close both gaps, so this page lays out exactly how we measure accuracy, visibility, machine-readability, and narrative protection across ChatGPT, Perplexity, Gemini, and Claude, and how the same method produced the 503-company benchmark.

Faro Index runs industry-specific, buyer-intent queries across up to four AI platforms, then scores four signals against ground truth taken from a company's own website. The same scoring runs on both live product scans and the benchmark, so a number you see in your dashboard means the same thing as a number in the report.

How a scan works

Every scan follows the same structured process. We send industry-specific, buyer-intent queries to up to four AI platforms, and the platform set depends on plan: ChatGPT and Perplexity on the free scan and Starter, Gemini added on Growth, and Claude added on Pro for all four. We capture the full response text, any cited URLs, and every company mention, then score the results across the pillars and compare them to ground truth from the company's own site. Product scans use a single scoring framework with independently reported sub-scores, so the accuracy number, the visibility number, and the GEO number each stand on their own rather than collapsing into one figure you cannot act on.

What queries the scan sends

Queries come from a library of more than 800 buyer-intent patterns organized by industry. Each industry set covers four kinds of question: category definition (“what is [category] software?”), vendor evaluation (“what are the best [category] tools?”), direct comparison (“how does [brand] compare to [competitor]?”), and capability (“which [category] tools include [feature]?”).

We deliberately do not use branded prompts like “tell me about [company]” in the standard scan. A branded prompt shows how AI describes you when it already knows to look for you, which flatters everyone. A buyer-intent prompt shows whether AI surfaces you at all when someone is actively in the market and has not typed your name, and that is the moment that decides a deal.

Query volume and platform coverage scale by plan. The free scan sends 10 queries to 2 platforms. Starter sends 25 to the same 2. Growth sends 50 across 3. Pro sends the full per-industry library, more than 50 queries, across all 4 platforms, with two repetitions per query so that inconsistency itself becomes a finding. If an assistant answers the same question two different ways, that gap is worth knowing, because your buyers are seeing it too.

How AI visibility is scored

Visibility measures how often AI names your brand when buyers ask about your category. For each query we check whether the company appeared, how prominently it appeared, and whether the mention carried a sourced URL. Prominence matters as much as presence, so being the primary recommendation counts for more than a passing reference buried in a list.

The Visibility Score runs 0 to 100 and blends how often you appear, how prominently you are positioned, and how many of the scanned platforms include you. A score above 90 means AI recommends you consistently, a score near 50 means it mentions you sometimes, and a score near 20 means it rarely includes you at all.

How accuracy is measured: Brand Accuracy Rate

Brand Accuracy Rate is the share of AI's factual claims about you that are actually correct, and it is the metric the whole product is built around. Measuring it takes two steps, extraction and comparison.

For extraction, every response that mentions the company is parsed for verifiable factual claims, meaning specific assertions that can be checked against the company's own site: pricing, founding year, product names, leadership, company size, technology, geographic presence, category positioning. Vague or purely subjective statements are not scored, because they cannot be right or wrong.

For comparison, each claim is graded as correct if it matches the site or a trusted source, outdated if it was once true and has gone stale, incorrect if it is simply wrong, or unverifiable if no available source can confirm it. The Brand Accuracy Rate is the percentage of scored claims marked correct. Ground truth is the company's own website content from the same crawl used for GEO scoring, not third-party news or social posts, so the standard we hold AI to is the standard the company sets for itself.

How GEO score is calculated

GEO Score, which runs 0 to 100, measures how well your site communicates with a model. The crawler reads up to 50 pages per domain and blends six weighted signals that together make up the score. We tested for an llms.txt file across more than 400 keywords and found no measurable effect on citation, so it does not contribute. The six signals are:

Schema markup coverage. The share of crawled pages carrying structured data in JSON-LD or Microdata. Schema hands a model your facts directly instead of making it infer them from prose, and pages with none are the single biggest driver of low GEO scores.
Content readability. A readability score measured on the first 500 words after the H1. Models extract facts more reliably from clear, lower-complexity prose, and dense jargon scores poorly no matter how accurate it is.
Answer block presence. Whether a page carries a 30 to 80 word declarative paragraph, placed right after the H1, that answers the buyer question for that page. These blocks are the primary source for AI citations, and pages without them are rarely quoted.
Content depth. The share of pages with 500 or more words of substantive text, since thin pages are rarely treated as authoritative.
Structured tables and lists. The presence of real tables and lists, which models cite more often in comparative answers.
Accessibility markup. Heading hierarchy, image alt text, ARIA roles, and caption signals that help automated extraction.

Content recency adds a small bonus only when a JSON-LD dateModified or datePublished confirms an update in the last twelve months. Missing or unreliable dates never lower the score.

How content leakage is detected

Leakage has two meanings here. The first is AI using your positioning or your facts without citing your site, and the second is AI misrepresenting your narrative through a wrong category, a competitor confusion, or negative framing. After the raw responses are collected, the leakage analysis identifies the primary narrative AI is telling about you, the gaps between that narrative and your own site copy, the competitor associations that show up alongside you, any negative signals, and any uncited overlap where the response mirrors your language without attribution.

The Leakage Protection score runs 0 to 100, and higher is better. It blends primary narrative alignment with your site at 50 percent, absence of negative signals at 25 percent, absence of competitor misassociation at 15 percent, and attribution accuracy at 10 percent. A score of 90 means AI represents you accurately and consistently, while a score of 70 means it protects your story unevenly. This is not social listening and it does not count mentions. It measures whether what AI says about you is faithful.

A worked example: scanning ourselves

The first company we ever scanned was our own, and the tool came back with a Brand Accuracy Rate of 20 percent. AI was resolving “Faro” to a 3D measurement company, a city in Portugal, and a card-shuffling technique, and only one of five checkable claims about the actual company was correct. We are telling you our starting number on purpose, because the point of the example is not the score, it is what happened next.

The fix was unglamorous. We added Organization schema with explicit disambiguation on the homepage, FAQPage schema on the pricing page, and a few direct-answer paragraphs written for the exact questions buyers ask. That was one afternoon of work, and Perplexity reflected the change within 48 hours while ChatGPT took longer, which matches the broader pattern we see across the benchmark. Per-claim breakdowns are available for any company we score, including ours, so the number is inspectable rather than a black box. If the method could not move our own score, we would not be asking you to trust it with yours.

Benchmark report methodology

The 2026 benchmark covers 503 companies across 21 industries on all four platforms, using the same scoring that runs on live scans. The 21 industries span ad tech, martech, SaaS, HR tech, dev tools, cybersecurity, fintech, insurtech, edtech, legal and legal tech, real estate, e-commerce, AI infrastructure, healthcare and healthcare systems, hospitality, wealth advisory, higher education, law firms, and DTC brands. In June 2026 we widened coverage beyond pure tech to include the vertical buyer industries in that list, and they are part of the current report rather than a future release. Companies were chosen with a stratified approach so each category includes market leaders, challengers, and independent vendors.

Each benchmark company was scanned with a focused set of roughly 15 buyer-intent queries, asked across all four platforms and repeated four times each, for up to 240 responses per company. That repetition is heavier than the live Pro cadence on purpose, because the benchmark needs to separate a genuine pattern from a one-off answer. The overall average Brand Accuracy Rate across the 503 companies was 90.1%, with a median of 90.9%, which means roughly one verifiable claim in ten about the average company was wrong or outdated. We excluded failed scans and any scan that returned fewer than 30 percent of expected platform responses, and industry averages use the same scoring as live scans with no after-the-fact adjustment.

A few findings shaped how we report the rest. Name ambiguity, meaning short names, common English words, and rebrand lag, was the dominant driver of low accuracy, while weak GEO was a real but secondary factor, because companies with thin schema were more often described from third-party sources rather than their own. Marketing intelligence was the lowest industry at 54.4 percent, pulled down by a small cohort of four, while fintech, long assumed to be the hardest vertical, averaged 91.6 percent. AI Visibility averaged 22.8 with a standard deviation of just 2.8, a band too narrow to separate companies meaningfully this year, which is why we lead with Brand Accuracy Rate and treat visibility as supporting context. A 15 percent random sample of accuracy assessments was manually reviewed against source with a stored, reproducible seed, and the per-claim review is held internally and available on request.

For a small set of companies where a live crawl returned no content because of bot protection, GEO scoring fell back to the most recent Wayback Machine snapshot, and those snapshot dates can precede the scan by several months. Scans reflect platform behavior at the time of the scan, and models change continuously, so we treat the benchmark as a baseline to track rather than a verdict, and we recommend weekly monitoring, available from Starter up, to watch it move. A change of 10 points or more in Visibility or Brand Accuracy Rate triggers an automatic email alert on every paid plan.

Full industry breakdowns and data tables are in the 2026 benchmark report. Questions about method go to hello@faroindex.ai