A Developer's Guide to Readability Formulas: Comparing All 8

Readability formulas have been around since the 1940s. The U.S. military commissioned them to make training manuals accessible. Insurance regulators adopted them to ensure policy language was understandable. Healthcare organizations use them to write patient-facing materials that people actually read.

There are dozens of formulas, but 8 are widely used in software. They all try to answer the same question: how many years of education does someone need to understand this text? They disagree on how to answer it.

This guide explains what each formula measures, how they differ, and when to use which one. All examples use textlens, which implements all 8 in a single zero-dependency package. Disclosure: I built textlens.

The big picture

Every readability formula is a proxy. None of them understand meaning. They count things—syllables, characters, sentence boundaries, word lists—and plug those counts into a regression equation that was calibrated against human comprehension tests decades ago.

Because they measure different proxies, they produce different scores for the same text. A passage with short sentences but polysyllabic jargon will score well on Flesch-Kincaid (which weighs sentence length heavily) but poorly on Dale-Chall (which checks vocabulary against a word list).

No single formula covers all dimensions of difficulty. Using multiple formulas and averaging the results—a "consensus grade"—is more reliable than trusting any one score.

The 8 formulas

1. Flesch Reading Ease (1948)

Inputs: syllables per word, words per sentence
Output: 0–100 score (higher = easier to read)
Developed by: Rudolf Flesch

206.835 - 1.015 × (words / sentences) - 84.6 × (syllables / words)

The original and most widely recognized readability formula. A score of 60–70 corresponds to standard plain English. Scores below 30 indicate college-level text. Scores above 90 indicate text a 5th grader can understand.

Best for: quick readability checks on content aimed at a general audience.

Limitation: the only formula on this list that doesn't output a grade level. It uses its own 0–100 scale, which makes it harder to compare with the other seven. "Your score is 55" is less actionable than "your text requires a 10th-grade reading level."

2. Flesch-Kincaid Grade Level (1975)

Inputs: syllables per word, words per sentence (same as Flesch RE)
Output: U.S. grade level
Developed by: J. Peter Kincaid for the U.S. Navy

0.39 × (words / sentences) + 11.8 × (syllables / words) - 15.59

Takes the same inputs as Flesch Reading Ease but applies different weights to produce a grade level. A score of 8.2 means an 8th grader should be able to understand the text.

Best for: matching content to a target audience grade level. It's the most widely understood formula because its output is intuitive—everyone knows what "8th grade" means.

Limitation: tends to underestimate difficulty for short texts (fewer than 100 words). Sentence length dominates the calculation, so a passage with three long sentences can score very differently from the same words split into six shorter ones.

3. Gunning Fog Index (1952)

Inputs: words per sentence, percentage of "complex words" (3+ syllables)
Output: years of formal education needed
Developed by: Robert Gunning

0.4 × ((words / sentences) + 100 × (complex words / words))

Gunning was a newspaper consultant who wanted to help business writers cut jargon. His formula penalizes polysyllabic words more aggressively than Flesch-Kincaid. A Fog score of 12 means the text needs roughly 12 years of education (high school senior).

Best for: business and technical writing where jargon creep is a concern.

Limitation: penalizes multisyllabic words even when they're common. "Understanding," "government," and "information" all count as complex words despite being well-known. This inflates the score for text that's actually accessible.

4. SMOG Index (1969)

Inputs: count of polysyllabic words (3+ syllables) in a sample
Output: years of education needed
Developed by: G. Harry McLaughlin

1.0430 × √(polysyllables × 30 / sentences) + 3.1291

SMOG stands for "Simple Measure of Gobbledygook." It was designed to predict 100% comprehension (not 50–75% like most other formulas), so it consistently produces higher grade levels. If SMOG says grade 10, the text requires a 10th-grade reading level for full comprehension.

Best for: healthcare and patient-facing materials. Many health literacy guidelines (including those from the NIH and AMA) specifically require SMOG scores.

Limitation: needs at least 30 sentences for statistical accuracy. On shorter text, the polysyllable count is too noisy to be reliable.

5. Coleman-Liau Index (1975)

Inputs: characters per word, sentences per 100 words
Output: U.S. grade level
Developed by: Meri Coleman and T. L. Liau

0.0588 × L - 0.296 × S - 15.8

Where L is the average number of letters per 100 words and S is the average number of sentences per 100 words.

Unique trait: the only formula on this list that counts characters instead of syllables. This makes it more reliable for text with abbreviations, acronyms, and technical terms where syllable counting is ambiguous. Is "API" one syllable or three? Coleman-Liau sidesteps the question.

Best for: technical documentation, code-adjacent prose, anything with acronyms or domain-specific terminology.

Limitation: character length is a cruder proxy for word difficulty than syllable count. "The" and "shy" have the same character count but different perceived difficulty levels are negligible. For highly polysyllabic text, syllable-based formulas are more discriminating.

6. Automated Readability Index (1967)

Inputs: characters per word, words per sentence
Output: U.S. grade level (1–14+)
Developed by: Senter and Smith for the U.S. Army

4.71 × (characters / words) + 0.5 × (words / sentences) - 21.43

Like Coleman-Liau, ARI uses character counts instead of syllables. It was originally designed for automated processing on machines that couldn't do syllable analysis (like teletypewriters). Character and word counting is computationally cheap, making ARI suitable for real-time scoring.

Best for: real-time readability scoring in editors and CMS tools, where syllable counting would add latency on large documents.

Limitation: overlaps significantly with Coleman-Liau. If you're already using Coleman-Liau, ARI adds little new information. Most implementations include both for completeness.

7. Dale-Chall Readability (1948, revised 1995)

Inputs: percentage of "difficult words" (not on a list of ~3,000 common words), words per sentence
Output: grade level equivalent
Developed by: Edgar Dale and Jeanne Chall

0.1579 × (difficult words / words × 100) + 0.0496 × (words / sentences)

An adjustment of +3.6365 is added if more than 5% of words are "difficult."

Unique trait: the only formula that uses a word familiarity list. While other formulas use word length or syllable count as a proxy for difficulty, Dale-Chall directly checks whether words are common. This catches text with simple sentence structure but obscure vocabulary—something syllable-based formulas miss entirely.

Best for: assessing vocabulary difficulty. If your text uses rare or specialized terms, Dale-Chall will flag it even when sentence length is short.

Limitation: the word list is U.S.-centric and was last updated in 1995. Words that have become common since then (like "internet" or "smartphone") may be flagged as difficult. Proper nouns and technical terms will also inflate the score.

8. Linsear Write (1973)

Inputs: count of "easy" words (≤2 syllables) vs "hard" words (3+ syllables) in a sample
Output: U.S. grade level
Developed by: John O'Hayre for the U.S. Air Force

The formula assigns 1 point for each easy word and 3 points for each hard word, divides by the number of sentences, then adjusts: if the raw score is above 20, it divides by 2; if below, it subtracts 1 then divides by 2.

Best for: technical manuals and procedural documents. It was specifically calibrated for the kind of instructional writing found in maintenance guides.

Limitation: less widely known and validated than the other formulas. Academic research on Linsear Write is sparse compared to Flesch-Kincaid or Dale-Chall. It's most useful as a cross-check alongside other formulas rather than as a standalone metric.

Comparison table

Formula	Input method	Output	Best for
Flesch Reading Ease	Syllables, sentences	0–100 score	General readability
Flesch-Kincaid	Syllables, sentences	Grade level	Grade targeting
Gunning Fog	Complex words, sentences	Education years	Business writing
SMOG	Polysyllabic words	Education years	Healthcare
Coleman-Liau	Characters, sentences	Grade level	Technical text
ARI	Characters, sentences	Grade level	Real-time scoring
Dale-Chall	Word familiarity, sentences	Grade level	Vocabulary difficulty
Linsear Write	Easy/hard words	Grade level	Technical manuals

Which should you use?

It depends on your content and audience.

If you only pick one: Flesch-Kincaid Grade Level. Its output is universally understood, and it's the most widely cited formula in style guides and content tools.

For healthcare: SMOG. Many health literacy organizations require it specifically. It predicts full comprehension, not partial.

For business writing: Gunning Fog. It was designed for exactly this use case—catching corporate jargon and bloated prose.

For vocabulary difficulty: Dale-Chall. If you suspect your text has accessible sentence structure but obscure word choices, Dale-Chall will catch it.

For technical text with acronyms: Coleman-Liau or ARI. Character-based formulas avoid syllable-counting ambiguity on abbreviations and technical terms.

Best practice: use multiple formulas and average the grade levels. This is called a "consensus grade." It smooths out the quirks and biases of individual formulas. If five formulas say grade 8 and one says grade 14, the outlier is probably wrong—and the average will reflect that.

Quick demo

textlens implements all 8 formulas and computes a consensus grade from the 7 that output grade levels (Flesch Reading Ease is excluded from the average because it uses a different scale).

import { readability } from 'textlens';

const text = `The patient should take the prescribed medication
twice daily with food. If side effects occur, contact your
healthcare provider immediately. Do not discontinue use
without medical advice.`;

const scores = readability(text);

console.log('Consensus grade:', scores.consensusGrade);
console.log('Flesch Reading Ease:', scores.fleschReadingEase.score);
console.log('Flesch-Kincaid:', scores.fleschKincaidGrade.grade);
console.log('Gunning Fog:', scores.gunningFogIndex.grade);
console.log('SMOG:', scores.smogIndex.grade);
console.log('Dale-Chall:', scores.daleChallScore.grade);

Or use the CLI to score any file without writing code:

npx textlens README.md

Conclusion

No single readability formula is "best." They measure different proxies for text difficulty—syllable count, character count, word familiarity—and they disagree on the same input. That disagreement is a feature, not a bug. Each formula catches things the others miss.

A consensus across multiple formulas gives you a more reliable picture than any individual score. If you're building content tools, evaluating documentation, or scoring user-generated text, running all 8 and averaging the grade levels is the most defensible approach.

textlens gives you all 8 in one import with zero dependencies.

npm: textlens
GitHub: ckmtools/textlens
Landing page: ckmtools.dev/textlens