QualityScore

QualityScore is a rubric I invented to evaluate the quality of any Anki flashcard. This is my first attempt and I suspect the rubric will evolve as I gather feedback from the community.

In this context, I define a flashcard as a two-part learning device. The first part (the front) is a prompt of some kind, where the learner is challenged to recall a specific piece of information. The second part (the back) is the reveal, which provides the correct answer to verify that recall.

When designing QualityScore, I tried to capture the universal qualities of the perfect flashcard. I believe that these are things that a medical student, a language learner, an engineer, and a Tolkien nerd would all want in their flashcards. Cards with high QualityScores are optimized for maximum retention, are interesting, last a lifetime, and are worthy of being shared with others.

Rather than a simple pass/fail, QualityScore is calculated as a weighted average across six distinct dimensions. This means some factors contribute much more heavily to the final score than others.

Overview of Dimensions

Dimension	Weight	Description	Scope
Correctness	40%	Factually accurate and free of spelling or grammatical errors.	Front & Back
Atomicity	20%	Tests exactly one concept.	Front
Objectivity	15%	The answer is definitive with no ambiguity and all valid answers are accounted for.	Front & Back
Richness	15%	The card includes interesting context to enhance long-term engagement.	Back
Durability	5%	The information will remain true for 10+ years.	Front & Back
Clarity	5%	The prompt clearly communicates what a correct answer looks like and stands on its own without external context.	Front

Correctness (worth 40% of QualityScore)

Correctness is the reason why I came up with flashcardaudit.com. Many of my own Anki cards are related to technology. Technology is a rapidly changing field which meant that my flashcards would quickly become outdated. Without some kind of tool to iterate through all of my cards, it was impossible to know which were stale. This is ultimately the problem I was trying to solve that led me to building this entire thing.

Correctness simply checks the factual accuracy of the premises and claims presented on the card.

Can the AI hallucinate and get this wrong? Sure, it does happen. But it's pretty rare these day and I think it only gets smarter from here. The intelligence level should be sufficient for 99%+ of the flashcards out there.

The importance of this dimension is fairly self-explanatory. Memorizing information that is incorrect is counterproductive to the goal of the Anki user.

The auditor is instructed to grade proportionally to the severity of the infraction. This means the correctness score is very low when key claims are wrong, while only small dings are made for minor spelling, capitalization, or grammatical errors. If the grammar of a card is incorrect, the learner is being taught to speak about the subject poorly.

Atomicity (worth 20% of QualityScore)

Atomicity measures the scope of the information requested by the card prompt. If the card requests the user to retrieve multiple distinct facts of information, the atomicity score will be impacted.

Many Anki users would agree that each card should be broken down into the smallest possible atomic pieces. This is rule #4 in Supermemo's 20 Rules of Formulating Knowledge.

When you break up the cards into atomic pieces, each piece of knowledge has its memory retention tracked separately by the scheduling algorithm. Highly atomic cards also force the learner to break complex topics into smaller bite-sized pieces, making it easier and more enjoyable to digest.

Low atomicity card fronts:

Explain the steps of the Krebs cycle and its primary end products.
Describe the physical and chemical properties of Gold.
Who discovered Penicillin and in what year did they win the Nobel Prize for it?

High atomicity card fronts:

Where does the Krebs cycle occur in a eukaryotic cell?
What is the atomic number of Gold (Au)?
Which biologist is credited with the discovery of Penicillin in 1928?

Trivia questions are atomic in nature as well. Imagine a pub trivia night in which those low atomicity examples are asked. I don't think it would go very well.

Objectivity (worth 15% of QualityScore)

Objectivity favors cards that only have a single right answer. QualityScore is a metric designed to ensure that cards can be shared with one another and still maintain their value. Consider these examples.

Low objectivity fronts:

Who was the best US president?
What is a big city in France?
What is a property of water?

Each question above has many different answers that could all be right. A learner could answer the card with a valid but different response that the card doesn't allow them to verify.

High objectivity fronts:

Which U.S. President signed the Emancipation Proclamation in 1863?
According to the 2020 French Census, what was the second most populous city in France?
What is the boiling point of pure water at sea level in degrees Celsius?

Cards with high objectivity also remove bias. Not everyone agrees on who the best US president was, so forcing a learner to memorize a particular point of view narrows their outlook. It is best to give the learner all the atomic facts about reality and let them form their own conclusions.

Richness (worth 15% of QualityScore)

Richness measures the amount of additional follow-up information provided on the back of a card. Once an answer is revealed, a high quality card will provide the opportunity for the learner to be curious and learn more. Memorizing a single word does not fill the knowledge graph in the brain, it only teaches the learner to recite a response when a specific question is asked.

Consider this low richness example:

# Front

What is the common name for the compound Sodium Chloride?

# Back

Table Salt

This card fails to provide the learner with an opportunity to further explore that topic. What if we added some additional information to it to make it more interesting?

# Front

What is the common name for the compound Sodium Chloride?

# Back

Table Salt

Sodium chloride is commonly called table salt because it is the most ubiquitous salt in
daily life. It is essential for human health and food preservation.

This compound consists of a lattice structure of sodium (Na⁺) and chloride (Cl⁻) ions.
These ions are vital for maintaining fluid balance and nerve transmission.

Historically, salt was so valuable that it served as currency. The word "salary" actually
derives from the Latin salarium, which was money given to Roman soldiers to buy salt.

Beyond the kitchen, it is a critical industrial raw material used in manufacturing glass
and soap or de-icing roads.

The additional information is not required for the learner to verify that their answer is correct, but doesn't it make it more fun?

You'll notice that richness is scoped only to the back of the card and not the front. Card fronts should be as minimal as possible. Adding unnecessary richness to the front of the card is excessive hinting which makes it easier to recall. Instead, we want to provide the smallest amount of hints to better simulate real world recall scenarios.

Add interesting details and extra information to the back of the card. Keep the card front as short as possible (without sacrificing clarity).

Durability (worth 5% of QualityScore)

Durability measures the long-term accuracy of a card. Some cards are phrased in a way that allows once-correct information to become false over time. When committing facts to permanent memory, the learner should avoid information that will eventually become outdated.

Low durability examples:

Who is the current Prime Minister of the United Kingdom?
What is the population of Tokyo?
Which country exports the most oil?

All of these are requests for atomic and objective pieces of information, but these facts will almost certainly go stale within years. One way of ensuring card durability is to “pin” another fact to the premise of the question.

High durability examples:

Who became the Prime Minister of the United Kingdom in July 2024?
According to the 2020 census, what was the approximate population of the Tokyo metropolis?
Which country was the world's leading exporter of crude oil in 2025?

This sort of violates the guidance of keeping the card front as minimal as possible, but I think it's an acceptable tradeoff.

Clarity (worth 5% of QualityScore)

Clarity is an assessment of whether the front of a card clarifies the success criteria and stands on its own. Cards with high clarity must pass the “stranger test.” If the front of a card were shown to a stranger who had never seen it before, would they understand what information is being requested? Would they even know which subject they are being tested on?

Low clarity examples:

(an image of The Great Wall of China)
The First Amendment.
The attack on Pearl Harbor happened in {{...}}.
What is the primary function of this organelle?

What are these cards asking for? Which facts? What constitutes a right answer? Would a stranger encountering these cards in a random deck understand the task? Now take a look at these examples when the card is making a clear, self-contained request.

High clarity examples:

What is the approximate total length of the Great Wall of China in kilometers?
List the **five specific freedoms** guaranteed by the First Amendment to the U.S. Constitution.
The attack on Pearl Harbor happened in {{which year?}}.
What is the primary function of the mitochondria?