How it Works

A semi-technical explanation

The flashcardaudit.com monorepo spans roughly 50,000 lines of code, mostly in the Python backend, tooling, and automation. Every line of code in this project has been written by an AI agent. Though the overall system design is my own creation.

I waited patiently to ship this code because I wanted the freedom to regularly do HUGE refactors to the architecture, schema, and codebase. There were many times where I had a completely functional system and I dismantled the entire thing because I thought of a better way to rebuild it.

If you are reading this, it means I finally ran out of tech debt. I literally could not think of a single thing I could do to improve the backend. Refactoring code is just as important to me as tidying my /Users/$USER/Desktop or my physical desktop. This is my masterpiece, my magnum opus, let me tell you a little bit about how it works.

An early prototype of the flashcardaudit.com system design
A very early system design attempt

System Design

When designing this system, I strongly felt that the AI auditor emulating the human flashcard experience would provide the highest quality of audit. This means that I need to take screenshots and simulate an experience for each card that flows through the system. This is an enormous amount of compute and handling this at scale requires engineering.

You can think of the entire backend architecture as a massive dam. As audits flow into the system, they accumulate in a reservoir (the queue) while being released through the spillways at a steady pace. This asynchronous dam-like architecture should allow me to handle any major spikes in audit requests.

Batch Inference & Async Processing

I don't believe that auditing flashcards needs to be a high speed event. Instead of speed, I've chosen to prioritize quality and cost effectiveness by leveraging batch inference. Batch Inference means I can save on costs if I'm willing to wait up to 24+ hours to receive audit results. This is where the non-instant delivery times come from. For me to achieve the scale that actually makes a real world difference, I learned early on that I needed to develop an asynchronous architecture.

Rendering & Audit

When processing begins, every Anki card is rendered as a set of 2 images inside a headless browser: the front of the card, and the back of the card. Custom HTML/CSS/JS is injected into each card to help the auditor best access the content of that card. I've tried my best to maintain a reasonable fidelity with the original card while still keeping things secure. Though there are limitations here, see the Compatibility List for more information on what I can and can't do.

Side-by-side view of a rendered Anki card front and back as seen by the AI auditor
A rendered card. The AI auditor views both the front and back of the card.

Results & Publication

After rendering, both the card images and any other media files (audio/video) are packaged and sent to Google. After a waiting period, I fetch the audit results. I assemble all of the results and then update the global registry of Cached Cards. I then publish the data to Cloudflare R2. This is when you will receive an email notifying you of audit completion.

Less is More

One of the things I decided to remove for this first version was any kind of login or password protection. The Anki flashcards I'm most interested in auditing are non-personal and purely academic. So why not just get rid of auth? Aren't we all tired of creating new accounts?

I'm a lifelong minimalist. To me, life is all about stripping things down to the absolute bare essentials, until you are left with what is truly important. This is a philosophy that guides me in all my choices. I'm always trying to keep it simple.

SpaceX Raptor engine iteration diagram
I try to iterate on my codebase like this. (my inspo)

“It is not daily increase but daily decrease; hack away the unessential.”