| File Formats |
| .apkg | Supported | |
| Images |
| .png / .jpg / .jpeg / .webp | Supported | Supported. Per-image limit is 7 MB and 3072px on any side. Larger images are automatically resized before being sent to the AI auditor. Images smaller than 50px are upscaled to 100px to improve readability. Cards are capped at 20 images. If a card has more than 20 images, the extras are dropped before sending to the AI auditor. This limit exists as a safeguard against excessive processing costs. |
| .gif | Supported | GIFs are rendered natively by the browser just like they would appear in Anki. Animated GIFs are re-rendered into short video clips so the LLM sees the animation, not just a frozen frame. Static GIFs are treated as regular images. |
| .svg | Supported | SVGs render natively in the headless browser, and the AI auditor evaluates them through the card screenshot. Complex SVGs with embedded scripts may not render perfectly. |
| Video & Audio |
| .mp4 / .mov / .avi / .webm / .mkv / .flv | Supported | Video files are transcoded to H.264 MP4 and sent to the AI auditor as multimodal input. Cards are capped at 3 videos; extras are dropped before auditing. Most flashcard videos are short clips — longer videos work fine but are best kept under 30 seconds. The visual render of the card will show a placeholder where the video would appear. |
| .mp3 / .wav / .ogg / .m4a / .flac / .aac | Supported | Audio files are transcoded to MP3 and sent to the AI auditor as multimodal input. Cards are capped at 5 audio files; extras are dropped before auditing. Most flashcard audio is a short pronunciation clip or sentence — longer clips work fine but are best kept under 1 minute. The AI can identify spoken language content (e.g., pronunciation guides, vocabulary audio). |
| Note Types & Features |
| Basic note type | Supported | The GOAT note. Use basic as much as possible. Focus on studying not inventing new note types. |
| Cloze deletions | Supported | Supported. Each cloze card is rendered and audited individually. The AI auditor sees each deletion in isolation, which is the intended Anki behavior. |
| Custom note types | Partial | If you have custom note types with various fields and templates, this is most likely supported unless you are doing super weird stuff with JavaScript. The renderer injects the #qa and #qa_box containers that most community templates expect. |
| Native image occlusion | Supported | Anki's built-in native image occlusion is supported for rendering and auditing. |
| Image Occlusion Enhanced | Supported | The Image Occlusion Enhanced add-on is supported for rendering and auditing. |
| Math & Typesetting |
| MathJax / LaTeX | Supported | Both are supported. MathJax is bundled directly into the render worker, no internet required. Anki's legacy delimiters ([latex], [$], [$$], [chem]) are automatically converted before rendering. |
| Custom fonts | Supported | Fonts included in your APKG media folder are automatically detected and injected into the render. If a card uses a custom font, it will look the same as it does in Anki. |
| Rendering Behavior |
| Custom CSS | Partial | Mostly supported. Custom styling is passed through with minimal transformation. The vast majority of CSS works correctly. Edge cases involving position: fixed or viewport-relative units may produce unexpected layouts in the screenshot. |
| Custom JavaScript | Partial | Custom template scripts run in the headless Chrome sandbox. Each script is wrapped in a try/catchso a broken script won't kill the rest of the render. Cards that rely on asynchronous JS (e.g., setTimeout-based animations) may not render as expected since the screenshot is taken when the page loads, not when JS finishes. |
| External network calls | Not supported | If your card has JavaScript that fetches an image or an online resource dynamically, this is not supported. The visual render worker does not have any kind of internet access for security reasons. This is generally a bad idea though, don't do this. |
| AI Limitations |
| Hearing sounds (guitar chords, bird calls, songs) | Partial | Audio is always sent to the AI auditor. Speech-based audio (pronunciation, spoken words) works well. Non-speech sounds like guitar chords, bird calls, or instrumental music may not be recognized accurately — this is a current AI limitation that is improving over time. |
| Very recent events | Not supported | LLMs have a knowledge cutoff date. If you are asking for knowledge that can only be known post the cutoff date, the AI auditor will just assume the content of the card is correct. Gemini's cutoff can be found in Google's model documentation. Parametric knowledge cutoffs will likely shrink over time. |
| Inappropriate content / PII | Not supported | People should be free to use Anki to memorize whatever they want, but the system will flag these automatically, and not display them in the audit results. No refund is issued for flagged cards. |