Duolingo Vocabulary Exporter Script: Automate Word Tracking & Build Your Own Learning Dataset
Learn how to automatically capture and organize vocabulary from Duolingo using a custom userscript. This guide explains how automation can turn passive learning into structured data, helping you build your own word library, export it to CSV, and optimize your language learning workflow.
Turning Duolingo Into a Data Engine (Not Just an App)
Most people use Duolingo passively — you complete lessons, you forget half the words, and the cycle repeats. But what if every word you encounter could be captured, tracked, and exported automatically?
That’s exactly what this script does. It transforms Duolingo into a lightweight data collection system using browser automation techniques — no API, no backend, just smart DOM observation and local storage.
If you're into automation, scraping, or building datasets, this is where things get interesting.
What This Script Actually Does
At its core, the script listens to changes in the page and extracts words in real time. Whenever Duolingo renders new content (like hints or challenge text), the script:
- Detects new words dynamically
- Cleans and filters them (removes stop words, punctuation, noise)
- Stores them locally with frequency tracking
- Displays them in a clean UI
- Lets you export everything as CSV
This means you're not just learning — you're building your own vocabulary dataset automatically.
Key Features That Make It Powerful
1. Real-Time Vocabulary Capture
Using a MutationObserver, the script watches the page like a bot. Whenever new elements appear, it extracts meaningful text instantly.
This is essentially client-side scraping — no requests, no rate limits, no detection.
2. Smart Filtering System
Not every word is useful. The script removes:
- Common filler words (like "the", "is", "at")
- Single-character noise
- Punctuation and hidden Unicode characters
This ensures your dataset stays clean and usable.
3. Frequency Tracking
Each word isn’t just stored — it’s tracked.
You get:
- How many times you've seen a word
- When it was first added
This is useful if you want to later:
- Prioritize difficult words
- Build spaced repetition systems
- Train custom NLP models
4. Built-in UI (No External Dashboard Needed)
A floating widget shows:
- Total words collected
- Recently captured vocabulary
- Quick actions like block/unblock
The UI is minimal but functional — perfect for fast interaction without breaking your learning flow.
5. Blacklist System
Sometimes you don’t want certain words in your dataset.
The script allows you to:
- Block unwanted words
- Maintain a clean vocabulary list
- Re-enable them anytime
This is basically manual data curation on top of automated scraping.
6. One-Click CSV Export
With a single click, you can export everything into a CSV file:
Word, Frequency, Date Added
This opens the door to:
- Excel analysis
- Importing into Anki or other tools
- Feeding data into Python scripts
- Building your own language datasets
Why This Matters (Beyond Duolingo)
This isn’t just about language learning.
This script is a real-world example of browser-based automation + scraping, applied in a clean and practical way.
It demonstrates:
- DOM scraping without APIs
- Event-driven automation
- Lightweight data persistence
- Building micro-tools inside existing platforms
If you’re working in automation or web scraping, this is the kind of approach that scales into:
- Lead extraction tools
- Content monitoring bots
- Data collection pipelines
- RPA workflows inside browsers
Technical Highlights (For Developers)
A few things worth noticing:
- MutationObserver → Instead of polling, it reacts to DOM changes
- Session Cache → Prevents duplicate processing within short intervals
- Local Storage via GM_setValue → Persistent storage without backend
- Regex Cleaning Pipeline → Keeps dataset usable
- Dynamic UI Rendering → Updates instantly after each change
This is efficient, stealthy, and doesn’t overload the browser.
Where You Can Take This Next
If you want to level this up:
- Sync data to a backend (Supabase / PostgreSQL)
- Add translation APIs for each word
- Build a spaced repetition engine
- Integrate with Anki automatically
- Create a central dashboard for multiple users
At that point, you’re not just learning — you’re building a product.
Here is script
// ==UserScript==
// @name Duolingo Vocab Master (UI Readability)
// @version 2.1
// @description Duolingo Exporter
// @match https://www.duolingo.com/*
// @grant GM_setValue
// @grant GM_getValue
// @grant GM_addStyle
// ==/UserScript==
(function() {
'use strict';
const COLORS = {
green: "#58cc02",
red: "#ff4b4b",
blue: "#1cb0f6",
darkGray: "#4b4b4b", // Better readability
lightGray: "#f1f1f1",
border: "#e5e5e5"
};
const STOP_WORDS = new Set(["the", "of", "a", "an", "to", "in", "is", "it", "you", "that", "he", "was", "for", "on", "refer", "are", "with", "as", "i", "his", "they", "be", "at", "one", "have", "this", "from", "or", "had", "by", "but", "what", "some", "we", "can", "out", "other", "were", "all", "there", "when", "up", "use", "your", "how", "she", "each", "has", "been", "my", "me"]);
const sessionCache = new Map();
const getData = () => GM_getValue("duo_vocab_v11", { words: {}, blacklist: [], isMainOpen: false, isBlacklistOpen: false });
const saveData = (data) => GM_setValue("duo_vocab_v11", data);
function addWordToLibrary(word) {
if (!word) return;
let clean = word.toLowerCase().trim().replace(/[\u200B-\u200D\uFEFF]/g, "").replace(/[.,!?;🙁)0-9"']/g, "");
if (clean.length <= 1 || STOP_WORDS.has(clean)) return;
let data = getData();
if (data.blacklist.includes(clean)) return;
let now = Date.now();
if (now - (sessionCache.get(clean) || 0) < 5000) return;
sessionCache.set(clean, now);
if (!data.words[clean]) {
data.words[clean] = { count: 1, date: new Date().toLocaleDateString() };
triggerPulse();
} else {
data.words[clean].count++;
}
saveData(data);
updateUI();
}
// --- UI STYLES ---
GM_addStyle(`
#duo-launcher {
position: fixed; bottom: 25px; right: 25px; z-index: 10001;
width: 55px; height: 55px; background: ${COLORS.green};
border-radius: 50%; border: none; cursor: pointer;
box-shadow: 0 4px 0 #46a302; display: flex; flex-direction: column;
align-items: center; justify-content: center;
color: white; font-family: "din-round", sans-serif; transition: all 0.2s;
}
#duo-launcher:active { transform: translateY(2px); box-shadow: none; }
#duo-launcher .count-num { font-size: 18px; font-weight: bold; line-height: 1; }
#duo-launcher .count-label { font-size: 8px; font-weight: bold; text-transform: uppercase; margin-top: 2px; }
@keyframes duo-pulse {
0% { transform: scale(1); }
50% { transform: scale(1.15); box-shadow: 0 0 20px ${COLORS.green}; }
100% { transform: scale(1); }
}
.pulse-anim { animation: duo-pulse 0.4s ease-out; }
#duo-master-container {
position: fixed; top: 15px; right: 15px; z-index: 10000;
background: white; border: 2px solid ${COLORS.border}; border-radius: 16px;
width: 300px; max-height: 80vh; display: none; flex-direction: column;
font-family: "din-round", sans-serif; box-shadow: 0 4px 0 ${COLORS.border};
}
#duo-master-container.open { display: flex; }
.duo-header {
padding: 12px; background: ${COLORS.green}; color: white;
border-radius: 13px 13px 0 0; font-weight: bold;
display: flex; justify-content: space-between; align-items: center;
}
.duo-content { overflow-y: auto; padding: 12px; flex-grow: 1; background: #fff; }
.word-item {
display: flex; justify-content: space-between; align-items: center;
padding: 8px 0; border-bottom: 2px solid #f0f0f0;
}
.word-text { color: ${COLORS.green}; font-weight: bold; }
.blacklist-toggle {
padding: 12px; background: #f7f7f7; cursor: pointer;
border-top: 2px solid ${COLORS.border}; font-weight: bold;
display: flex; justify-content: space-between; color: ${COLORS.darkGray};
font-size: 12px; letter-spacing: 0.5px;
}
.blacklist-content { padding: 10px; display: none; background: #fff; max-height: 150px; overflow-y: auto; border-radius: 0 0 16px 16px; }
.blacklist-content.open { display: block; }
.btn-duo {
cursor: pointer; border: none; border-radius: 12px;
padding: 6px 12px; font-size: 10px; font-weight: bold;
text-transform: uppercase; box-shadow: 0 2px 0 rgba(0,0,0,0.1);
}
.btn-red { background: ${COLORS.red}; color: white; }
.btn-blue { background: ${COLORS.blue}; color: white; }
/* BLACKLIST CHIP FIX */
.chip {
display: inline-flex;
align-items: center;
background: ${COLORS.lightGray};
color: ${COLORS.darkGray}; /* Fixed: Dark text on light background */
padding: 5px 10px;
border-radius: 14px;
margin: 3px;
font-size: 12px;
font-weight: 500;
border: 1px solid #ddd;
}
.chip-remove {
margin-left: 8px;
color: ${COLORS.red};
cursor: pointer;
font-weight: 800;
font-size: 14px;
line-height: 1;
}
.chip-remove:hover { transform: scale(1.2); }
`);
// --- DOM SETUP ---
const launcher = document.createElement('button');
launcher.id = "duo-launcher";
document.body.appendChild(launcher);
const container = document.createElement('div');
container.id = "duo-master-container";
document.body.appendChild(container);
function triggerPulse() {
launcher.classList.remove('pulse-anim');
void launcher.offsetWidth;
launcher.classList.add('pulse-anim');
}
function updateUI() {
const data = getData();
const totalWords = Object.keys(data.words).length;
launcher.innerHTML = `<span class="count-num">${totalWords}</span><span class="count-label">Words</span>`;
container.classList.toggle('open', data.isMainOpen);
const words = Object.keys(data.words).reverse().slice(0, 50);
container.innerHTML = `
<div class="duo-header">
<span>LIBRARY (${totalWords})</span>
<div style="display:flex; gap:10px; align-items:center;">
<button id="export-csv" class="btn-duo btn-blue">CSV</button>
<span id="close-ui" style="cursor:pointer; font-size:20px; line-height:1;">✕</span>
</div>
</div>
<div class="duo-content">
${words.map(w => `
<div class="word-item">
<span><span class="word-text">${w}</span> <small style="color:#aaa; font-size:10px; margin-left:4px;">${data.words[w].count}x</small></span>
<button class="btn-duo btn-red action-block" data-word="${w}">Block</button>
</div>
`).join('') || '<p style="text-align:center; color:#ccc; padding:20px;">No words found yet...</p>'}
</div>
<div class="blacklist-toggle" id="toggle-bl">
<span>BLACKLISTED (${data.blacklist.length})</span>
<span>${data.isBlacklistOpen ? '▼' : '▲'}</span>
</div>
<div class="blacklist-content ${data.isBlacklistOpen ? 'open' : ''}">
${data.blacklist.map(w => `
<span class="chip">
${w}
<span class="action-unblock chip-remove" data-word="${w}">×</span>
</span>
`).join('') || '<p style="font-size:11px; color:#aaa; text-align:center;">No words blocked.</p>'}
</div>
`;
// --- ATTACH EVENTS ---
document.getElementById('export-csv').onclick = exportCSV;
document.getElementById('close-ui').onclick = toggleMainUI;
document.getElementById('toggle-bl').onclick = toggleBlacklistUI;
container.querySelectorAll('.action-block').forEach(btn => {
btn.onclick = () => blockWord(btn.dataset.word);
});
container.querySelectorAll('.action-unblock').forEach(btn => {
btn.onclick = () => unblockWord(btn.dataset.word);
});
}
function toggleMainUI() {
let d = getData(); d.isMainOpen = !d.isMainOpen; saveData(d); updateUI();
}
function toggleBlacklistUI() {
let d = getData(); d.isBlacklistOpen = !d.isBlacklistOpen; saveData(d); updateUI();
}
function blockWord(word) {
let data = getData();
delete data.words[word];
if (!data.blacklist.includes(word)) data.blacklist.push(word);
saveData(data);
updateUI();
}
function unblockWord(word) {
let data = getData();
data.blacklist = data.blacklist.filter(w => w !== word);
saveData(data);
updateUI();
}
launcher.onclick = toggleMainUI;
function exportCSV() {
const data = getData();
let csv = "Word,Frequency,Date Added\n" + Object.entries(data.words).map(([w, i]) => `${w},${i.count},${i.date}`).join("\n");
const a = document.createElement('a');
a.href = URL.createObjectURL(new Blob([csv], {type: 'text/csv'}));
a.download = 'duo_vocab_export.csv'; a.click();
}
const observer = new MutationObserver(() => {
const elements = document.querySelectorAll('[data-test="hint-token"], [data-test="challenge-token-text"], [style*="dashed"]');
elements.forEach(el => {
const val = el.getAttribute('aria-label') || el.innerText;
if (val) val.split(/\s+/).forEach(p => addWordToLibrary(p));
});
});
observer.observe(document.body, { childList: true, subtree: true, characterData: true });
updateUI();
})();