Build Internationalization

Build-Time Site Translation

The generator now includes a build-time translation materializer for Hugo sites.

The intended model is:

  • keep one canonical authored language under content/
  • generate localized language trees under .generated/i18n/<lang>/content
  • keep translation cache state under .generated/i18n-cache/cache.sqlite
  • let Hugo read language roots from a generated config fragment

Current entry points:

  • scripts/i18n-build.mjs
  • scripts/i18n-watch.mjs
  • scripts/i18n-audit.mjs
  • scripts/i18n-core.mjs

Site Contract

Each site opts in with an i18n.config.yaml at the site root.

TutorLumin now uses:

Important fields:

  • sourceLanguage
  • sourceLanguageCode
  • targetLanguages
  • generatedRoot
  • cachePath
  • generatedHugoConfigPath
  • provider.kind: noop, command, or openai
  • glossary.protectedTerms

Output Contract

On each build, the materializer:

  1. clears .generated/i18n/
  2. rewrites generated language trees
  3. preserves cache rows in .generated/i18n-cache/cache.sqlite
  4. rewrites the Hugo language fragment at config/_default/zz_i18n.generated.toml

This means content stays uncluttered while repeated builds reuse warm translation cache entries.

Supported Source Types

The current build step translates:

  • Markdown pages under content/
  • YAML sidecars such as quiz.yaml
  • SVG text nodes

Non-text assets are copied through unchanged.

For Markdown and YAML, the translator preserves:

  • shortcodes
  • MathML blocks
  • KaTeX-style math delimiters
  • inline code
  • URLs and asset paths
  • glossary-protected brand terms

MathML is handled specially: <mtext> nodes are translated before the wider block is protected.

Cache Invalidation

Cache reuse is keyed by:

  • target language
  • source text hash
  • provider kind
  • provider model
  • prompt version
  • glossary version
  • pipeline version

If you change model, prompt, glossary, or pipeline version, old rows become stale automatically.

Audit or purge stale rows with:

npm run i18n:audit -- --site '/path/to/site'
npm run i18n:audit -- --site '/path/to/site' --purge-stale

Commands

One-shot build:

npm run i18n:build -- --site '/home/henry/Design Resources (2)/Websites/Groups/tutorlumin.co.uk'

Build and run Hugo with the generated language config layered explicitly:

npm run i18n:hugo -- --site '/home/henry/Design Resources (2)/Websites/Groups/tutorlumin.co.uk' -- --destination /tmp/tutorlumin-i18n

Watch mode:

npm run i18n:watch -- --site '/home/henry/Design Resources (2)/Websites/Groups/tutorlumin.co.uk'

Temporary language/provider override for validation:

npm run i18n:build -- --site '/home/henry/Design Resources (2)/Websites/Groups/tutorlumin.co.uk' --language es --provider-kind noop

Hugo Integration

The generated config fragment currently lives at:

That file declares languages.<code>.contentDir for the source language and every generated target language.

Use it through the wrapper or an explicit Hugo --config layer. Do not assume plain hugo will automatically merge it from config/_default/ in every site layout.

Quiz Chrome

Quiz widget chrome is no longer hardcoded only in English. The widget now resolves common labels from Hugo i18n keys and falls back to English when a site does not provide translated strings.

Relevant files:

  • layouts/partials/widgets/quiz.html
  • cdn/custom/quiz.js
  • i18n/en.yaml

Current keys:

  • quiz_check_answers
  • quiz_reset
  • quiz_question
  • quiz_select_all_that_apply
  • quiz_explanation
  • quiz_score_summary
  • quiz_no_questions

Known Limits

  • Some existing TutorLumin pages use multiline quoted YAML front matter that does not round-trip cleanly through the current YAML parser. Those pages are still translated, but their front matter is preserved raw instead of being rewritten.
  • Shared site config copy outside content, such as purchase-widget strings in site config, is not yet auto-materialized by this build step.
  • SVG translation currently handles text-bearing nodes, not outlined/path-only lettering.