Browser Page Builder Plan

Nowtype Browser Page Builder Refactor

Last updated: 2026-06-29

Problem

The browser PDF surface currently paginates a rendered DOM tree by calculating break offsets and then clipping/translating that same long DOM into page-sized windows. That fixes some gross failures, but it cannot safely support LaTeX-like page building:

  • approximate internal paragraph cuts can duplicate or crop text because the DOM is still one continuous flow;
  • break scoring and page rendering are coupled through implicit offsets;
  • floats, footnotes, margin notes, and long tables are bolted on after the page cut rather than inserted into page construction;
  • smoke tests can pass while screenshots still show visually bad pages.

The refactor should replace the clipped-DOM compositor with an explicit page composition pipeline.

Architectural Target

Keep Hugo as render authority and Markdown as source of truth. The browser page builder is an editing/preview compositor over Hugo-rendered DOM, not a new Markdown renderer.

The browser pipeline should become:

  1. Source DOM

    • current Nowtype/Hugo rendered DOM.
    • contains editable nodes, math, figures, tables, headings, lists, and structured blocks.
  2. Layout Metrics

    • root-relative geometry for blocks, headings, figures, tables, anchors, line boxes where available, and page-affecting insertions.
    • owned by collectNowtypePdfLayoutMetrics().
    • no pagination consumer should read raw offsetTop independently.
  3. Flow Items

    • normalized sequence of page-builder material:
      • boxes: headings, figures, tables, display math, list items;
      • glue: margins and stretchable vertical space;
      • penalties: preferred/forbidden page breaks;
      • insertions: footnotes, margin notes, floats;
      • text fragments: paragraph line boxes or measured range fragments.
  4. Page Builder

    • chooses page breaks over flow items using TeX-like badness/penalties.
    • produces a PagePlan:
      • page role, source item ranges, fragment ranges, floats, notes, anchors, counters, running heads, and diagnostics.
  5. Page Renderer

    • renders each page from the PagePlan.
    • never displays a page by clipping arbitrary portions of one long DOM.
    • may clone whole blocks, range-fragment text, or use measured fragments, but every visible page must be an explicit page tree.
  6. Editor Bridge

    • one canonical editable source remains live.
    • page views map clicks/selections back to source nodes/ranges.
    • preview pages may be read-only until source-range editing is stable.

Key Design Rules

  • Do not split inside a measured line box.
  • Do not duplicate content across adjacent pages unless it is deliberate running chrome or a repeated table header.
  • Prefer breaking between block items; allow paragraph/list fragments only when the renderer can render the matching fragment safely.
  • Headings use keep-with-next penalties, not absolute no-break capture of whole following paragraphs.
  • Figures/tables are insertions with placement policy, not ordinary paragraphs.
  • Footnotes and margin notes participate in page height before a break is accepted.
  • Every PagePlan must be screenshot-testable.

First Vertical Slice

The first implementation should not attempt full TeX quality. It should create a new explicit page-plan path for ordinary prose/math chapters:

  1. Build FlowItem[] from the existing compose root:

    • whole-block items for headings, display math, figures, tables, lists, blockquotes, and short paragraphs;
    • measured paragraph fragments only when Range.getClientRects() can map stable line boxes back to a text node range;
    • fallback to whole-block behavior when line mapping is uncertain.
  2. Build PagePlan[]:

    • use current paper size, margins, font size, running heads, title/ToC pages;
    • apply widow/orphan penalties for text fragments;
    • keep headings with at least the first following fragment;
    • record diagnostics for overfull/underfull pages.
  3. Render read-only explicit pages:

    • single-page and spread preview use the same PagePlan;
    • print preview uses the same PagePlan;
    • editing remains routed to canonical Nowtype source until range editing is added.
  4. Gate it behind a runtime flag:

    • default remains the current compositor until image tests pass;
    • flag name: window.QLMarkdownEditor.pdfPageBuilder = true or params.qlMarkdownEditor.pdfPageBuilder = true.

The current first slice is intentionally diagnostic-only. When the flag is on, toggleMarkdown.js stores a sanitized pageBuilder bundle beside the existing pagination state:

  • flowItems: interleaved boxes, glue, and penalties extracted from the current layout metrics;
  • pagePlans: existing content page breaks interpreted as explicit page slices; title/ToC pages remain outside this first diagnostic slice;
  • diagnostics: overheight boxes, page-boundary crossings, underfull pages, and heading orphan warnings.

The renderer still uses the current compositor until the PagePlan output is good enough to become the rendering source.

Status note, 2026-06-29: the paragraph above describes the interactive editor runtime. The CLI book exporter has moved further: scripts/book-pdf now uses a book-export page preparation path that mutates the Chromium DOM for print features such as long-table splitting, footnote decks, float handling, hyphenation hints, math scaling, figure promotion, references, accessibility probes, and running-head probes. Keep these paths distinct when debugging.

Test Contract

Every page-builder change must run a Playwright image capture against a real book fixture, not just numeric smoke checks.

Required test artifacts:

  • one PNG per page or spread;
  • a contact sheet;
  • a JSON manifest with page count, page labels, text/hash diagnostics, pageBuilder diagnostics, and screenshot paths;
  • optional pixel-diff baseline once the first good set is accepted.

Initial fixture:

  • http://phd.localhost:8080/results/microscopic-loop-supercurrent-trsb/

Acceptance for the first vertical slice:

  • no obviously blank non-final pages;
  • no duplicated trailing content across adjacent pages;
  • no cut headings at page bottoms;
  • references appear once;
  • Playwright contact sheet is reviewed before commit.

Migration Plan

  1. Land image capture harness.
  2. Add feature-flagged FlowItem extraction without changing rendering.
  3. Add PagePlan generation and diagnostics alongside current pagination.
  4. Add explicit read-only page renderer for single-page mode.
  5. Reuse that renderer for spread and print preview.
  6. Move floats, footnotes, margin notes, and long tables from ad-hoc page overlays into PagePlan insertions.
  7. Add source-range click mapping so explicit pages can become editable.

Non-Goals

  • Do not reimplement Hugo shortcodes or render hooks in JS.
  • Do not make browser PDF the final print authority over the TeX path.
  • Do not switch to approximate paragraph splitting in the current clipped-DOM renderer; that already produced duplicated/cropped screenshots.