Browser Page Builder Plan
Nowtype Browser Page Builder Refactor
Last updated: 2026-06-29
Problem
The browser PDF surface currently paginates a rendered DOM tree by calculating break offsets and then clipping/translating that same long DOM into page-sized windows. That fixes some gross failures, but it cannot safely support LaTeX-like page building:
- approximate internal paragraph cuts can duplicate or crop text because the DOM is still one continuous flow;
- break scoring and page rendering are coupled through implicit offsets;
- floats, footnotes, margin notes, and long tables are bolted on after the page cut rather than inserted into page construction;
- smoke tests can pass while screenshots still show visually bad pages.
The refactor should replace the clipped-DOM compositor with an explicit page composition pipeline.
Architectural Target
Keep Hugo as render authority and Markdown as source of truth. The browser page builder is an editing/preview compositor over Hugo-rendered DOM, not a new Markdown renderer.
The browser pipeline should become:
-
Source DOM- current Nowtype/Hugo rendered DOM.
- contains editable nodes, math, figures, tables, headings, lists, and structured blocks.
-
Layout Metrics- root-relative geometry for blocks, headings, figures, tables, anchors, line boxes where available, and page-affecting insertions.
- owned by
collectNowtypePdfLayoutMetrics(). - no pagination consumer should read raw
offsetTopindependently.
-
Flow Items- normalized sequence of page-builder material:
- boxes: headings, figures, tables, display math, list items;
- glue: margins and stretchable vertical space;
- penalties: preferred/forbidden page breaks;
- insertions: footnotes, margin notes, floats;
- text fragments: paragraph line boxes or measured range fragments.
- normalized sequence of page-builder material:
-
Page Builder- chooses page breaks over flow items using TeX-like badness/penalties.
- produces a
PagePlan:- page role, source item ranges, fragment ranges, floats, notes, anchors, counters, running heads, and diagnostics.
-
Page Renderer- renders each page from the
PagePlan. - never displays a page by clipping arbitrary portions of one long DOM.
- may clone whole blocks, range-fragment text, or use measured fragments, but every visible page must be an explicit page tree.
- renders each page from the
-
Editor Bridge- one canonical editable source remains live.
- page views map clicks/selections back to source nodes/ranges.
- preview pages may be read-only until source-range editing is stable.
Key Design Rules
- Do not split inside a measured line box.
- Do not duplicate content across adjacent pages unless it is deliberate running chrome or a repeated table header.
- Prefer breaking between block items; allow paragraph/list fragments only when the renderer can render the matching fragment safely.
- Headings use keep-with-next penalties, not absolute no-break capture of whole following paragraphs.
- Figures/tables are insertions with placement policy, not ordinary paragraphs.
- Footnotes and margin notes participate in page height before a break is accepted.
- Every
PagePlanmust be screenshot-testable.
First Vertical Slice
The first implementation should not attempt full TeX quality. It should create a new explicit page-plan path for ordinary prose/math chapters:
-
Build
FlowItem[]from the existing compose root:- whole-block items for headings, display math, figures, tables, lists, blockquotes, and short paragraphs;
- measured paragraph fragments only when
Range.getClientRects()can map stable line boxes back to a text node range; - fallback to whole-block behavior when line mapping is uncertain.
-
Build
PagePlan[]:- use current paper size, margins, font size, running heads, title/ToC pages;
- apply widow/orphan penalties for text fragments;
- keep headings with at least the first following fragment;
- record diagnostics for overfull/underfull pages.
-
Render read-only explicit pages:
- single-page and spread preview use the same
PagePlan; - print preview uses the same
PagePlan; - editing remains routed to canonical Nowtype source until range editing is added.
- single-page and spread preview use the same
-
Gate it behind a runtime flag:
- default remains the current compositor until image tests pass;
- flag name:
window.QLMarkdownEditor.pdfPageBuilder = trueorparams.qlMarkdownEditor.pdfPageBuilder = true.
The current first slice is intentionally diagnostic-only. When the flag is on,
toggleMarkdown.js stores a sanitized pageBuilder bundle beside the existing
pagination state:
flowItems: interleaved boxes, glue, and penalties extracted from the current layout metrics;pagePlans: existing content page breaks interpreted as explicit page slices; title/ToC pages remain outside this first diagnostic slice;diagnostics: overheight boxes, page-boundary crossings, underfull pages, and heading orphan warnings.
The renderer still uses the current compositor until the PagePlan output is
good enough to become the rendering source.
Status note, 2026-06-29: the paragraph above describes the interactive editor
runtime. The CLI book exporter has moved further: scripts/book-pdf now uses a
book-export page preparation path that mutates the Chromium DOM for print
features such as long-table splitting, footnote decks, float handling,
hyphenation hints, math scaling, figure promotion, references, accessibility
probes, and running-head probes. Keep these paths distinct when debugging.
Test Contract
Every page-builder change must run a Playwright image capture against a real book fixture, not just numeric smoke checks.
Required test artifacts:
- one PNG per page or spread;
- a contact sheet;
- a JSON manifest with page count, page labels, text/hash diagnostics,
pageBuilderdiagnostics, and screenshot paths; - optional pixel-diff baseline once the first good set is accepted.
Initial fixture:
http://phd.localhost:8080/results/microscopic-loop-supercurrent-trsb/
Acceptance for the first vertical slice:
- no obviously blank non-final pages;
- no duplicated trailing content across adjacent pages;
- no cut headings at page bottoms;
- references appear once;
- Playwright contact sheet is reviewed before commit.
Migration Plan
- Land image capture harness.
- Add feature-flagged
FlowItemextraction without changing rendering. - Add
PagePlangeneration and diagnostics alongside current pagination. - Add explicit read-only page renderer for single-page mode.
- Reuse that renderer for spread and print preview.
- Move floats, footnotes, margin notes, and long tables from ad-hoc page
overlays into
PagePlaninsertions. - Add source-range click mapping so explicit pages can become editable.
Non-Goals
- Do not reimplement Hugo shortcodes or render hooks in JS.
- Do not make browser PDF the final print authority over the TeX path.
- Do not switch to approximate paragraph splitting in the current clipped-DOM renderer; that already produced duplicated/cropped screenshots.