1. Introduction
Markdown has become the de facto output format of the AI era. Large language models, documentation generators, note-taking applications, and academic writing tools all produce Markdown natively. However, sharing Markdown documents that include images, diagrams, or multi-page structure remains fragile: relative paths break across systems, assets are separated from prose, and authors resort to converting to PDF — sacrificing editability, transparency, and the lightweight character that made Markdown attractive in the first place.
MarkBooks resolve this by packaging Markdown content into a single, self-contained, portable file. The format is a ZIP archive with the extension .mkb containing Markdown pages and their assets. At its simplest, a valid MarkBook is a single index.md file inside a ZIP. At its most capable, a MarkBook can carry multi-chapter documents with illustrations, supporting datasets, cryptographic signatures from multiple authors, and blockchain-anchored publication timestamps.
Every feature beyond index.md in a ZIP is optional. The format is designed so that a minimal reader can be implemented in an afternoon, while a full-featured reader can offer a rich scholarly experience. Readers compete on rendering quality; the specification guarantees portability, legibility, and verifiability.
1.1 Design Principles
- Trivially implementable. A developer can build a conforming reader in an afternoon.
- Machine-generatable. A large language model can produce a valid MarkBook in a single response.
- Human-readable inside. Unzipping a MarkBook yields plain Markdown files editable in any text editor.
- Self-contained. No network access is required to render a MarkBook.
- Gracefully degrading. Readers render what they support and display the rest as plain text.
- Open. This specification is released under CC0. No licence fees. No proprietary tooling required.
2. Identifiers
| Property | Value |
|---|---|
| File extension | .mkb |
| MIME type | application/vnd.markant.markbook+zip |
| Container format | ZIP (ISO/IEC 21320-1 or any compliant implementation) |
| Text encoding | UTF-8 for all .md and .yaml files |
| Uniform Type Identifier (Apple) | com.markant.markbook |
The +zip structured syntax suffix (RFC 6839) indicates the container format, enabling generic ZIP-aware tooling to inspect the archive without format-specific knowledge.
3. Archive Structure
example.mkb (ZIP archive)
├── index.md # REQUIRED — entry point
├── markbook.yaml # OPTIONAL — metadata (§6)
├── pages/ # OPTIONAL — additional pages
│ ├── chapter-one.md
│ ├── chapter-two.md
│ └── appendix/
│ └── supplementary.md
├── assets/ # OPTIONAL — rendered media (§3.3)
│ ├── figure-1.png
│ └── diagrams/
│ └── architecture.svg
├── data/ # OPTIONAL — supporting data (§7)
│ ├── measurements.csv
│ ├── analysis.py
│ └── supplementary.pdf
├── signatures/ # OPTIONAL — author signatures (§11)
│ ├── alice.sig
│ └── bob.sig
├── MANIFEST.sha256 # OPTIONAL — content integrity (§10)
├── MANIFEST.sha256.ots # OPTIONAL — timestamp proof (§12)
└── .git/ # OPTIONAL — version history (§9)
3.1 index.md (REQUIRED)
The entry point of the MarkBook. This is the only required file. A valid MarkBook MAY consist of index.md alone inside a ZIP archive.
3.2 pages/
Additional Markdown pages. Subdirectories within pages/ are permitted for logical grouping. There is no required naming convention; authors MAY organise pages by chapter, section, or any other scheme.
3.3 assets/
Images, diagrams, and other media files referenced by the Markdown pages. Subdirectories within assets/ are permitted. All files in assets/ are intended to be rendered inline by the reader when referenced from Markdown content.
3.4 data/
Supporting data files that accompany the document but are not rendered inline. This directory is intended for datasets, source code, supplementary materials, and other non-rendered content. Implementations SHOULD make files in data/ accessible to the user (e.g., via export or “open externally”) but MUST NOT require their presence for basic rendering.
Markdown content MAY link to data files using standard link syntax:
[Download raw measurements](data/measurements.csv)
Implementations SHOULD handle such links as file export or download actions.
3.5 markbook.yaml
Optional metadata file at the archive root. See §6.
3.6 signatures/
Optional directory containing cryptographic signatures. See §11.
3.7 .git/
Optional embedded Git repository. See §9.
4. Markdown Dialect
MarkBooks use CommonMark as the baseline Markdown specification. All conforming implementations MUST parse and render CommonMark. The format additionally requires wikilink support as specified in §4.1.
4.1 Wikilinks
Wikilinks are supported following the Obsidian convention, with target path before display text:
[[target|display text]]
Where target is a file path relative to the archive root or relative to the current file (see §5 for path resolution rules).
[[pages/chapter-one.md|Chapter One]]
[[index.md|Back to front page]]
If the display text is omitted, the target path is used as the display text:
[[pages/chapter-one.md]]
Standard CommonMark relative links are equally valid:
[Chapter One](pages/chapter-one.md)
Implementations MUST support both link styles.
4.2 Image References
Images use standard Markdown syntax with paths relative to the archive root or relative to the current file:

All referenced assets MUST be present in the archive. External URLs in image references are permitted, but implementations SHOULD NOT require network access to render a MarkBook.
4.3 Extended Syntax and Graceful Degradation
MarkBook authors MAY use any Markdown extensions beyond CommonMark, including but not limited to:
- Tables (GFM-style)
- LaTeX mathematics (
$inline$and$$display$$) - Task lists (
- [ ]/- [x]) - Footnotes
- Definition lists
- Strikethrough
- Syntax-highlighted fenced code blocks
- Cross-references (§8)
The specification imposes no requirements on readers to render all extensions. However, implementations MUST degrade gracefully for any syntax they do not support. Graceful degradation means: the content MUST remain visible and legible. Unrecognised syntax SHOULD be displayed as a code block, inline code, or plain text — never silently dropped.
This design is intentional. MarkBooks are a container and content format, not a rendering specification. Readers compete on rendering quality; the specification guarantees portability and legibility.
5. Path Rules
All internal references within a MarkBook — links, image sources, data references — are subject to the following rules:
- All paths MUST be relative. Paths MUST NOT begin with
/. - All paths MUST use forward slashes (
/) as directory separators, regardless of the host operating system. - Paths MAY contain
..components for navigation between directories (e.g.,../index.mdfrom a file inpages/). However, any path that resolves above the archive root MUST be treated as a broken link. Implementations SHOULD surface broken links visually (e.g., with a warning indicator) rather than silently ignoring them. - File and directory names SHOULD be lowercase ASCII, using hyphens for word separation (e.g.,
my-page.md). - File and directory names MUST NOT contain characters illegal in ZIP entries or common filesystems:
\,:,*,?,",<,>,|.
The permissive handling of .. components is intentional: it allows existing Markdown projects with relative cross-references to be packaged as MarkBooks without rewriting links.
6. Metadata
A MarkBook MAY include a markbook.yaml file at the archive root. This file provides structured metadata for display, indexing, and integrity verification.
6.1 Basic Metadata
title: "Attention Is All You Need"
version: "1.0"
language: "en"
created: "2026-04-08"
modified: "2026-04-08"
All fields are OPTIONAL. Implementations SHOULD use these values for display purposes (e.g., window titles, library indexing) but MUST NOT require the file or any field within it.
If markbook.yaml is absent or lacks a title field, implementations SHOULD extract the title from the first ATX heading (#) in index.md.
6.2 Authorship Metadata
For single-author documents:
author: "Author Name"
For multi-author documents, use the authors array (see §11 for signature-related fields):
authors:
- name: "Alice Chen"
- name: "Bob Eriksen"
6.3 Extended Metadata
The markbook.yaml file MAY contain additional fields not defined by this specification. Implementations MUST ignore unrecognised fields.
7. Supporting Data
A MarkBook MAY include a data/ directory at the archive root for non-rendered supporting materials. This is intended for use cases where the document is accompanied by datasets, source code, supplementary figures, or other files that readers may wish to access but that are not rendered inline.
Common use cases include:
- Raw datasets underlying figures or tables in the document
- Source code for reproducing computational results
- Supplementary materials (additional figures, extended proofs)
- Machine-readable structured data (JSON, CSV, HDF5)
The data/ directory imposes no structure requirements. Authors MAY organise contents freely, including the use of subdirectories.
Implementations MUST preserve the data/ directory when re-packaging a MarkBook. Implementations SHOULD provide a mechanism for users to access individual files within data/ (e.g., export, “Show in Finder,” or “Open Externally”). Implementations MUST NOT require the presence of data/ or any of its contents for basic document rendering.
8. Cross-References
MarkBook authors MAY attach identifiers to block-level elements and reference those identifiers elsewhere in the document. This enables auto-numbered, hyperlinked references to figures, tables, equations, sections, and other elements.
8.1 Labels
Labels are attached to elements using the attribute syntax {#identifier}, appended to the element:
Equations:
$$ E = mc^2 $$ {#eq:einstein}
Figures:
{#fig:transformer}
Tables:
| Model | BLEU |
|-------|-------|
| Base | 27.3 |
| Big | 28.4 |
: Translation results on WMT 2014. {#tbl:results}
Sections:
## Model Architecture {#sec:architecture}
Code listings:
```python {#lst:training}
model.fit(X_train, y_train, epochs=100)
```
8.2 References
References use @-prefixed identifiers inside square brackets:
As shown in [@fig:transformer], the encoder-decoder structure...
From [@eq:einstein] we derive the energy-momentum relation.
The results in [@tbl:results] demonstrate a clear improvement.
As described in [@sec:architecture], the model uses...
Multiple references MAY be combined in a single bracket, separated by commas:
See [@fig:transformer, @tbl:results] for details.
8.3 Identifier Conventions
Identifiers SHOULD use a namespaced prefix to indicate element type:
| Prefix | Element |
|---|---|
eq: | Equation |
fig: | Figure |
tbl: | Table |
sec: | Section or heading |
lst: | Code listing |
thm: | Theorem, definition, or proof |
Prefixes are a convention, not a requirement. An unprefixed label {#foo} is valid.
8.4 Rendering Behaviour
Implementations that support cross-references SHOULD:
- Auto-number labelled elements by type in document order (e.g., Figure 1, Figure 2, Table 1)
- Resolve
[@identifier]to the appropriate numbered label (e.g., “Figure 3”) - Render resolved references as in-document hyperlinks
Implementations that do not support cross-references MUST display the raw [@identifier] text. This ensures that references remain visible and searchable even in minimal readers, and that the target can be found manually by searching for the corresponding {#identifier} label.
9. Versioned Documents and Annotations
A MarkBook MAY contain a .git/ directory at the archive root, making the MarkBook’s contents a Git repository.
9.1 Purpose
The embedded Git repository enables:
- Version history. The complete authoring history of the document is preserved within the archive.
- Reader annotations. Highlights, margin notes, and comments can be stored as commits on named branches without modifying the original content.
- Collaborative review. Multiple readers’ annotations can be merged using standard Git merge operations.
- Non-destructive editing. The original published content is always recoverable from the initial commit or the
mainbranch.
9.2 Branch Conventions
The main branch represents the canonical published content of the MarkBook.
Reader annotations and notes SHOULD be stored on named branches following the convention:
annotations/{reader-name}
For example: annotations/morten, annotations/review-committee.
9.3 Implementation Requirements
Implementations that support versioned MarkBooks SHOULD use libgit2 or an equivalent library to interact with the embedded repository.
Implementations MUST preserve the .git/ directory when re-packaging a MarkBook. Implementations MUST NOT modify the main branch.
Implementations that do not support Git-based features MUST ignore the .git/ directory.
9.4 Packaging Considerations
ZIP archives do not preserve POSIX file permissions or symbolic links. Implementations that create MarkBooks containing .git/ directories SHOULD use standard ZIP compression (zip -r) and SHOULD verify that the resulting archive produces a functional Git repository when extracted.
10. Content Integrity
A MarkBook MAY include a MANIFEST.sha256 file at the archive root to enable content integrity verification.
10.1 Manifest Format
The manifest file contains the SHA-256 hash of every content file in the archive, one entry per line, in the format produced by the sha256sum utility:
e3b0c44298fc1c149afb... index.md
a7ffc6f8bf1ed766518c... pages/chapter-1.md
2c26b46b68ffc68ff99b... assets/figure-1.png
9f86d081884c7d659a2f... data/measurements.csv
Each line consists of the lowercase hexadecimal SHA-256 digest, two spaces, and the file path relative to the archive root.
10.2 Excluded Files
The following files MUST NOT be listed in the manifest:
MANIFEST.sha256(the manifest itself)MANIFEST.sha256.sigor any files insignatures/MANIFEST.sha256.ots(timestamp proof)
These files relate to the integrity verification process itself and cannot self-referentially appear in the manifest.
Files within .git/ SHOULD NOT be listed in the manifest, as Git maintains its own internal integrity mechanisms.
10.3 Verification
Implementations that support integrity verification SHOULD:
- Compute the SHA-256 hash of each content file in the archive
- Compare computed hashes against the manifest entries
- Report any mismatches to the user
Implementations MUST warn the user visibly if verification fails. A failed integrity check indicates that the document has been modified since the manifest was generated.
Implementations that do not support integrity verification MUST ignore the MANIFEST.sha256 file.
12. Publication Timestamping
A MarkBook MAY include a cryptographic timestamp proof establishing that the document existed in its current form at a specific point in time. This provides a publicly verifiable, trustless record of publication priority.
12.1 Motivation
Establishing the priority of a publication — proving that a specific document with specific content existed at a specific time — is critical in academic research, intellectual property, and legal contexts. Existing mechanisms (journal submission dates, preprint server timestamps) depend on trusting a single third party. A blockchain-anchored timestamp is independently verifiable by anyone, with no infrastructure dependency, and cannot be retroactively altered.
12.2 Mechanism
The RECOMMENDED timestamping method is OpenTimestamps, which anchors SHA-256 hashes in the Bitcoin blockchain via Merkle tree aggregation at effectively zero cost per document.
The timestamp proof is stored as:
MANIFEST.sha256.ots
This file contains the OpenTimestamps proof linking the SHA-256 hash of MANIFEST.sha256 to a specific Bitcoin transaction at a specific block height.
12.3 Metadata
integrity:
manifest: "MANIFEST.sha256"
timestamp:
method: "opentimestamps"
proof: "MANIFEST.sha256.ots"
bitcoin_block: 892451
committed: "2026-04-08T14:23:00Z"
The bitcoin_block and committed fields are informational and SHOULD reflect the confirmed anchor point. Verification MUST be performed against the .ots proof file and the blockchain, not against these metadata fields.
12.4 Pending Timestamps
An OpenTimestamps proof requires confirmation in a Bitcoin block, which may take several hours. A MarkBook MAY be published with a pending .ots proof. The proof file SHOULD be updated once the timestamp is confirmed. Implementations SHOULD distinguish between confirmed and pending timestamps in their display.
12.5 Verification
Verification requires the ots command-line tool or equivalent library, plus access to Bitcoin block headers (approximately 60 MB for the complete chain).
Implementations that support timestamp verification SHOULD:
- Verify the
.otsproof against the Bitcoin blockchain - Display the confirmed timestamp to the user
- Clearly indicate if the timestamp is pending or cannot be verified
Implementations that do not support timestamp verification MUST ignore the .ots file.
12.6 Alternative Timestamping Methods
This specification RECOMMENDS OpenTimestamps but does not prohibit alternative mechanisms. Any system that produces a verifiable, independent proof that a specific hash existed at a specific time is acceptable. The method field in markbook.yaml identifies the mechanism used.
Possible alternatives include:
- RFC 3161 trusted timestamps (legally recognised under EU eIDAS regulation)
- Other blockchain-anchored timestamping services
- Certificate Transparency-style append-only logs
13. Constraints
The following constraints apply to all MarkBooks:
- No executable content. MarkBooks MUST NOT contain scripts, macros, or embedded HTML that requires JavaScript execution. This is a security boundary: a MarkBook is a document, not an application.
- No absolute paths. All internal references MUST be relative (see §5).
- No required network access. A MarkBook MUST be fully renderable offline when all referenced assets are present in the archive. Network access MAY be used for optional operations such as key discovery (§11.6) and timestamp verification (§12.5).
- UTF-8 only. All text files (
.md,.yaml) MUST be encoded as UTF-8 without BOM.
14. Content Types
The following asset types are expected to be widely supported by implementations:
| Category | Extensions |
|---|---|
| Images | .png, .jpg, .jpeg, .gif, .svg, .webp |
| Metadata | .yaml |
| Documents | .md |
Implementations MAY support additional asset types but MUST NOT require them for basic rendering.
The data/ directory (§7) may contain files of any type. Implementations are not expected to render these files but SHOULD make them accessible for export.
15. MIME Type Registration
The MIME type application/vnd.markant.markbook+zip is pending registration with IANA under the vendor tree, as specified in RFC 6838 §3.2.
Until registration is complete, implementations SHOULD use this MIME type for content negotiation and file type identification. Operating system file associations SHOULD map the .mkb extension to this MIME type.
16. Specification Versioning
This is version 1.0 of the MarkBooks specification.
Future versions within the 1.x series MUST maintain backward compatibility: a conforming v1.0 reader MUST be able to open any v1.x MarkBook, ignoring features it does not understand. This is the reason all features beyond index.md are optional — a v1.0 reader that supports only basic Markdown rendering is fully conforming.
Breaking changes that alter required behaviour or invalidate previously valid MarkBooks require a major version increment (v2.0).
17. Design Rationale
17.1 Why ZIP?
ZIP is universally supported, well-specified (ISO/IEC 21320-1), streamable, and understood by every operating system. It allows individual files to be extracted without decompressing the entire archive. It is the same container used by EPUB, DOCX, XLSX, and JAR. Choosing ZIP means MarkBooks can be created with standard command-line tools (zip, unzip) and inspected by any file manager.
17.2 Why not EPUB?
EPUB is the closest existing format to MarkBooks. However, EPUB requires XHTML content documents, OPF package files, NCX navigation documents, and a specific META-INF directory structure. This complexity makes EPUB difficult to generate programmatically and impossible for a large language model to produce correctly in a single response. MarkBooks are “EPUB minus everything you don’t need” — the minimum viable self-contained document format.
17.3 Why CommonMark + extensions rather than a fixed dialect?
Markdown is a family of dialects, and the ecosystem continues to evolve. Mandating a specific set of extensions (e.g., GFM tables, KaTeX math) would freeze the format at the state of the art in 2026. Instead, the specification mandates CommonMark as the baseline and requires graceful degradation for anything beyond it. This means the format can absorb new Markdown extensions — including those that do not yet exist — without a specification revision.
17.4 Why Git for annotations?
Reader annotations are a form of version control: they are changes made to a document by someone other than the author. Git is the most widely deployed version control system in the world, with robust tooling for branching, merging, and diffing. Embedding a Git repository in the MarkBook means annotations are portable (they travel with the document), mergeable (multiple readers’ notes can be combined), and non-destructive (the original content is always recoverable). No custom annotation format is required.
17.5 Why blockchain timestamping?
Academic priority disputes, intellectual property claims, and legal proceedings all require proving that a specific document existed at a specific time. Existing timestamping mechanisms depend on trusting a single institution (a journal, a preprint server, a notary). Blockchain-anchored timestamps are independently verifiable by anyone with access to the blockchain, require no trusted third party, and are immutable once confirmed. OpenTimestamps provides this at zero marginal cost per document.
17.6 Why per-author signatures?
Academic authorship is a collective act, but accountability is individual. Listing five authors on a paper currently provides no mechanism to verify that all five approved the final version. Per-author signatures make this verifiable: each author independently attests to the content. This also provides a cryptographic audit trail for revisions — if a co-author declines to re-sign a revised version, that absence is a meaningful and visible signal.
18. Security Considerations
18.1 Executable Content
MarkBooks MUST NOT contain executable content (§13). Implementations MUST NOT execute scripts, macros, or active content found within a MarkBook, even if embedded in HTML blocks within Markdown files. This constraint exists to ensure that MarkBooks are safe to open from untrusted sources.
18.2 Path Traversal
Paths resolving above the archive root via .. components MUST be treated as broken links, not as references to the host filesystem (§5). Implementations MUST NOT resolve paths outside the archive boundary.
18.3 Signature Trust
A valid cryptographic signature proves that the holder of a specific private key signed the manifest. It does not, by itself, prove the identity of the signer. Key-to-identity binding depends on external trust mechanisms (§11.6) — domain-hosted keys, DNS records, or key registries. Implementations SHOULD clearly communicate the distinction between “signature valid” (the cryptography checks out) and “author verified” (the key is confirmed to belong to the stated author).
18.4 Timestamp Limitations
A blockchain-anchored timestamp proves that a document existed at a specific time. It does not prove that the document was published at that time — the author may have created the timestamp privately and disclosed it later. It also does not prove that the timestamped version was the first version of the content. Timestamps establish a lower bound on publication date, not a definitive publication event.
18.5 ZIP-Specific Risks
ZIP archives may contain filenames with path separators or absolute paths that, if naively extracted, could overwrite files outside the intended directory (the “Zip Slip” vulnerability). Implementations MUST sanitise file paths during extraction and MUST NOT extract files to locations outside the intended output directory.
Appendix A: Minimal Valid MarkBook
The simplest possible MarkBook:
minimal.mkb (ZIP archive)
└── index.md
Containing:
# Hello, World
This is a MarkBook.
Created with:
echo "# Hello, World\n\nThis is a MarkBook." > index.md
zip minimal.mkb index.md
Appendix B: Complete Example with All Optional Features
research-paper.mkb (ZIP archive)
├── index.md
├── markbook.yaml
├── pages/
│ ├── introduction.md
│ ├── methodology.md
│ ├── results.md
│ └── conclusion.md
├── assets/
│ ├── figure-1.svg
│ ├── figure-2.png
│ └── diagrams/
│ └── architecture.svg
├── data/
│ ├── raw-measurements.csv
│ ├── analysis.py
│ └── supplementary-figures.pdf
├── signatures/
│ ├── alice.sig
│ └── bob.sig
├── MANIFEST.sha256
├── MANIFEST.sha256.ots
└── .git/
With markbook.yaml:
title: "A Novel Approach to Sensor Fusion"
version: "1.0"
language: "en"
created: "2026-04-08"
authors:
- name: "Alice Chen"
role: "corresponding"
key_url: "https://alice.example.com/.well-known/markbook/keys/primary.pub"
signature: "signatures/alice.sig"
contributions:
- "Conceptualization"
- "Methodology"
- "Writing – original draft"
- name: "Bob Eriksen"
role: "equal_contribution"
key_url: "https://bob.example.com/.well-known/markbook/keys/primary.pub"
signature: "signatures/bob.sig"
contributions:
- "Software"
- "Data curation"
- "Visualization"
integrity:
manifest: "MANIFEST.sha256"
timestamp:
method: "opentimestamps"
proof: "MANIFEST.sha256.ots"
bitcoin_block: 892451
committed: "2026-04-08T14:23:00Z"
Appendix C: Verification Quickstart
Verify a MarkBook’s integrity using standard command-line tools:
# Extract the MarkBook
unzip paper.mkb -d paper/
cd paper/
# 1. Verify content integrity (requires sha256sum)
sha256sum -c MANIFEST.sha256
# 2. Verify author signature (requires minisign)
minisign -Vm MANIFEST.sha256 -p alice.pub -x signatures/alice.sig
# 3. Verify publication timestamp (requires ots-cli)
ots verify MANIFEST.sha256.ots
No proprietary tools are required at any step.
This specification is released under CC0 1.0 Universal. No rights reserved. Anyone may implement, extend, or redistribute this specification without restriction.