Walk-through the NavigaDoc format
Introduction to the NavigaDoc document format
NavigaDoc is a JSON version of our existing NewsML document format.
...and some other XML formats we're replacing
The NavigaDoc format is less verbose and JSON/GraphQL/struct-friendly. The idea is to be able to use the same format all the way from authoring tools to frontend applications.
The JSON format sticks close enough to the way data and content is modelled in our NewsML format to make bi-directional conversion feasible. We are automatically converting between NavigaDoc JSON and NewsML on the fly in the Content Creation API (CCA). All writes to the OC in question will have to go through the intermediate API (the Content Creation API - CCA) that handles the conversion.
Relying on conversion JSON<->XML
We're still storing documents as NewsML XML in the OC Content repository. To ensure a reliable 1:1 conversion, we puts some constraint on how to use NewsML in the Writer. And some constraints on what we can use in NavigaDoc. More on that when we are more familiar with the format.
The supported document types, i.e. the document types that can be converted to and from the NewsML XML are:
Article:
x-im/article
Article template:
x-im/article-template
Image:
x-im/image
Imagelink:
x-im/imagelink
PDF:
x-im/pdf
Concept: The actual type depends on which type of concept it is;
Author:
x-im/author
Category:
x-im/category
Channel:
x-im/channel
Content profile:
x-im/content-profile
Event:
x-im/event
Organisation:
x-im/organisation
Person:
x-im/person
Place:
x-im/place
Section:
x-im/section
Story:
x-im/story
Topic:
x-im/topic
List:
x-im/list
Package:
x-im/package
Planning:
x-im/newscoverage
Assignment:
x-im/assignment
The type corresponds to the $.type
attribute of the NavigaDoc document (see below).
The attributes of a NavigaDoc document
Top level document structure:
The attributes of a document
uuid
: ID of the documenturi
: an URL-based identifier for the documenturl
: the location of the document (if any)title
: title of the document (not headline)status
: workflow status of the document, "draft" et.c.provider
: the provider of the document, e.g. "TT" or "NavigaPhotos"created
: when the document was createdtype
: the type of document, e.g. "x-im/article"modified
: last modified timestamppublished
: when the document was, or will be publishedunpublished
: when the document was, or will be unpublishedlanguage
: the language code for the document: "en-GB", "fi", "sv" et.c.path
: the path on which the document can be exposed when consumed through a websiteproducts
: a list of product names, replaced by channel-links but preserved for legacy supportcontent
: a list of content blocksmeta
: a list of metadata blocks that describes the documentlinks
: a list of link blocks that describes the documents relationships.properties
: a list of properties, primarily used when converting from XML
Note: Storing the document in Open Content, there is a "core" property named source
which will have the value "cca" if document is saved using CCA. The source
property is not supported in the actual document, only as a property in Open Content.
UUID and URI
Articles created by the writer have a random UUIDv4 and an URI that contains the UUID: im://article/1d02738f-7c99-42ba-a6da-3d1b97261523
Documents from an external systems should construct an URI that represents the ID of the document in the external system.
If you have a system called Robot that produces an article with the ID 1234-8754 you could construct an URI like robot://article/1234-8754
and generate a v5 UUID from it.
Generating a v5 UUID
A UUIDv5 is created from a name (uri) in a namespace (url).
In a shell you would do it like this:
In Go, you would do it like this:
Status and timestamps
Statuses
draft: a working copy
done: work is done, needs to be approved by e.g. an editor
withheld: scheduled for publishing
usable: published
canceled: the article has been published, but was then unpublished
The document status works in collaboration with the published
and unpublished
timestamps.
Status and timestamps
For "withheld" documents the published
timestamp is when they will be published
For "usable" documents published
is the time they were published, and if unpublished
is set they will be cancelled
Building blocks
The document has three primary sets of blocks that describes it:
links
meta
content
These blocks are also recursive and can in turn contain links, properties (metadata equivalent) and content.
Block attributes
id
is the block IDuuid
is used when a block references another document.uri
is used to reference another entity (document or otherwise)url
is a browseable URL for the block.type
is a mime-ish type for the blocktitle
title/headline of the blockdata
key-value datarel
is the relationship the block has to its parentname
is a name for the block. An alternative to "rel" when relationship is a term that doesn't fit
Block attributes
value
is a value for the block. Useful when we want to store a primitive valuecontentType
is used to describe the content type of the block/linked entity if it differs from the type of the block
And then we have the nested blocks:
links
is a set of link blocksproperties
is a set of properties for the block, much likemeta
is for the documentcontent
is used to nest content under a block
Modelling data with blocks
Block nesting and the key value structure under data
is intended to be used responsibly.
Model your data with nesting, but don't go multi-level without considering complexity costs.
Modelling data with blocks - a video block
Try to keep data keys generic, and don't do things like this:
Modelling data with blocks
Use nesting instead:
Modelling data with blocks
The link model scales better with feature requests like "we need to credit multiple authors", or "we need to provide the dimensions of the cover image". These innocent requests could result in:
Instead of semantic structure, we are left with arbitrary fields.
The data block
The data block allows arbitrary keys and values
but there are some keys that are expected to contain certain values
"geometry" is a WKT string
"width", "height", "x", "y", "score" et.c. are expected to be numbers
when used with "text", "format" refers to the format of the text.
and the type of the block must act as a contract between producers and consumers of content
Document links
Document links associates the documents with concepts, external resources, and other documents.
The order of links doesn't have any semantic meaning.
Document links - subjects
A document may have the same kind of relationship to document of different types
Or different relationships to documents of the same type
Document links - authors
Metadata blocks
Metadata blocks carry information about the document.
Metadata blocks
Metadata blocks are commonly associated with a writer plugin.
Here we see the product of the news value plugin, where somebody is feeding an algorithm (or analytics) information about the editorial valuation of an article.
Metadata blocks - teaser
Content blocks
Content blocks describes the content that typically gets rendered when we display a document.
Content blocks
Examples of a headline and a paragraph.
Content block - Image
Last updated