Walk-through the NavigaDoc format
NavigaDoc is a JSON version of our existing NewsML document format.
...and some other XML formats we're replacing
The NavigaDoc format is less verbose and JSON/GraphQL/struct-friendly. The idea is to be able to use the same format all the way from authoring tools to frontend applications.
The JSON format sticks close enough to the way data and content is modelled in our NewsML format to make bi-directional conversion feasible. We are automatically converting between NavigaDoc JSON and NewsML on the fly in the Content Creation API (CCA). All writes to the OC in question will have to go through the intermediate API (the Content Creation API - CCA) that handles the conversion.
We're still storing documents as NewsML XML in the OC Content repository. To ensure a reliable 1:1 conversion, we puts some constraint on how to use NewsML in the Writer. And some constraints on what we can use in NavigaDoc. More on that when we are more familiar with the format.
The supported document types, i.e. the document types that can be converted to and from the NewsML XML are:
- Article:
x-im/article
- Article template:
x-im/article-template
- Image:
x-im/image
- PDF:
x-im/pdf
- Concept: The actual type depends on which type of concept it is;
- Author:
x-im/author
- Category:
x-im/category
- Channel:
x-im/channel
- Content profile:
x-im/content-profile
- Event:
x-im/event
- Organisation:
x-im/organisation
- Person:
x-im/person
- Place:
x-im/place
- Section:
x-im/section
- Story:
x-im/story
- Topic:
x-im/topic
- List:
x-im/list
- Package:
x-im/package
- Planning:
x-im/newscoverage
- Assignment:
x-im/assignment
The type corresponds to the
$.type
attribute of the NavigaDoc document (see below).Top level document structure:
{
"uuid": "1d02738f-7c99-42ba-a6da-3d1b97261523",
"title": "Proin eget dignissim ipsum",
"status": "withheld",
"provider": "acme",
"modified": "2015-07-01T14:11:20Z",
"created": "2015-07-01T14:00:02Z",
"type": "x-im/article",
"uri": "im://article/1d02738f-7c99-42ba-a6da-3d1b97261523",
"url": "http://example.org/articles/2015-",
"published": "2015-07-01T14:27:00+02:00",
"unpublished": "2015-10-05T15:14:13+02:00",
"language": "en-GB",
"links": [{...}],
"meta": [{...}],
"content": [{...}],
"properties": [{...}],
}
uuid
: ID of the documenturi
: an URL-based identifier for the documenturl
: the location of the document (if any)title
: title of the document (not headline)status
: workflow status of the document, "draft" et.c.provider
: the provider of the document, e.g. "TT" or "NavigaPhotos"created
: when the document was createdtype
: the type of document, e.g. "x-im/article"modified
: last modified timestamppublished
: when the document was, or will be publishedunpublished
: when the document was, or will be unpublishedlanguage
: the language code for the document: "en-GB", "fi", "sv" et.c.path
: the path on which the document can be exposed when consumed through a websiteproducts
: a list of product names, replaced by channel-links but preserved for legacy supportcontent
: a list of content blocksmeta
: a list of metadata blocks that describes the documentlinks
: a list of link blocks that describes the documents relationships.properties
: a list of properties, primarily used when converting from XML
Note: Storing the document in Open Content, there is a "core" property named
source
which will have the value "cca" if document is saved using CCA. The source
property is not supported in the actual document, only as a property in Open Content. Articles created by the writer have a random UUIDv4 and an URI that contains the UUID:
im://article/1d02738f-7c99-42ba-a6da-3d1b97261523
Documents from an external systems should construct an URI that represents the ID of the document in the external system.
If you have a system called Robot that produces an article with the ID 1234-8754 you could construct an URI like
robot://article/1234-8754
and generate a v5 UUID from it.A UUIDv5 is created from a name (uri) in a namespace (url).
In a shell you would do it like this:
$ uuidgen --namespace @url --sha1 --name robot://article/1234-8754
bda1a573-e7ab-5076-adbf-aa3ff9ba8106
In Go, you would do it like this:
package main
import uuid "github.com/satori/go.uuid"
func main() {
uri := "robot://article/1234-8754"
uuidV5 := uuid.NewV5(uuid.NamespaceURL, uri)
println(uuidV5.String())
// Output: bda1a573-e7ab-5076-adbf-aa3ff9ba8106
}
Statuses
- draft: a working copy
- done: work is done, needs to be approved by e.g. an editor
- withheld: scheduled for publishing
- usable: published
- canceled: the article has been published, but was then unpublished
The document status works in collaboration with the
published
and unpublished
timestamps.For "withheld" documents the
published
timestamp is when they will be publishedFor "usable" documents
published
is the time they were published, and if unpublished
is set they will be cancelled
The document has three primary sets of blocks that describes it:
- links
- meta
- content
These blocks are also recursive and can in turn contain links, properties (metadata equivalent) and content.
id
is the block IDuuid
is used when a block references another document.uri
is used to reference another entity (document or otherwise)url
is a browseable URL for the block.type
is a mime-ish type for the blocktitle
title/headline of the blockdata
key-value datarel
is the relationship the block has to its parentname
is a name for the block. An alternative to "rel" when relationship is a term that doesn't fit
value
is a value for the block. Useful when we want to store a primitive valuecontentType
is used to describe the content type of the block/linked entity if it differs from the type of the block
And then we have the nested blocks:
links
is a set of link blocksproperties
is a set of properties for the block, much likemeta
is for the documentcontent
is used to nest content under a block
Block nesting and the key value structure under
data
is intended to be used responsibly.Model your data with nesting, but don't go multi-level without considering complexity costs.
Try to keep data keys generic, and don't do things like this:
{
"type": "sanoma/video-type",
"title": "Onnellinen lokki",
"uri": "videprovider://video/1234-5678",
"data": {
"bylineName": "Hugo Wetterberg",
"bylineImage": "https://example.com/hugo.png",
"bylineLink": "https://example.com/photographer/hugo",
"coverImage": "https://example.com/seagull.png"
}
}
Use nesting instead:
{
"type": "sanoma/video-type",
"title": "Onnellinen lokki",
"uri": "videprovider://video/1234-5678",
"links": [
{
"rel": "author",
"title": "Hugo Wetterberg",
"url": "https://example.com/photographer/hugo",
"links": [
{ "rel": "avatar", "url": "https://example.com/seagull.png" }
]
},
{
"rel": "cover-image", "url": "https://example.com/seagull.png"
}
]
}
The link model scales better with feature requests like "we need to credit multiple authors", or "we need to provide the dimensions of the cover image". These innocent requests could result in:
"data": {
"bylineName": "Hugo Wetterberg",
"bylineImage": "https://example.com/hugo.png",
"bylineLink": "https://example.com/photographer/hugo",
"bylineTwoName": "Kristofer Pasanen",
"bylineTwoImage": "https://example.com/kristofer.png",
"bylineTwoLink": "https://example.com/photographer/kristofer",
"coverImage": "https://example.com/seagull.png"
"coverImageWidth": "1920"
"coverImageHeight": "1080"
}
Instead of semantic structure, we are left with arbitrary fields.
The data block allows arbitrary keys and values
- but there are some keys that are expected to contain certain values
- "width", "height", "x", "y", "score" et.c. are expected to be numbers
- when used with "text", "format" refers to the format of the text.
- and the type of the block must act as a contract between producers and consumers of content
Document links associates the documents with concepts, external resources, and other documents.
The order of links doesn't have any semantic meaning.
A document may have the same kind of relationship to document of different types
Or different relationships to documents of the same type
{
"title": "Dalarna",
"rel": "subject", "type": "x-im/category",
"uuid": "03d22994-91e4-11e5-8994-feff819cdc9f"
},
{
"title": "Volvo",
"rel": "subject", "type": "x-im/topic",
"uuid": "b201e042-555b-11e5-885d-feff819cdc9f"
},
{
"title": "Alvesta",
"rel": "subject", "type": "x-im/place",
"uuid": "bce38dda-555b-11e5-885d-feff819cdc9f",
"data": { "geometry": "POINT(14.55600 56.89921)" }
}
{
"title": "Jane Doe", "rel": "author", "type": "x-im/author",
"uuid": "bad4314c-7e33-11e5-8bcf-feff819cdc9f",
"uri": "im://user/58456",
"links": [
{
"rel":"avatar",
"type":"x-im/image",
"uuid":"9c188460-c500-11e5-9912-ba0be0483c18",
"uri":"im://image/janedoe.jpeg"
}
]
}
Metadata blocks carry information about the document.
Metadata blocks are commonly associated with a writer plugin.
Here we see the product of the news value plugin, where somebody is feeding an algorithm (or analytics) information about the editorial valuation of an article.
{
"type" : "x-im/newsvalue",
"data" : {
"duration" : "86400",
"description" : "1D",
"score" : "4"
}
}
{
"type": "x-im/teaser",
"title": "The squid comes for you",
"data": {
"title": "The squid comes for you",
"text": "10 facts about the mecha-squids that terrorised cowboys during the gold rush.",
"subject": "A mecha squid racing to catch a gunslinger on horseback"
},
"links": [
{
"rel": "image",
"type": "x-im/image",
"uri": "im://image/WEH99iJHOXu6ssz7h8Ne7kFLmqs.png",
"uuid": "f7d9d837-5048-54ee-b961-064dfc8467ca",
"data": {
"width": "1600",
"height": "900"
}
}
]
}
Content blocks describes the content that typically gets rendered when we display a document.
Examples of a headline and a paragraph.
{
"id": "d0dbf67d385e",
"type": "x-im/header",
"data": {
"format": "html",
"text": "Lorem ipsum dolor sit"
}
},
{
"id": "fafbedf02da1", "type": "x-im/paragraph",
"data": {
"format": "html",
"text": "Mauris eleifend, <a href=\"http://google.com\">Bacon</a> orci nec volutpat."
}
}
{
"type" : "x-im/image"
"uuid" : "1b34f847-fb4c-59e2-a648-42fe168061d2",
"id" : "MTE2LDQxLDE3MywxMDU",
"links" : [
{
"type" : "x-im/image",
"links" : [
{
"title" : "Kristofer Pasanen",
"rel" : "author"
}
],
"data" : {
"height" : "1668",
"width" : "2500",
"text" : "Sed libero metus, iaculis sit amet dolor."
},
"uri" : "im://image/ZcrcVwEZyI29HDnmykq1te8M5-s.jpeg",
"uuid" : "1b34f847-fb4c-59e2-a648-42fe168061d2",
"rel" : "self"
}
]
}
Last modified 6mo ago