Walk-through the NavigaDoc format

Introduction to the NavigaDoc document format

NavigaDoc is a JSON version of our existing NewsML document format.

...and some other XML formats we're replacing

The NavigaDoc format is less verbose and JSON/GraphQL/struct-friendly. The idea is to be able to use the same format all the way from authoring tools to frontend applications.

The JSON format sticks close enough to the way data and content is modelled in our NewsML format to make bi-directional conversion feasible. We are automatically converting between NavigaDoc JSON and NewsML on the fly in the Content Creation API (CCA). All writes to the OC in question will have to go through the intermediate API (the Content Creation API - CCA) that handles the conversion.

Relying on conversion JSON<->XML

We're still storing documents as NewsML XML in the OC Content repository. To ensure a reliable 1:1 conversion, we puts some constraint on how to use NewsML in the Writer. And some constraints on what we can use in NavigaDoc. More on that when we are more familiar with the format.

The supported document types, i.e. the document types that can be converted to and from the NewsML XML are:

  • Article: x-im/article

  • Article template: x-im/article-template

  • Image: x-im/image

  • Imagelink: x-im/imagelink

  • PDF: x-im/pdf

  • Concept: The actual type depends on which type of concept it is;

    • Author: x-im/author

    • Category: x-im/category

    • Channel: x-im/channel

    • Content profile: x-im/content-profile

    • Event: x-im/event

    • Organisation: x-im/organisation

    • Person: x-im/person

    • Place: x-im/place

    • Section: x-im/section

    • Story: x-im/story

    • Topic: x-im/topic

  • List: x-im/list

  • Package: x-im/package

  • Planning: x-im/newscoverage

  • Assignment: x-im/assignment

The type corresponds to the $.type attribute of the NavigaDoc document (see below).

The attributes of a NavigaDoc document

Top level document structure:

{
  "uuid": "1d02738f-7c99-42ba-a6da-3d1b97261523",
  "title": "Proin eget dignissim ipsum",
  "status": "withheld",
  "provider": "acme",
  "modified": "2015-07-01T14:11:20Z",
  "created": "2015-07-01T14:00:02Z",
  "type": "x-im/article",
  "uri": "im://article/1d02738f-7c99-42ba-a6da-3d1b97261523",
  "url": "http://example.org/articles/2015-",
  "published": "2015-07-01T14:27:00+02:00",
  "unpublished": "2015-10-05T15:14:13+02:00",
  "language": "en-GB",
  "links": [{...}],
  "meta": [{...}],
  "content": [{...}],
  "properties": [{...}],
}

The attributes of a document

  • uuid: ID of the document

  • uri: an URL-based identifier for the document

  • url: the location of the document (if any)

  • title: title of the document (not headline)

  • status: workflow status of the document, "draft" et.c.

  • provider: the provider of the document, e.g. "TT" or "NavigaPhotos"

  • created: when the document was created

  • type: the type of document, e.g. "x-im/article"

  • modified: last modified timestamp

  • published: when the document was, or will be published

  • unpublished: when the document was, or will be unpublished

  • language: the language code for the document: "en-GB", "fi", "sv" et.c.

  • path: the path on which the document can be exposed when consumed through a website

  • products: a list of product names, replaced by channel-links but preserved for legacy support

  • content: a list of content blocks

  • meta: a list of metadata blocks that describes the document

  • links: a list of link blocks that describes the documents relationships.

  • properties: a list of properties, primarily used when converting from XML

Note: Storing the document in Open Content, there is a "core" property named source which will have the value "cca" if document is saved using CCA. The source property is not supported in the actual document, only as a property in Open Content.

UUID and URI

Articles created by the writer have a random UUIDv4 and an URI that contains the UUID: im://article/1d02738f-7c99-42ba-a6da-3d1b97261523

Documents from an external systems should construct an URI that represents the ID of the document in the external system.

If you have a system called Robot that produces an article with the ID 1234-8754 you could construct an URI like robot://article/1234-8754 and generate a v5 UUID from it.

Generating a v5 UUID

A UUIDv5 is created from a name (uri) in a namespace (url).

In a shell you would do it like this:

$ uuidgen --namespace @url --sha1 --name robot://article/1234-8754 
bda1a573-e7ab-5076-adbf-aa3ff9ba8106

In Go, you would do it like this:

package main

import uuid "github.com/satori/go.uuid"

func main() {
	uri := "robot://article/1234-8754"
	uuidV5 := uuid.NewV5(uuid.NamespaceURL, uri)
	println(uuidV5.String())
	// Output: bda1a573-e7ab-5076-adbf-aa3ff9ba8106
}

Status and timestamps

Statuses

  • draft: a working copy

  • done: work is done, needs to be approved by e.g. an editor

  • withheld: scheduled for publishing

  • usable: published

  • canceled: the article has been published, but was then unpublished

The document status works in collaboration with the published and unpublished timestamps.

Status and timestamps

For "withheld" documents the published timestamp is when they will be published

For "usable" documents published is the time they were published, and if unpublished is set they will be cancelled

Building blocks

The document has three primary sets of blocks that describes it:

  • links

  • meta

  • content

These blocks are also recursive and can in turn contain links, properties (metadata equivalent) and content.

Block attributes

  • id is the block ID

  • uuid is used when a block references another document.

  • uri is used to reference another entity (document or otherwise)

  • url is a browseable URL for the block.

  • type is a mime-ish type for the block

  • title title/headline of the block

  • data key-value data

  • rel is the relationship the block has to its parent

  • name is a name for the block. An alternative to "rel" when relationship is a term that doesn't fit

Block attributes

  • value is a value for the block. Useful when we want to store a primitive value

  • contentType is used to describe the content type of the block/linked entity if it differs from the type of the block

And then we have the nested blocks:

  • links is a set of link blocks

  • properties is a set of properties for the block, much like meta is for the document

  • content is used to nest content under a block

Modelling data with blocks

Block nesting and the key value structure under data is intended to be used responsibly.

Model your data with nesting, but don't go multi-level without considering complexity costs.

Modelling data with blocks - a video block

Try to keep data keys generic, and don't do things like this:

{
  "type": "sanoma/video-type",
  "title": "Onnellinen lokki",
  "uri": "videprovider://video/1234-5678",
  "data": {
    "bylineName": "Hugo Wetterberg",
    "bylineImage": "https://example.com/hugo.png",
    "bylineLink": "https://example.com/photographer/hugo",
    "coverImage": "https://example.com/seagull.png"
  }
}

Modelling data with blocks

Use nesting instead:

{
  "type": "sanoma/video-type",
  "title": "Onnellinen lokki",
  "uri": "videprovider://video/1234-5678",
  "links": [
    {
      "rel": "author",
      "title": "Hugo Wetterberg",
      "url": "https://example.com/photographer/hugo",
      "links": [
          { "rel": "avatar", "url": "https://example.com/seagull.png" }
      ]
    },
    {
      "rel": "cover-image", "url": "https://example.com/hugo.png"
    }
  ]
}

Modelling data with blocks

The link model scales better with feature requests like "we need to credit multiple authors", or "we need to provide the dimensions of the cover image". These innocent requests could result in:

"data": {
  "bylineName": "Hugo Wetterberg",
  "bylineImage": "https://example.com/hugo.png",
  "bylineLink": "https://example.com/photographer/hugo",
  "bylineTwoName": "Kristofer Pasanen",
  "bylineTwoImage": "https://example.com/kristofer.png",
  "bylineTwoLink": "https://example.com/photographer/kristofer",
  "coverImage": "https://example.com/seagull.png"
  "coverImageWidth": "1920"
  "coverImageHeight": "1080"
}

Instead of semantic structure, we are left with arbitrary fields.

The data block

The data block allows arbitrary keys and values

  • but there are some keys that are expected to contain certain values

    • "geometry" is a WKT string

    • "width", "height", "x", "y", "score" et.c. are expected to be numbers

    • when used with "text", "format" refers to the format of the text.

  • and the type of the block must act as a contract between producers and consumers of content

Document links associates the documents with concepts, external resources, and other documents.

The order of links doesn't have any semantic meaning.

A document may have the same kind of relationship to document of different types

Or different relationships to documents of the same type

{
  "title": "Dalarna",
  "rel": "subject", "type": "x-im/category",
  "uuid": "03d22994-91e4-11e5-8994-feff819cdc9f"
},
{
  "title": "Volvo",
  "rel": "subject", "type": "x-im/topic",
  "uuid": "b201e042-555b-11e5-885d-feff819cdc9f"
},
{
  "title": "Alvesta",
  "rel": "subject", "type": "x-im/place",
  "uuid": "bce38dda-555b-11e5-885d-feff819cdc9f",
  "data": { "geometry": "POINT(14.55600 56.89921)" }
}
{
  "title": "Jane Doe", "rel": "author", "type": "x-im/author",
  "uuid": "bad4314c-7e33-11e5-8bcf-feff819cdc9f",
  "uri": "im://user/58456",
  "links": [
    {
      "rel":"avatar",
      "type":"x-im/image",
      "uuid":"9c188460-c500-11e5-9912-ba0be0483c18",
      "uri":"im://image/janedoe.jpeg"
    }
  ]
}

Metadata blocks

Metadata blocks carry information about the document.

Metadata blocks

Metadata blocks are commonly associated with a writer plugin.

Here we see the product of the news value plugin, where somebody is feeding an algorithm (or analytics) information about the editorial valuation of an article.

{
  "type" : "x-im/newsvalue",
  "data" : {
    "duration" : "86400",
    "description" : "1D",
    "score" : "4"
  }
}

Metadata blocks - teaser

{
  "type": "x-im/teaser",
  "title": "The squid comes for you",
  "data": {
    "title": "The squid comes for you",
    "text": "10 facts about the mecha-squids that terrorised cowboys during the gold rush.",
    "subject": "A mecha squid racing to catch a gunslinger on horseback"
  },
  "links": [
    {
      "rel": "image",
      "type": "x-im/image",
      "uri": "im://image/WEH99iJHOXu6ssz7h8Ne7kFLmqs.png",
      "uuid": "f7d9d837-5048-54ee-b961-064dfc8467ca",
      "data": {
        "width": "1600",
        "height": "900"
      }
    }
  ]
}

Content blocks

Content blocks describes the content that typically gets rendered when we display a document.

Content blocks

Examples of a headline and a paragraph.

{
  "id": "d0dbf67d385e",
  "type": "x-im/header",
  "data": {
    "format": "html",
    "text": "Lorem ipsum dolor sit"
  }
},
{
  "id": "fafbedf02da1", "type": "x-im/paragraph",
  "data": {
    "format": "html",
    "text": "Mauris eleifend, <a href=\"http://google.com\">Bacon</a> orci nec volutpat."
  }
}

Content block - Image

{
  "type" : "x-im/image"
  "uuid" : "1b34f847-fb4c-59e2-a648-42fe168061d2",
  "id" : "MTE2LDQxLDE3MywxMDU",
  "links" : [
    {
      "type" : "x-im/image",
      "links" : [
        {
          "title" : "Kristofer Pasanen",
          "rel" : "author"
        }
      ],
      "data" : {
        "height" : "1668",
        "width" : "2500",
        "text" : "Sed libero metus, iaculis sit amet dolor."
      },
      "uri" : "im://image/ZcrcVwEZyI29HDnmykq1te8M5-s.jpeg",
      "uuid" : "1b34f847-fb4c-59e2-a648-42fe168061d2",
      "rel" : "self"
    }
  ]
}

Last updated