Introduction NewsML-NavigaDoc

New document format - NavigaDoc

Background

There was a need a new document format that is less verbose and JSON/GraphQL/struct-friendly. The goal was to be able to use the same format all the way from authoring tools to frontend applications.

The JSON format sticks close enough to the way data and content is modelled in our NewsML format to make bi-directional conversion feasible. We will still store documents as NewsML documents in the content repo, OC. Over time, we will move to NavigaDoc even at the storage level.

What is NavigaDoc?

NavigaDoc is a JSON version of our existing NewsML document format ...and some other XML formats we're replacing.

Editorial notes

The NewsML format contains some fields that are reserved for internal editorial notes, these are itemMeta/edNote and contentMeta/description . These fields should not be used.

Inline Notes

There's a Writer plugin, the Inline Notes plugin, that adds inline text to the NewsML document. It's sometimes used for notes or comments. Please use this plugin only if you understand that the added text in some situations may follow the article from the Writer to the presentation layer.

IMPORTANT: Please note that the Inline Notes plugin stores the inline text in the article NewsML document, as part of the article content, using the INS tag.

Example 1: an article

Here's an NewsML article and an example output file.

The conversion of the <data> elements is the stickier parts of the conversion process. There is no single good way to handle the conversion of them. If I would guess I would say that we would need two conversion modes:

  1. Key value: the data can be represented as a flat key value structure.

  2. Sorry, but that's just text. The object data contains complex data

    structures with nested elements, this gets converted to a "text"

    and "format" attribute in the data map.

The first case should include handling of a set of well known key names. F.ex. we want width and height to be numbers and not strings.

Converting the second case to JSON will become nonsensical fast. The table object is a good example of this.

Modeling data

The print meta object is an interesting case for discussing data modelling within the data format, which applies to both the NewsML and JSON version.

The original looks like this:

<object id="9076h25e322y" type="x-im/print-meta">
    <data>
        <firstPagin>2</firstPagin>
        <multiPageCount>2</multiPageCount>
        <part>A</part>
        <publicationDate>2017-11-28</publicationDate>
        <publicationDateName>29.11.2017</publicationDateName>
        <originalArticleNewspilotID>112233</originalArticleNewspilotID>
        <originalArticleNewspilotGUID>dfec478e-0014-4948-afb7-08fe0038307a</originalArticleNewspilotGUID>
        <newspilotJobId>2211</newspilotJobId>
    </data>
</object>

If we attempt to model some of that data using the data structures we have available, it could look like this (the "part" specification could've gone either way, but it's a relationship to something, so it fits as a link):

<object id="9076h25e322y" type="x-im/print-meta">
    <data>
        <firstPagin>2</firstPagin>
        <multiPageCount>2</multiPageCount>
        <publicationDate>2017-11-28</publicationDate>
        <publicationDateName>29.11.2017</publicationDateName>
    </data>
    <links>
        <link rel="part" title="A" />
        <link rel="alternate" uri="newspilot://id/112233" />
        <link rel="alternate" uri="newspilot://guid/dfec478e-0014-4948-afb7-08fe0038307a" />
        <link rel="job" uri="newspilot://job/2211" />
    </links>
</object>

...if we introduce <properties> it could be written like this instead:

<object id="9076h25e322y" type="x-im/print-meta">
    <properties>
        <property name="part" value="A" />
        <property name="firstPagin" value="2" />
        <property name="multiPageCount" value="2" />
        <property name="publicationDate" value="2017-11-28" title="29.11.2017" />
    </properties>
    <links>
        <link rel="alternate" uri="newspilot://id/112233" />
        <link rel="alternate" uri="newspilot://guid/dfec478e-0014-4948-afb7-08fe0038307a" />
        <link rel="job" uri="newspilot://job/2211" />
    </links>
</object>

The upside of doing it like that is that it would become available for querying in a system like f.ex. GraphQL. It could also be readily serialized and consumed in a strictly typed language.

Last updated