Introduction NewsML-NavigaDoc
New document format - NavigaDoc
Background
There was a need a new document format that is less verbose and JSON/GraphQL/struct-friendly. The goal was to be able to use the same format all the way from authoring tools to frontend applications.
The JSON format sticks close enough to the way data and content is modelled in our NewsML format to make bi-directional conversion feasible. We will still store documents as NewsML documents in the content repo, OC. Over time, we will move to NavigaDoc even at the storage level.
What is NavigaDoc?
NavigaDoc is a JSON version of our existing NewsML document format ...and some other XML formats we're replacing.
Editorial notes
The NewsML format contains some fields that are reserved for internal editorial notes, these are itemMeta/edNote
and contentMeta/description
. These fields should not be used.
Inline Notes
There's a Writer plugin, the Inline Notes plugin, that adds inline text to the NewsML document. It's sometimes used for notes or comments. Please use this plugin only if you understand that the added text in some situations may follow the article from the Writer to the presentation layer.
IMPORTANT: Please note that the Inline Notes plugin stores the inline text in the article NewsML document, as part of the article content, using the INS tag.
Example 1: an article
Here's an NewsML article and an example output file.
The conversion of the <data>
elements is the stickier parts of the conversion process. There is no single good way to handle the conversion of them. If I would guess I would say that we would need two conversion modes:
Key value: the data can be represented as a flat key value structure.
Sorry, but that's just text. The object data contains complex data
structures with nested elements, this gets converted to a "text"
and "format" attribute in the data map.
The first case should include handling of a set of well known key names. F.ex. we want width and height to be numbers and not strings.
Converting the second case to JSON will become nonsensical fast. The table object is a good example of this.
Modeling data
The print meta object is an interesting case for discussing data modelling within the data format, which applies to both the NewsML and JSON version.
The original looks like this:
If we attempt to model some of that data using the data structures we have available, it could look like this (the "part" specification could've gone either way, but it's a relationship to something, so it fits as a link):
...if we introduce <properties>
it could be written like this instead:
The upside of doing it like that is that it would become available for querying in a system like f.ex. GraphQL. It could also be readily serialized and consumed in a strictly typed language.
Last updated