Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Open Content: Make your content available for both your users, developers and readers
If you are thinking about creating your own headless CMS, Open Content will fit right in, and will solve several tedious parts of your journey ahead – storage, API’s, scalability, authentication and indexing just to name a few.
Open Content is a handy toolbox: we use it in our own solutions, for example as content backend for our Digital Writer and Newsroom apps as well as powering the Naviga web presentation layer. We also use it in our XLibris archive solution. Our customers uses it to power in-house built presentation solutions.
Together with the Naviga Creation and Presentation tools, Open Content delivers a standardised easy-to-maintain setup. You can also use Open Content as a content agnostic storage and search engine for digital content.
Using Amazon S3 as the main storage means in theory unlimited capacity. XML metadata files are used to describe the uploaded content. Properties are defined and extracted from the metadata using XPATH 2.0 expressions.
Indexing is done using Solr, an open source enterprise search platform built on Apache Lucene™, making your content accessible for any purpose.
Different content types (for example articles, images, lists, graphics) are separated and has their own specific properties setup. Relations between content items can be easily created, minimising the amount of requests needed to fetch the content.
We offer a standard OC setup for both content production as well as presentation, built on best practices. The standard setups are used with the Naviga Creation and Presentation Platforms.
It’s not a video- or streaming platform. If you want to store and edit streamed content, we recommend to use a specialised platform serving that purpose, like Flowplayer, Youplay or Youtube. It may be convenient to have access to such content within Open Content and then it’s just to add those objects and a subset of metadata to the Open Content as well, with a link to the original source.
The architecture describes the upcoming 3.0 version of Open Content.
For info about older versions, please look at the release documentation at https://wiki.infomaker.se/display/OCS/Open+Content
The Open Content stack consists of several parts, all running in the Amazon cloud.
Load Balancer. The OC stack uses the standard Amazon application load balancers.
OC API. Is the REST API for queries, read and write, as well as the OC Admin API. Runs in ECS and scales horizontally.
S3 is the storage where all content items are stored.
RDS is the database where we store a selection of meta data.
SolrCloud is the Solr cluster that executes the queries, manages the indexes etc. It's deployed in a EKS cluster, from 1 Solr node and up. We always recommend at least 2 Solr nodes for redundancy.
Binlog is created by the RDS, and contains all modifications to the OC content.
Kafka is a streaming platform where we persist all changes to the content item. It also powers the Indexer services. We use the Amazon managed Kafka service.
The Indexer is the part that extracts the metadata to index and perform the index updates in Solr. The updates are then committed to the index by Solr. The indexer is running in ECS containers and scales horizontally.
The Notifier is used to create event-driven workflows.
We always recommend a multi-AZ setup for all parts of the stack. That means the Open Content stack is running on multiple datacenters in parallel, enabling high availability.
For Open Content pre 3.0, you'll need to use the master-satellite mechanism (see below) to reach multi-AZ redundancy.
When using Open Content as a creation backend, we always use a Satellite Open Content for the presentation layers. The production and presentation is totally separated each of them can be configured and scaled in the appropriate way.
We recommend to use the Naviga standard configuration for Creation and Presentation. They are both versioned and maintained by Naviga, and are updated when needed to be in sync with the Naviga Creation and Presentation tools.
Master - satellite In complex environments setting up multiple Open Content Satellites might be a suitable way to scale. All content is stored in an Open Content Master setup, and predefined replication rules make sure the correct content is available in each Satellite. This does not require additional storage, they are setup as read-only OC’s, reading the content from the same S3 bucket, saving both time and money. As content can differ each Satellite maintains its own index.
Image upload to Open Content calculating the correct filename for proper use when Open Content is used as the content storage for Writer Articles.
This exercise shows :
Calculate the filename when image is used by writer
Using openssl
Create preview and thumb to be used by Open Content
Create xml metadata file
Upload of Image with preview, thumb and metadata file
For more details see the ./upload-image.sh
file.
When uploading images for Digital Writer the image file needs to be upload to an internal S3 bucket and also be copied to an external S3 bucket with a calculated filename.
This exercise will upload 6 concepts to Open Content. These Concept are referenced from the article which will be uploaded in lab 3
The script ./upload-concepts.sh will upload 6 concepts to Open Content using a curl multipart POST request.
For more details how this is done take a look at the script:
Remove all objects uploaded so that it can be done again
This exercise shows how to delete objects in Open Content.
./delete.sh [uuid]
will delete object with the specified uuid
will delete objects with source set to ./delete-mine.shlab-$(whoami)
the script in this exercise sets the source to lab-$(whoami)
Important: To enable a scalable and predictable solution, some old features have been removed:
Property extraction based on relations between content types has been removed. In OC 3.0 version you need to supply all metadata needed for property extraction within the content item itself.
Query time evaluations of XPath expressions has been removed from the Search API. Previously, if the value of a configured property was not indexed, OC would fetch the document and evaluate the XPath for the missing properties before returning the search result to make sure they were always included. In OC 3.0 only what is indexed will be returned in a search result. If you change the properties config, a reindex of the content is needed.
Support for multiple storages and import storage rules have been removed. There can only be one storage.
Identifiers as a feature is removed.
The import metadata rules function is removed.
Default search response properties can’t be configured anymore. The client should always specify what properties it wants in the search response. If the client does not specify any properties all will be returned.
Open Content is used to power the XLibris Archive. When importing content to an Open Content based archive you have two main options:
Convert and migrate all your content items to the Content NewsML format used by all Naviga Creation and Presentation tools. Of course, binary artefacts are more or less just copied to the OC Storage, but all meta data files, articles, meta data etc needs to be migrated. Depending of the quality and format used in your old content that can be a really massive work, or not. Contact us to discuss the scope of your migration. The benefit of doing so is that your content is more future proof, streamlined to one, well known format. Content items can more easily be reused. Standard configuration can be used.
The other option is more like a "copy" of the content to the OC Storage, and then create a configuration that adapts to your content. You still need your articles and meta data side car files in XML format. If you don't have that you need to migrate your content anyway. The advantage with this model is lower migration costs. On the other side, you'll have a more complex, customised configuration. Your content is still in the original format. You'll not be able to reuse content items in the same easy way as form migrated content.
Consult us to discuss what's best in your specific case.
Open Content is the content repository of the Creation and Presentation universe.
This book is intended for anyone managing or integrating with Open Content. If you're new to Open Content, we recommend starting with the overview. If you're a developer, feel free to jump straight to the API reference.
We urge you to reach out to us at support@infomaker.se if you have any questions. Certain sections are still incomplete, and in other sections we have yet to define well documented best practices.
REST API for content There is a Swagger REST API available for adding, modifying and deleting content as well as performing various kind queries. The query syntax is following the standard Solr syntax, but is also adding a set of extra comfort functions , like related content.
REST API for admin There is also REST API available for all kind of administration issues, like index, properties, extraction and storage management.
Event log API The events for the last 30 days are recorded and stored in the event log, accessible using the event log api.
Read more about the Open Content Rest API and you could also even try it yourself.
Onboarding We offer onboarding, on location or remote, for Open Content developers to get the most out of the available tools and solutions.
How to run Open Content in Docker on my own computer.
The Docker images for Open Content are primarily for development purposes, not production. So if you are a developer looking for how to start Open Content locally for integration testing or trying things out, then this is for you.
Wait until all containers are downloaded and started. Now there is an empty Open Content without configuration or content.
Configuration is done using the admin UI or the admin API. The UI can be found here http://localhost/admin.
Below is the menu for the Open Content admin UI.
The first thing that has to be configured is storage. This can either be done in the UI at http://localhost/admin
or with this curl command:
Open Content configuration in this setup is done using a local copy of our Bitbucket repository for configuration. Use the Open Content admin UI to inspect the detailed settings for the different configuration options.
Go to the opencontent-configuration
directory where the configure.sh script is
Configure Open Content for public use
Configure Open Content for public and app use
Configure Open Content for editorial use
Activation of the configuration
The Naviga content could act as a standard end-to-end solution. You use our standard authoring setup in combo with our solutions for presentation on the web and in mobile apps. In that case, we are managing everything from setup, configuration, hosting, support etc. You are still able to interact with the backend, but we recommend to use our more high-level API:s for content creation (like ingestion of content) instead of using the more low level OC API.
You can also use Naviga content solutions as a headless CMS, and build your own presentation layer. In that case, we recommend to use our content distribution API to power your own presentation solution. You may also use the more low level OC REST API to power your presentation layer. The distribution API also offers a cache solution. If you use the OC REST API, you need to add your own cache mechanism between OC and your presentation engine. It's possible to just scale up the read capacity of Open Content, but that will be a quite expensive solution in most cases.
Both solutions uses a separate Open Content for production, and one to power presentation layers. When a content item, like an article, is ready to be published (useable) it's copied to the public content repo by the Replicator service.
The developer friendly availability platform A well documented and flexible platform that makes all content available, all the time.
The backend for your headless CMS A headless CMS without a content repository is like an electric car without batteries. Instead of building batteries build your chassis.
Built for Amazon AWS Run Open Content in AWS, then we can handle upgrades and changes with zero downtime with unlimited storage and backup possibilities.
Integrated to Naviga Content solutions Works out-of-the-box with solutions such as Newspilot, Digital Writer, Dashboard and Naviga web.
API’s for everything Use our user interfaces for admin and search, or use the OC REST API:s. Regardless of approach, it’s all open for integration.
Reliable backend Spend less time on server issues and let us manage the hosting. Open Content supports a range of different setups, from a small single-node setup to large, clustered, high availability setups.
Scalable to suit your needs Open Content support both SolR, frontend API and indexing scaling. Open Content scales depending on your needs (and wallet).
Proven solution Used daily by thousands of Creation users, as well as powering hundreds of apps and sites all over the world.
Content Types Open Content configuration makes it possible to group content into Content Type (typically : Article, Image, Page, Concept, Job, Planning items, Lists, Packages). We have a standardised konfiguration for all tools in the Creation suite.
OC Concepts is an entire metadata universe – all stored and made available in Open Content
OC Concepts is a metadata structure, built around the IPTC NewsMLG2 standard. One of the most important parts of that is of course how to use it. For the editor, the developer as well as the end user. All concepts are stored and made available in Open Content.
In our view, metadata like categories and tags are not just text strings. Instead, each metadata is an object – each with a unique id, name and its own set of metadata and links.
Like an author. It could be just a name, But when you think of it as an object with a unique id, first name, last name, email, phone, description, avatar image, high res image and links things get really powerful.
These can be shown in your frontend if you want to, for example when showing articles for a specific category on a search page could then show the long description or image for that category.
Examples of Concepts:
Author
Category
Persons
Organisations
Topics
Places (poI:s or geo areas)
Story
Functional tags
The concepts are administered using our Dashboard application, your journalists use Digital Writer to choose the right concepts, and Everyware and the App Platform will show and let the user follow selected topics or geo areas.
Any digital material can be stored in Open Content using the Open Content REST-API. Open Content configuration makes it possible to group content into Content Type (typically : Article, Image, Page, Concept, Planning, Lists, Packages). Content from different systems can normalised into the same Content Type.
A Content object (item) consists of a primary file and a metadata file describing the the primary file.
Normally xml-metadata files are used to describe the content uploaded and properties are extracted from the meta data files using XPATH 2.0 expressions.
Open Content is configured with a browser based UI or by YAML-files.
Typical use cases for Open Content are:
Long-time archive with the XLibris search client
Back-end for web and mobile publishing using Naviga web
Content repo for the Content Creation Suite
Link to the Swagger documentation
Open Content Swagger REST-API documentation
The link above assumes that you are running Open Content locally. The api docs can be found at http://localhost:8080/opencontent/apidocs/
Please note that you need to understand the NewsML document format used in the ution API is not yet available for 3rd parties, but will be during 2020.
An overview of the eventlog and contentlog endpoints
The event log tells you what has happened after a last known event. Depending on your use-case you can either process the eventlog from the beginning (it keeps a history of one month), or start at the last event. It's useful to process all retained events if you want to prepopulate a cache, but if you just need it for invalidation of a cache that you start cold and build ad-hoc it makes more sense to start with the last event.
A request to the eventlog looks like this: GET
https://oc.tryout.infomaker.io:8443/opencontent/eventlog
If called without any query parameters you get events from the start of the log:
If you pass in a negative value, like so GET
https://oc.tryout.infomaker.io:8443/opencontent/eventlog?event=-2
, you get the last -N events in the log.
The id
attribute in the events can be used to paginate though the eventlog. So if we have processed events up until 406374 we would ask the eventlog for all events after it, like so GET
https://oc.tryout.infomaker.io:8443/opencontent/eventlog?event=406374
:
To fetch the updated object the normal objects endpoint is used GET
https://oc.tryout.infomaker.io:8443/opencontent/objects/f9f87e70-a0d7-4bc8-b2d4-5fab82760839?version=6