New in OC 3.0 (draft)

Open Content 3.0 is a major new version. It's not yet released, but is planned for release mid 2020.

The 3.0 version of Open Content is a major upcoming release. A lot of effort, on all levels, have been put into the areas of increased performance, scalability and availability. Many pieces have been optimised, rewritten or redesigned. The APIs are still the same, except for a few functions that has been deprecated.

SolrCloud support Running one single instance of Solr means that you have one single index running on one Solr node. Even if we have quick restore processes, that’s not a redundant solution. With the 3.0 version we have standardized a multi node SolrCloud setup as an option to the standard setup.

The SolrCloud setup runs in a Kubernetes (https://kubernetes.io/) cluster, starting with 3 Solr nodes plus the necessary orchestration mechanisms. The Solr version used in the 3.0 version is 8.x.x.

Support for multiple indexers Open Content versions previous to 3.0 supported one single indexer process. OC 3.0 allows you to deploy multiple indexers working in parallel. The indexer is no longer a single point of failure. The new indexer is also faster and running multiple indexers scales the indexing performance.

We have also offloaded a lot of work from the OC API fronts, like moving the property extraction to the indexer process. The OC API does not share the database with the indexer anymore. This increases the OC API performance in general and also provides a more predictable performance.

Apache Kafka The Kafka streaming platform (https://kafka.apache.org/) is now a part of the Open Content solution. In addition to the classic Open Content event log, all commits (add, update, delete) are inserted into the Kafka log. Kafka is used internally to power the new indexer processes as well as the upcoming Audit Trail module for the Naviga Writer and Dashboard. The complete content item is stored in Kafka (excluding binary artefacts).

Increased upload performance Bottlenecks in the upload process has been identified, fixed and optimised to get the highest possible upload throughput. Upload of content now scales more or less linear with the amount of OC API fronts used.

Increased read performance We have made a set of query and read optimizations and eliminated a couple of bottlenecks. The performance when querying for nested properties is substantially increased. Resolving nested properties is now parallelized to maximize the utilization of the hardware. The number of Solr requests needed for resolving nested properties is also substantially decreased. Using the new SolrCloud multi-node setup is also a good way to scale querying performance adding more Solr nodes. Both the OC API as well as the SolrCloud cluster now scales almost linear in read intensive setups.

Increased index update performance Using Solr sharding we are able to split indexes in smaller pieces and thereby increasing the commit capacity. The indexing process itself has also been re-designed to be more streamlined and efficient. We are now also able to run multiple indexers in parallel to boost the indexing performance.

AWS deployment Open Content 3.0 requires to be deployed in the AWS cloud. The OC 3.0 setup uses AWS services and deployment templates designed for AWS. Note: On premise installations are not supported (on premise installation is possible with Open Content up to 2.2.3).

Metrics Prometheus (https://prometheus.io/) is supported in the new 3.0 setup. The OC API, SolrCloud cluster, Kafka and the indexer processes all expose metrics that can be graphed and acted on.

Last updated