Open Content 2.3

Wildfly is deployed as a container in Amazon ECS

In 2.3 the Wildfly service is available as a container image. When installing Open Content in Amazon AWS the Wildfly Service is deployed on Amazon ECS. The Wildfly instances belong to a private subnet and will not get public IPs. Access to the API has to be done through the public load balancer.

XLibris and OC Admin are also available as container images. They are using the latest versions of PHP and Apache httpd. When using Amazon ECS XLibris and OC Admin are deployed on their own EC2 instances without public IPs. The public loadbalancer is the only way to access the services.

API Cache

When Open Content is deployed on Amazon AWS an API cache can be put in front of Wildfly. It is running on port 9999 while Wildfly is running on 8080. When a client wants to use the cache it must use port 9999. Because both ports are open it's always possible to directly talk to Wildfly even if the cache is enabled.

Validation of properties on add and update

Until now there has been no validation of properties when adding content. For example if an article is uploaded with a property "TextLength" of type Integer but contains a string instead, then the upload still succeeds.

In 2.3 properties are validated according to their type. Validation is performed on adds and updates of content and if it fails it results in HTTP 400 Bad Request.

New property type called WKT (Well Known Text)

Until now latitude, longitude and spatial geometries had to be of property type "String". There has been no validation of the string until it has been indexed into Solr. When Solr refuses to accept the invalid WKT an entry has been added to the indexer error log. But the indexing happens in the background and the user who uploaded the content may never discover that something was wrong with the content that was uploaded.

In 2.3 there is a new property type called "WKT". This property is validated when content is either added or updated. If the property contains an invalid WKT string then the add or update will return HTTP 400 Bad Request.

Failing extractors are no longer suppressed

An XPath can be valid but throw an error anyway depending on the text it's applied on. Until now Open Content has silently suppressed these errors. A result of this is that content can be indexed while one or two properties are missing from the index.

In 2.3 no extractors are suppressed. If an XPath fails at content upload time the upload will respond with a HTTP 400 Bad Request. If an XPath fails at indexing time the content will not get indexed at all.

Fall back to a default dynamic path / if none is configured

Until now it has been mandatory to configure dynamic path for the storage.

In 2.3 the absence of a dynamic path will lead to content being placed in the root path. So for example if S3 is used for storage and no dynamic path is configured, then the uuid will be be in the root of the S3 bucket with no prefix added.

Upgrade to Solr 7.7

Open Content has used version 5.5 of Solr for a long time. In 2.3 it has been upgraded to Solr 7.7. This has the implication that all content has to be reindexed.

A backward incompatible change is that it is not possible to do a query like Pubdate:* to get all content that has a Pubdate. That query needs to be changed to Pubdate:[* TO *]. The reason is that Solr has deprecated TrieDateField for dates so Open Content is using DatePointField instead. This is true for index fields of type date, int, long, float and double.

Open Content does not deliver a clustered Solr out of the box yet, but it has been prepared by using SolrCloud and allowing multiple configured ZooKeeper.

Update Swagger apidocs to OpenAPI 3.0

The Swagger support has been reimplemented from scratch and has a couple of improvements.

The specification is updated from Swagger 1.2 to OpenAPI 3.0.

OpenAPI specification for Open Content is now generated at release time and is a static file so loading Swagger is much faster.

Swagger UI is updated to the latest version which is why the look and colours have changed.

Many REST API documentation errors have been fixed.

Upgrade to Java 11

Until now Wildfly, Indexer, Notifier and Replicator have been using Oracle's distribution of OpenJDK 8.

In 2.3 all Java based services use AdoptOpenJDK's distribution of OpenJDK 11.

Upgrade to Wildfly 15

Has support for Java 11.

Upgrade Saxon and get support for XPath 3.1

In the effort to keep as many 3rd party libraries as up-to-date as possible Saxon has also been updated. This means that XPath and XSLT extractors now do support XPath 3.1 and XSLT 3.0.

Nested properties API call may result in too long Solr GET request

When using nested properties Wildfly makes HTTP GET requests to Solr. In some circumstances the URL length hits a limit and Solr refuses the requests. In order to get around this Wildfly now sends the requests to Solr using HTTP POST instead.

Sortable flag on index fields is removed

Sorting in Solr is memory intensive. The sortable flag has been a not-perfect guard against out of memory errors in Solr. By not allowing to sort on just any arbitrary number of fields there has been some sort of protection.

In 2.3 docValues have been enabled for many index field types. This means memory consumption is lower when sorting on these index fields. Therefore the not-perfect guard (sortable flag) has been removed.

Open Content will ignore the sortable flag when it reads the old configuration and the next time the configuration is activated the sortable flag will not be in oc.yaml anymore.

Support for HTTPS in replicator

Until now the replicator has only been able to replicate using HTTP. Now in Open Content 2.3 the replicator can replicate content using HTTPS.

Deprecated and will be removed in 3.0

In a microservice world we want to split out the search and suggest functionality to its own service. The Wildfly service will not use Solr anymore. When using the Wildfly API you will know that you get the source of truth. When using the Search API you know that only Solr will be asked for data. Because Wildfly will not ask an eventually consistent search engine for data anymore some small parts of Wildfly will be backwards incompatible.

Functionallity

Reason

1

Import Metadata Rules

No one is using this functionality and data transformation should really be done before the content is added to Open Content.

2

Possibility to have multiple storages

Hardly anyone is using multiple storages. Also this flexibility adds unnecessary complexity to Open Content.

3

Import Storage Rules

Would not be needed if there can only be one storage.

4

Named relations

This is old functionality. Nested properties in the Search API is a better way to get related content.

5

Relation extractors in Wildfly

Relation property value extractor is an old feature. Use nested properties in the Search API instead.

6

Relation property xpath extractor is an old feature that is really slow. Use nested properties in the Search API instead.

7

Relation property contenttype extractor is a rather new feature and is what enables nested properties, but it will only be available in the Search API, not in the Properties API.

8

Identifiers

Open Content should not be aware of any external id. The objects id in another system should be handled outside of Open Content.

9

Configured default search response

Having configuration that affects all clients is not good. Instead the client should specify the properties in each request.

10

Configured sortings

Having configuration that affects all clients is not good. Instead the client should specify the sorting in each request. sort.name in Search API will be removed.

11

Property having more than one indexfield

Not used by anyone today. Unnecessary complexity.

12

Index field and property names can differ

When names differs this is accidental and leads to confusion.

13

/basicsearch

Use /search instead because it provides the same functionality.

Last updated