Setup a Solr schema.xml for AEM

Date

02.03.2017

Reading time

3 Minutes

Comments

Why do we need a schema?

Solr does not know anything about your data structure but you want it to perform complex operation like fulltext searches, faceting etc. To allow Solr to create a fast index, you need to define which fields you want to index and which operations should be performed upon index or query¹.

There is an excellent book by Trey Grainger² and Timothy Potter which gives a good view on the capabilities of Solr³. Although it is written for Solr 5 most of the concepts are the same for Solr 6 and just need minimal adjustments.

By default Solr 6 uses a managed-schema.xml⁴ which allows you to use the Schema API⁵ to modify the schema. You can change this behavior in solrconfix.xml per core and enable the classic schema.xml which we’ll use in this example.

The Jackrabbit project provides a basic configuration for a core you can use with Solr 4.x⁶ and as base for a custom configuration. I recommend that you have a look at the schema.xml which is the base for the following definitions.

Schema.xml for AEM

You can find an example for a basic schema.xml⁷ in the aem-solr Github repository⁸ which I’ll explain here.

Unique Key

The uniqueKey field is the identity of an indexed document. If a new document with an already existing uniqueKey is indexed it replaces the existing entry. For structured content like a JCR content the path is a great identifier and therefor used.

Fields

Path*

Since you most likely not only want to query the complete index but restrict your queries to certain paths, some adjustments are required here. The Jackrabbit Oak Solr indexer supports multiple fields out of the box that should be added to your schema⁹. The documentation also provides some examples, where those fields are used.

Note: Only the field path_exact is stored in our index and is therefor retrievable. All other fields are only used for indexing.

JCR/Sling and DAM attributes

The schema.xml contains some interesting JCR attributes like jcr_title or jcr_lastModified that can be queried as string or date (e.g before xyz). To allow queries of DAM assets, you can also see the mimetype attributes of DAM.

Content attributes

For this example I’ll use three different JCR properties that should be index:

Fieldname	Index as
headline	Simple String, no fulltext search
title	Simple String, no fulltext search
text	English text, indexed for fulltext search, suggestions etc

Fieldtypes

All fieldtypes you can find in the schema.xml are quite simple and by the book. There are primitive fieldtypes like int or string but also types that support fulltext searches like text_en.

For the two *_path fieldtypes some rules that replace or group the result by slashes are defined.

Summary

For a simple AEM application where you want to perform fulltext searches on predefined fields (like text) the provided schema is a good starting point. You can extend it by adding additional fields or using the copyField¹⁰ mechanism to index more fields into the already defined ones.

If your application uses a property named richText which you want to index, the following definition would copy it into the text field and merge the results:

<copyField source="richText" dest="text">

The next post will deal with a sample application you can setup to get a better insight of the already achieved steps.

Setup a Solr schema.xml for AEM

Contents

Date

Reading time

Comments

Tags

Why do we need a schema?

Schema.xml for AEM

Unique Key

Fields

Path*

JCR/Sling and DAM attributes

Content attributes

Fieldtypes

Summary

Footnotes

Tags

Comments

Related

02.03.2017

Sample Application for the AEM-Solr Integration

27.02.2017

Create an AEM index utilizing Solr

25.05.2017

Fill PDF forms in an AEM Service

18.02.2017

Import Wikipedia Pages into AEM

07.02.2017

AEM: Map local filesystem into crx repository

31.01.2017

AEM: Rebuild Client Libraries