Apache Solr Guide: Solr Removing Data Import Handler
Apache Solr Guide: Solr Removing Data Import Handler
Saturday 29th October, 2022
Apache Solr Guide: Solr Removing Data Import Handler

A quick decision is required to update the Apache Solr Data Import Handler (DIH)

The Data Import Handler (DIH) in Apache Solr is about to be deprecated in Solr 8.6 and altogether removed from the Solr distribution in 9.0.

DIH includes a technique for importing and indexing content from a data store. Those who rely on it for production systems must make an urgent decision between continuing to use it and falling behind in terms of Solr features, or embracing the transition and implementing something different sooner rather than later.

DIH is being replaced by a community contribution, which may have certain constraints.

It indicates, for example, that it only ships with the MariaDB driver. Manually configuring a JDBC driver for other sources should be doable in theory, but this is not a well-trodden path. Another concern is that it has been inactive for about 6 months and hence has not been updated to track the most recent version (Solr 8.11).

There is a clear, unsettling risk that the project will not be maintained by the community, and I'm sure the Solr contributors who made this decision anticipate that there would be more activity once DIH is removed from Solr releases, which is a risky strategy.

Because the primary reason for DIH's removal is part of a major security overhaul in Solr, it may not be replaced by anything equivalent anytime soon, if at all. As a result, this is a risk, and in-production users should consider alternative means of filling the search index.

Solr Data Import Hander (DIH) Replacing Options

1- The authorized successor for the DIH community. Although it is expected that a small group of people will maintain this snapshot of the Apache DIH, it is unknown how much like-for-like compatibility it will offer with the current DIH and whether there will be a desire to continue maintaining it.

2- Tool for open source ETL. Apache Camel and Apache Nifi are only two of the integration tools that potentially take the role of DIH. Nifi is more of a platform with a user interface, whereas Camel is a collection of Java integration libraries. They frequently work with a queue like Kafka, JMS, or ActiveMQ.

3- Tool for commercial ETL. Too many to list here, however, the majority of ETL tools can push to a REST endpoint and could carry out the transformations now carried out by DIH. Specific Solr emitters are offered by some vendors.

4- An individual publisher application. Like, construct it yourself.

5- Use the Logstash JDBC driver and switch to Elasticsearch or the updated Amazon OpenSearch. This should not be done lightly because it would involve a significant amount of rework.

Conclusion: It's time to start thinking about methods to mitigate against DIH's deprecation in next Solr versions if you use it in production systems. Although it is an unwelcome diversion, it deserves careful attention.

Check out our Solr consulting services if you need assistance making this choice, or CONTACT US for a free one to one session.

Related articles