The Apache Solr project, based on Apache Lucene, was originally created by CNET to provide full-text search across the company’s massive media database. Since 2004, Solr has seen many iterations and improvements and has built an enormous community of contributors. It’s a powerful and scalable search engine written in Java with a full complement of libraries for C#, PHP, Python and other languages, and offers HTTP REST-like APIs with support for both XML and JSON.
For larger use cases with a team of dedicated search engineers, Solr is a solid search software solution. But for businesses that want to allocate engineering resources differently, Solr’s strengths — scalability, configurability, extensibility — can also be liabilities. Even small projects can require days of engineering to get up and running. However, today, alternative search solutions built on cloud-native architecture can offer the same degree of configurability and scale in much less time.
Solr scale and cloud search options
Solr can be set up either in standalone mode or in SolrCloud mode. As the name implies, SolrCloud offers more robust features to help businesses scale including index replication, load balancing, failover, and distributed queries with the help of ZooKeeper.
However, many organizations prefer to run Solr in standalone mode because they don’t have the need for heavy indexing or have high-volume use cases. Using Solr in standalone mode requires manual configuration for search features such as failover and index replication, additional nodes for high availability, sharding, and more (how much you require depends on the use case, of course).
Companies with plenty of in-house engineers can afford to manage Solr infrastructure on-premises. Others have outsourced the problem by using Solr-as-a-service solutions or by partnering with cloud providers for managing Solr on shared resources.
Regardless of which direction you go, after you have determined the best Solr hosting solution for your business, the question remains whether Solr is the best search engine for your use case.
Solr search functionality
Solr was built for enterprise use cases with very specific edge cases. So, it makes perfect sense for a company like MorganStanley in a highly regulated space to use a tool like Solr for custom solutions. However, for most businesses, adding search can be much easier and less resource-intensive.
Solr is overly-configurable for more mainstream use cases. Even for mission-critical search use cases, there are arguably better solutions today.
As stated at the beginning of this article, many of Solr’s strengths are also its weaknesses:
- Setup time: Adding search can now be accomplished in minutes with modern search engines. For Solr, however, because of its complexity, even simple use cases can take 1-2 days — and that was cited as a strength!
- Configuration: Core features are managed across multiple levels of config files, plugins, APIs, as well as Solr dashboard configurations. As the saying goes, with great power comes great responsibility, and in this case, with great configuration comes added complexity, cost, and technical debt. Upgrades to newer versions could cause custom configurations to break.
- Core search features: Core features such as crawling, indexing, search synonyms, or faceted search should be a snap, but with Solr, they can be anything but. Often you may need to develop additional schemas to handle certain kinds of data, configure synonym tokenizers, or manually set up faceted search. Solr doesn’t ship with a crawler by default, either.
- Custom features: More advanced search features such as document search (e.g., PDF and Word files) and multilingual search require plugins. The good news is that many plugins ship with Solr, the bad news is that they require additional management and overhead.
- Scalability: We covered this in the previous section, but when it comes to scaling Solr for load balancing, high availability, failover, etc., it can be a major undertaking.
If you’re building the next DuckDuckGo or have a need for deploying on-prem for security, Solr is certainly worth a look. For most site search, ecommerce search, and app search use cases, there are many other options. If Solr has become too bulky, too much to manage, or showing its age, it’s time to look for alternatives.
And speaking of that, let’s have a look at some Apache Solr Alternatives.
Search.io is a site search engine built from the ground up for developers. It offers tremendous flexibility and ease of configuration built on top of a cloud-native architecture for elastic scale. Projects that can take weeks or months with Solr can be accomplished in hours or days on Search.io without the need for a battalion of engineers. Because it is fully-hosted and battle-tested with thousands queries per second, you can spend more time working on your core business without having to manage search scale.
Search.io features include:
- Instant indexing with full-text crawler, including document (PDF, DOCX) search
- Easy to add search advanced capabilities via simple YAML-based configuration
- Machine learning included for always-on improvement
Best use cases:
- Site search
- E-commerce search
- Web or mobile app search
Search.io approaches search differently from legacy search engines. Whereas legacy platforms like Lucene have built immutable search indexes, Search.io treats search more like a database, which offers some advantages in near real-time read/write speed and data synchronization. It also has built-in machine learning —and more specifically, reinforcement learning — for continuous improvement of search performance.
Additionally, Search.io has taken a different approach to configuration and extensibility, moving configuration from config.xml files to a core, built-in feature called pipelines. Pipelines are YAML-based scripts that define a series of steps which are executed sequentially when indexing a record (record pipeline) or performing a query (query pipeline). With pipelines, you can configure the search algorithm to improve search relevance or even A/B test different algorithms to determine which one provides the best search experience.
There is perhaps no more similar alternative to Solr than Elasticsearch. Like Solr, Elasticsearch is another API product built on the same Lucene core and also available as an open source project (but be mindful of Elastic's OS license changes). Elasticsearch is a specialized search engine that has built a massive community around logging analytics projects with its popular ELK stack.
Like Solr, Elasticsearch offers tremendous flexibility. For best results, it requires teams of specialist engineers who have the time, resources, and capabilities to eke out higher performance or develop custom features. Elasticsearch is built for scale and ideal for projects that generate massive amounts of data like log analysis — this is where lucene based search shines as log data does not change. Unlike Solr, Elasticsearch is much easier to configure and search can be up and running quickly for basic search projects.
Elasticsearch features include:
- Instant indexing and full-text search, including document (PDF, DOCX) search
- Scalability and resilience for high-volume use cases
- Strong open source community and support
Best use cases:
- Logging and log analytics (along with Logstash)
- Full-text search
- Scraping and combining with public datasets
- Metrics (along with Kibana)
Available both as a free open source download or fully-hosted through Elastic or other providers (including AWS), there’s a large number of options for getting a project started. Elasticsearch is a great fit for logging and analytics projects, but less so for a “pure” search engine use case such as site search or ecommerce search where it’s overly complex and requires a tremendous amount of expertise.
Like Search.io, Algolia is a new search engine built from the ground up. Originally, Algolia was developed for mobile search use cases, but has since been extended to more traditional search projects. Algolia can boast about its retrieval speed; it’s milliseconds faster than the competition. Those few milliseconds won’t matter for most use cases, but if speed is important, Algolia is worth a look. As a fully-hosted product, Algolia also eliminates the need for cluster management.
Algolia features include:
- Instant indexing and full-text search
- Global language support and advanced language processing
- Fast information retrieval
Best use cases:
- Website search
- App search
- Mobile search
Algolia has quickly grown into a major player because of how simple and easy it is to get started. It’s a great general purpose search engine. But, it has its critics too, particularly around pricing and complexity for managing custom rules and configurations. For example, anytime Algolia re-indexes the database — such as for A/B testing — it counts against monthly search queries quota. Features such as machine learning are add-ons that also cost more. It's ranking algorithm is a simple tie-breaking algorithm, which is easier to understand but also less flexible and powerful than other solutions on the market.
Cloud service providers: AWS, Azure, and GCP
The major cloud providers now offer many alternatives to Solr including search engines, often built on Elastic or Lucene, such as Microsoft’s Azure Cognitive Search or open source hosted search, such as Amazon Elasticsearch Service. If you go the cloud provider route, you’re going to select the one you’re already working with.
Cloud providers offer both private and public hosted search solutions. If your app is hosted in one of these providers, then it might be worth considering them for your search service as well. Co-locating your search service with your app makes a lot of sense for reducing latency.
The pros and cons of each cloud service provider and software vary a lot. But they have some similarities:
- They’re built to scale, but still require a good deal of hand-holding
- Each solution requires expertise and overhead for managing the search instance
- They’re best suited if you want to co-locate search with your site or app
If you want to build a search application on top of Solr and get additional tooling and support, there’s Lucidworks. Lucidworks offers enterprise-support for Solr deployments on public and private cloud instances. If you’re sold on Solr and need support, Lucidworks is worth a look.
Lucidworks features include:
- Solr with optional packages for e-commerce, customer service, and other use cases
- Optional AI package for developing tightly coupled solution alongside their core platform
- Add-on packages for additional search functionality
So, if Lucidworks is built on Solr, has a team which includes Solr contributors, and offering Solr solutions, why is it #5? Unlike all the other vendors on this list, Lucidworks does not have a self-service option. It’s a true enterprise platform. It’s solution add-ons are customizable components that will still require a good deal of engineering or a professional services engagement to piece together. This will appeal to some organizations — especially those who are building on-premise applications — but could be achieved more quickly and at less cost through some of the new platforms mentioned above.