Search indexing is often the first topic we discuss with customers when starting a new business engagement. Whether it's a large enterprise-scale site or a small e-commerce store, the first step to adding search to a website is indexing your content through a website crawler or API. Your site's architecture, schema, and content can all affect indexing.
In this article, we'll cover a lot of the topics we discuss with our customers and share some actionable tips for improving a search index.
Note: this is not an article about SEO. While on-site search optimization and SEO are related — the work you do to optimize your search index for on-site search also helps with Google search or Bing visibility — they address different needs. SEO is geared towards internet visibility, whereas on-site search addresses user experience. However, the XML sitemaps, internal links, meta tags, etc., you create for one will help the other!
What is a search index?
A search index helps users quickly find information on a website. It is designed to map search queries to web pages, documents, or other site content. It's analogous to an index in a book. It allows the user to quickly find useful information using keywords, but has many technological advantages over the ones in books such as helping visitors find what they want faster. Search indexes can be created either through web crawlers or via API access, but both have their benefits for different situations.
What is full-text search?
Full-text search entails indexing each word on your site in order to make search engine navigation through many records easy. Traditionally, full-text search engines used an "inverted index" — essentially, a map of all the keywords in your document and the locations of those keywords.
AI-powered search engines can now go beyond keywords to understand context to provide richer results. Take the query "portable sound" as an example. If a keyword based search engine has the terms "portable" and "speaker" in the index, the results page may include the correct item. With AI search, you can get good results even if the keywords aren't on the site.
Search crawlers and APIs
There are two primary ways to build a search engine index — a search crawler or directly pulling data from a database via APIs. Each of these has benefits for different situations.
For example, for most static websites, a crawler is fine. It's fast and comprehensive. API-driven indexing is ideal for sites with dynamic or constantly changing data. APIs have their own set of advantages such as the ability to quickly add new data sources.
What is instant indexing?
When you add new content or change existing content, you want results to be searchable in real time. Instant indexing is a must have for retailers and brands selling new products or launching campaigns. On occasion, when our customers have problems with instant indexing, it’s typically due to an issue such as:
- Content isn't getting indexed fast enough due to complex architecture of an API issue
- Content is in the index, but not getting displayed in results
- PDF and DOC files fail to index
Most problems can be resolved relatively quickly. The first thing to do is check how the crawler views your website documents. In Search.io, customers can use a Preview feature to see which fields are being used and how content appears in each field.
Using a sitemap.xml file to assist the crawler is always a good practice and can help with getting your content indexed quickly. If you’re indexing your site via API, it’s likely that there is an integration issue that needs to be resolved.
12 ways to optimize and enrich your search index
Whether you are using a search crawler or connecting your site via API, there are many ways to configure and improve a search index. The real-world suggestions below come directly from the conversations we often have with customers who are building their index via crawler or API. Some of these methods are more appropriate for crawler-based index, others are relevant to API-indexing, and a few are relevant for both.
Here are 12 ways you can optimize your search index:
1. Open Graph metadata
Facebook released their Open Graph protocol in 2010 and since then it has become widely used by search engines. Search results often include an image preview, and most often this is powered by Open Graph.
By adding open graph tags to your content you can improve a search index with information such as:
- Title with type of content
- Image and URL
- Add additional open graph data
There are heaps of other data you can use with Open Graph to enrich a search index besides just title, description, and images, but many people don't know or use them all. For more information, visit https://ogp.me/
2. Schema.org formats
Open Graph is just one of several open protocols for enriching web and search engine indexing data. There are different kinds of schemas you can mark up your page content with. For example, if you're a recipe site, you will have different standards for how you mark up content than, say, an event website.
Schema.org publishes and maintains different schema vocabulary for different kinds of sites. For example, for events, such as a concert, lecture, or festival, ticketing information may be added via the markup in HTML (or JSON-LD) format like <a class="localLink" href="/offers">offers</a> property. Repeated events may be structured as separate Event objects.
3. Article publish and modified times
The article publish and article modified dates/times are super important for being able to sort content by recency. The time stamps are supported both within open graph or schema.org formats.
4. Identify header and footer content
Miscellaneous content such as your nav, footer, and anything not specific to the page, should be within the header and footer tag so search engines know to ignore it. By marking up the header and footer content, you give the search engine a better chance of understanding what the page is about so it can be indexed properly — in this case, navigational data vs body data.
5. Augmenting your search index
Search indexes can be enriched with data in a variety of ways such as:
- Adding color metadata via the Google Vision API
- Using third-party data such as product ratings
- Extract incoming data to be used for creating filters and facets
As new information is added to the index, data may be enhanced. This data is utilized by search engines in order to provide better results and make it simpler for consumers to locate what they are looking for faster. E-commerce sites frequently update their items on a regular basis, and the enriched data can be incorporated during updates.
6. Business performance data
Your index is more than your content. Off-site data, such as product ratings, margins, inventory levels, etc., can be very useful for a search index to assist with result ranking. There may be many products which are relevant to a customer searching your site, but your business data can be used to enhance results to ensure the best ones are pushed to the top. With Search.io, we offer a feature called dynamic boo/sting that can help customers build conversion flywheels using this kind of business data.
7. Campaign data
Many retailers run quarterly, seasonal, or holiday sales. By adding campaign data to your site index, you can adjust results to display sale items.
You could add a specific sales field or use a discount field to calculate when there is a sale. In the latter case, the search engine will know that your display price is lower than your regular price, which can be helpful for sorting on discounted items to help visitors find the best savings. You can also then use an algorithm (via Search.io relevance settings) to give different items a boost based on their sales status or other properties.
Search filters and facets can be built using your search index. Search.io will infer and create filters automatically, but you can also design custom filters when needed. Determining the best filters to offer comes down to understanding your customers and how they want to slice and dice your products. Check out our guide on filters and facets for more.
9. Content type
There are different meta tags available to help a search index understand content by type. Is the content going to take visitors to a video, a document, page, or something else? Use HTML or JSON-LD tags to identify your content as a video, audio, abstract, etc., to help your search index sort or filter content by type.
Customers expect and want search results to be personalized. If you offer free shipping for members, that information should be in the data. If there's a discount by location, then you'll want to have geo data in your records, too. By connecting your search index to this data, you can easily personalize search results.
11. Integration with other third-party systems
Big businesses often have complex infrastructure with data coming from various systems. Need to integrate with your supply chain management or PIM? You'll want your search solution to support a REST API to enable instant indexing of data between systems.
12. Review the metrics
Site owners should plan to spend some amount of time reviewing their search metrics to identify the keywords customers are querying. Understanding how customers search can help identify opportunities to enrich the index, add or adjust filters, and improve search engine results.
Building a rich search index can greatly improve search performance and customer satisfaction. By understanding the different types of data that can be included, site owners can make sure they are providing the best possible search experience for their customers.
To learn more about how to set up your search index or take advantage of our personalization and dynamic boosting features, contact us today! We offer a free trial so you can explore all that our solution has to offer.