Indexing External REST API JSON directly into Sitecore Search


Today, I am going to provide a detailed and step-by-step explanation of how external REST API data can be indexed directly into Sitecore Search using API crawlers. This approach allows the organizations to make external system data, such as commerce products, third-party APIs, or microservices, fully searchable without duplicating content inside a CMS.

To begin, let us first understand what Sitecore Search is. Sitecore Search is a cloud-native search solution that is designed to index, manage, and deliver highly relevant search experiences across websites and digital platforms. It supports structured ingestion, content crawling, relevance tuning, faceting, filtering, ranking rules, as well as AI-powered personalization. Unlike the traditional CMS based indexing models, Sitecore Search allows content to be indexed from multiple sources, such as CMS, Headless or REST APIs, Commerce platforms, and External REST services. This makes it ideal for composable architectures where the CMS is no longer the single source of truth.

Steps for Indexing External REST API JSON

In modern composable architectures, all the required data usually does not live inside the CMS.

For example: Products are stored and retrieved from a Commerce API, Pricing is stored in an ERP system, Reviews are usually stored in a third-party service such as Trustpilot, and Inventory information is stored in a Warehouse system.

Step 1: Understand the API Response

So, for testing purposes, we will be using an external API from https://dummyjson.com.


The received structured JSON result is shown below.


It looks simple and easy to integrate. But this is where most systems go wrong: directly indexing the entire JSON into the search index.

That creates:
• Unnecessary fields in the index
• Poor search relevance
• A large index size
• Slow query performance

Before indexing anything, you must think like a search architect, not like a frontend developer.

Step 2: Define Your Search Attributes


The next step is to explicitly define the attributes that will store your API data. Sitecore Search does not automatically create attributes from your JSON response. If an attribute does not exist, ingestion will fail, or data will not be indexed.



Step 3: Add New Source


Now that the attributes are defined, it is required to register the REST API endpoint as a new source in Sitecore Search.


The source is responsible for telling the platform where to fetch JSON data from.


1. Go to the “Sources” section in the Sitecore Search platform.

2. Click “+ Add Source” to create a new source.

3. Choose “API Crawler” as the connector type.

4. Enter a name and description for the source (e.g., “Products API”).

5. Save the source.



This sets up the endpoint as a content source from which your search engine will pull data.


Step 4: Configure Triggers


Triggers define how the API is called before the indexing process begins.


Once the source is created:


1. Scroll to the “Triggers” section on the source details page.

2. Click “Edit” and add a new trigger.

3. Set the Trigger Type to “JS (JavaScript)”.

4. In the trigger source, define the REST API request(s) to be made.

5. Set a reasonable timeout (e.g., 10,000 ms).

6. Save the trigger.


This tells the crawler how to call your REST endpoint.


function extract() {
    return [{
        "url": "https://dummyjson.com/products"
    }];
}



Step 5: Configure Document Extractors


The document extractor converts the JSON response into indexable items.


1. Scroll to the “Document Extractors” section on the source details page.

2. Add a new extractor and choose “JS” as the extractor type.

3. Enter a JavaScript function that reads from response.body and maps JSON properties into search items.


function extract(request, response) {
    let json = response.body.products;

    return json.map(item => {
        return {
            "type": 'json',
            "id": item.id,
            "name": item.title,
            "description": item.description
        };
    });
}



Modify this to include any attributes you created (e.g., category, price, etc.). This step extracts the required values from your JSON and maps them into the corresponding search attributes.


Step 6: Publish and Start the Scan



After the triggers and document extractors are set:


1. Click “Publish” at the top of the source details page.

2. Publishing does two things:

    A. It pushes your configuration live.

    B. It starts the source scanning process.


If configured correctly, the scan will begin fetching data from the REST API and executing your extraction logic written in the document extractor.


You can monitor the “Last Scan Status”; it should change to “Finished” if everything works correctly.



Step 7: Verify Indexed Content


Once the scan finishes:


1. Navigate to “Content Collection” in the left navigation.

2. Click on “Content.”

3. Use the filters to look up your indexed documents.

    A. Filter on the source you assigned (e.g., “Products API”).

4. Verify that the documents display the attributes you mapped (id, name, description, etc.).



This confirms that the indexing process is working correctly.


Integrating external APIs into Sitecore Search opens up powerful possibilities for building dynamic, scalable search experiences. With the right structure and mapping in place, you can seamlessly bring external data into your search ecosystem and keep it fully in sync.


Once you understand the flow, extending this approach to other APIs or data sources becomes simple and efficient.


References


Using the Ingestion API - https://doc.sitecore.com/search/en/developers/search-developer-guide/using-the-ingestion-api-to-add-content-to-an-index.html


Mastering Website Content Indexing with​ Sitecore Search - https://enlightenwithamit.hashnode.dev/content-indexing-with-sitecore-search



That's All For Today,

Happy Coding

Coders for Life

Chirag Goel

I am a developer, likes to work on different future technologies.

Post a Comment (0)
Previous Post Next Post