Most Recent Articles
OpenSearch Approximation Framework	Jul 14
The new semantic field: Simplifying semantic search in OpenSearch	Jul 11
Advanced usage of the semantic field in OpenSearch	Jul 10
Making ingestion smarter: System ingest pipelines in OpenSearch	Jul 08
Reducing hybrid query latency in OpenSearch 3.1 with efficient score collection	Jun 27
Introduction to ML inference processors in OpenSearch: Review summarization a...	Jun 27
Announcing OpenSearch Data Prepper 2.12: Additional source and sinks for your...	Jun 26
Redline testing now available in OpenSearch Benchmark	Jun 23
Neural sparse models are now available in Hugging Face Sentence Transformers	Jun 11
Unlocking agentic AI experiences with OpenSearch	Jun 09

The new semantic field: Simplifying semantic search in OpenSearch

Fri, Jul 11, 2025 · Bo Zhang, Fanit Kolchina

Semantic search improves result relevance by using a machine learning (ML) model to generate dense or sparse vector embeddings from unstructured text. Traditionally, enabling semantic search has required several manual steps: defining an embedding field, setting up an ingest pipeline, and including the model ID in every query.

OpenSearch 3.1 streamlines this process with the semantic field type. Now you only need to register and deploy your ML model and then reference its ID in the index mapping. OpenSearch handles the rest of the work: it automatically creates the necessary embedding field, generates embeddings during ingestion, and resolves the model during query execution. The following diagram illustrates semantic search using a semantic field.

How to use a semantic field

To use a semantic field, follow these steps:

Register and deploy a model: Register and deploy an ML model, such as one from Hugging Face, in OpenSearch.
Create an index with a semantic field: Define an index mapping that includes a semantic field and link it to the model using the model ID.
Index documents: Index raw text documents directly—OpenSearch will automatically generate and store the embeddings.
Run a semantic search query: Use a neural query to semantically search your data without manually handling embeddings.

Each of these steps is detailed in the next sections.

Step 1: Register and deploy a model

Start by registering and deploying a text embedding model. For example, the following request registers a pretrained sentence transformer model from Hugging Face:

PUT _plugins/_ml/models/_register?deploy=true
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
  "version": "1.0.2",
  "model_format": "TORCH_SCRIPT"
}

After deployment, retrieve the model’s configuration to verify key details:

GET /_plugins/_ml/models/No0hhZcBnsM8JstbBkjQ
{
    "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
    "model_group_id": "Lo0hhZcBnsM8JstbA0hg",
    "algorithm": "TEXT_EMBEDDING",
    "model_version": "1",
    "model_format": "TORCH_SCRIPT",
    "model_state": "DEPLOYED",
    "model_config": {
        "model_type": "bert",
        "embedding_dimension": 384,
        "additional_config": {
            "space_type": "l2"
        },
        ...
    },
    ...
}

The response includes metadata such as the embedding_dimension and space_type. OpenSearch uses this information to automatically create the underlying embedding field when you define the semantic field in your index mapping.

Step 2: Create an index with a semantic field

To use the model for indexing and search, create an index with a semantic field and specify the model ID:

PUT /my-nlp-index

{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
        "type": "semantic"
        "model_id": "No0hhZcBnsM8JstbBkjQ"
      }
    }
  }
}

OpenSearch automatically adds a knn_vector field and stores relevant model metadata in the text_semantic_info field. To verify the mapping, send the following request:

GET /my-nlp-index/_mappings
{
    "my-nlp-index": {
        "mappings": {
            "properties": {
                "id": {
                    "type": "text"
                },
                "text": {
                    "type": "semantic",
                    "model_id": "No0hhZcBnsM8JstbBkjQ",
                    "raw_field_type": "text"
                },
                "text_semantic_info": {
                    "properties": {
                        "embedding": {
                            "type": "knn_vector",
                            "dimension": 384,
                            "method": {
                                "engine": "faiss",
                                "space_type": "l2",
                                "name": "hnsw",
                                "parameters": {}
                            }
                        },
                        "model": {
                            "properties": {
                                "id": {
                                    "type": "text",
                                    "index": false
                                },
                                "name": {
                                    "type": "text",
                                    "index": false
                                },
                                "type": {
                                    "type": "text",
                                    "index": false
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Step 3: Index documents

With the semantic field, there’s no need to define a custom ingest pipeline: you can index documents directly. The following examples use data from the Flickr image dataset, where each document includes a text field with an image description and an id field for the image ID:

PUT /my-nlp-index/_doc/1
{
    "text": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .",
    "id": "4319130149.jpg"
}

PUT /my-nlp-index/_doc/2
{
    "text": "A wild animal races across an uncut field with a minimal amount of trees .",
    "id": "1775029934.jpg"
}

OpenSearch automatically generates embeddings using the associated model. You can confirm this by retrieving a document:

GET /my-nlp-index/_doc/1
{
    "_index": "my-nlp-index",
    "_id": "1",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "text": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .",
        "id": "4319130149.jpg",
        "text_semantic_info": {
            "model": {
                "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
                "id": "No0hhZcBnsM8JstbBkjQ",
                "type": "TEXT_EMBEDDING"
            },
            "embedding": [
                -0.086742505
                ...
            ]
        }
    }
}

The response includes the embedding and model metadata in the text_semantic_info field.

Step 4: Run a neural search query

To perform semantic search, use a neural query with the semantic field. OpenSearch uses the model defined in the mapping to generate the query embedding:

GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "text_semantic_info"
    ]
  },
  "query": {
    "neural": {
      "text": {
        "query_text": "wild west",
        "k": 1
      }
    }
  }
}

The query returns the following results:

{
    "took": 15,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.42294958,
        "hits": [
            {
                "_index": "my-nlp-index",
                "_id": "2",
                "_score": 0.42294958,
                "_source": {
                    "text": "A wild animal races across an uncut field with a minimal amount of trees .",
                    "id": "1775029934.jpg"
                }
            }
        ]
    }
}

Using the semantic field with sparse models

Using a sparse model with a semantic field is similar to using a dense model, with a few differences.

Sparse models support two modes:

Bi-encoder mode: The same model is used for both document and query embeddings.
Doc-only mode: One model is used to generate document embeddings at ingestion time, and another is used at query time.

To use the bi-encoder mode, define the semantic field as usual:

PUT /my-nlp-index
{
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
        "type": "semantic"
        "model_id": "No0hhZcBnsM8JstbBkjQ"
      }
    }
  }
}

To use the doc-only mode, add a search_model_id to the mapping:

PUT /my-nlp-index
{
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
        "type": "semantic"
        "model_id": "No0hhZcBnsM8JstbBkjQ",
        "search_model_id": "TY2piZcBnsM8Jstb-Uhv"
      }
    }
  }
}

Sparse embeddings use the rank_features field type. This field does not require configuration for dimension or distance space:

GET /my-nlp-index
{
    "my-nlp-index": {
        "mappings": {
            "properties": {
                "id": {
                    "type": "text"
                },
                "text": {
                    "type": "semantic",
                    "model_id": "R42oiZcBnsM8JstbUUgc",
                    "search_model_id": "TY2piZcBnsM8Jstb-Uhv",
                    "raw_field_type": "text"
                },
                "text_semantic_info": {
                    "properties": {
                        "embedding": {
                            "type": "rank_features"
                        },
                        "model": {
                            "properties": {
                                "id": {
                                    "type": "text",
                                    "index": false
                                },
                                "name": {
                                    "type": "text",
                                    "index": false
                                },
                                "type": {
                                    "type": "text",
                                    "index": false
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Using built-in analyzers

You can also optionally specify a built-in search analyzer for sparse queries. This approach provides faster retrieval at the cost of a slight decrease in search relevance:

{
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
       "type": "semantic",
        "model_id": "R42oiZcBnsM8JstbUUgc",
        "semantic_field_search_analyzer": "bert-uncased"
      }
    }
  }
}

Summary

The semantic field makes it easy to bring semantic search into your OpenSearch workflows. By supporting both dense and sparse models with automatic embedding and indexing, it removes the need for custom pipelines or manual field management. Try it out with a pretrained model to streamline your document search experience.

What’s next?

In our next blog post about the semantic field, we’ll describe advanced usage of the semantic field in OpenSearch. We’ll dive into advanced capabilities like chunking long text, using externally hosted or custom models, implementing cross-cluster support, and updating the model ID. Stay tuned for this blog post to deepen your understanding and unlock more powerful search capabilities!

« OpenSearch Approximation Framework Advanced usage of the semantic field in OpenSearch »

Blog

The new semantic field: Simplifying semantic search in OpenSearch

How to use a semantic field

Step 1: Register and deploy a model

Step 2: Create an index with a semantic field

Step 3: Index documents

Step 4: Run a neural search query

Using the semantic field with sparse models

Using built-in analyzers

Summary

What’s next?

Participate

Providers

Resources

Platform

Capabilities

Community

Documentation

Blog

The new semantic field: Simplifying semantic search in OpenSearch

How to use a semantic field

Step 1: Register and deploy a model

Step 2: Create an index with a semantic field

Step 3: Index documents

Step 4: Run a neural search query

Using the semantic field with sparse models

Using built-in analyzers

Summary

What’s next?

Bo Zhang

Fanit Kolchina