Building a Sub-50ms Neural Seatch with Vector Databases and Language Embeddings

The search for a way to effectively manage, retrieve and analyse large-scale textual data has steered the academic and industry research towards more sophisticated and semantic-oriented search mechanisms. In today’s post, I’ll delve deep into the technical aspects of building an ultrafast, serverless neural search engine; a unique blend of the different technologies: vector database (Pinecone), Language Embeddings using the Cohere Embed API.

1. Setting up the Environment

To initiate our implementation, we need certain dependencies to be satisfied. These are the Cohere, Pinecone and HuggingFace Dataset libraries.

pip install -U cohere pinecone-client datasets

The Cohere API helps us in generating vector embeddings of text data. Pinecone functions as a platform for storing these embeddings and also provides a mechanism for scalable vector search. HuggingFace Dataset library facilitates the download of existing dataset for this demonstration.

2. Generating Language Embeddings with Cohere API

After setting up the environment, we proceed to connect with the Cohere using the API Key. The Cohere API will allow us to generate vector representations (embedding) of the required text. Embeddings give a high dimensional representation of the text data, reflecting its semantic value.

For this post, We are using the TREC (Text REtrieval Conference) question dataset for our illustrative purpose. TREC is a curated dataset containing thousands of labeled questions, suitable for practical demonstrations.

import cohere
co = cohere.Client("YOUR_API_KEY")

from datasets import load_dataset
trec = load_dataset('trec', split='train[:1000]')
embeds = co.embed(texts=trec['text'], model='small', truncate='LEFT').embeddings

The generation of embeddings in this case is done with the help of ‘small’ model available with cohere. This model is capable of generating embeddings of 1024 dimensions, which is quite suitable for capturing rich semantic details.

3. Storage of Embeddings: Vector Database with Pinecone

Once our embeddings are ready, we employ the Pinecone API for their storage. Pinecone vector database effectively stores these high-dimensional embedding vectors. It scales seamlessly and provides ultrafast search functionalities.

import pinecone
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")

index_name = 'cohere-pinecone-trec'
if index_name not in pinecone.list_indexes():
    pinecone.create_index(index_name, dimension=shape[1], metric='cosine')
index = pinecone.Index(index_name)

ids = [str(i) for i in range(shape[0])]
meta = [{'text': text} for text in trec['text']]
to_upsert = list(zip(ids, embeds, meta))
for i in range(0, shape[0], batch_size):
    index.upsert(vectors=to_upsert[i:i_end])

We provide Pinecone with a set of tuples containing the ID for reference, embedding vector, and an optional metadata field. This metadata field can be used to provide additional information regarding the text in focus.

4. Exploiting Semantic Search: Cohere and Pinecone at Work

After the successful setup and embedding formation, we can put this setup to test through semantic searching. For querying, we pass a prompt to the Cohere API, turning the prompt into an embedding. This resultant vector is used as a search query within Pinecone.

query = "What caused the 1929 Great Depression?"
xq = co.embed(texts=[query], model='small', truncate='LEFT').embeddings
res = index.query(xq, top_k=5, include_metadata=True)

for match in res['matches']:
    print(f"{match['score']:.2f}: {match['metadata']['text']}")

In the semantic search context, a search query would yield documents that share a contextual or semantic similarity with the query, even if they don’t necessarily share the same keywords. This is a key advantage of using embeddings as opposed to traditional key-word search, as the contextual relevance is encoded into the numeric structure of the embeddings.

5. Concluding Remarks

We deployed a full-fledged serverless, sub-50ms semantic search engine utilizing the unique combination of Cohere and Pinecone. Inherent efficiencies and architectural superiority of the combination deliver a robust, scalable tool to tackle the ever-increasing demand for superior and faster data retrieval techniques.

July 15, 2023 · ai, pinecone, cohere, semantic search