Skip to main content

Google Generative AI

Chroma also provides a convenient wrapper around Google's embedding API. This embedding function runs remotely on Google servers, and requires an API key. You can get an API key by signing up for an account at Google MakerSuite.

ModelsInputDimensionality
models/embedding-001English768

Basic Usage

Python

This embedding function relies on the google-generativeai python package, which you can install with pip install google-generativeai.

pip install google-generativeai
# import
import chromadb
from chromadb.utils import embedding_functions

# use directly
google_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY")
google_ef(["document1","document2"])

# pass documents to query for .add and .query
collection = client.create_collection(name="name", embedding_function=google_ef)
collection = client.get_collection(name="name", embedding_function=google_ef)

You can view a more complete example chatting over documents with Gemini embedding and langauge models.

For more info - please visit the official Google python docs.

Javascript

This embedding function relies on the @google/generative-ai npm package, which you can install with yarn add @google/generative-ai.

yarn add @google/generative-ai
import { ChromaClient, GoogleGenerativeAiEmbeddingFunction } from 'chromadb'
const embedder = new GoogleGenerativeAiEmbeddingFunction({googleApiKey: "<YOUR API KEY>"})

// use directly
const embeddings = await embedder.generate(["document1","document2"])

// pass documents to query for .add and .query
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})

You can view a more complete example using Node.

For more info - please visit the official Google JS docs.

Advanced Usage

Call directly

By passing the embedding function to a Collection, Chroma handles the embedding of documents and queries for you. However in some cases you may want to generate the embeddings outside and handle them yourself.

Python

embeddings = embedder(["document1","document2"])
# [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]

Javascript

const embeddings = embedder.generate(["document1","document2"])
// [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]

Task Type

Google's Embedding endpoint also accepts a task_type/taskType parameter. This may boost performance for your specific usage.

Task TypeDescription
RETRIEVAL_QUERYSpecifies the given text is a query in a search/retrieval setting.
RETRIEVAL_DOCUMENTSpecifies the given text is a document in a search/retrieval setting. Using this task type requires a title.
SEMANTIC_SIMILARITYSpecifies the given text will be used for Semantic Textual Similarity (STS).
CLASSIFICATIONSpecifies that the embeddings will be used for classification.
CLUSTERINGSpecifies that the embeddings will be used for clustering.

Here is a python demonstration of how to use RETRIEVAL_QUERY with RETRIEVAL_DOCUMENT.

# import
import chromadb
from chromadb.utils import embedding_functions

google_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY", task_type='RETRIEVAL_DOCUMENT')

# pass documents to query for .add and .query
collection = client.create_collection(name="name", embedding_function=google_ef)

# add your documents
collection.add(...)

# create a new EF for Query and re-get your collection
google_ef2 = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY", task_type='RETRIEVAL_QUERY')
collection = client.get_collection(name="name", embedding_function=google_ef2)

# query your documents
collection.query(...)