Google Generative AI
Chroma also provides a convenient wrapper around Google's embedding API. This embedding function runs remotely on Google servers, and requires an API key. You can get an API key by signing up for an account at Google MakerSuite.
Models | Input | Dimensionality | ||
---|---|---|---|---|
models/embedding-001 | English | 768 |
Basic Usage
Python
This embedding function relies on the google-generativeai
python package, which you can install with pip install google-generativeai
.
pip install google-generativeai
# import
import chromadb
from chromadb.utils import embedding_functions
# use directly
google_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY")
google_ef(["document1","document2"])
# pass documents to query for .add and .query
collection = client.create_collection(name="name", embedding_function=google_ef)
collection = client.get_collection(name="name", embedding_function=google_ef)
You can view a more complete example chatting over documents with Gemini embedding and langauge models.
For more info - please visit the official Google python docs.
Javascript
This embedding function relies on the @google/generative-ai
npm package, which you can install with yarn add @google/generative-ai
.
yarn add @google/generative-ai
import { ChromaClient, GoogleGenerativeAiEmbeddingFunction } from 'chromadb'
const embedder = new GoogleGenerativeAiEmbeddingFunction({googleApiKey: "<YOUR API KEY>"})
// use directly
const embeddings = await embedder.generate(["document1","document2"])
// pass documents to query for .add and .query
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
You can view a more complete example using Node.
For more info - please visit the official Google JS docs.
Advanced Usage
Call directly
By passing the embedding function to a Collection, Chroma handles the embedding of documents and queries for you. However in some cases you may want to generate the embeddings outside and handle them yourself.
Python
embeddings = embedder(["document1","document2"])
# [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
Javascript
const embeddings = embedder.generate(["document1","document2"])
// [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
Task Type
Google's Embedding endpoint also accepts a task_type
/taskType
parameter. This may boost performance for your specific usage.
Task Type | Description |
---|---|
RETRIEVAL_QUERY | Specifies the given text is a query in a search/retrieval setting. |
RETRIEVAL_DOCUMENT | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title. |
SEMANTIC_SIMILARITY | Specifies the given text will be used for Semantic Textual Similarity (STS). |
CLASSIFICATION | Specifies that the embeddings will be used for classification. |
CLUSTERING | Specifies that the embeddings will be used for clustering. |
Here is a python demonstration of how to use RETRIEVAL_QUERY
with RETRIEVAL_DOCUMENT
.
# import
import chromadb
from chromadb.utils import embedding_functions
google_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY", task_type='RETRIEVAL_DOCUMENT')
# pass documents to query for .add and .query
collection = client.create_collection(name="name", embedding_function=google_ef)
# add your documents
collection.add(...)
# create a new EF for Query and re-get your collection
google_ef2 = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY", task_type='RETRIEVAL_QUERY')
collection = client.get_collection(name="name", embedding_function=google_ef2)
# query your documents
collection.query(...)