ChromaDB Driver for SBK Framework

Storage Benchmark Kit

ChromaDB Driver for SBK Framework

This driver provides performance benchmarking capabilities for ChromaDB, a vector database for AI applications.

Overview

ChromaDB is an open-source vector database designed for AI applications. This driver allows you to benchmark ChromaDB’s performance for storing and retrieving byte arrays as documents within collections.

Features

Configuration

The driver supports the following configuration options:

Prerequisites

  1. ChromaDB Server: You need a running ChromaDB instance
    # Using Docker
    docker run -p 8000:8000 chromadb/chroma
       
    # Or using Python
    pip install chromadb
    chroma run --host localhost --port 8000
    
  2. Java Dependencies: The driver automatically includes the required ChromaDB Java client

Usage

Basic Usage

# Run benchmark with default settings
./sbk -class chromadb -writers 4 -readers 4 -size 1024 -seconds 60

# Custom ChromaDB settings
./sbk -class chromadb -writers 4 -readers 4 -size 1024 -seconds 60 \
  -host localhost \
  -port 8000 \
  -collectionName my_test_collection

Configuration File

You can also modify the default settings in ChromaDB.properties:

host=localhost
port=8000
collectionName=sbk_benchmark
embeddingDimension=384
distanceFunction=cosine
ssl=false
authToken=
timeoutSeconds=30
maxRetries=3
batchSize=100

Implementation Details

Data Storage

Key Generation

The driver uses a key generation strategy similar to other SBK drivers:

Error Handling

Performance Considerations

Troubleshooting

Common Issues

  1. Connection Refused: Ensure ChromaDB server is running and accessible
  2. Collection Not Found: The driver automatically creates collections
  3. Memory Issues: Monitor heap size for large datasets
  4. Slow Performance: Consider batch sizes and network optimization

Dependencies

License

This driver is licensed under the Apache License 2.0, same as the SBK framework.