Want a managed solution? Checkout Sourcebot Cloud.

Sourcebot is open source and can be self-hosted using our official Docker image.

Quick Start Guide

1

Create a config

By default, Sourcebot requires a configuration file with a list of code host connections that specify what repositories should be synced (cloned and indexed). To get started, run the following command to create a starter config.json:

touch config.json
echo '{
    "$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
    "connections": {
        // Comments are supported
        "starter-connection": {
            "type": "github",
            "repos": [
                "sourcebot-dev/sourcebot"
            ]
        }
    }
}' > config.json

This config creates a single GitHub connection named starter-connection that specifies Sourcebot as a repo to sync.

2

Launch your instance

Sourcebot is packaged as a single Docker image. In the same directory as config.json, run the following command to start your instance:

docker run \
    -p 3000:3000 \
    --pull=always \
    --rm \
    -v $(pwd):/data \
    -e CONFIG_PATH=/data/config.json \
    --name sourcebot \
    ghcr.io/sourcebot-dev/sourcebot:latest

Navigate to localhost:3000 to start searching the Sourcebot repo.

Hit an issue? Please let us know on GitHub discussions or by emailing us.
3

Link your code

Sourcebot supports indexing public & private code on the following code hosts:

Missing your code host? Submit a feature request on GitHub.

Architecture

Sourcebot is shipped as a single docker container that runs a collection of services using supervisord:

Sourcebot consists of the following components:

  • Web Server : main Next.js web application serving the Sourcebot UI.
  • Backend Worker : Node.js process that incrementally syncs with code hosts (e.g., GitHub, GitLab etc.) and asynchronously indexes configured repositories.
  • Zoekt : the open-source, trigram indexing code search engine that powers Sourcebot under the hood.
  • Postgres : transactional database for storing business-logic data.
  • Redis Job Queue : fast in-memory store. Used with BullMQ for queuing asynchronous work.
  • .sourcebot/ cache : file-system cache where persistent data is written.

You can use managed Redis / Postgres services that run outside of the Sourcebot container by providing the REDIS_URL and DATABASE_URL environment variables, respectively. See the configuration for more configuration options.

Scalability

One of our design philosophies for Sourcebot is to keep our infrastructure radically simple while balancing scalability concerns. Depending on the number of repositories you have indexed and the instance you are running Sourcebot on, you may experience slow search times or other performance degradations. Our recommendation is to vertically scale your instance by increasing the number of CPU cores and memory.

Sourcebot does not support horizontal scaling at this time, but it is on our roadmap. If this is something your team would be interested in, please contact us at team@sourcebot.dev.

Telemetry

By default, Sourcebot collects anonymized usage data through PostHog to help us improve the performance and reliability of our tool. We don’t collect or transmit any information related to your codebase. In addition, all events are sanitized to ensure that no sensitive details (ex. ip address, query info) leave your machine.

The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application’s health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made.

If you’d like to disable all telemetry, you can do so by setting the environment variable SOURCEBOT_TELEMETRY_DISABLED to true:

docker run \
  -e SOURCEBOT_TELEMETRY_DISABLED=true \
  /* additional args */ \
  ghcr.io/sourcebot-dev/sourcebot:latest

If you disabled telemetry correctly, you’ll see the following log when starting Sourcebot:

Disabling telemetry since SOURCEBOT_TELEMETRY_DISABLED was set.