Adding Knowledge Connectors

8 min read

Overview

This guide covers how to add a new knowledge connector to Archestra Platform. Connectors pull data from external tools (Jira, Confluence, GitHub, GitLab, etc.) into knowledge bases on a schedule. Each connector requires:

  1. Zod schemas for config, checkpoint, and the type literal
  2. Connector class extending BaseConnector with validateConfig, testConnection, and sync
  3. Registry entry so the runtime can instantiate the connector by type string
  4. Frontend config fields component for the creation dialog

When the external service provides an official SDK, prefer it over raw fetch calls. Official SDKs handle pagination, authentication, rate limiting, and type safety out of the box. For example, the GitHub connector uses @octokit/rest and the GitLab connector uses @gitbeaker/rest.

The walkthrough below uses a hypothetical connector as an example.

Getting Started: Let TypeScript Guide You

Add your connector type literal to ConnectorTypeSchema in backend/src/types/knowledge-connector.ts and run pnpm type-check. TypeScript will report errors in the registry, discriminated unions, and frontend switch statements -- these are exactly the files you need to update.

Type Definitions

All connector types live in a single file: backend/src/types/knowledge-connector.ts. The type system uses Zod discriminated unions keyed on a type field.

1. Add the type literal

const GITHUB = z.literal("github");

Add it to the union:

export const ConnectorTypeSchema = z.union([JIRA, CONFLUENCE, GITHUB]);

2. Define config and checkpoint schemas

Config holds the settings a user provides when creating the connector. Checkpoint holds the sync cursor so only new data is fetched on subsequent runs.

export const GithubConfigSchema = z.object({
  type: GITHUB,
  githubBaseUrl: z.string(),
  owner: z.string(),
  repo: z.string().optional(),
  labelsToSkip: z.array(z.string()).optional(),
});
export type GithubConfig = z.infer<typeof GithubConfigSchema>;

export const GithubCheckpointSchema = z.object({
  type: GITHUB,
  lastSyncedAt: z.string().optional(),
  lastIssueNumber: z.number().optional(),
});
export type GithubCheckpoint = z.infer<typeof GithubCheckpointSchema>;

3. Add to discriminated unions

export const ConnectorConfigSchema = z.discriminatedUnion("type", [
  JiraConfigSchema,
  ConfluenceConfigSchema,
  GithubConfigSchema, // <-- add here
]);

export const ConnectorCheckpointSchema = z.discriminatedUnion("type", [
  JiraCheckpointSchema,
  ConfluenceCheckpointSchema,
  GithubCheckpointSchema, // <-- add here
]);

No changes needed to ConnectorDocument, ConnectorSyncBatch, or the Connector interface -- they are connector-agnostic.

Connector Implementation

Create a new directory backend/src/knowledge-base/connectors/github/ with a github-connector.ts file.

The Connector interface

Every connector must implement four methods:

MethodPurpose
validateConfig(config)Parse raw config with the Zod schema, run domain-specific checks (e.g., URL format). Return { valid, error? }
testConnection({ config, credentials })Make a lightweight API call to verify credentials work. Return { success, error? }
estimateTotalItems({ config, credentials, checkpoint })Return an estimated total item count for progress display, or null if unknown. The base class returns null by default — override to enable progress tracking.
sync({ config, credentials, checkpoint })Async generator that yields ConnectorSyncBatch objects, each containing documents and an updated checkpoint

Example implementation

import type {
  ConnectorCredentials,
  ConnectorDocument,
  ConnectorSyncBatch,
  GithubCheckpoint,
  GithubConfig,
} from "@/types/knowledge-connector";
import { GithubConfigSchema } from "@/types/knowledge-connector";
import { BaseConnector } from "../base-connector";

const BATCH_SIZE = 50;

export class GithubConnector extends BaseConnector {
  type = "github" as const;

  async validateConfig(
    config: Record<string, unknown>,
  ): Promise<{ valid: boolean; error?: string }> {
    const parsed = GithubConfigSchema.safeParse({ type: "github", ...config });
    if (!parsed.success) {
      return { valid: false, error: "Invalid GitHub configuration" };
    }
    if (!/^https?:\/\/.+/.test(parsed.data.githubBaseUrl)) {
      return {
        valid: false,
        error: "githubBaseUrl must be a valid HTTP(S) URL",
      };
    }
    return { valid: true };
  }

  async testConnection(params: {
    config: Record<string, unknown>;
    credentials: ConnectorCredentials;
  }): Promise<{ success: boolean; error?: string }> {
    // Make a lightweight API call to verify credentials
    try {
      const response = await this.fetchWithRetry(
        "https://api.github.com/user",
        {
          headers: {
            Authorization: `Bearer ${params.credentials.apiToken}`,
            Accept: "application/vnd.github.v3+json",
          },
        },
      );
      if (!response.ok) {
        return { success: false, error: `HTTP ${response.status}` };
      }
      return { success: true };
    } catch (error) {
      const message = error instanceof Error ? error.message : String(error);
      return { success: false, error: `Connection failed: ${message}` };
    }
  }

  async *sync(params: {
    config: Record<string, unknown>;
    credentials: ConnectorCredentials;
    checkpoint: Record<string, unknown> | null;
    startTime?: Date;
    endTime?: Date;
  }): AsyncGenerator<ConnectorSyncBatch> {
    const parsed = GithubConfigSchema.safeParse({
      type: "github",
      ...params.config,
    });
    if (!parsed.success) {
      throw new Error("Invalid GitHub configuration");
    }
    const config = parsed.data;
    const checkpoint = (params.checkpoint as GithubCheckpoint | null) ?? {
      type: "github" as const,
    };

    let page = 1;
    let hasMore = true;

    while (hasMore) {
      await this.rateLimit();

      const url = `${config.githubBaseUrl}/repos/${config.owner}/${config.repo}/issues?page=${page}&per_page=${BATCH_SIZE}&state=all&sort=updated&direction=asc`;
      const response = await this.fetchWithRetry(url, {
        headers: {
          Authorization: `Bearer ${params.credentials.apiToken}`,
          Accept: "application/vnd.github.v3+json",
        },
      });

      const issues = await response.json();
      const documents: ConnectorDocument[] = issues.map((issue: any) => ({
        id: String(issue.number),
        title: issue.title,
        content: `# ${issue.title}\n\n${issue.body ?? ""}`,
        sourceUrl: issue.html_url,
        metadata: { number: issue.number, state: issue.state },
        updatedAt: new Date(issue.updated_at),
      }));

      hasMore = issues.length >= BATCH_SIZE;
      page++;

      yield {
        documents,
        checkpoint: {
          type: "github",
          lastSyncedAt: new Date().toISOString(),
          lastIssueNumber:
            issues.length > 0
              ? issues[issues.length - 1].number
              : checkpoint.lastIssueNumber,
        },
        hasMore,
      };
    }
  }
}

BaseConnector utilities

BaseConnector provides helpers you should use rather than reimplementing:

MethodPurpose
fetchWithRetry(url, options, maxRetries?)Fetch with exponential backoff, automatic timeout (30s), and retry on 429/5xx or network errors
rateLimit()Sleep for the configured delay (default 100ms) between API calls to avoid rate limits
joinUrl(base, path)Normalize and join URL parts
buildBasicAuthHeader(email, token)Build a Basic auth header

The async generator pattern

The sync method is an AsyncGenerator<ConnectorSyncBatch>. Each yield emits a batch of documents plus an updated checkpoint. The runtime persists the checkpoint after each batch, so if a sync is interrupted, it resumes from the last successful batch.

Key points:

  • Call await this.rateLimit() before each API call
  • Set hasMore: true on intermediate batches, false on the final one
  • Always include the type field in the checkpoint object
  • The checkpoint is opaque to the runtime; only your connector reads it

Connector Registry

Register the connector in backend/src/knowledge-base/connectors/registry.ts:

import { GithubConnector } from "./github/github-connector";

const connectorRegistry: Record<ConnectorType, () => Connector> = {
  jira: () => new JiraConnector(),
  confluence: () => new ConfluenceConnector(),
  github: () => new GithubConnector(), // <-- add here
};

The Record<ConnectorType, ...> type ensures TypeScript will error if you add a new type to the union but forget to register a factory.

Frontend Config Fields

Create frontend/src/app/knowledge-bases/_parts/github-config-fields.tsx. This component renders form fields for the connector-specific config. It receives a react-hook-form UseFormReturn and an optional field name prefix (defaults to "config").

"use client";

import type { UseFormReturn } from "react-hook-form";
import {
  FormControl,
  FormDescription,
  FormField,
  FormItem,
  FormLabel,
  FormMessage,
} from "@/components/ui/form";
import { Input } from "@/components/ui/input";

interface GithubConfigFieldsProps {
  form: UseFormReturn<any>;
  prefix?: string;
}

export function GithubConfigFields({
  form,
  prefix = "config",
}: GithubConfigFieldsProps) {
  return (
    <div className="space-y-4">
      <FormField
        control={form.control}
        name={`${prefix}.githubBaseUrl`}
        rules={{ required: "Base URL is required" }}
        render={({ field }) => (
          <FormItem>
            <FormLabel>Base URL</FormLabel>
            <FormControl>
              <Input placeholder="https://api.github.com" {...field} />
            </FormControl>
            <FormDescription>
              GitHub API base URL. Use https://api.github.com for GitHub.com.
            </FormDescription>
            <FormMessage />
          </FormItem>
        )}
      />
      {/* Add fields for owner, repo, etc. */}
    </div>
  );
}

Wire into the create connector dialog

In frontend/src/app/knowledge-bases/_parts/create-connector-dialog.tsx:

  1. Import the new config fields component.
  2. Add a <SelectItem> for the new connector type.
  3. Add a conditional render for the config step.
import { GithubConfigFields } from "./github-config-fields";

// In the SelectContent:
<SelectItem value="github">GitHub</SelectItem>;

// In the config step:
{
  step === 1 && connectorType === "github" && (
    <GithubConfigFields form={form} />
  );
}

Update the CreateConnectorFormValues type to include the new connector type in the connectorType union.

Database Schema

The database schema in backend/src/database/schemas/knowledge-base-connector.ts does not need changes when adding a new connector. The config and checkpoint columns use jsonb typed with the discriminated union types, so any new variant is stored automatically.

If your connector needs a migration (e.g., a new column), follow the standard Drizzle migration workflow described in CLAUDE.md.

Testing

Create backend/src/knowledge-base/connectors/github/github-connector.test.ts. Mock the external SDK or HTTP calls; test the three interface methods.

Structure your test file with three describe blocks matching the interface:

BlockWhat to test
validateConfigValid config returns { valid: true }, missing required fields return errors, URL format validation
testConnectionSuccessful API response returns { success: true }, auth failures return errors, invalid config returns errors
syncSingle-page results, pagination across multiple pages, incremental sync using checkpoint, label/filter exclusion, document metadata mapping, API errors propagate

Use vi.mock() to mock the external client library. See backend/src/knowledge-base/connectors/jira/jira-connector.test.ts for a complete example.

Reference Implementations

ConnectorFiles
Jirabackend/src/knowledge-base/connectors/jira/jira-connector.ts, frontend/src/app/knowledge/knowledge-bases/_parts/jira-config-fields.tsx
Confluencebackend/src/knowledge-base/connectors/confluence/confluence-connector.ts, frontend/src/app/knowledge/knowledge-bases/_parts/confluence-config-fields.tsx
GitHubbackend/src/knowledge-base/connectors/github/github-connector.ts, frontend/src/app/knowledge/knowledge-bases/_parts/github-config-fields.tsx
GitLabbackend/src/knowledge-base/connectors/gitlab/gitlab-connector.ts, frontend/src/app/knowledge/knowledge-bases/_parts/gitlab-config-fields.tsx

The Jira connector is the best starting point -- it demonstrates both Cloud and Server API handling, ADF text extraction, comment filtering, and JQL-based incremental sync. The GitHub and GitLab connectors demonstrate using official SDKs (@octokit/rest and @gitbeaker/rest) with separate issue/PR sync passes and label filtering.