Soniox

The Soniox adapter provides access to Soniox transcription models.

Installation

shell

npm install @soniox/tanstack-ai-adapter

Authentication

Set SONIOX_API_KEY in your environment or pass apiKey when creating the adapter. Get your API key from the Soniox Console.

Basic Usage

import { generateTranscription } from "@tanstack/ai";
import { sonioxTranscription } from "@soniox/tanstack-ai-adapter";

const result = await generateTranscription({
  adapter: sonioxTranscription("stt-async-v3"),
  audio: audioFile,
  modelOptions: {
    enableLanguageIdentification: true,
    enableSpeakerDiarization: true,
  },
});

console.log(result.text);
console.log(result.segments);

Basic Usage - Custom API Key

import { generateTranscription } from "@tanstack/ai";
import { createSonioxTranscription } from "@soniox/tanstack-ai-adapter";

const adapter = createSonioxTranscription("stt-async-v3", process.env.SONIOX_API_KEY!);

const result = await generateTranscription({
  adapter,
  audio: audioFile,
});

Adapter Configuration

Use createSonioxTranscription to customize the adapter instance:

import { createSonioxTranscription } from "@soniox/tanstack-ai-adapter";

const adapter = createSonioxTranscription("stt-async-v3", process.env.SONIOX_API_KEY!, {
  baseUrl: "https://api.soniox.com",
  pollingIntervalMs: 1000,
  timeout: 180000,
});

Options:

apiKey - Override SONIOX_API_KEY (required when using createSonioxTranscription)
baseUrl - Custom API base URL. Default is https://api.soniox.com
headers - Additional request headers
timeout - Transcription timeout in milliseconds (default: 180000)
pollingIntervalMs - Transcription polling interval in milliseconds (default: 1000)
See the Soniox regional endpoints if you need data residency.

Transcription Options

Per-request options are passed via modelOptions:

const result = await generateTranscription({
  adapter: sonioxTranscription("stt-async-v3"),
  audio,
  modelOptions: {
    languageHints: ["en", "es"],
    enableLanguageIdentification: true,
    enableSpeakerDiarization: true,
    context: {
      terms: ["Soniox", "TanStack"],
    },
  },
});

Available options:

languageHints - Array of ISO language codes to bias recognition
languageHintsStrict - When true, rely more heavily on language hints (not supported by all models)
enableLanguageIdentification - Automatically detect spoken language
enableSpeakerDiarization - Identify and separate different speakers
context - Additional context to improve accuracy
clientReferenceId - Optional client-defined reference ID
webhookUrl - Webhook URL for completion notifications
webhookAuthHeaderName - Webhook authentication header name
webhookAuthHeaderValue - Webhook authentication header value
translation - Translation configuration
Check the Soniox API reference for more details.

Language Hints

Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide languageHints to improve accuracy by biasing recognition toward those languages.

Language hints do not restrict recognition. If you pass the TanStack language option, this adapter merges it into languageHints.

const result = await generateTranscription({
  adapter: sonioxTranscription("stt-async-v3"),
  audio,
  modelOptions: {
    languageHints: ["en", "es"],
  },
});

For more details, see the Soniox language hints documentation.

Context

Provide custom context to improve transcription and translation accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary.

The context object supports four optional sections:

const result = await generateTranscription({
  adapter: sonioxTranscription("stt-async-v3"),
  audio,
  modelOptions: {
    context: {
      general: [
        { key: "domain", value: "Healthcare" },
        { key: "topic", value: "Diabetes management consultation" },
        { key: "doctor", value: "Dr. Martha Smith" },
      ],
      text: "The patient has a history of...",
      terms: ["Celebrex", "Zyrtec", "Xanax"],
      translationTerms: [
        { source: "Mr. Smith", target: "Sr. Smith" },
        { source: "MRI", target: "RM" },
      ],
    },
  },
});

For more details, see the Soniox context documentation.

Translation

Configure translation for your transcriptions:

const result = await generateTranscription({
  adapter: sonioxTranscription("stt-async-v3"),
  audio,
  modelOptions: {
    translation: {
      type: "one_way",
      targetLanguage: "es",
    },
  },
});

For two-way translation:

const result = await generateTranscription({
  adapter: sonioxTranscription("stt-async-v3"),
  audio,
  modelOptions: {
    translation: {
      type: "two_way",
      languageA: "en",
      languageB: "es",
    },
  },
});

When using translation, the API returns both transcription tokens and translation tokens. The segments array always includes only transcription tokens. To access translation tokens, use providerMetadata and filter by translation_status === "translation".

Accessing Raw Tokens

When using translation or working with multilingual audio, you may need access to raw tokens with per-token language information and translation status. The adapter attaches a non-standard providerMetadata field at runtime:

const result = await generateTranscription({
  adapter: sonioxTranscription("stt-async-v3"),
  audio,
  modelOptions: {
    translation: { type: "one_way", targetLanguage: "es" },
  },
});

const rawTokens = (result as any).providerMetadata?.soniox?.tokens;

if (rawTokens) {
  rawTokens.forEach((token) => {
    // token.text - token text
    // token.start_ms - start time in milliseconds
    // token.end_ms - end time in milliseconds
    // token.language - detected language for this token
    // token.translation_status - translation status (if translation enabled)
    // token.speaker - speaker identifier
    // token.confidence - confidence score
  });
}