Voice OTP delivery places an outbound call from your Sautikit number to the user's phone and reads a numeric code aloud, typically twice so the user can write it down. The call is placed via POST /v1/calls, and a voice-action webhook URL supplies the Say actions that speak the code. The OTP itself lives on your server; Sautikit handles only the call and the text-to-speech.
Voice OTP is not a replacement for authenticator apps or hardware tokens. It is a pragmatic channel when the user's phone must be the authentication device but SMS is unavailable: for example, when the user is roaming on a network with poor message routing, or when local regulations require explicit voice confirmation.
Speak each digit individually with a period separator so the TTS engine pauses between digits. Avoid speaking the code as a continuous number ("eight hundred forty-seven thousand..."). A format like "8. 4. 7. 2. 9. 3." works well across most TTS voices.
You can also play a brief introductory prompt before the code: "This is a verification call from Acme Corp." This reduces the chance that the user hangs up thinking it is spam.
Generate the OTP once, then attempt delivery in order:
Your server controls the retry logic. Sautikit reports call status (answered, no-answer, busy, failed) to your status_url so you can trigger the next step programmatically.
Endpoints you call:
POST /v1/calls: initiate the outbound call.GET /v1/calls/{call_sid}: poll or confirm delivery status after the call.Voice actions used:
Say: speak the OTP digits aloud using text-to-speech.Hangup: end the call after the code is spoken.There is intentionally no GetDigits in this flow. The user does not confirm the code by pressing keys; they hear it and type it into your application's form. Keeping the call short reduces billed duration and user friction.
curl -X POST "https://api.sautikit.com/v1/calls" \
-H "Authorization: Bearer $SAUTIKIT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"to": "+254722123456",
"from": "+254700000001",
"action_url": "https://yourapp.example.com/otp/speak?token=abc123",
"status_url": "https://yourapp.example.com/otp/status"
}'The token query parameter is an opaque handle your server uses to look up the OTP without embedding it in the URL.
import express from "express";
import { lookupOtp } from "./otp-store"; // your OTP lookup logic
const app = express();
app.use(express.json());
// Sautikit calls this when the user answers
app.post("/otp/speak", async (req, res) => {
const token = req.query.token;
const otp = await lookupOtp(token); // e.g. "847293"
if (!otp) {
// OTP expired or not found; hang up gracefully
return res.json({
actions: [
{ say: { text: "We could not find your verification code. Please request a new one." } },
{ hangup: {} },
],
});
}
// Format code for speech: "8. 4. 7. 2. 9. 3."
const spoken = otp.split("").join(". ") + ".";
return res.json({
actions: [
{
say: {
text: `This is a verification call from Acme Corp. Your code is: ${spoken} I will repeat: ${spoken}`,
language: "en-US",
},
},
{ hangup: {} },
],
});
});
// Sautikit posts final call status here
app.post("/otp/status", (req, res) => {
const { CallId, CallStatus } = req.body;
console.log(`Call ${CallId} ended with status: ${CallStatus}`);
// "completed", "no-answer", "busy", "failed"
res.sendStatus(204);
});
app.listen(3000);The call is independent of your normal authentication flow. After the call ends, the user submits the code they heard in your web or mobile form. Your server validates it against the stored OTP, applying the same expiry and attempt limits you use for SMS OTP.
Voice OTP calls are short, typically under 30 seconds once answered. Billing is per-minute (rounded up to the nearest billing increment), so a 25-second call where the user hears the code and hangs up costs less than half a minute.
Factors that affect cost:
Say actions.For an OTP programme with high volume, model your cost as: (calls placed) × (average call duration in minutes) × (per-minute rate for your destination country). Not-answered calls are a meaningful fraction of total call attempts; a realistic answer rate is 60–80%.
action_url vs status_url, SID references.