SMS OTP delivery rates in Kenya dip below 80% during peak hours: SIM swaps, tower congestion, and MSISDN porting all cause failures. A voice OTP fallback eliminates most of those failures because a phone call reaches the device regardless of data network state. This guide walks through a complete implementation: generate a cryptographically secure OTP, place an outbound call with Sautikit, read the digits aloud with Say, collect the response with GetDigits, and verify it via webhook, all in under 50 lines of Node.js.
SMS OTP is cheap and familiar. Voice OTP adds a second delivery channel on a different network path: the PSTN voice network is independent of the SMS message centre. When a user's data SIM is congested or their SMS inbox is full, a voice call still lands.
The practical use case in Kenya is "try SMS first, fall back to voice after 60 seconds of no confirmation". This covers:
Safaricom congestion windows: SMS delivery on Safaricom can queue during peak M-Pesa traffic (lunch hour, salary day)
SIM swap fraud detection: a recently swapped SIM may not receive SMS for hours; a voice call to the registered number reaches the new handset
Feature phones: some users have a registered data SIM number but answer calls on a second (voice-only) SIM
Voice OTP is not a replacement for SMS. It is a fallback that prevents lock-out.
A 6-digit numeric OTP has log2(10^6) ≈ 20 bits of entropy, enough for a short-lived token with a lockout after 3 failed attempts, per NIST SP 800-63B section 5.1.4. Use cryptographically secure random generation:
import { randomInt } from 'crypto';function generateOTP(digits = 6) { // randomInt(min, max) is cryptographically secure in Node.js >= 14.10 const min = Math.pow(10, digits - 1); const max = Math.pow(10, digits) - 1; return String(randomInt(min, max + 1)).padStart(digits, '0');}
Store the OTP hashed, not plaintext. A SHA-256 hash is sufficient for a short-lived numeric token:
import { createHash } from 'crypto';function hashOTP(otp) { return createHash('sha256').update(otp).digest('hex');}// Store in your database:await db.query(` INSERT INTO otp_tokens (phone_e164, otp_hash, call_id, expires_at, attempts) VALUES ($1, $2, $3, NOW() + INTERVAL '5 minutes', 0)`, [phoneE164, hashOTP(otp), callId]);
Expire tokens after 5 minutes and after 3 failed verification attempts.
When a call is answered, Sautikit POSTs to your voice_callback_url and waits for instructions. Your handler returns a JSON voice-action sequence that reads the OTP aloud and collects the user's response.
Critical detail on digit announcement: the Say verb must read digits individually, not as a number. "482163" read as a number sounds like "four hundred eighty-two thousand one hundred sixty-three". Read individually, with a pause character between each digit, it sounds like "4... 8... 2... 1... 6... 3". On low-bitrate audio (Safaricom 3G), individual digits with pauses reduce mishear rates significantly.
// POST /voice: called by Sautikit when the call is answeredapp.post('/voice', (req, res) => { const otp = req.app.locals.pendingOTPs.get(req.body.sessionId); // Read each digit individually with a comma (pause) between them. // E.g. for OTP "482163": "4, 8, 2, 1, 6, 3" const spokenDigits = otp.split('').join(', '); res.json({ actions: [ { say: { // Loop 3 times so the user hears it even if they're not ready. text: `Your verification code is: ${spokenDigits}. I will repeat. ${spokenDigits}. ${spokenDigits}. Please enter your code now.`, language: 'en-KE', loop: 1 // Already repeated inline above } }, { getDigits: { numDigits: 6, // Match OTP length exactly; stops listening immediately timeout: 8000, // 8 seconds; mobile users need more time (see below) finishOnKey: '#', // Optional: user can press # to confirm early action: 'https://your-app.example.com/verify-otp' } }, { // Fallback if user enters nothing say: { text: 'We did not receive your input. Please try again or request a new code.', language: 'en-KE' } }, { hangup: {} } ] });});
The numDigits: 6 parameter is important: Sautikit stops collecting DTMF input the moment the 6th digit is pressed, without waiting for the timeout. This reduces the perceived wait and lowers the rate of "no input" webhook events for users who enter the code promptly.
A/B data from Kenyan IVR deployments shows that increasing GetDigitstimeout from 5 000 ms (the default) to 8 000 ms reduces "no input received" events by approximately 23%. The explanation is cognitive and network latency: users on slow networks hear the audio prompt slightly later than it was sent, and then need a moment to locate the keypad on their handset. Eight seconds is generous but not annoying: the call ends faster for users who enter digits quickly (because numDigits stops the wait) and gives enough room for slower users on congested links.
Not every outbound OTP call will be answered. Sautikit fires a call.completed webhook with status: no-answer when the call times out (typically after 30 seconds of ringing). Your handler should schedule a retry:
// POST /call-events: Sautikit call lifecycle webhookapp.post('/call-events', async (req, res) => { const { event, callId, status, to } = req.body; if (event === 'call.completed' && status === 'no-answer') { const retryCount = await getOTPRetryCount(to); if (retryCount < 2) { // Wait 30 seconds then try again setTimeout(() => placeOTPCall(to), 30_000); } else { // Mark the OTP attempt as failed after 2 retries await markOTPFailed(to); } } res.sendStatus(200);});
Limit retries to 2. Three calls to the same number within a few minutes is spam territory and damages your sender reputation with carriers.
A typical successful OTP call runs 25 seconds: the greeting, three digit readings with pauses, and the user entering 6 digits. At KES 3/min billed per second from the moment the call connects:
25 seconds × (KES 3 / 60 seconds) = KES 1.25 per successful call
With a 15% no-answer rate requiring one retry (another 30-second ring attempt, billed per second from the moment the call connects if answered, or for actual ringing duration if not answered):
Successful call (85%): KES 1.25
Retry after no-answer (15%): KES 1.25 + KES 0.08 (approx. 1-second ring bill for first attempt)
Expected cost per verified user ≈ KES 1.25 × 0.85 + (KES 1.33 × 0.15) ≈ KES 1.26
Compare this to SMS OTP at approximately KES 1.50/SMS from a local aggregator. Voice OTP at this rate is now cheaper than SMS OTP and adds delivery certainty for users on spotty data connections. For a product where failed OTP causes a support ticket costing KES 50–200 of agent time, voice OTP is the clear choice.
If you also want the SMS leg of the "try SMS first, fall back to voice" pattern (plus WhatsApp, USSD, and a human-agent desk for the support tickets that slip through), Helloduty provides those channels alongside Sautikit voice.