The five AI call tracking scoring dimensions and the testing approach behind every review on ShadowVoices.
Each AI call tracking platform was evaluated through three channels.
Each platform scored on five AI call tracking dimensions, equally weighted.
How fast the call audio turns into a usable transcript. We measured p50 and p95 latency on the 820-call benchmark corpus. Sub-300ms p50 cleared the bar. 800ms p50 marked an acceptable floor for real-time workflows. Anything above 1.5s p50 got marked down.
How well the platform's model labels calls. We measured F1 score on a held-out subset of the corpus, with ground-truth labels manually annotated by two reviewers. We also measured the granularity of the intent taxonomy. Generic models that ship 6 to 8 intent labels score lower than fine-tuned vertical models that ship 40 to 60.
How fast and how deeply the AI signal flows back to Google Ads, Meta, and TikTok as conversion events. We measured event-fire latency from call-end to ad-platform pickup. We also measured the granularity of events supported (basic conversion event vs custom-event taxonomies). Reference for the integration patterns: Google's official call assets and conversion documentation.
The cost of provisioning and keeping tracking numbers at network scale. The dominant variable for most cost-sensitive operators. We measured published rates where available and quoted rates where not. Hidden floors, minimum spend rules, and tier-only discounts also captured.
Whether an operator can provision a tracking number and validate the workflow without talking to sales. Self-serve cleared the bar; sales-led got marked down. Trial-to-paid path also factored in. PAYG-style $0 trials counted as best-in-class. Annual-contract-only counted as worst.
Generic CRM integration count was not scored. Brand recognition was not scored. Number of "AI-powered features" listed on the marketing site was not scored. None of those correlate with operator-fit for the audience this site serves.
Annual full report with quarterly updates when major platform releases shift the rankings. Open-source transcription models keep moving fast, so the latency benchmarks especially get revisited each quarter. Wikipedia's speech analytics article covers the broader category history for context on where the technology came from.
Further reading: schema.org Review markup specification · Wikipedia entry on software review