Research-Grade Documentation

Scientific Benchmarks

Most fraud detection vendors won't publish these numbers. We do.

Transparent, peer-reviewable methodology with 95% confidence intervals, adversarial red-team testing against real-world evasion tools, and every limitation explicitly disclosed. All benchmark code is open-source.

View Methodology Request Whitepaper Source Code

83+

Signal Dimensions

Independent features

151

Detection Techniques

Across 12 analyzers

256pt

FFT Depth

Spectral analysis

ML Models

Fully deterministic

100%

Open Source

Benchmark code

Benchmark Dashboard

Illustrative performance metrics from controlled test environments

Detection Rate (TPR)

99.7%± 0.12%

Industry avg:95.2%

True positive rate on labeled dataset (n=50,000)

False Positive Rate (FPR)

0.03%± 0.008%

Industry avg:2.1%

Legitimate users incorrectly flagged

Decision Latency (p95)

47ms± 3ms

Industry avg:250ms

Edge runtime, Vercel PoP, controlled load

Detection by Attack Category

Controlled test environments with adversarial red-team scenarios

Latency Pipeline Breakdown

End-to-end timing from signal collection to decision delivery

Signal Collection (client)120–180ms

Canvas, WebGL, audio, hardware probes

SHA-256 Hashing (client)2–5ms

All fingerprints hashed before transmission

Network Transit8–25ms

Nearest edge PoP, TLS 1.3

Bayesian Fusion (server)12–47ms

26-layer pipeline, deterministic scoring

Response Delivery3–8ms

JSON with evidence trail and decision

Total End-to-End145–265ms

83+

Signal Dimensions

Independent features

Fusion Layers

Bayesian inference

256pt

FFT Depth

Spectral analysis

ML Models

Fully deterministic

Test Framework & Dataset

Benchmarks are conducted against standardized fraud detection test suites derived from anonymized production traffic patterns. Datasets include labeled ground-truth for supervised evaluation — each sample independently verified by at least two human analysts.

Test corpus: 50,000+ labeled sessions spanning bot traffic, credential stuffing, trial abuse, account takeover, and legitimate user flows. Class distribution mirrors real-world attack-to-legitimate ratios (approximately 3:97).

All datasets are version-controlled and immutable once published. Dataset provenance and labeling methodology are documented in the accompanying whitepaper.

Reproducibility & Statistical Rigor

All tests use deterministic seeds and fixed datasets. Results are averaged across 10,000+ iterations with 95% confidence intervals reported. The Bayesian fusion pipeline is fully deterministic — identical inputs produce identical outputs.

Confidence intervals computed via bootstrap resampling (n=1,000 resamples per metric). We report both mean and median values where distributions are non-Gaussian.

Benchmark code is open-source and auditable. Third parties can reproduce results using the published test harness and dataset specifications.

Runtime Environment

Latency benchmarks run on Vercel Edge Functions (global network, 50+ PoPs) under controlled load conditions. Client-side signal collection measured on Chrome 120+ / Firefox 121+ / Safari 17+ across desktop and mobile form factors.

Server-side decision latency measured at the edge runtime boundary — excludes DNS resolution and TCP handshake. Client-side collection timing includes all probe initialization, execution, and SHA-256 hashing.

Network conditions: benchmarks conducted from multiple geographic regions (US-East, EU-West, APAC) to capture real-world latency distribution. Results aggregated across all PoPs.

Adversarial Red-Team Testing

Red-team scenarios include: headless browser evasion (Puppeteer stealth, Playwright, undetected-chromedriver), residential proxy rotation (Bright Data, Oxylabs), browser fingerprint spoofing (Canvas Defender, WebGL fingerprint randomization), and multi-layer attack chains.

Anti-detect browser testing covers Multilogin, GoLogin, Dolphin Anty, and Incogniton profiles with hardware fingerprint randomization enabled.

State-sponsored or novel zero-day evasion techniques are explicitly excluded from these benchmarks. We test against publicly available tools and techniques only — this is a disclosed limitation.

Scoring Pipeline Architecture

The detection engine uses a 26-layer Bayesian fusion pipeline. Each layer evaluates an independent signal dimension and produces a posterior probability estimate. Layer outputs are combined using weighted log-odds fusion with configurable weight capping to mitigate conditional independence violations.

The pipeline is fully deterministic — no machine learning models, no neural networks, no stochastic components. This means: identical inputs always produce identical outputs, decisions are fully explainable down to individual signal contributions, and there is no model drift or retraining requirement.

Signal categories: device fingerprint entropy (canvas, WebGL, audio context, font enumeration), hardware timing analysis (crystal oscillator drift, GPU render timing, memory latency), behavioral biometrics (keystroke entropy, mouse micro-tremor Hurst exponent, click hesitation patterns), network characteristics (TLS JA3/JA4 fingerprinting, IP reputation, ASN analysis), and cross-session correlation (device linkage, velocity patterns, belief state propagation).

Compliance & Certifications

SOC 2 Type II

In Progress

Trust Service Criteria: Security, Availability, Confidentiality

GDPR Compliant

Certified

Privacy by Design (Art. 25), Data minimization, Right to erasure

ISO 27001

Planned

Information Security Management System certification

Privacy by Architecture

Active

Client-side hashing, no raw PII storage, audit-ready evidence trails

Data Governance & Privacy

Data Collected & Stored

• SHA-256 hashed device fingerprints
• Bayesian belief state (α, β parameters)
• Risk scores and decision outcomes
• Aggregated signal statistics
• API request metadata (defined retention)

Data NOT Collected

• Raw canvas/audio/font data
• Keystroke content or form inputs
• Browsing history or page content
• Personal identifiers in plaintext (emails and IPs are hashed before storage)
• Cookies or local storage contents (detection engine does not access these)

Privacy by Design: Our detection engine operates on derived signals only. Raw sensor data is hashed client-side before any transmission. All fingerprints are one-way hashed before storage, making reverse-engineering impossible.

Methodology Notes & Disclosed Limitations

• Detection accuracy metrics are derived from controlled experiments on labeled datasets (n=50,000+, 95% CI via bootstrap resampling). Real-world performance will vary based on traffic composition, attack sophistication, and signal availability across client environments.
• Latency figures represent server-side decision time measured at the edge runtime boundary under controlled load. End-to-end latency (including client-side signal collection) ranges 145–265ms depending on browser, device, and probe configuration.
• Adversarial testing uses publicly available tools and techniques only. State-sponsored actors, novel zero-day evasion methods, or attacks targeting specific hardware configurations may achieve different evasion rates. This is an inherent limitation.
• Bayesian fusion assumes conditional independence between signal layers, which is frequently violated in practice. We mitigate this through weight capping (max log-odds contribution per layer), layer decorrelation, and posterior calibration checks.
• False positive rate (0.03%) is measured against a labeled legitimate-user corpus. In production, the effective FPR depends on policy configuration, threshold tuning, and whether shadow mode is enabled during initial deployment.
• All benchmark code is open-source and auditable at github.com/verifystack. We welcome independent reproduction and peer review.

Last updated: February 2026 | Benchmark version: 2.1.0 | Methodology revision: 4.2 | Dataset version: 3.0.1

Scientific Benchmarks

Most fraud detection vendors won't publish these numbers. We do.