Benchmarks
Transparent, reproducible performance comparisons
Recursive Text Splitting
Document chunking throughput with UTF-8 safe separators
| Input Size | Time (avg) | Throughput | Notes |
|---|---|---|---|
| 16 KB | 70.8 us | 221 MiB/s | Baseline |
| 128 KB | 572 us | 218 MiB/s | Consistent scaling |
| 1 MB | 4.97 ms | 201 MiB/s | Large input overhead |
vs Python: ~2-4x faster
Typical Python recursive text splitters achieve ~50-100 MiB/s. Wesichain's Rust implementation peaks at 221 MiB/s in the current artifact snapshot.
Methodology: Measured with Criterion on macOS. Chunk size: 1000 chars, overlap: 200 chars.
Run: cargo bench -p wesichain-retrieval --bench recursive_splitter
Connector Payload Microbenchmarks
Local payload construction snapshots for Qdrant and Weaviate connectors
| Benchmark | Wesichain mean | Baseline mean | Delta |
|---|---|---|---|
| Qdrant payload construction | 0.801 ms | 0.998 ms | 1.25x faster |
| Weaviate payload construction | 1.099 ms | 1.450 ms | 1.32x faster |
Caveat: these snapshots are single local runs on synthetic datasets and benchmark local payload
construction only (no live network call timing). Source files: wesichain/docs/benchmarks/data/qdrant-2026-02-16.json, wesichain/docs/benchmarks/data/weaviate-2026-02-16.json.
Test Parameters
Text Splitter
- • Default separators:
["\n\n", "\n", " ", ""] - • Chunk size: 1000 characters
- • Overlap: 200 characters
- • Character-based: UTF-8 safe
Environment
- • Platform: macOS (Darwin)
- • Rust version: 1.92.0 in latest connector snapshots
- • Optimization: --release
- • CPU: Apple Silicon M-series
Reproducing These Results
All benchmarks are open source and reproducible. Run them yourself:
git clone https://github.com/wesichain/wesichain.git
cd wesichain
cargo bench -p wesichain-retrieval --bench recursive_splitter
cargo bench -p wesichain-qdrant --bench vs_langchain -- --sample-size 10
cargo bench -p wesichain-weaviate --bench vs_langchain -- --sample-size 10
Results saved to target/criterion/ for detailed analysis with confidence intervals.