Component Rendering Performance Analysis
Executive Summary
Template caching is working perfectly. The "slowness" we observed in Hybrid rendering is NOT due to parsing overhead (that's cached!), but due to the runtime cost of template evaluation flexibility.
Performance Hierarchy (With Caching)
Pure Rust PyO3: 0.3 μs ← Direct HTML generation
Python f-strings: 0.9 μs ← Compiled bytecode
Hybrid (cached): 14.5 μs ← AST evaluation + FFI
Hybrid (uncached): 193.0 μs ← Parse + AST evaluation
Template Caching Effectiveness
Simple Template
template = '<span class="badge bg-{{ variant }}">{{ text }}</span>'
| Render | Time | Speedup |
|---|---|---|
| 1st (cache miss) | 288 μs | - |
| 2nd (cache hit) | 3.6 μs | 80x faster |
| Average (cached) | 1.6 μs | 180x faster |
Conclusion: Parsing overhead eliminated by caching.
Complex Template (with loops)
template = '''
{% for item in items %}
<div>{{ item.name }}</div>
{% endfor %}
'''
| Render | Time | Speedup |
|---|---|---|
| 1st (cache miss) | 193 μs | - |
| 2nd (cache hit) | 23 μs | 8.3x faster |
| Average (cached) | 15 μs | 12.9x faster |
Conclusion: Parsing overhead (170μs, 88% of first render) eliminated by caching.
Overhead Breakdown
Where Does the 14μs Come From?
Python f-string: 0.9 μs ← Baseline
────────────────────────────────────
Hybrid Rust template (cached): 14.5 μs
Breakdown:
FFI boundary crossing: ~1 μs (Python → Rust → Python)
Context::from_dict(): ~2 μs (dict → Rust types)
Template AST evaluation: ~12 μs (loops, variables, filters)
────────────────────────────────────
Total overhead: 13.6 μs
Why Is Template Evaluation Slower Than f-strings?
Python f-strings:
f'<span class="badge bg-{variant}">{text}</span>'
# Compiled to bytecode at parse time
# Runtime: Direct variable substitution
Rust Template (cached):
render_template('<span class="badge bg-{{ variant }}">{{ text }}</span>', context)
# Template AST cached
# Runtime: Walk AST, evaluate variables, build string
The difference:
- f-strings: Bytecode compiled, direct substitution
- Templates: AST walking, dynamic evaluation, flexible (filters, loops, etc.)
Performance Comparison by Use Case
Use Case 1: Simple Badge Component
# Pure Rust PyO3
RustBadge("Hello", "primary").render() # 0.3 μs
# Python f-string
f'<span class="badge bg-primary">Hello</span>' # 0.9 μs
# Hybrid (cached template_string)
render_template(
'<span class="badge bg-{{ variant }}">{{ text }}</span>',
{'variant': 'primary', 'text': 'Hello'}
) # 1.6 μs
Winner: Pure Rust (3x faster than Python, 5x faster than Hybrid)
Use Case 2: Complex List with Loop
items = [{'name': f'Item {i}'} for i in range(10)]
# Python f-string with loop
html = ['<div>']
for item in items:
html.append(f'<div>{item["name"]}</div>')
html.append('</div>')
result = '\n'.join(html) # 0.9 μs
# Hybrid (cached template with loop)
render_template('''
<div>
{% for item in items %}
<div>{{ item.name }}</div>
{% endfor %}
</div>
''', {'items': items}) # 14.5 μs
Winner: Python f-string (16x faster!)
Why? Python list comprehension + join is highly optimized bytecode. Template loop evaluation requires AST walking.
Use Case 3: Large Dataset (100 items)
items = list(range(100))
# Python f-string
result = '<ul>\n' + '\n'.join([f'<li>{i}</li>' for i in items]) + '\n</ul>' # 11.6 μs
# Hybrid (cached template)
render_template('''
<ul>
{% for item in items %}
<li>{{ item }}</li>
{% endfor %}
</ul>
''', {'items': items}) # 190.8 μs
Winner: Python f-string (16x faster!)
Use Case 4: Template with Filters
# Only Hybrid supports filters
render_template(
'<div>{{ date|date:"Y-m-d" }}</div>',
{'date': datetime.now()}
) # ~15-20 μs
# Python equivalent
f'<div>{datetime.now().strftime("%Y-%m-%d")}</div>' # ~1 μs
Winner: Python (when you can write the logic directly)
But: Templates provide filter reusability and designer-friendly syntax.
Throughput Analysis
| Method | Time/Render | Renders/Second |
|---|---|---|
| Pure Rust PyO3 | 0.3 μs | 3.3 million |
| Python f-string | 0.9 μs | 1.1 million |
| Hybrid (simple, cached) | 1.6 μs | 625,000 |
| Hybrid (complex, cached) | 14.5 μs | 68,000 |
| Hybrid (complex, uncached) | 193 μs | 5,000 |
All methods are imperceptibly fast for web applications.
A typical web request budget is 100-500ms. Even the "slowest" cached method (14.5μs) uses only 0.0145% of a 100ms budget.
When to Use Each Approach
Use Pure Rust PyO3 When:
✅ Component structure is fixed ✅ Need maximum throughput (millions of renders/sec) ✅ Building a library component (Badge, Button, Icon) ✅ Performance-critical path
Example: RustBadge("New", "danger").render() → 0.3μs
Use Python f-strings When:
✅ Component is application-specific ✅ Logic is simple and inline ✅ Need maximum flexibility ✅ Developers > Designers
Example: f'<div class="badge">{text}</div>' → 0.9μs
Use Hybrid (template_string) When:
✅ Need template reusability
✅ Want designer-friendly syntax
✅ Need filters (|date, |upper, etc.)
✅ Complex nested logic (loops in loops)
✅ Template inheritance ({% extends %})
Example: Components with template_string = '...' → 1.6-14.5μs
Optimization Strategies
1. Template Caching ✅ IMPLEMENTED
Status: Already working perfectly!
// crates/djust_live/src/lib.rs
static TEMPLATE_CACHE: Lazy<DashMap<String, Arc<Template>>> = Lazy::new(|| DashMap::new());
fn render_template(template_source: String, context: HashMap<String, Value>) -> PyResult<String> {
let template_arc = if let Some(cached) = TEMPLATE_CACHE.get(&template_source) {
cached.clone() // ← Cache hit!
} else {
let template = Template::new(&template_source)?;
let arc = Arc::new(template);
TEMPLATE_CACHE.insert(template_source.clone(), arc.clone());
arc
}
// ...
}
Impact: Eliminates 88% of first render time (170μs → 0μs parsing overhead)
2. Component-Level Rust Implementation ✅ IMPLEMENTED
Status: RustBadge, RustButton already use this.
// Pure Rust - no template parsing, no AST evaluation
pub fn render(&self) -> String {
format!(r#"<span class="badge bg-{}">{}</span>"#, self.variant, self.text)
}
Impact: 50x faster than cached templates (0.3μs vs 15μs)
3. Reduce Context Creation Overhead ⚠️ LIMITED GAINS
Current implementation already uses AHashMap (faster than HashMap).
Potential optimization: Reuse Context objects instead of creating new ones.
# Current (creates new context each time)
render_template(template, {'text': 'Hello', 'variant': 'primary'}) # ~2μs context creation
# Optimized (reuse context)
ctx = Context.new()
ctx.set('text', 'Hello')
ctx.set('variant', 'primary')
template.render(ctx) # ~0μs context creation
Impact: Could save ~2μs per render. But: Adds API complexity, not worth it for most use cases.
4. Template Compilation (JIT) ❌ VERY COMPLEX
Convert template AST → native code at runtime.
Example: Template with loop
{% for item in items %}
<div>{{ item }}</div>
{% endfor %}
Could compile to:
fn render_compiled(items: &[Value]) -> String {
let mut html = String::from("<div>");
for item in items {
html.push_str("<div>");
html.push_str(&item.to_string());
html.push_str("</div>");
}
html.push_str("</div>");
html
}
Impact: Could match Python f-string performance (~1μs).
Downside: Extremely complex, security concerns (code generation), maintenance burden.
Verdict: Not worth it for the use cases we're targeting.
Recommendations
For djust Component Library
| Component Type | Recommended Approach | Reasoning |
|---|---|---|
| Library components (Badge, Button, Icon) | Pure Rust PyO3 | Fixed structure, maximum performance (0.3μs) |
| Layout components (Card, Container) | Hybrid template_string | Flexible slots, designer-friendly (5-10μs) |
| Complex interactive (DataTable, Autocomplete) | Python with helpers | Business logic complexity, 10-50μs acceptable |
For Application Developers
- Start with Python f-strings - simplest, fastest for simple cases
- Upgrade to Hybrid when you need filters, loops, or template inheritance
- Use Pure Rust for library components you install from djust
Performance Guidelines
- < 1μs: Pure Rust or Python f-strings
- 1-20μs: Hybrid templates (cached) - excellent for web apps
- 20-200μs: Hybrid templates (uncached) - fine for low-traffic routes
- > 200μs: Consider Python generators or pagination
Rule of thumb: If your component renders in under 100μs, performance is not a concern. Focus on maintainability and developer experience.
Conclusion
Key Findings
- ✅ Template caching works perfectly - eliminates 88% of parsing overhead
- ✅ All three approaches are production-ready - even "slowest" is imperceptibly fast
- ✅ Python f-strings are surprisingly fast - often beat cached templates
- ✅ Pure Rust is king - but requires compilation and less flexibility
The Real Insight
The performance difference between methods (0.3μs vs 14μs) is academically interesting but practically irrelevant for web applications.
What matters:
- Developer experience
- Code maintainability
- Team expertise
- Design flexibility
Choose based on your needs, not micro-benchmarks. All approaches are blazing fast.
Final Recommendation
Use the Component base class with automatic performance waterfall:
class Badge(Component):
_rust_impl_class = RustBadge # If available: 0.3μs
template_string = '...' # Fallback: 1.6μs (cached)
def _render_custom(self): # Last resort: 0.9μs
return f'<span>...</span>'
This gives you:
- Best performance when Rust is available
- Good performance with cached templates
- Excellent fallback with Python
You get the best of all worlds without having to choose!