dawarich/tracks_performance_optimization_options.md
2025-07-23 18:21:21 +02:00

235 lines
No EOL
7.1 KiB
Markdown

# Tracks Feature Performance Optimization Options
## Current State Analysis
### Performance Characteristics
- **Time Complexity:** O(n log n) where n = number of GPS points
- **Memory Usage:** Loads entire dataset into memory (~200-400 bytes per point)
- **Processing Mode:** Single-threaded, sequential segmentation
- **Database Load:** Multiple PostGIS distance calculations per point pair
### Performance Estimates (Bulk Mode)
| Points | Processing Time | Memory Usage | Database Load |
|--------|----------------|--------------|---------------|
| 10K | 30-60 seconds | ~50 MB | Low |
| 100K | 5-15 minutes | ~200 MB | Medium |
| 1M+ | 30-90 minutes | 400+ MB | High |
### Current Bottlenecks
1. **Memory constraints** - Loading all points at once
2. **PostGIS distance calculations** - Sequential, not optimized
3. **Single-threaded processing** - No parallelization
4. **No progress indication** - Users can't track long-running operations
---
## Optimization Options
### Option 1: Enhanced Time-Based Batching
**Complexity:** Low | **Impact:** High | **Risk:** Low
#### Implementation
- Extend existing `:daily` mode with configurable batch sizes
- Add 1-point overlap between batches to maintain segmentation accuracy
- Implement batch-aware progress reporting
#### Benefits
- **Memory reduction:** 90%+ reduction (from 400MB to ~40MB for 1M points)
- **Better UX:** Progress indication and cancellation support
- **Incremental processing:** Can resume interrupted operations
- **Lower DB pressure:** Smaller query result sets
#### Changes Required
```ruby
# Enhanced generator with configurable batching
Tracks::Generator.new(
user,
mode: :batched,
batch_size: 24.hours,
enable_overlap: true
).call
```
#### Edge Cases to Handle
- Tracks spanning batch boundaries (solved with overlap)
- Midnight-crossing tracks in daily mode
- Deduplication of overlapping segments
---
### Option 2: Spatial Indexing Optimization
**Complexity:** Medium | **Impact:** Medium | **Risk:** Low
#### Implementation
- Replace individual PostGIS calls with batch distance calculations
- Implement spatial clustering for nearby points before segmentation
- Use PostGIS window functions for distance calculations
#### Benefits
- **Faster distance calculations:** Batch operations vs individual queries
- **Reduced DB round-trips:** Single query for multiple distance calculations
- **Better index utilization:** Leverage existing spatial indexes
#### Changes Required
```sql
-- Batch distance calculation approach
WITH point_distances AS (
SELECT
id,
timestamp,
ST_Distance(
lonlat::geography,
LAG(lonlat::geography) OVER (ORDER BY timestamp)
) as distance_to_previous
FROM points
WHERE user_id = ?
ORDER BY timestamp
)
SELECT * FROM point_distances WHERE distance_to_previous > ?
```
---
### Option 3: Parallel Processing with Worker Pools
**Complexity:** High | **Impact:** High | **Risk:** Medium
#### Implementation
- Split large datasets into non-overlapping time ranges
- Process multiple batches in parallel using Sidekiq workers
- Implement coordination mechanism for dependent segments
#### Benefits
- **Faster processing:** Utilize multiple CPU cores
- **Scalable:** Performance scales with worker capacity
- **Background processing:** Non-blocking for users
#### Challenges
- **Complex coordination:** Managing dependencies between batches
- **Resource competition:** Multiple workers accessing same user's data
- **Error handling:** Partial failure scenarios
#### Architecture
```ruby
# Parallel processing coordinator
class Tracks::ParallelGenerator
def call
time_ranges = split_into_parallel_ranges
time_ranges.map do |range|
Tracks::BatchProcessorJob.perform_later(user_id, range)
end
end
end
```
---
### Option 4: Incremental Algorithm Enhancement
**Complexity:** Medium | **Impact:** Medium | **Risk:** Medium
#### Implementation
- Enhance existing `:incremental` mode with smarter buffering
- Implement sliding window approach for active track detection
- Add automatic track finalization based on time gaps
#### Benefits
- **Real-time processing:** Process points as they arrive
- **Lower memory footprint:** Only active segments in memory
- **Better for live tracking:** Immediate track updates
#### Current Limitations
- Existing incremental mode processes untracked points only
- No automatic track finalization
- Limited to single active track per user
---
### Option 5: Database-Level Optimization
**Complexity:** Low-Medium | **Impact:** Medium | **Risk:** Low
#### Implementation
- Add composite indexes for common query patterns
- Implement materialized views for expensive calculations
- Use database-level segmentation logic
#### Benefits
- **Faster queries:** Better index utilization
- **Reduced Ruby processing:** Move logic to database
- **Consistent performance:** Database optimizations benefit all modes
#### Proposed Indexes
```sql
-- Optimized for bulk processing
CREATE INDEX CONCURRENTLY idx_points_user_timestamp_track
ON points(user_id, timestamp) WHERE track_id IS NULL;
-- Optimized for incremental processing
CREATE INDEX CONCURRENTLY idx_points_untracked_timestamp
ON points(timestamp) WHERE track_id IS NULL;
```
---
## Recommended Implementation Strategy
### Phase 1: Quick Wins (Week 1-2)
1. **Implement Enhanced Time-Based Batching** (Option 1)
- Extend existing daily mode with overlap
- Add progress reporting
- Configurable batch sizes
### Phase 2: Database Optimization (Week 3)
2. **Add Database-Level Optimizations** (Option 5)
- Create optimized indexes
- Implement batch distance calculations
### Phase 3: Advanced Features (Week 4-6)
3. **Spatial Indexing Optimization** (Option 2)
- Replace individual distance calculations
- Implement spatial clustering
### Phase 4: Future Enhancements
4. **Parallel Processing** (Option 3) - Consider for v2
5. **Incremental Enhancement** (Option 4) - For real-time features
---
## Risk Assessment
### Low Risk
- **Time-based batching:** Builds on existing daily mode
- **Database indexes:** Standard optimization technique
- **Progress reporting:** UI enhancement only
### Medium Risk
- **Spatial optimization:** Requires careful testing of distance calculations
- **Incremental enhancement:** Changes to existing algorithm logic
### High Risk
- **Parallel processing:** Complex coordination, potential race conditions
- **Major algorithm changes:** Could introduce segmentation bugs
---
## Success Metrics
### Performance Targets
- **Memory usage:** < 100MB for datasets up to 1M points
- **Processing time:** < 10 minutes for 1M points
- **User experience:** Progress indication and cancellation
### Monitoring Points
- Database query performance
- Memory consumption during processing
- User-reported processing times
- Track generation accuracy (no regression)
---
## Next Steps
1. **Choose initial approach** based on urgency and resources
2. **Create feature branch** for selected optimization
3. **Implement comprehensive testing** including edge cases
4. **Monitor performance** in staging environment
5. **Gradual rollout** with feature flags