# Tracks Feature Performance Optimization Options ## Current State Analysis ### Performance Characteristics - **Time Complexity:** O(n log n) where n = number of GPS points - **Memory Usage:** Loads entire dataset into memory (~200-400 bytes per point) - **Processing Mode:** Single-threaded, sequential segmentation - **Database Load:** Multiple PostGIS distance calculations per point pair ### Performance Estimates (Bulk Mode) | Points | Processing Time | Memory Usage | Database Load | |--------|----------------|--------------|---------------| | 10K | 30-60 seconds | ~50 MB | Low | | 100K | 5-15 minutes | ~200 MB | Medium | | 1M+ | 30-90 minutes | 400+ MB | High | ### Current Bottlenecks 1. **Memory constraints** - Loading all points at once 2. **PostGIS distance calculations** - Sequential, not optimized 3. **Single-threaded processing** - No parallelization 4. **No progress indication** - Users can't track long-running operations --- ## Optimization Options ### Option 1: Enhanced Time-Based Batching **Complexity:** Low | **Impact:** High | **Risk:** Low #### Implementation - Extend existing `:daily` mode with configurable batch sizes - Add 1-point overlap between batches to maintain segmentation accuracy - Implement batch-aware progress reporting #### Benefits - **Memory reduction:** 90%+ reduction (from 400MB to ~40MB for 1M points) - **Better UX:** Progress indication and cancellation support - **Incremental processing:** Can resume interrupted operations - **Lower DB pressure:** Smaller query result sets #### Changes Required ```ruby # Enhanced generator with configurable batching Tracks::Generator.new( user, mode: :batched, batch_size: 24.hours, enable_overlap: true ).call ``` #### Edge Cases to Handle - Tracks spanning batch boundaries (solved with overlap) - Midnight-crossing tracks in daily mode - Deduplication of overlapping segments --- ### Option 2: Spatial Indexing Optimization **Complexity:** Medium | **Impact:** Medium | **Risk:** Low #### Implementation - Replace individual PostGIS calls with batch distance calculations - Implement spatial clustering for nearby points before segmentation - Use PostGIS window functions for distance calculations #### Benefits - **Faster distance calculations:** Batch operations vs individual queries - **Reduced DB round-trips:** Single query for multiple distance calculations - **Better index utilization:** Leverage existing spatial indexes #### Changes Required ```sql -- Batch distance calculation approach WITH point_distances AS ( SELECT id, timestamp, ST_Distance( lonlat::geography, LAG(lonlat::geography) OVER (ORDER BY timestamp) ) as distance_to_previous FROM points WHERE user_id = ? ORDER BY timestamp ) SELECT * FROM point_distances WHERE distance_to_previous > ? ``` --- ### Option 3: Parallel Processing with Worker Pools **Complexity:** High | **Impact:** High | **Risk:** Medium #### Implementation - Split large datasets into non-overlapping time ranges - Process multiple batches in parallel using Sidekiq workers - Implement coordination mechanism for dependent segments #### Benefits - **Faster processing:** Utilize multiple CPU cores - **Scalable:** Performance scales with worker capacity - **Background processing:** Non-blocking for users #### Challenges - **Complex coordination:** Managing dependencies between batches - **Resource competition:** Multiple workers accessing same user's data - **Error handling:** Partial failure scenarios #### Architecture ```ruby # Parallel processing coordinator class Tracks::ParallelGenerator def call time_ranges = split_into_parallel_ranges time_ranges.map do |range| Tracks::BatchProcessorJob.perform_later(user_id, range) end end end ``` --- ### Option 4: Incremental Algorithm Enhancement **Complexity:** Medium | **Impact:** Medium | **Risk:** Medium #### Implementation - Enhance existing `:incremental` mode with smarter buffering - Implement sliding window approach for active track detection - Add automatic track finalization based on time gaps #### Benefits - **Real-time processing:** Process points as they arrive - **Lower memory footprint:** Only active segments in memory - **Better for live tracking:** Immediate track updates #### Current Limitations - Existing incremental mode processes untracked points only - No automatic track finalization - Limited to single active track per user --- ### Option 5: Database-Level Optimization **Complexity:** Low-Medium | **Impact:** Medium | **Risk:** Low #### Implementation - Add composite indexes for common query patterns - Implement materialized views for expensive calculations - Use database-level segmentation logic #### Benefits - **Faster queries:** Better index utilization - **Reduced Ruby processing:** Move logic to database - **Consistent performance:** Database optimizations benefit all modes #### Proposed Indexes ```sql -- Optimized for bulk processing CREATE INDEX CONCURRENTLY idx_points_user_timestamp_track ON points(user_id, timestamp) WHERE track_id IS NULL; -- Optimized for incremental processing CREATE INDEX CONCURRENTLY idx_points_untracked_timestamp ON points(timestamp) WHERE track_id IS NULL; ``` --- ## Recommended Implementation Strategy ### Phase 1: Quick Wins (Week 1-2) 1. **Implement Enhanced Time-Based Batching** (Option 1) - Extend existing daily mode with overlap - Add progress reporting - Configurable batch sizes ### Phase 2: Database Optimization (Week 3) 2. **Add Database-Level Optimizations** (Option 5) - Create optimized indexes - Implement batch distance calculations ### Phase 3: Advanced Features (Week 4-6) 3. **Spatial Indexing Optimization** (Option 2) - Replace individual distance calculations - Implement spatial clustering ### Phase 4: Future Enhancements 4. **Parallel Processing** (Option 3) - Consider for v2 5. **Incremental Enhancement** (Option 4) - For real-time features --- ## Risk Assessment ### Low Risk - **Time-based batching:** Builds on existing daily mode - **Database indexes:** Standard optimization technique - **Progress reporting:** UI enhancement only ### Medium Risk - **Spatial optimization:** Requires careful testing of distance calculations - **Incremental enhancement:** Changes to existing algorithm logic ### High Risk - **Parallel processing:** Complex coordination, potential race conditions - **Major algorithm changes:** Could introduce segmentation bugs --- ## Success Metrics ### Performance Targets - **Memory usage:** < 100MB for datasets up to 1M points - **Processing time:** < 10 minutes for 1M points - **User experience:** Progress indication and cancellation ### Monitoring Points - Database query performance - Memory consumption during processing - User-reported processing times - Track generation accuracy (no regression) --- ## Next Steps 1. **Choose initial approach** based on urgency and resources 2. **Create feature branch** for selected optimization 3. **Implement comprehensive testing** including edge cases 4. **Monitor performance** in staging environment 5. **Gradual rollout** with feature flags