mirror of
https://github.com/Freika/dawarich.git
synced 2026-01-11 01:31:39 -05:00
235 lines
No EOL
7.1 KiB
Markdown
235 lines
No EOL
7.1 KiB
Markdown
# Tracks Feature Performance Optimization Options
|
|
|
|
## Current State Analysis
|
|
|
|
### Performance Characteristics
|
|
- **Time Complexity:** O(n log n) where n = number of GPS points
|
|
- **Memory Usage:** Loads entire dataset into memory (~200-400 bytes per point)
|
|
- **Processing Mode:** Single-threaded, sequential segmentation
|
|
- **Database Load:** Multiple PostGIS distance calculations per point pair
|
|
|
|
### Performance Estimates (Bulk Mode)
|
|
| Points | Processing Time | Memory Usage | Database Load |
|
|
|--------|----------------|--------------|---------------|
|
|
| 10K | 30-60 seconds | ~50 MB | Low |
|
|
| 100K | 5-15 minutes | ~200 MB | Medium |
|
|
| 1M+ | 30-90 minutes | 400+ MB | High |
|
|
|
|
### Current Bottlenecks
|
|
1. **Memory constraints** - Loading all points at once
|
|
2. **PostGIS distance calculations** - Sequential, not optimized
|
|
3. **Single-threaded processing** - No parallelization
|
|
4. **No progress indication** - Users can't track long-running operations
|
|
|
|
---
|
|
|
|
## Optimization Options
|
|
|
|
### Option 1: Enhanced Time-Based Batching
|
|
**Complexity:** Low | **Impact:** High | **Risk:** Low
|
|
|
|
#### Implementation
|
|
- Extend existing `:daily` mode with configurable batch sizes
|
|
- Add 1-point overlap between batches to maintain segmentation accuracy
|
|
- Implement batch-aware progress reporting
|
|
|
|
#### Benefits
|
|
- **Memory reduction:** 90%+ reduction (from 400MB to ~40MB for 1M points)
|
|
- **Better UX:** Progress indication and cancellation support
|
|
- **Incremental processing:** Can resume interrupted operations
|
|
- **Lower DB pressure:** Smaller query result sets
|
|
|
|
#### Changes Required
|
|
```ruby
|
|
# Enhanced generator with configurable batching
|
|
Tracks::Generator.new(
|
|
user,
|
|
mode: :batched,
|
|
batch_size: 24.hours,
|
|
enable_overlap: true
|
|
).call
|
|
```
|
|
|
|
#### Edge Cases to Handle
|
|
- Tracks spanning batch boundaries (solved with overlap)
|
|
- Midnight-crossing tracks in daily mode
|
|
- Deduplication of overlapping segments
|
|
|
|
---
|
|
|
|
### Option 2: Spatial Indexing Optimization
|
|
**Complexity:** Medium | **Impact:** Medium | **Risk:** Low
|
|
|
|
#### Implementation
|
|
- Replace individual PostGIS calls with batch distance calculations
|
|
- Implement spatial clustering for nearby points before segmentation
|
|
- Use PostGIS window functions for distance calculations
|
|
|
|
#### Benefits
|
|
- **Faster distance calculations:** Batch operations vs individual queries
|
|
- **Reduced DB round-trips:** Single query for multiple distance calculations
|
|
- **Better index utilization:** Leverage existing spatial indexes
|
|
|
|
#### Changes Required
|
|
```sql
|
|
-- Batch distance calculation approach
|
|
WITH point_distances AS (
|
|
SELECT
|
|
id,
|
|
timestamp,
|
|
ST_Distance(
|
|
lonlat::geography,
|
|
LAG(lonlat::geography) OVER (ORDER BY timestamp)
|
|
) as distance_to_previous
|
|
FROM points
|
|
WHERE user_id = ?
|
|
ORDER BY timestamp
|
|
)
|
|
SELECT * FROM point_distances WHERE distance_to_previous > ?
|
|
```
|
|
|
|
---
|
|
|
|
### Option 3: Parallel Processing with Worker Pools
|
|
**Complexity:** High | **Impact:** High | **Risk:** Medium
|
|
|
|
#### Implementation
|
|
- Split large datasets into non-overlapping time ranges
|
|
- Process multiple batches in parallel using Sidekiq workers
|
|
- Implement coordination mechanism for dependent segments
|
|
|
|
#### Benefits
|
|
- **Faster processing:** Utilize multiple CPU cores
|
|
- **Scalable:** Performance scales with worker capacity
|
|
- **Background processing:** Non-blocking for users
|
|
|
|
#### Challenges
|
|
- **Complex coordination:** Managing dependencies between batches
|
|
- **Resource competition:** Multiple workers accessing same user's data
|
|
- **Error handling:** Partial failure scenarios
|
|
|
|
#### Architecture
|
|
```ruby
|
|
# Parallel processing coordinator
|
|
class Tracks::ParallelGenerator
|
|
def call
|
|
time_ranges = split_into_parallel_ranges
|
|
|
|
time_ranges.map do |range|
|
|
Tracks::BatchProcessorJob.perform_later(user_id, range)
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
---
|
|
|
|
### Option 4: Incremental Algorithm Enhancement
|
|
**Complexity:** Medium | **Impact:** Medium | **Risk:** Medium
|
|
|
|
#### Implementation
|
|
- Enhance existing `:incremental` mode with smarter buffering
|
|
- Implement sliding window approach for active track detection
|
|
- Add automatic track finalization based on time gaps
|
|
|
|
#### Benefits
|
|
- **Real-time processing:** Process points as they arrive
|
|
- **Lower memory footprint:** Only active segments in memory
|
|
- **Better for live tracking:** Immediate track updates
|
|
|
|
#### Current Limitations
|
|
- Existing incremental mode processes untracked points only
|
|
- No automatic track finalization
|
|
- Limited to single active track per user
|
|
|
|
---
|
|
|
|
### Option 5: Database-Level Optimization
|
|
**Complexity:** Low-Medium | **Impact:** Medium | **Risk:** Low
|
|
|
|
#### Implementation
|
|
- Add composite indexes for common query patterns
|
|
- Implement materialized views for expensive calculations
|
|
- Use database-level segmentation logic
|
|
|
|
#### Benefits
|
|
- **Faster queries:** Better index utilization
|
|
- **Reduced Ruby processing:** Move logic to database
|
|
- **Consistent performance:** Database optimizations benefit all modes
|
|
|
|
#### Proposed Indexes
|
|
```sql
|
|
-- Optimized for bulk processing
|
|
CREATE INDEX CONCURRENTLY idx_points_user_timestamp_track
|
|
ON points(user_id, timestamp) WHERE track_id IS NULL;
|
|
|
|
-- Optimized for incremental processing
|
|
CREATE INDEX CONCURRENTLY idx_points_untracked_timestamp
|
|
ON points(timestamp) WHERE track_id IS NULL;
|
|
```
|
|
|
|
---
|
|
|
|
## Recommended Implementation Strategy
|
|
|
|
### Phase 1: Quick Wins (Week 1-2)
|
|
1. **Implement Enhanced Time-Based Batching** (Option 1)
|
|
- Extend existing daily mode with overlap
|
|
- Add progress reporting
|
|
- Configurable batch sizes
|
|
|
|
### Phase 2: Database Optimization (Week 3)
|
|
2. **Add Database-Level Optimizations** (Option 5)
|
|
- Create optimized indexes
|
|
- Implement batch distance calculations
|
|
|
|
### Phase 3: Advanced Features (Week 4-6)
|
|
3. **Spatial Indexing Optimization** (Option 2)
|
|
- Replace individual distance calculations
|
|
- Implement spatial clustering
|
|
|
|
### Phase 4: Future Enhancements
|
|
4. **Parallel Processing** (Option 3) - Consider for v2
|
|
5. **Incremental Enhancement** (Option 4) - For real-time features
|
|
|
|
---
|
|
|
|
## Risk Assessment
|
|
|
|
### Low Risk
|
|
- **Time-based batching:** Builds on existing daily mode
|
|
- **Database indexes:** Standard optimization technique
|
|
- **Progress reporting:** UI enhancement only
|
|
|
|
### Medium Risk
|
|
- **Spatial optimization:** Requires careful testing of distance calculations
|
|
- **Incremental enhancement:** Changes to existing algorithm logic
|
|
|
|
### High Risk
|
|
- **Parallel processing:** Complex coordination, potential race conditions
|
|
- **Major algorithm changes:** Could introduce segmentation bugs
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
### Performance Targets
|
|
- **Memory usage:** < 100MB for datasets up to 1M points
|
|
- **Processing time:** < 10 minutes for 1M points
|
|
- **User experience:** Progress indication and cancellation
|
|
|
|
### Monitoring Points
|
|
- Database query performance
|
|
- Memory consumption during processing
|
|
- User-reported processing times
|
|
- Track generation accuracy (no regression)
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Choose initial approach** based on urgency and resources
|
|
2. **Create feature branch** for selected optimization
|
|
3. **Implement comprehensive testing** including edge cases
|
|
4. **Monitor performance** in staging environment
|
|
5. **Gradual rollout** with feature flags |