From c9861663c9ec1d7192aa8c324596ec940bb91ae1 Mon Sep 17 00:00:00 2001 From: Eugene Burmakin Date: Sun, 5 Oct 2025 22:19:48 +0200 Subject: [PATCH] Remove a file related to import optimization plan. --- docs/import_optimization_plan.md | 36 -------------------------------- 1 file changed, 36 deletions(-) delete mode 100644 docs/import_optimization_plan.md diff --git a/docs/import_optimization_plan.md b/docs/import_optimization_plan.md deleted file mode 100644 index 1cc1a7c5..00000000 --- a/docs/import_optimization_plan.md +++ /dev/null @@ -1,36 +0,0 @@ -# Import Optimisation Plan - -## Goals -- Prevent large imports from exhausting memory or hitting IO limits while reading export archives. -- Maintain correctness and ordering guarantees for all imported entities. -- Preserve observability and operability (clear errors and actionable logs). - -## Current Status -- ✅ Replaced `File.read + JSON.parse` with streaming via `Oj::Parser(:saj).load`, so `data.json` is consumed in 16KB chunks instead of loading the whole file. -- ✅ `Users::ImportData` now dispatches streamed payloads section-by-section, buffering `places` in-memory batches and spilling `visits`/`points` to NDJSON for replay after dependencies are ready. -- ✅ Points, places, and visits importers support incremental ingestion with a fixed batch size of 1,000 records and detailed progress logs. -- ✅ Added targeted specs for the SAJ handler and streaming flow; addressed IO retry messaging. -- ⚙️ Pending: archive-size guardrails, broader telemetry, and production rollout validation. - -## Remaining Pain Points -- No preflight check yet for extreme `data.json` sizes or malformed streams. -- Logging only (no metrics/dashboards) for monitoring batch throughput and failures. - -## Next Steps -1. **Rollout & Hardening** - - Add size/structure validation before streaming (fail fast with actionable error). - - Extend log coverage (import durations, batch counts) and document operator playbook. - - Capture memory/runtime snapshots during large staged imports. -2. **Post-Rollout Validation** - - Re-run the previously failing Sidekiq job (import 105) under the new pipeline. - - Monitor Sidekiq memory and throughput; tune batch size if needed. - - Gather feedback and decide on export format split or further streaming tweaks. - -## Validation Strategy -- Automated: streaming parser specs, importer batch tests, service integration spec (already in place; expand as new safeguards land). -- Manual: stage large imports, inspect Sidekiq logs/metrics once added, confirm notifications, stats, and files restored. - -## Open Questions -- What thresholds should trigger preflight failures or warnings (file size, record counts)? -- Do we need structured metrics beyond logs for long-running imports? -- Should we pursue export format splitting or incremental resume once streaming rollout is stable?