dawarich/app/services/points/raw_data/archiver.rb
Evgenii Burmakin c8242ce902
0.36.3 (#2013)
* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

* Pull only necessary data for map v2 points

* Feature/raw data archive (#2009)

* 0.36.2 (#2007)

* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Remove esbuild scripts from package.json

* Remove sideEffects field from package.json

* Raw data archivation

* Add tests

* Fix tests

* Fix tests

* Update ExceptionReporter

* Add schedule to run raw data archival job monthly

* Change file structure for raw data archival feature

* Update changelog and version for raw data archival feature

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Set raw_data to an empty hash instead of nil when archiving

* Fix storage configuration and file extraction

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation (#2018)

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation

* Remove raw data from visited cities api endpoint

* Use user timezone to show dates on maps (#2020)

* Fix/pre epoch time (#2019)

* Use user timezone to show dates on maps

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Fix tests failing due to new index on stats table

* Fix failing specs

* Update redis client configuration to support unix socket connection

* Update changelog

* Fix kml kmz import issues (#2023)

* Fix kml kmz import issues

* Refactor KML importer to improve readability and maintainability

* Implement moving points in map v2 and fix route rendering logic to ma… (#2027)

* Implement moving points in map v2 and fix route rendering logic to match map v1.

* Fix route spec

* fix(maplibre): update date format to ISO 8601 (#2029)

* Add verification step to raw data archival process (#2028)

* Add verification step to raw data archival process

* Add actual verification of raw data archives after creation, and only clear raw_data for verified archives.

* Fix failing specs

* Eliminate zip-bomb risk

* Fix potential memory leak in js

* Return .keep files

* Use Toast instead of alert for notifications

* Add help section to navbar dropdown

* Update changelog

* Remove raw_data_archival_job

* Ensure file is being closed properly after reading in Archivable concern

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>
2025-12-14 12:05:59 +01:00

184 lines
6 KiB
Ruby

# frozen_string_literal: true
module Points
module RawData
class Archiver
SAFE_ARCHIVE_LAG = 2.months
def initialize
@stats = { processed: 0, archived: 0, failed: 0 }
end
def call
unless archival_enabled?
Rails.logger.info('Raw data archival disabled (ARCHIVE_RAW_DATA != "true")')
return @stats
end
Rails.logger.info('Starting points raw_data archival...')
archivable_months.each do |month_data|
process_month(month_data)
end
Rails.logger.info("Archival complete: #{@stats}")
@stats
end
def archive_specific_month(user_id, year, month)
month_data = {
'user_id' => user_id,
'year' => year,
'month' => month
}
process_month(month_data)
end
private
def archival_enabled?
ENV['ARCHIVE_RAW_DATA'] == 'true'
end
def archivable_months
# Only months 2+ months old with unarchived points
safe_cutoff = Date.current.beginning_of_month - SAFE_ARCHIVE_LAG
# Use raw SQL to avoid GROUP BY issues with ActiveRecord
# Use AT TIME ZONE 'UTC' to ensure consistent timezone handling
sql = <<-SQL.squish
SELECT user_id,
EXTRACT(YEAR FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))::int as year,
EXTRACT(MONTH FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))::int as month,
COUNT(*) as unarchived_count
FROM points
WHERE raw_data_archived = false
AND raw_data IS NOT NULL
AND raw_data != '{}'
AND to_timestamp(timestamp) < ?
GROUP BY user_id,
EXTRACT(YEAR FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC')),
EXTRACT(MONTH FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))
SQL
ActiveRecord::Base.connection.exec_query(
ActiveRecord::Base.sanitize_sql_array([sql, safe_cutoff])
)
end
def process_month(month_data)
user_id = month_data['user_id']
year = month_data['year']
month = month_data['month']
lock_key = "archive_points:#{user_id}:#{year}:#{month}"
# Advisory lock prevents duplicate processing
# Returns false if lock couldn't be acquired (already locked)
lock_acquired = ActiveRecord::Base.with_advisory_lock(lock_key, timeout_seconds: 0) do
archive_month(user_id, year, month)
@stats[:processed] += 1
true
end
Rails.logger.info("Skipping #{lock_key} - already locked") unless lock_acquired
rescue StandardError => e
ExceptionReporter.call(e, "Failed to archive points for user #{user_id}, #{year}-#{month}")
@stats[:failed] += 1
end
def archive_month(user_id, year, month)
points = find_archivable_points(user_id, year, month)
return if points.empty?
point_ids = points.pluck(:id)
log_archival_start(user_id, year, month, point_ids.count)
archive = create_archive_chunk(user_id, year, month, points, point_ids)
mark_points_as_archived(point_ids, archive.id)
update_stats(point_ids.count)
log_archival_success(archive)
end
def find_archivable_points(user_id, year, month)
timestamp_range = month_timestamp_range(year, month)
Point.where(user_id: user_id, raw_data_archived: false)
.where(timestamp: timestamp_range)
.where.not(raw_data: nil)
.where.not(raw_data: '{}')
end
def month_timestamp_range(year, month)
start_of_month = Time.utc(year, month, 1).to_i
end_of_month = (Time.utc(year, month, 1) + 1.month).to_i
start_of_month...end_of_month
end
def mark_points_as_archived(point_ids, archive_id)
Point.transaction do
Point.where(id: point_ids).update_all(
raw_data_archived: true,
raw_data_archive_id: archive_id
)
end
end
def update_stats(archived_count)
@stats[:archived] += archived_count
end
def log_archival_start(user_id, year, month, count)
Rails.logger.info("Archiving #{count} points for user #{user_id}, #{year}-#{format('%02d', month)}")
end
def log_archival_success(archive)
Rails.logger.info("✓ Archived chunk #{archive.chunk_number} (#{archive.size_mb} MB)")
end
def create_archive_chunk(user_id, year, month, points, point_ids)
# Determine chunk number (append-only)
chunk_number = Points::RawDataArchive
.where(user_id: user_id, year: year, month: month)
.maximum(:chunk_number).to_i + 1
# Compress points data
compressed_data = Points::RawData::ChunkCompressor.new(points).compress
# Create archive record
archive = Points::RawDataArchive.create!(
user_id: user_id,
year: year,
month: month,
chunk_number: chunk_number,
point_count: point_ids.count,
point_ids_checksum: calculate_checksum(point_ids),
archived_at: Time.current,
metadata: {
format_version: 1,
compression: 'gzip',
archived_by: 'Points::RawData::Archiver'
}
)
# Attach compressed file via ActiveStorage
# Uses directory structure: raw_data_archives/:user_id/:year/:month/:chunk.jsonl.gz
# The key parameter controls the actual storage path
archive.file.attach(
io: StringIO.new(compressed_data),
filename: "#{format('%03d', chunk_number)}.jsonl.gz",
content_type: 'application/gzip',
key: "raw_data_archives/#{user_id}/#{year}/#{format('%02d', month)}/#{format('%03d', chunk_number)}.jsonl.gz"
)
archive
end
def calculate_checksum(point_ids)
Digest::SHA256.hexdigest(point_ids.sort.join(','))
end
end
end
end