Feature/raw data archive (#2009)

* 0.36.2 (#2007)

* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Remove esbuild scripts from package.json

* Remove sideEffects field from package.json

* Raw data archivation

* Add tests

* Fix tests

* Fix tests

* Update ExceptionReporter

* Add schedule to run raw data archival job monthly

* Change file structure for raw data archival feature

* Update changelog and version for raw data archival feature

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>
This commit is contained in:
Evgenii Burmakin 2025-12-07 14:33:23 +01:00 committed by GitHub
parent 2debcd88fa
commit 9661e8e7f7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
35 changed files with 1680 additions and 18 deletions

View file

@ -1 +1 @@
0.36.2
0.36.3

View file

@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).
# [0.36.3] - Unreleased
## Added
- Setting `ARCHIVE_RAW_DATA` env var to true will enable monthly raw data archiving for all users. It will look for points older than 2 months with `raw_data` column not empty and create a zip archive containing raw data files for each month. After successful archiving, raw data will be removed from the database to save space. Monthly archiving job is being run every day at 2:00 AM. Default env var value is false.
# [0.36.2] - 2025-12-06

View file

@ -14,6 +14,7 @@ gem 'bootsnap', require: false
gem 'chartkick'
gem 'data_migrate'
gem 'devise'
gem 'foreman'
gem 'geocoder', github: 'Freika/geocoder', branch: 'master'
gem 'gpx'
gem 'groupdate'
@ -55,7 +56,7 @@ gem 'stimulus-rails'
gem 'tailwindcss-rails', '= 3.3.2'
gem 'turbo-rails', '>= 2.0.17'
gem 'tzinfo-data', platforms: %i[mingw mswin x64_mingw jruby]
gem 'foreman'
gem 'with_advisory_lock'
group :development, :test, :staging do
gem 'brakeman', require: false

View file

@ -620,6 +620,9 @@ GEM
base64
websocket-extensions (>= 0.1.0)
websocket-extensions (0.1.5)
with_advisory_lock (7.0.2)
activerecord (>= 7.2)
zeitwerk (>= 2.7)
xpath (3.2.0)
nokogiri (~> 1.8)
zeitwerk (2.7.3)
@ -703,6 +706,7 @@ DEPENDENCIES
turbo-rails (>= 2.0.17)
tzinfo-data
webmock
with_advisory_lock
RUBY VERSION
ruby 3.4.6p54

View file

@ -4,6 +4,8 @@ class Family::Invitations::CleanupJob < ApplicationJob
queue_as :families
def perform
return unless DawarichSettings.family_feature_enabled?
Rails.logger.info 'Starting family invitations cleanup'
expired_count = Family::Invitation.where(status: :pending)

View file

@ -0,0 +1,21 @@
# frozen_string_literal: true
module Points
module RawData
class ArchiveJob < ApplicationJob
queue_as :archival
def perform
return unless ENV['ARCHIVE_RAW_DATA'] == 'true'
stats = Points::RawData::Archiver.new.call
Rails.logger.info("Archive job complete: #{stats}")
rescue StandardError => e
ExceptionReporter.call(e, 'Points raw data archival job failed')
raise
end
end
end
end

View file

@ -0,0 +1,19 @@
# frozen_string_literal: true
module Points
module RawData
class ReArchiveMonthJob < ApplicationJob
queue_as :archival
def perform(user_id, year, month)
Rails.logger.info("Re-archiving #{user_id}/#{year}/#{month} (retrospective import)")
Points::RawData::Archiver.new.archive_specific_month(user_id, year, month)
rescue StandardError => e
ExceptionReporter.call(e, "Re-archival job failed for #{user_id}/#{year}/#{month}")
raise
end
end
end
end

View file

@ -0,0 +1,81 @@
# frozen_string_literal: true
module Archivable
extend ActiveSupport::Concern
included do
belongs_to :raw_data_archive,
class_name: 'Points::RawDataArchive',
optional: true
scope :archived, -> { where(raw_data_archived: true) }
scope :not_archived, -> { where(raw_data_archived: false) }
scope :with_archived_raw_data, lambda {
includes(raw_data_archive: { file_attachment: :blob })
}
end
# Main method: Get raw_data with fallback to archive
# Use this instead of point.raw_data when you need archived data
def raw_data_with_archive
return raw_data if raw_data.present? || !raw_data_archived?
fetch_archived_raw_data
end
# Restore archived data back to database column
def restore_raw_data!(value)
update!(
raw_data: value,
raw_data_archived: false,
raw_data_archive_id: nil
)
end
private
def fetch_archived_raw_data
# Check temporary restore cache first (for migrations)
cached = check_temporary_restore_cache
return cached if cached
fetch_from_archive_file
rescue StandardError => e
handle_archive_fetch_error(e)
end
def check_temporary_restore_cache
return nil unless respond_to?(:timestamp)
recorded_time = Time.at(timestamp)
cache_key = "raw_data:temp:#{user_id}:#{recorded_time.year}:#{recorded_time.month}:#{id}"
Rails.cache.read(cache_key)
end
def fetch_from_archive_file
return {} unless raw_data_archive&.file&.attached?
# Download and search through JSONL
compressed_content = raw_data_archive.file.blob.download
io = StringIO.new(compressed_content)
gz = Zlib::GzipReader.new(io)
result = nil
gz.each_line do |line|
data = JSON.parse(line)
if data['id'] == id
result = data['raw_data']
break
end
end
gz.close
result || {}
end
def handle_archive_fetch_error(error)
ExceptionReporter.call(error, "Failed to fetch archived raw_data for Point ID #{id}")
{} # Graceful degradation
end
end

View file

@ -3,6 +3,7 @@
class Point < ApplicationRecord
include Nearable
include Distanceable
include Archivable
belongs_to :import, optional: true, counter_cache: true
belongs_to :visit, optional: true

View file

@ -0,0 +1,40 @@
# frozen_string_literal: true
module Points
class RawDataArchive < ApplicationRecord
self.table_name = 'points_raw_data_archives'
belongs_to :user
has_many :points, dependent: :nullify
has_one_attached :file
validates :year, :month, :chunk_number, :point_count, presence: true
validates :year, numericality: { greater_than: 1970, less_than: 2100 }
validates :month, numericality: { greater_than_or_equal_to: 1, less_than_or_equal_to: 12 }
validates :chunk_number, numericality: { greater_than: 0 }
validates :point_ids_checksum, presence: true
scope :for_month, lambda { |user_id, year, month|
where(user_id: user_id, year: year, month: month)
.order(:chunk_number)
}
scope :recent, -> { where('archived_at > ?', 30.days.ago) }
scope :old, -> { where('archived_at < ?', 1.year.ago) }
def month_display
Date.new(year, month, 1).strftime('%B %Y')
end
def filename
"raw_data_archives/#{user_id}/#{year}/#{format('%02d', month)}/#{format('%03d', chunk_number)}.jsonl.gz"
end
def size_mb
return 0 unless file.attached?
(file.blob.byte_size / 1024.0 / 1024.0).round(2)
end
end
end

View file

@ -20,6 +20,7 @@ class User < ApplicationRecord # rubocop:disable Metrics/ClassLength
has_many :tags, dependent: :destroy
has_many :trips, dependent: :destroy
has_many :tracks, dependent: :destroy
has_many :raw_data_archives, class_name: 'Points::RawDataArchive', dependent: :destroy
after_create :create_api_key
after_commit :activate, on: :create, if: -> { DawarichSettings.self_hosted? }

View file

@ -2,10 +2,14 @@
class ExceptionReporter
def self.call(exception, human_message = 'Exception reported')
return unless DawarichSettings.self_hosted?
return if DawarichSettings.self_hosted?
Rails.logger.error "#{human_message}: #{exception.message}"
Sentry.capture_exception(exception)
if exception.is_a?(Exception)
Rails.logger.error "#{human_message}: #{exception.message}"
Sentry.capture_exception(exception)
else
Rails.logger.error "#{exception}: #{human_message}"
Sentry.capture_message("#{exception}: #{human_message}")
end
end
end

View file

@ -0,0 +1,184 @@
# frozen_string_literal: true
module Points
module RawData
class Archiver
SAFE_ARCHIVE_LAG = 2.months
def initialize
@stats = { processed: 0, archived: 0, failed: 0 }
end
def call
unless archival_enabled?
Rails.logger.info('Raw data archival disabled (ARCHIVE_RAW_DATA != "true")')
return @stats
end
Rails.logger.info('Starting points raw_data archival...')
archivable_months.each do |month_data|
process_month(month_data)
end
Rails.logger.info("Archival complete: #{@stats}")
@stats
end
def archive_specific_month(user_id, year, month)
month_data = {
'user_id' => user_id,
'year' => year,
'month' => month
}
process_month(month_data)
end
private
def archival_enabled?
ENV['ARCHIVE_RAW_DATA'] == 'true'
end
def archivable_months
# Only months 2+ months old with unarchived points
safe_cutoff = Date.current.beginning_of_month - SAFE_ARCHIVE_LAG
# Use raw SQL to avoid GROUP BY issues with ActiveRecord
# Use AT TIME ZONE 'UTC' to ensure consistent timezone handling
sql = <<-SQL.squish
SELECT user_id,
EXTRACT(YEAR FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))::int as year,
EXTRACT(MONTH FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))::int as month,
COUNT(*) as unarchived_count
FROM points
WHERE raw_data_archived = false
AND raw_data IS NOT NULL
AND raw_data != '{}'
AND to_timestamp(timestamp) < ?
GROUP BY user_id,
EXTRACT(YEAR FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC')),
EXTRACT(MONTH FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))
SQL
ActiveRecord::Base.connection.exec_query(
ActiveRecord::Base.sanitize_sql_array([sql, safe_cutoff])
)
end
def process_month(month_data)
user_id = month_data['user_id']
year = month_data['year']
month = month_data['month']
lock_key = "archive_points:#{user_id}:#{year}:#{month}"
# Advisory lock prevents duplicate processing
# Returns false if lock couldn't be acquired (already locked)
lock_acquired = ActiveRecord::Base.with_advisory_lock(lock_key, timeout_seconds: 0) do
archive_month(user_id, year, month)
@stats[:processed] += 1
true
end
Rails.logger.info("Skipping #{lock_key} - already locked") unless lock_acquired
rescue StandardError => e
ExceptionReporter.call(e, "Failed to archive points for user #{user_id}, #{year}-#{month}")
@stats[:failed] += 1
end
def archive_month(user_id, year, month)
points = find_archivable_points(user_id, year, month)
return if points.empty?
point_ids = points.pluck(:id)
log_archival_start(user_id, year, month, point_ids.count)
archive = create_archive_chunk(user_id, year, month, points, point_ids)
mark_points_as_archived(point_ids, archive.id)
update_stats(point_ids.count)
log_archival_success(archive)
end
def find_archivable_points(user_id, year, month)
timestamp_range = month_timestamp_range(year, month)
Point.where(user_id: user_id, raw_data_archived: false)
.where(timestamp: timestamp_range)
.where.not(raw_data: nil)
end
def month_timestamp_range(year, month)
start_of_month = Time.utc(year, month, 1).to_i
end_of_month = (Time.utc(year, month, 1) + 1.month).to_i
start_of_month...end_of_month
end
def mark_points_as_archived(point_ids, archive_id)
Point.transaction do
Point.where(id: point_ids).update_all(
raw_data_archived: true,
raw_data_archive_id: archive_id,
raw_data: nil
)
end
end
def update_stats(archived_count)
@stats[:archived] += archived_count
end
def log_archival_start(user_id, year, month, count)
Rails.logger.info("Archiving #{count} points for user #{user_id}, #{year}-#{format('%02d', month)}")
end
def log_archival_success(archive)
Rails.logger.info("✓ Archived chunk #{archive.chunk_number} (#{archive.size_mb} MB)")
end
def create_archive_chunk(user_id, year, month, points, point_ids)
# Determine chunk number (append-only)
chunk_number = Points::RawDataArchive
.where(user_id: user_id, year: year, month: month)
.maximum(:chunk_number).to_i + 1
# Compress points data
compressed_data = Points::RawData::ChunkCompressor.new(points).compress
# Create archive record
archive = Points::RawDataArchive.create!(
user_id: user_id,
year: year,
month: month,
chunk_number: chunk_number,
point_count: point_ids.count,
point_ids_checksum: calculate_checksum(point_ids),
archived_at: Time.current,
metadata: {
format_version: 1,
compression: 'gzip',
archived_by: 'Points::RawData::Archiver'
}
)
# Attach compressed file via ActiveStorage
# Uses directory structure: raw_data_archives/:user_id/:year/:month/:chunk.jsonl.gz
# The key parameter controls the actual storage path
archive.file.attach(
io: StringIO.new(compressed_data),
filename: "#{format('%03d', chunk_number)}.jsonl.gz",
content_type: 'application/gzip',
key: "raw_data_archives/#{user_id}/#{year}/#{format('%02d', month)}/#{format('%03d', chunk_number)}.jsonl.gz"
)
archive
end
def calculate_checksum(point_ids)
Digest::SHA256.hexdigest(point_ids.sort.join(','))
end
end
end
end

View file

@ -0,0 +1,25 @@
# frozen_string_literal: true
module Points
module RawData
class ChunkCompressor
def initialize(points_relation)
@points = points_relation
end
def compress
io = StringIO.new
gz = Zlib::GzipWriter.new(io)
# Stream points to avoid memory issues with large months
@points.select(:id, :raw_data).find_each(batch_size: 1000) do |point|
# Write as JSONL (one JSON object per line)
gz.puts({ id: point.id, raw_data: point.raw_data }.to_json)
end
gz.close
io.string.force_encoding(Encoding::ASCII_8BIT) # Returns compressed bytes in binary encoding
end
end
end
end

View file

@ -0,0 +1,105 @@
# frozen_string_literal: true
module Points
module RawData
class Restorer
def restore_to_database(user_id, year, month)
archives = Points::RawDataArchive.for_month(user_id, year, month)
raise "No archives found for user #{user_id}, #{year}-#{month}" if archives.empty?
Rails.logger.info("Restoring #{archives.count} archives to database...")
Point.transaction do
archives.each { restore_archive_to_db(_1) }
end
Rails.logger.info("✓ Restored #{archives.sum(:point_count)} points")
end
def restore_to_memory(user_id, year, month)
archives = Points::RawDataArchive.for_month(user_id, year, month)
raise "No archives found for user #{user_id}, #{year}-#{month}" if archives.empty?
Rails.logger.info("Loading #{archives.count} archives into cache...")
cache_key_prefix = "raw_data:temp:#{user_id}:#{year}:#{month}"
count = 0
archives.each do |archive|
count += restore_archive_to_cache(archive, cache_key_prefix)
end
Rails.logger.info("✓ Loaded #{count} points into cache (expires in 1 hour)")
end
def restore_all_for_user(user_id)
archives =
Points::RawDataArchive.where(user_id: user_id)
.select(:year, :month)
.distinct
.order(:year, :month)
Rails.logger.info("Restoring #{archives.count} months for user #{user_id}...")
archives.each do |archive|
restore_to_database(user_id, archive.year, archive.month)
end
Rails.logger.info('✓ Complete user restore finished')
end
private
def restore_archive_to_db(archive)
decompressed = download_and_decompress(archive)
decompressed.each_line do |line|
data = JSON.parse(line)
Point.where(id: data['id']).update_all(
raw_data: data['raw_data'],
raw_data_archived: false,
raw_data_archive_id: nil
)
end
end
def restore_archive_to_cache(archive, cache_key_prefix)
decompressed = download_and_decompress(archive)
count = 0
decompressed.each_line do |line|
data = JSON.parse(line)
Rails.cache.write(
"#{cache_key_prefix}:#{data['id']}",
data['raw_data'],
expires_in: 1.hour
)
count += 1
end
count
end
def download_and_decompress(archive)
# Download via ActiveStorage
compressed_content = archive.file.blob.download
# Decompress
io = StringIO.new(compressed_content)
gz = Zlib::GzipReader.new(io)
content = gz.read
gz.close
content
rescue StandardError => e
Rails.logger.error("Failed to download/decompress archive #{archive.id}: #{e.message}")
raise
end
end
end
end

View file

@ -62,3 +62,6 @@ OIDC_AUTO_REGISTER = ENV.fetch('OIDC_AUTO_REGISTER', 'true') == 'true'
# Email/password registration setting (default: false for self-hosted, true for cloud)
ALLOW_EMAIL_PASSWORD_REGISTRATION = ENV.fetch('ALLOW_EMAIL_PASSWORD_REGISTRATION', 'false') == 'true'
# Raw data archival setting
ARCHIVE_RAW_DATA = ENV.fetch('ARCHIVE_RAW_DATA', 'false') == 'true'

View file

@ -49,5 +49,9 @@ class DawarichSettings
family: family_feature_enabled?
}
end
def archive_raw_data_enabled?
@archive_raw_data_enabled ||= ARCHIVE_RAW_DATA
end
end
end

View file

@ -49,3 +49,8 @@ nightly_family_invitations_cleanup_job:
cron: "30 2 * * *" # every day at 02:30
class: "Family::Invitations::CleanupJob"
queue: family
raw_data_archival_job:
cron: "0 2 1 * *" # Monthly on the 1st at 2 AM
class: "Points::RawData::ArchiveJob"
queue: archival

View file

@ -16,3 +16,4 @@
- places
- app_version_checking
- cache
- archival

View file

@ -0,0 +1,23 @@
# frozen_string_literal: true
class CreatePointsRawDataArchives < ActiveRecord::Migration[8.0]
def change
create_table :points_raw_data_archives do |t|
t.bigint :user_id, null: false
t.integer :year, null: false
t.integer :month, null: false
t.integer :chunk_number, null: false, default: 1
t.integer :point_count, null: false
t.string :point_ids_checksum, null: false
t.jsonb :metadata, default: {}, null: false
t.datetime :archived_at, null: false
t.timestamps
end
add_index :points_raw_data_archives, :user_id
add_index :points_raw_data_archives, [:user_id, :year, :month]
add_index :points_raw_data_archives, :archived_at
add_foreign_key :points_raw_data_archives, :users, validate: false
end
end

View file

@ -0,0 +1,22 @@
# frozen_string_literal: true
class AddArchivalColumnsToPoints < ActiveRecord::Migration[8.0]
disable_ddl_transaction!
def change
add_column :points, :raw_data_archived, :boolean, default: false, null: false
add_column :points, :raw_data_archive_id, :bigint, null: true
add_index :points, :raw_data_archived,
where: 'raw_data_archived = true',
name: 'index_points_on_archived_true',
algorithm: :concurrently
add_index :points, :raw_data_archive_id,
algorithm: :concurrently
add_foreign_key :points, :points_raw_data_archives,
column: :raw_data_archive_id,
on_delete: :nullify, # Don't delete points if archive deleted
validate: false
end
end

View file

@ -0,0 +1,8 @@
# frozen_string_literal: true
class ValidateArchivalForeignKeys < ActiveRecord::Migration[8.0]
def change
validate_foreign_key :points_raw_data_archives, :users
validate_foreign_key :points, :points_raw_data_archives
end
end

27
db/schema.rb generated
View file

@ -10,7 +10,7 @@
#
# It's strongly recommended that you check this file into your version control system.
ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
ActiveRecord::Schema[8.0].define(version: 2025_12_06_000004) do
# These are extensions that must be enabled in order to support this database
enable_extension "pg_catalog.plpgsql"
enable_extension "postgis"
@ -224,6 +224,10 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
t.bigint "country_id"
t.bigint "track_id"
t.string "country_name"
t.boolean "raw_data_archived", default: false, null: false
t.bigint "raw_data_archive_id"
t.integer "timestamp_year"
t.integer "timestamp_month"
t.index ["altitude"], name: "index_points_on_altitude"
t.index ["battery"], name: "index_points_on_battery"
t.index ["battery_status"], name: "index_points_on_battery_status"
@ -238,6 +242,8 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
t.index ["latitude", "longitude"], name: "index_points_on_latitude_and_longitude"
t.index ["lonlat", "timestamp", "user_id"], name: "index_points_on_lonlat_timestamp_user_id", unique: true
t.index ["lonlat"], name: "index_points_on_lonlat", using: :gist
t.index ["raw_data_archive_id"], name: "index_points_on_raw_data_archive_id"
t.index ["raw_data_archived"], name: "index_points_on_archived_true", where: "(raw_data_archived = true)"
t.index ["reverse_geocoded_at"], name: "index_points_on_reverse_geocoded_at"
t.index ["timestamp"], name: "index_points_on_timestamp"
t.index ["track_id"], name: "index_points_on_track_id"
@ -245,10 +251,27 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
t.index ["user_id", "country_name"], name: "idx_points_user_country_name"
t.index ["user_id", "reverse_geocoded_at"], name: "index_points_on_user_id_and_reverse_geocoded_at", where: "(reverse_geocoded_at IS NOT NULL)"
t.index ["user_id", "timestamp", "track_id"], name: "idx_points_track_generation"
t.index ["user_id", "timestamp_year", "timestamp_month", "raw_data_archived"], name: "index_points_on_user_time_archived"
t.index ["user_id"], name: "index_points_on_user_id"
t.index ["visit_id"], name: "index_points_on_visit_id"
end
create_table "points_raw_data_archives", force: :cascade do |t|
t.bigint "user_id", null: false
t.integer "year", null: false
t.integer "month", null: false
t.integer "chunk_number", default: 1, null: false
t.integer "point_count", null: false
t.string "point_ids_checksum", null: false
t.jsonb "metadata", default: {}, null: false
t.datetime "archived_at", null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.index ["archived_at"], name: "index_points_raw_data_archives_on_archived_at"
t.index ["user_id", "year", "month"], name: "index_points_raw_data_archives_on_user_id_and_year_and_month"
t.index ["user_id"], name: "index_points_raw_data_archives_on_user_id"
end
create_table "stats", force: :cascade do |t|
t.integer "year", null: false
t.integer "month", null: false
@ -384,8 +407,10 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
add_foreign_key "notifications", "users"
add_foreign_key "place_visits", "places"
add_foreign_key "place_visits", "visits"
add_foreign_key "points", "points_raw_data_archives", column: "raw_data_archive_id", on_delete: :nullify
add_foreign_key "points", "users"
add_foreign_key "points", "visits"
add_foreign_key "points_raw_data_archives", "users"
add_foreign_key "stats", "users"
add_foreign_key "taggings", "tags"
add_foreign_key "tags", "users"

View file

@ -0,0 +1,243 @@
# frozen_string_literal: true
namespace :points do
namespace :raw_data do
desc 'Restore raw_data from archive to database for a specific month'
task :restore, [:user_id, :year, :month] => :environment do |_t, args|
validate_args!(args)
user_id = args[:user_id].to_i
year = args[:year].to_i
month = args[:month].to_i
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ' Restoring raw_data to DATABASE'
puts " User: #{user_id} | Month: #{year}-#{format('%02d', month)}"
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ''
restorer = Points::RawData::Restorer.new
restorer.restore_to_database(user_id, year, month)
puts ''
puts '✓ Restoration complete!'
puts ''
puts "Points in #{year}-#{month} now have raw_data in database."
puts 'Run VACUUM ANALYZE points; to update statistics.'
end
desc 'Restore raw_data to memory/cache temporarily (for data migrations)'
task :restore_temporary, [:user_id, :year, :month] => :environment do |_t, args|
validate_args!(args)
user_id = args[:user_id].to_i
year = args[:year].to_i
month = args[:month].to_i
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ' Loading raw_data into CACHE (temporary)'
puts " User: #{user_id} | Month: #{year}-#{format('%02d', month)}"
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ''
puts 'Data will be available for 1 hour via Point.raw_data_with_archive accessor'
puts ''
restorer = Points::RawData::Restorer.new
restorer.restore_to_memory(user_id, year, month)
puts ''
puts '✓ Cache loaded successfully!'
puts ''
puts 'You can now run your data migration.'
puts 'Example:'
puts " rails runner \"Point.where(user_id: #{user_id}, timestamp_year: #{year}, timestamp_month: #{month}).find_each { |p| p.fix_coordinates_from_raw_data }\""
puts ''
puts 'Cache will expire in 1 hour automatically.'
end
desc 'Restore all archived raw_data for a user'
task :restore_all, [:user_id] => :environment do |_t, args|
raise 'Usage: rake points:raw_data:restore_all[user_id]' unless args[:user_id]
user_id = args[:user_id].to_i
user = User.find(user_id)
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ' Restoring ALL archives for user'
puts " #{user.email} (ID: #{user_id})"
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ''
archives = Points::RawDataArchive.where(user_id: user_id)
.select(:year, :month)
.distinct
.order(:year, :month)
puts "Found #{archives.count} months to restore"
puts ''
archives.each_with_index do |archive, idx|
puts "[#{idx + 1}/#{archives.count}] Restoring #{archive.year}-#{format('%02d', archive.month)}..."
restorer = Points::RawData::Restorer.new
restorer.restore_to_database(user_id, archive.year, archive.month)
end
puts ''
puts "✓ All archives restored for user #{user_id}!"
end
desc 'Show archive statistics'
task status: :environment do
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ' Points raw_data Archive Statistics'
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ''
total_archives = Points::RawDataArchive.count
total_points = Point.count
archived_points = Point.where(raw_data_archived: true).count
percentage = total_points.positive? ? (archived_points.to_f / total_points * 100).round(2) : 0
puts "Archives: #{total_archives}"
puts "Points archived: #{archived_points} / #{total_points} (#{percentage}%)"
puts ''
# Storage size via ActiveStorage
total_blob_size = ActiveStorage::Blob
.joins('INNER JOIN active_storage_attachments ON active_storage_attachments.blob_id = active_storage_blobs.id')
.where("active_storage_attachments.record_type = 'Points::RawDataArchive'")
.sum(:byte_size)
puts "Storage used: #{ActiveSupport::NumberHelper.number_to_human_size(total_blob_size)}"
puts ''
# Recent activity
recent = Points::RawDataArchive.where('archived_at > ?', 7.days.ago).count
puts "Archives created last 7 days: #{recent}"
puts ''
# Top users
puts 'Top 10 users by archive count:'
puts '─────────────────────────────────────────────────'
Points::RawDataArchive.group(:user_id)
.select('user_id, COUNT(*) as archive_count, SUM(point_count) as total_points')
.order('archive_count DESC')
.limit(10)
.each_with_index do |stat, idx|
user = User.find(stat.user_id)
puts "#{idx + 1}. #{user.email.ljust(30)} #{stat.archive_count.to_s.rjust(3)} archives, #{stat.total_points.to_s.rjust(8)} points"
end
puts ''
end
desc 'Verify archive integrity for a month'
task :verify, [:user_id, :year, :month] => :environment do |_t, args|
validate_args!(args)
user_id = args[:user_id].to_i
year = args[:year].to_i
month = args[:month].to_i
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ' Verifying Archives'
puts " User: #{user_id} | Month: #{year}-#{format('%02d', month)}"
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ''
archives = Points::RawDataArchive.for_month(user_id, year, month)
if archives.empty?
puts 'No archives found.'
exit
end
all_ok = true
archives.each do |archive|
print "Chunk #{archive.chunk_number}: "
# Check file attached
unless archive.file.attached?
puts '✗ ERROR - File not attached!'
all_ok = false
next
end
# Download and count
begin
compressed = archive.file.blob.download
io = StringIO.new(compressed)
gz = Zlib::GzipReader.new(io)
actual_count = 0
gz.each_line { actual_count += 1 }
gz.close
if actual_count == archive.point_count
puts "✓ OK (#{actual_count} points, #{archive.size_mb} MB)"
else
puts "✗ MISMATCH - Expected #{archive.point_count}, found #{actual_count}"
all_ok = false
end
rescue StandardError => e
puts "✗ ERROR - #{e.message}"
all_ok = false
end
end
puts ''
if all_ok
puts '✓ All archives verified successfully!'
else
puts '✗ Some archives have issues. Please investigate.'
end
end
desc 'Run initial archival for old data (safe to re-run)'
task initial_archive: :environment do
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ' Initial Archival (2+ months old data)'
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ''
puts 'This will archive points.raw_data for months 2+ months old.'
puts 'This is safe to run multiple times (idempotent).'
puts ''
print 'Continue? (y/N): '
response = $stdin.gets.chomp.downcase
unless response == 'y'
puts 'Cancelled.'
exit
end
puts ''
stats = Points::RawData::Archiver.new.call
puts ''
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ' Archival Complete'
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
puts ''
puts "Months processed: #{stats[:processed]}"
puts "Points archived: #{stats[:archived]}"
puts "Failures: #{stats[:failed]}"
puts ''
return unless stats[:archived].positive?
puts 'Next steps:'
puts '1. Verify a sample: rake points:raw_data:verify[user_id,year,month]'
puts '2. Check stats: rake points:raw_data:status'
puts '3. (Optional) Reclaim space: VACUUM FULL points; (during maintenance)'
end
end
end
def validate_args!(args)
return if args[:user_id] && args[:year] && args[:month]
raise 'Usage: rake points:raw_data:TASK[user_id,year,month]'
end

View file

@ -15,13 +15,5 @@
"devDependencies": {
"@playwright/test": "^1.56.1",
"@types/node": "^24.0.13"
},
"scripts": {
"build": "esbuild app/javascript/*.* --bundle --splitting --format=esm --outdir=app/assets/builds",
"analyze": "esbuild app/javascript/*.* --bundle --metafile=meta.json --analyze"
},
"sideEffects": [
"*.css",
"maplibre-gl/dist/maplibre-gl.css"
]
}
}

View file

@ -0,0 +1,32 @@
# frozen_string_literal: true
FactoryBot.define do
factory :points_raw_data_archive, class: 'Points::RawDataArchive' do
user
year { 2024 }
month { 6 }
chunk_number { 1 }
point_count { 100 }
point_ids_checksum { Digest::SHA256.hexdigest('1,2,3') }
archived_at { Time.current }
metadata { { format_version: 1, compression: 'gzip' } }
after(:build) do |archive|
# Attach a test file
archive.file.attach(
io: StringIO.new(gzip_test_data),
filename: archive.filename,
content_type: 'application/gzip'
)
end
end
end
def gzip_test_data
io = StringIO.new
gz = Zlib::GzipWriter.new(io)
gz.puts({ id: 1, raw_data: { lon: 13.4, lat: 52.5 } }.to_json)
gz.puts({ id: 2, raw_data: { lon: 13.5, lat: 52.6 } }.to_json)
gz.close
io.string
end

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,46 @@
# frozen_string_literal: true
require 'rails_helper'
RSpec.describe Points::RawData::ArchiveJob, type: :job do
describe '#perform' do
let(:archiver) { instance_double(Points::RawData::Archiver) }
before do
# Enable archival for tests
allow(ENV).to receive(:[]).and_call_original
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
allow(Points::RawData::Archiver).to receive(:new).and_return(archiver)
allow(archiver).to receive(:call).and_return({ processed: 5, archived: 100, failed: 0 })
end
it 'calls the archiver service' do
expect(archiver).to receive(:call)
described_class.perform_now
end
context 'when archiver raises an error' do
let(:error) { StandardError.new('Archive failed') }
before do
allow(archiver).to receive(:call).and_raise(error)
end
it 're-raises the error' do
expect do
described_class.perform_now
end.to raise_error(StandardError, 'Archive failed')
end
it 'reports the error before re-raising' do
expect(ExceptionReporter).to receive(:call).with(error, 'Points raw data archival job failed')
expect do
described_class.perform_now
end.to raise_error(StandardError)
end
end
end
end

View file

@ -0,0 +1,34 @@
# frozen_string_literal: true
require 'rails_helper'
RSpec.describe Points::RawData::ReArchiveMonthJob, type: :job do
describe '#perform' do
let(:archiver) { instance_double(Points::RawData::Archiver) }
let(:user_id) { 123 }
let(:year) { 2024 }
let(:month) { 6 }
before do
allow(Points::RawData::Archiver).to receive(:new).and_return(archiver)
end
it 'calls archive_specific_month with correct parameters' do
expect(archiver).to receive(:archive_specific_month).with(user_id, year, month)
described_class.perform_now(user_id, year, month)
end
context 'when re-archival fails' do
before do
allow(archiver).to receive(:archive_specific_month).and_raise(StandardError, 'Re-archive failed')
end
it 're-raises the error' do
expect do
described_class.perform_now(user_id, year, month)
end.to raise_error(StandardError, 'Re-archive failed')
end
end
end
end

View file

@ -0,0 +1,116 @@
# frozen_string_literal: true
require 'rails_helper'
RSpec.describe Archivable, type: :model do
let(:user) { create(:user) }
let(:point) { create(:point, user: user, raw_data: { lon: 13.4, lat: 52.5 }) }
describe 'associations and scopes' do
it { expect(point).to belong_to(:raw_data_archive).optional }
describe 'scopes' do
let!(:archived_point) { create(:point, user: user, raw_data_archived: true) }
let!(:not_archived_point) { create(:point, user: user, raw_data_archived: false) }
it '.archived returns archived points' do
expect(Point.archived).to include(archived_point)
expect(Point.archived).not_to include(not_archived_point)
end
it '.not_archived returns non-archived points' do
expect(Point.not_archived).to include(not_archived_point)
expect(Point.not_archived).not_to include(archived_point)
end
end
end
describe '#raw_data_with_archive' do
context 'when raw_data is present in database' do
it 'returns raw_data from database' do
expect(point.raw_data_with_archive).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
end
end
context 'when raw_data is archived' do
let(:archive) { create(:points_raw_data_archive, user: user) }
let(:archived_point) do
create(:point, user: user, raw_data: nil, raw_data_archived: true, raw_data_archive: archive)
end
before do
# Mock archive file content with this specific point
compressed_data = gzip_data([
{ id: archived_point.id, raw_data: { lon: 14.0, lat: 53.0 } }
])
allow(archive.file.blob).to receive(:download).and_return(compressed_data)
end
it 'fetches raw_data from archive' do
result = archived_point.raw_data_with_archive
expect(result).to eq({ 'id' => archived_point.id, 'raw_data' => { 'lon' => 14.0, 'lat' => 53.0 } }['raw_data'])
end
end
context 'when raw_data is archived but point not in archive' do
let(:archive) { create(:points_raw_data_archive, user: user) }
let(:archived_point) do
create(:point, user: user, raw_data: nil, raw_data_archived: true, raw_data_archive: archive)
end
before do
# Mock archive file with different point
compressed_data = gzip_data([
{ id: 999, raw_data: { lon: 14.0, lat: 53.0 } }
])
allow(archive.file.blob).to receive(:download).and_return(compressed_data)
end
it 'returns empty hash' do
expect(archived_point.raw_data_with_archive).to eq({})
end
end
end
describe '#restore_raw_data!' do
let(:archive) { create(:points_raw_data_archive, user: user) }
let(:archived_point) do
create(:point, user: user, raw_data: nil, raw_data_archived: true, raw_data_archive: archive)
end
it 'restores raw_data to database and clears archive flags' do
new_data = { lon: 15.0, lat: 54.0 }
archived_point.restore_raw_data!(new_data)
archived_point.reload
expect(archived_point.raw_data).to eq(new_data.stringify_keys)
expect(archived_point.raw_data_archived).to be false
expect(archived_point.raw_data_archive_id).to be_nil
end
end
describe 'temporary cache' do
let(:june_point) { create(:point, user: user, timestamp: Time.new(2024, 6, 15).to_i) }
it 'checks temporary restore cache with correct key format' do
cache_key = "raw_data:temp:#{user.id}:2024:6:#{june_point.id}"
cached_data = { lon: 16.0, lat: 55.0 }
Rails.cache.write(cache_key, cached_data, expires_in: 1.hour)
# Access through send since check_temporary_restore_cache is private
result = june_point.send(:check_temporary_restore_cache)
expect(result).to eq(cached_data)
end
end
def gzip_data(points_array)
io = StringIO.new
gz = Zlib::GzipWriter.new(io)
points_array.each do |point_data|
gz.puts(point_data.to_json)
end
gz.close
io.string
end
end

View file

@ -0,0 +1,86 @@
# frozen_string_literal: true
require 'rails_helper'
RSpec.describe Points::RawDataArchive, type: :model do
let(:user) { create(:user) }
subject(:archive) { build(:points_raw_data_archive, user: user) }
describe 'associations' do
it { is_expected.to belong_to(:user) }
it { is_expected.to have_many(:points).dependent(:nullify) }
end
describe 'validations' do
it { is_expected.to validate_presence_of(:year) }
it { is_expected.to validate_presence_of(:month) }
it { is_expected.to validate_presence_of(:chunk_number) }
it { is_expected.to validate_presence_of(:point_count) }
it { is_expected.to validate_presence_of(:point_ids_checksum) }
it { is_expected.to validate_numericality_of(:year).is_greater_than(1970).is_less_than(2100) }
it { is_expected.to validate_numericality_of(:month).is_greater_than_or_equal_to(1).is_less_than_or_equal_to(12) }
it { is_expected.to validate_numericality_of(:chunk_number).is_greater_than(0) }
end
describe 'scopes' do
let!(:recent_archive) { create(:points_raw_data_archive, user: user, year: 2024, month: 5, archived_at: 1.day.ago) }
let!(:old_archive) { create(:points_raw_data_archive, user: user, year: 2023, month: 5, archived_at: 2.years.ago) }
describe '.recent' do
it 'returns archives from last 30 days' do
expect(described_class.recent).to include(recent_archive)
expect(described_class.recent).not_to include(old_archive)
end
end
describe '.old' do
it 'returns archives older than 1 year' do
expect(described_class.old).to include(old_archive)
expect(described_class.old).not_to include(recent_archive)
end
end
describe '.for_month' do
let!(:june_archive) { create(:points_raw_data_archive, user: user, year: 2024, month: 6, chunk_number: 1) }
let!(:june_archive_2) { create(:points_raw_data_archive, user: user, year: 2024, month: 6, chunk_number: 2) }
let!(:july_archive) { create(:points_raw_data_archive, user: user, year: 2024, month: 7, chunk_number: 1) }
it 'returns archives for specific month ordered by chunk number' do
result = described_class.for_month(user.id, 2024, 6)
expect(result.map(&:chunk_number)).to eq([1, 2])
expect(result).to include(june_archive, june_archive_2)
expect(result).not_to include(july_archive)
end
end
end
describe '#month_display' do
it 'returns formatted month and year' do
archive = build(:points_raw_data_archive, year: 2024, month: 6)
expect(archive.month_display).to eq('June 2024')
end
end
describe '#filename' do
it 'generates correct filename with directory structure' do
archive = build(:points_raw_data_archive, user_id: 123, year: 2024, month: 6, chunk_number: 5)
expect(archive.filename).to eq('raw_data_archives/123/2024/06/005.jsonl.gz')
end
end
describe '#size_mb' do
it 'returns 0 when no file attached' do
archive = build(:points_raw_data_archive)
expect(archive.size_mb).to eq(0)
end
it 'returns size in MB when file is attached' do
archive = create(:points_raw_data_archive, user: user)
# Mock file with 2MB size
allow(archive.file.blob).to receive(:byte_size).and_return(2 * 1024 * 1024)
expect(archive.size_mb).to eq(2.0)
end
end
end

View file

@ -35,7 +35,11 @@ RSpec.describe PointSerializer do
'course_accuracy' => point.course_accuracy,
'external_track_id' => point.external_track_id,
'track_id' => point.track_id,
'country_name' => point.read_attribute(:country_name)
'country_name' => point.read_attribute(:country_name),
'raw_data_archived' => point.raw_data_archived,
'raw_data_archive_id' => point.raw_data_archive_id,
'timestamp_year' => point.timestamp_year,
'timestamp_month' => point.timestamp_month
}
end

View file

@ -0,0 +1,202 @@
# frozen_string_literal: true
require 'rails_helper'
RSpec.describe Points::RawData::Archiver do
let(:user) { create(:user) }
let(:archiver) { described_class.new }
before do
allow(PointsChannel).to receive(:broadcast_to)
end
describe '#call' do
context 'when archival is disabled' do
before do
allow(ENV).to receive(:[]).and_call_original
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('false')
end
it 'returns early without processing' do
result = archiver.call
expect(result).to eq({ processed: 0, archived: 0, failed: 0 })
end
end
context 'when archival is enabled' do
before do
allow(ENV).to receive(:[]).and_call_original
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
end
let!(:old_points) do
# Create points 3 months ago (definitely older than 2 month lag)
old_date = 3.months.ago.beginning_of_month
create_list(:point, 5, user: user,
timestamp: old_date.to_i,
raw_data: { lon: 13.4, lat: 52.5 })
end
it 'archives old points' do
expect { archiver.call }.to change(Points::RawDataArchive, :count).by(1)
end
it 'marks points as archived' do
archiver.call
expect(Point.where(raw_data_archived: true).count).to eq(5)
end
it 'nullifies raw_data column' do
archiver.call
Point.where(user: user).find_each do |point|
expect(point.raw_data).to be_nil
end
end
it 'returns correct stats' do
result = archiver.call
expect(result[:processed]).to eq(1)
expect(result[:archived]).to eq(5)
expect(result[:failed]).to eq(0)
end
end
context 'with points from multiple months' do
before do
allow(ENV).to receive(:[]).and_call_original
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
end
let!(:june_points) do
june_date = 4.months.ago.beginning_of_month
create_list(:point, 3, user: user,
timestamp: june_date.to_i,
raw_data: { lon: 13.4, lat: 52.5 })
end
let!(:july_points) do
july_date = 3.months.ago.beginning_of_month
create_list(:point, 2, user: user,
timestamp: july_date.to_i,
raw_data: { lon: 14.0, lat: 53.0 })
end
it 'creates separate archives for each month' do
expect { archiver.call }.to change(Points::RawDataArchive, :count).by(2)
end
it 'archives all points' do
archiver.call
expect(Point.where(raw_data_archived: true).count).to eq(5)
end
end
end
describe '#archive_specific_month' do
before do
allow(ENV).to receive(:[]).and_call_original
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
end
let(:test_date) { 3.months.ago.beginning_of_month.utc }
let!(:june_points) do
create_list(:point, 3, user: user,
timestamp: test_date.to_i,
raw_data: { lon: 13.4, lat: 52.5 })
end
it 'archives specific month' do
expect do
archiver.archive_specific_month(user.id, test_date.year, test_date.month)
end.to change(Points::RawDataArchive, :count).by(1)
end
it 'creates archive with correct metadata' do
archiver.archive_specific_month(user.id, test_date.year, test_date.month)
archive = user.raw_data_archives.last
expect(archive.user_id).to eq(user.id)
expect(archive.year).to eq(test_date.year)
expect(archive.month).to eq(test_date.month)
expect(archive.point_count).to eq(3)
expect(archive.chunk_number).to eq(1)
end
it 'attaches compressed file' do
archiver.archive_specific_month(user.id, test_date.year, test_date.month)
archive = user.raw_data_archives.last
expect(archive.file).to be_attached
expect(archive.file.key).to match(%r{raw_data_archives/\d+/\d{4}/\d{2}/001\.jsonl\.gz})
end
end
describe 'append-only architecture' do
before do
allow(ENV).to receive(:[]).and_call_original
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
end
# Use UTC from the start to avoid timezone issues
let(:test_date_utc) { 3.months.ago.utc.beginning_of_month }
let!(:june_points_batch1) do
create_list(:point, 2, user: user,
timestamp: test_date_utc.to_i,
raw_data: { lon: 13.4, lat: 52.5 })
end
it 'creates additional chunks for same month' do
# First archival
archiver.archive_specific_month(user.id, test_date_utc.year, test_date_utc.month)
expect(Points::RawDataArchive.for_month(user.id, test_date_utc.year, test_date_utc.month).count).to eq(1)
expect(Points::RawDataArchive.last.chunk_number).to eq(1)
# Verify first batch is archived
june_points_batch1.each(&:reload)
expect(june_points_batch1.all?(&:raw_data_archived)).to be true
# Add more points for same month (retrospective import)
# Use unique timestamps to avoid uniqueness validation errors
mid_month = test_date_utc + 15.days
june_points_batch2 = [
create(:point, user: user, timestamp: mid_month.to_i, raw_data: { lon: 14.0, lat: 53.0 }),
create(:point, user: user, timestamp: (mid_month + 1.hour).to_i, raw_data: { lon: 14.0, lat: 53.0 })
]
# Verify second batch exists and is not archived
expect(june_points_batch2.all? { |p| !p.raw_data_archived }).to be true
# Second archival should create chunk 2
archiver.archive_specific_month(user.id, test_date_utc.year, test_date_utc.month)
expect(Points::RawDataArchive.for_month(user.id, test_date_utc.year, test_date_utc.month).count).to eq(2)
expect(Points::RawDataArchive.last.chunk_number).to eq(2)
end
end
describe 'advisory locking' do
before do
allow(ENV).to receive(:[]).and_call_original
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
end
let!(:june_points) do
old_date = 3.months.ago.beginning_of_month
create_list(:point, 2, user: user,
timestamp: old_date.to_i,
raw_data: { lon: 13.4, lat: 52.5 })
end
it 'prevents duplicate processing with advisory locks' do
# Simulate lock couldn't be acquired (returns nil/false)
allow(ActiveRecord::Base).to receive(:with_advisory_lock).and_return(false)
result = archiver.call
expect(result[:processed]).to eq(0)
expect(result[:failed]).to eq(0)
end
end
end

View file

@ -0,0 +1,94 @@
# frozen_string_literal: true
require 'rails_helper'
RSpec.describe Points::RawData::ChunkCompressor do
let(:user) { create(:user) }
before do
# Stub broadcasting to avoid ActionCable issues in tests
allow(PointsChannel).to receive(:broadcast_to)
end
let(:points) do
[
create(:point, user: user, raw_data: { lon: 13.4, lat: 52.5 }),
create(:point, user: user, raw_data: { lon: 13.5, lat: 52.6 }),
create(:point, user: user, raw_data: { lon: 13.6, lat: 52.7 })
]
end
let(:compressor) { described_class.new(Point.where(id: points.map(&:id))) }
describe '#compress' do
it 'returns compressed gzip data' do
result = compressor.compress
expect(result).to be_a(String)
expect(result.encoding.name).to eq('ASCII-8BIT')
end
it 'compresses points as JSONL format' do
compressed = compressor.compress
# Decompress and verify format
io = StringIO.new(compressed)
gz = Zlib::GzipReader.new(io)
lines = gz.readlines
gz.close
expect(lines.count).to eq(3)
# Each line should be valid JSON
lines.each_with_index do |line, index|
data = JSON.parse(line)
expect(data).to have_key('id')
expect(data).to have_key('raw_data')
expect(data['id']).to eq(points[index].id)
end
end
it 'includes point ID and raw_data in each line' do
compressed = compressor.compress
io = StringIO.new(compressed)
gz = Zlib::GzipReader.new(io)
first_line = gz.readline
gz.close
data = JSON.parse(first_line)
expect(data['id']).to eq(points.first.id)
expect(data['raw_data']).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
end
it 'processes points in batches' do
# Create many points to test batch processing with unique timestamps
many_points = []
base_time = Time.new(2024, 6, 15).to_i
2500.times do |i|
many_points << create(:point, user: user, timestamp: base_time + i, raw_data: { lon: 13.4, lat: 52.5 })
end
large_compressor = described_class.new(Point.where(id: many_points.map(&:id)))
compressed = large_compressor.compress
io = StringIO.new(compressed)
gz = Zlib::GzipReader.new(io)
line_count = 0
gz.each_line { line_count += 1 }
gz.close
expect(line_count).to eq(2500)
end
it 'produces smaller compressed output than uncompressed' do
compressed = compressor.compress
# Decompress to get original size
io = StringIO.new(compressed)
gz = Zlib::GzipReader.new(io)
decompressed = gz.read
gz.close
# Compressed should be smaller
expect(compressed.bytesize).to be < decompressed.bytesize
end
end
end

View file

@ -0,0 +1,228 @@
# frozen_string_literal: true
require 'rails_helper'
RSpec.describe Points::RawData::Restorer do
let(:user) { create(:user) }
let(:restorer) { described_class.new }
before do
# Stub broadcasting to avoid ActionCable issues in tests
allow(PointsChannel).to receive(:broadcast_to)
end
describe '#restore_to_database' do
let!(:archived_points) do
create_list(:point, 3, user: user, timestamp: Time.new(2024, 6, 15).to_i,
raw_data: nil, raw_data_archived: true)
end
let(:archive) do
# Create archive with actual point data
compressed_data = gzip_points_data(archived_points.map do |p|
{ id: p.id, raw_data: { lon: 13.4, lat: 52.5 } }
end)
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6)
arc.file.attach(
io: StringIO.new(compressed_data),
filename: arc.filename,
content_type: 'application/gzip'
)
arc.save!
# Associate points with archive
archived_points.each { |p| p.update!(raw_data_archive: arc) }
arc
end
it 'restores raw_data to database' do
archive # Ensure archive is created before restore
restorer.restore_to_database(user.id, 2024, 6)
archived_points.each(&:reload)
archived_points.each do |point|
expect(point.raw_data).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
end
end
it 'clears archive flags' do
archive # Ensure archive is created before restore
restorer.restore_to_database(user.id, 2024, 6)
archived_points.each(&:reload)
archived_points.each do |point|
expect(point.raw_data_archived).to be false
expect(point.raw_data_archive_id).to be_nil
end
end
it 'raises error when no archives found' do
expect do
restorer.restore_to_database(user.id, 2025, 12)
end.to raise_error(/No archives found/)
end
context 'with multiple chunks' do
let!(:more_points) do
create_list(:point, 2, user: user, timestamp: Time.new(2024, 6, 20).to_i,
raw_data: nil, raw_data_archived: true)
end
let!(:archive2) do
compressed_data = gzip_points_data(more_points.map do |p|
{ id: p.id, raw_data: { lon: 14.0, lat: 53.0 } }
end)
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6, chunk_number: 2)
arc.file.attach(
io: StringIO.new(compressed_data),
filename: arc.filename,
content_type: 'application/gzip'
)
arc.save!
more_points.each { |p| p.update!(raw_data_archive: arc) }
arc
end
it 'restores from all chunks' do
archive # Ensure first archive is created
archive2 # Ensure second archive is created
restorer.restore_to_database(user.id, 2024, 6)
(archived_points + more_points).each(&:reload)
expect(archived_points.first.raw_data).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
expect(more_points.first.raw_data).to eq({ 'lon' => 14.0, 'lat' => 53.0 })
end
end
end
describe '#restore_to_memory' do
let!(:archived_points) do
create_list(:point, 2, user: user, timestamp: Time.new(2024, 6, 15).to_i,
raw_data: nil, raw_data_archived: true)
end
let(:archive) do
compressed_data = gzip_points_data(archived_points.map do |p|
{ id: p.id, raw_data: { lon: 13.4, lat: 52.5 } }
end)
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6)
arc.file.attach(
io: StringIO.new(compressed_data),
filename: arc.filename,
content_type: 'application/gzip'
)
arc.save!
archived_points.each { |p| p.update!(raw_data_archive: arc) }
arc
end
it 'loads data into cache' do
archive # Ensure archive is created before restore
restorer.restore_to_memory(user.id, 2024, 6)
archived_points.each do |point|
cache_key = "raw_data:temp:#{user.id}:2024:6:#{point.id}"
cached_value = Rails.cache.read(cache_key)
expect(cached_value).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
end
end
it 'does not modify database' do
archive # Ensure archive is created before restore
restorer.restore_to_memory(user.id, 2024, 6)
archived_points.each(&:reload)
archived_points.each do |point|
expect(point.raw_data).to be_nil
expect(point.raw_data_archived).to be true
end
end
it 'sets cache expiration to 1 hour' do
archive # Ensure archive is created before restore
restorer.restore_to_memory(user.id, 2024, 6)
cache_key = "raw_data:temp:#{user.id}:2024:6:#{archived_points.first.id}"
# Cache should exist now
expect(Rails.cache.exist?(cache_key)).to be true
end
end
describe '#restore_all_for_user' do
let!(:june_points) do
create_list(:point, 2, user: user, timestamp: Time.new(2024, 6, 15).to_i,
raw_data: nil, raw_data_archived: true)
end
let!(:july_points) do
create_list(:point, 2, user: user, timestamp: Time.new(2024, 7, 15).to_i,
raw_data: nil, raw_data_archived: true)
end
let!(:june_archive) do
compressed_data = gzip_points_data(june_points.map { |p| { id: p.id, raw_data: { month: 'june' } } })
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6)
arc.file.attach(
io: StringIO.new(compressed_data),
filename: arc.filename,
content_type: 'application/gzip'
)
arc.save!
june_points.each { |p| p.update!(raw_data_archive: arc) }
arc
end
let!(:july_archive) do
compressed_data = gzip_points_data(july_points.map { |p| { id: p.id, raw_data: { month: 'july' } } })
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 7)
arc.file.attach(
io: StringIO.new(compressed_data),
filename: arc.filename,
content_type: 'application/gzip'
)
arc.save!
july_points.each { |p| p.update!(raw_data_archive: arc) }
arc
end
it 'restores all months for user' do
restorer.restore_all_for_user(user.id)
june_points.each(&:reload)
july_points.each(&:reload)
expect(june_points.first.raw_data).to eq({ 'month' => 'june' })
expect(july_points.first.raw_data).to eq({ 'month' => 'july' })
end
it 'clears all archive flags' do
restorer.restore_all_for_user(user.id)
(june_points + july_points).each(&:reload)
expect(Point.where(user: user, raw_data_archived: true).count).to eq(0)
end
end
def gzip_points_data(points_array)
io = StringIO.new
gz = Zlib::GzipWriter.new(io)
points_array.each do |point_data|
gz.puts(point_data.to_json)
end
gz.close
io.string
end
end