mirror of
https://github.com/Freika/dawarich.git
synced 2026-01-10 17:21:38 -05:00
Feature/raw data archive (#2009)
* 0.36.2 (#2007) * fix: move foreman to global gems to fix startup crash (#1971) * Update exporting code to stream points data to file in batches to red… (#1980) * Update exporting code to stream points data to file in batches to reduce memory usage * Update changelog * Update changelog * Feature/maplibre frontend (#1953) * Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet * Implement phase 1 * Phases 1-3 + part of 4 * Fix e2e tests * Phase 6 * Implement fog of war * Phase 7 * Next step: fix specs, phase 7 done * Use our own map tiles * Extract v2 map logic to separate manager classes * Update settings panel on v2 map * Update v2 e2e tests structure * Reimplement location search in maps v2 * Update speed routes * Implement visits and places creation in v2 * Fix last failing test * Implement visits merging * Fix a routes e2e test and simplify the routes layer styling. * Extract js to modules from maps_v2_controller.js * Implement area creation * Fix spec problem * Fix some e2e tests * Implement live mode in v2 map * Update icons and panel * Extract some styles * Remove unused file * Start adding dark theme to popups on MapLibre maps * Make popups respect dark theme * Move v2 maps to maplibre namespace * Update v2 references to maplibre * Put place, area and visit info into side panel * Update API to use safe settings config method * Fix specs * Fix method name to config in SafeSettings and update usages accordingly * Add missing public files * Add handling for real time points * Fix remembering enabled/disabled layers of the v2 map * Fix lots of e2e tests * Add settings to select map version * Use maps/v2 as main path for MapLibre maps * Update routing * Update live mode * Update maplibre controller * Update changelog * Remove some console.log statements --------- Co-authored-by: Robin Tuszik <mail@robin.gg> * Remove esbuild scripts from package.json * Remove sideEffects field from package.json * Raw data archivation * Add tests * Fix tests * Fix tests * Update ExceptionReporter * Add schedule to run raw data archival job monthly * Change file structure for raw data archival feature * Update changelog and version for raw data archival feature --------- Co-authored-by: Robin Tuszik <mail@robin.gg>
This commit is contained in:
parent
2debcd88fa
commit
9661e8e7f7
35 changed files with 1680 additions and 18 deletions
|
|
@ -1 +1 @@
|
|||
0.36.2
|
||||
0.36.3
|
||||
|
|
|
|||
|
|
@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
|
|||
The format is based on [Keep a Changelog](http://keepachangelog.com/)
|
||||
and this project adheres to [Semantic Versioning](http://semver.org/).
|
||||
|
||||
# [0.36.3] - Unreleased
|
||||
|
||||
## Added
|
||||
|
||||
- Setting `ARCHIVE_RAW_DATA` env var to true will enable monthly raw data archiving for all users. It will look for points older than 2 months with `raw_data` column not empty and create a zip archive containing raw data files for each month. After successful archiving, raw data will be removed from the database to save space. Monthly archiving job is being run every day at 2:00 AM. Default env var value is false.
|
||||
|
||||
|
||||
# [0.36.2] - 2025-12-06
|
||||
|
||||
|
|
|
|||
3
Gemfile
3
Gemfile
|
|
@ -14,6 +14,7 @@ gem 'bootsnap', require: false
|
|||
gem 'chartkick'
|
||||
gem 'data_migrate'
|
||||
gem 'devise'
|
||||
gem 'foreman'
|
||||
gem 'geocoder', github: 'Freika/geocoder', branch: 'master'
|
||||
gem 'gpx'
|
||||
gem 'groupdate'
|
||||
|
|
@ -55,7 +56,7 @@ gem 'stimulus-rails'
|
|||
gem 'tailwindcss-rails', '= 3.3.2'
|
||||
gem 'turbo-rails', '>= 2.0.17'
|
||||
gem 'tzinfo-data', platforms: %i[mingw mswin x64_mingw jruby]
|
||||
gem 'foreman'
|
||||
gem 'with_advisory_lock'
|
||||
|
||||
group :development, :test, :staging do
|
||||
gem 'brakeman', require: false
|
||||
|
|
|
|||
|
|
@ -620,6 +620,9 @@ GEM
|
|||
base64
|
||||
websocket-extensions (>= 0.1.0)
|
||||
websocket-extensions (0.1.5)
|
||||
with_advisory_lock (7.0.2)
|
||||
activerecord (>= 7.2)
|
||||
zeitwerk (>= 2.7)
|
||||
xpath (3.2.0)
|
||||
nokogiri (~> 1.8)
|
||||
zeitwerk (2.7.3)
|
||||
|
|
@ -703,6 +706,7 @@ DEPENDENCIES
|
|||
turbo-rails (>= 2.0.17)
|
||||
tzinfo-data
|
||||
webmock
|
||||
with_advisory_lock
|
||||
|
||||
RUBY VERSION
|
||||
ruby 3.4.6p54
|
||||
|
|
|
|||
|
|
@ -4,6 +4,8 @@ class Family::Invitations::CleanupJob < ApplicationJob
|
|||
queue_as :families
|
||||
|
||||
def perform
|
||||
return unless DawarichSettings.family_feature_enabled?
|
||||
|
||||
Rails.logger.info 'Starting family invitations cleanup'
|
||||
|
||||
expired_count = Family::Invitation.where(status: :pending)
|
||||
|
|
|
|||
21
app/jobs/points/raw_data/archive_job.rb
Normal file
21
app/jobs/points/raw_data/archive_job.rb
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
module Points
|
||||
module RawData
|
||||
class ArchiveJob < ApplicationJob
|
||||
queue_as :archival
|
||||
|
||||
def perform
|
||||
return unless ENV['ARCHIVE_RAW_DATA'] == 'true'
|
||||
|
||||
stats = Points::RawData::Archiver.new.call
|
||||
|
||||
Rails.logger.info("Archive job complete: #{stats}")
|
||||
rescue StandardError => e
|
||||
ExceptionReporter.call(e, 'Points raw data archival job failed')
|
||||
|
||||
raise
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
19
app/jobs/points/raw_data/re_archive_month_job.rb
Normal file
19
app/jobs/points/raw_data/re_archive_month_job.rb
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
module Points
|
||||
module RawData
|
||||
class ReArchiveMonthJob < ApplicationJob
|
||||
queue_as :archival
|
||||
|
||||
def perform(user_id, year, month)
|
||||
Rails.logger.info("Re-archiving #{user_id}/#{year}/#{month} (retrospective import)")
|
||||
|
||||
Points::RawData::Archiver.new.archive_specific_month(user_id, year, month)
|
||||
rescue StandardError => e
|
||||
ExceptionReporter.call(e, "Re-archival job failed for #{user_id}/#{year}/#{month}")
|
||||
|
||||
raise
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
81
app/models/concerns/archivable.rb
Normal file
81
app/models/concerns/archivable.rb
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
module Archivable
|
||||
extend ActiveSupport::Concern
|
||||
|
||||
included do
|
||||
belongs_to :raw_data_archive,
|
||||
class_name: 'Points::RawDataArchive',
|
||||
optional: true
|
||||
|
||||
scope :archived, -> { where(raw_data_archived: true) }
|
||||
scope :not_archived, -> { where(raw_data_archived: false) }
|
||||
scope :with_archived_raw_data, lambda {
|
||||
includes(raw_data_archive: { file_attachment: :blob })
|
||||
}
|
||||
end
|
||||
|
||||
# Main method: Get raw_data with fallback to archive
|
||||
# Use this instead of point.raw_data when you need archived data
|
||||
def raw_data_with_archive
|
||||
return raw_data if raw_data.present? || !raw_data_archived?
|
||||
|
||||
fetch_archived_raw_data
|
||||
end
|
||||
|
||||
# Restore archived data back to database column
|
||||
def restore_raw_data!(value)
|
||||
update!(
|
||||
raw_data: value,
|
||||
raw_data_archived: false,
|
||||
raw_data_archive_id: nil
|
||||
)
|
||||
end
|
||||
|
||||
private
|
||||
|
||||
def fetch_archived_raw_data
|
||||
# Check temporary restore cache first (for migrations)
|
||||
cached = check_temporary_restore_cache
|
||||
return cached if cached
|
||||
|
||||
fetch_from_archive_file
|
||||
rescue StandardError => e
|
||||
handle_archive_fetch_error(e)
|
||||
end
|
||||
|
||||
def check_temporary_restore_cache
|
||||
return nil unless respond_to?(:timestamp)
|
||||
|
||||
recorded_time = Time.at(timestamp)
|
||||
cache_key = "raw_data:temp:#{user_id}:#{recorded_time.year}:#{recorded_time.month}:#{id}"
|
||||
Rails.cache.read(cache_key)
|
||||
end
|
||||
|
||||
def fetch_from_archive_file
|
||||
return {} unless raw_data_archive&.file&.attached?
|
||||
|
||||
# Download and search through JSONL
|
||||
compressed_content = raw_data_archive.file.blob.download
|
||||
io = StringIO.new(compressed_content)
|
||||
gz = Zlib::GzipReader.new(io)
|
||||
|
||||
result = nil
|
||||
gz.each_line do |line|
|
||||
data = JSON.parse(line)
|
||||
if data['id'] == id
|
||||
result = data['raw_data']
|
||||
break
|
||||
end
|
||||
end
|
||||
|
||||
gz.close
|
||||
result || {}
|
||||
end
|
||||
|
||||
def handle_archive_fetch_error(error)
|
||||
ExceptionReporter.call(error, "Failed to fetch archived raw_data for Point ID #{id}")
|
||||
|
||||
{} # Graceful degradation
|
||||
end
|
||||
end
|
||||
|
|
@ -3,6 +3,7 @@
|
|||
class Point < ApplicationRecord
|
||||
include Nearable
|
||||
include Distanceable
|
||||
include Archivable
|
||||
|
||||
belongs_to :import, optional: true, counter_cache: true
|
||||
belongs_to :visit, optional: true
|
||||
|
|
|
|||
40
app/models/points/raw_data_archive.rb
Normal file
40
app/models/points/raw_data_archive.rb
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
module Points
|
||||
class RawDataArchive < ApplicationRecord
|
||||
self.table_name = 'points_raw_data_archives'
|
||||
|
||||
belongs_to :user
|
||||
has_many :points, dependent: :nullify
|
||||
|
||||
has_one_attached :file
|
||||
|
||||
validates :year, :month, :chunk_number, :point_count, presence: true
|
||||
validates :year, numericality: { greater_than: 1970, less_than: 2100 }
|
||||
validates :month, numericality: { greater_than_or_equal_to: 1, less_than_or_equal_to: 12 }
|
||||
validates :chunk_number, numericality: { greater_than: 0 }
|
||||
validates :point_ids_checksum, presence: true
|
||||
|
||||
scope :for_month, lambda { |user_id, year, month|
|
||||
where(user_id: user_id, year: year, month: month)
|
||||
.order(:chunk_number)
|
||||
}
|
||||
|
||||
scope :recent, -> { where('archived_at > ?', 30.days.ago) }
|
||||
scope :old, -> { where('archived_at < ?', 1.year.ago) }
|
||||
|
||||
def month_display
|
||||
Date.new(year, month, 1).strftime('%B %Y')
|
||||
end
|
||||
|
||||
def filename
|
||||
"raw_data_archives/#{user_id}/#{year}/#{format('%02d', month)}/#{format('%03d', chunk_number)}.jsonl.gz"
|
||||
end
|
||||
|
||||
def size_mb
|
||||
return 0 unless file.attached?
|
||||
|
||||
(file.blob.byte_size / 1024.0 / 1024.0).round(2)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
|
@ -20,6 +20,7 @@ class User < ApplicationRecord # rubocop:disable Metrics/ClassLength
|
|||
has_many :tags, dependent: :destroy
|
||||
has_many :trips, dependent: :destroy
|
||||
has_many :tracks, dependent: :destroy
|
||||
has_many :raw_data_archives, class_name: 'Points::RawDataArchive', dependent: :destroy
|
||||
|
||||
after_create :create_api_key
|
||||
after_commit :activate, on: :create, if: -> { DawarichSettings.self_hosted? }
|
||||
|
|
|
|||
|
|
@ -2,10 +2,14 @@
|
|||
|
||||
class ExceptionReporter
|
||||
def self.call(exception, human_message = 'Exception reported')
|
||||
return unless DawarichSettings.self_hosted?
|
||||
return if DawarichSettings.self_hosted?
|
||||
|
||||
Rails.logger.error "#{human_message}: #{exception.message}"
|
||||
|
||||
Sentry.capture_exception(exception)
|
||||
if exception.is_a?(Exception)
|
||||
Rails.logger.error "#{human_message}: #{exception.message}"
|
||||
Sentry.capture_exception(exception)
|
||||
else
|
||||
Rails.logger.error "#{exception}: #{human_message}"
|
||||
Sentry.capture_message("#{exception}: #{human_message}")
|
||||
end
|
||||
end
|
||||
end
|
||||
|
|
|
|||
184
app/services/points/raw_data/archiver.rb
Normal file
184
app/services/points/raw_data/archiver.rb
Normal file
|
|
@ -0,0 +1,184 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
module Points
|
||||
module RawData
|
||||
class Archiver
|
||||
SAFE_ARCHIVE_LAG = 2.months
|
||||
|
||||
def initialize
|
||||
@stats = { processed: 0, archived: 0, failed: 0 }
|
||||
end
|
||||
|
||||
def call
|
||||
unless archival_enabled?
|
||||
Rails.logger.info('Raw data archival disabled (ARCHIVE_RAW_DATA != "true")')
|
||||
return @stats
|
||||
end
|
||||
|
||||
Rails.logger.info('Starting points raw_data archival...')
|
||||
|
||||
archivable_months.each do |month_data|
|
||||
process_month(month_data)
|
||||
end
|
||||
|
||||
Rails.logger.info("Archival complete: #{@stats}")
|
||||
@stats
|
||||
end
|
||||
|
||||
def archive_specific_month(user_id, year, month)
|
||||
month_data = {
|
||||
'user_id' => user_id,
|
||||
'year' => year,
|
||||
'month' => month
|
||||
}
|
||||
|
||||
process_month(month_data)
|
||||
end
|
||||
|
||||
private
|
||||
|
||||
def archival_enabled?
|
||||
ENV['ARCHIVE_RAW_DATA'] == 'true'
|
||||
end
|
||||
|
||||
def archivable_months
|
||||
# Only months 2+ months old with unarchived points
|
||||
safe_cutoff = Date.current.beginning_of_month - SAFE_ARCHIVE_LAG
|
||||
|
||||
# Use raw SQL to avoid GROUP BY issues with ActiveRecord
|
||||
# Use AT TIME ZONE 'UTC' to ensure consistent timezone handling
|
||||
sql = <<-SQL.squish
|
||||
SELECT user_id,
|
||||
EXTRACT(YEAR FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))::int as year,
|
||||
EXTRACT(MONTH FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))::int as month,
|
||||
COUNT(*) as unarchived_count
|
||||
FROM points
|
||||
WHERE raw_data_archived = false
|
||||
AND raw_data IS NOT NULL
|
||||
AND raw_data != '{}'
|
||||
AND to_timestamp(timestamp) < ?
|
||||
GROUP BY user_id,
|
||||
EXTRACT(YEAR FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC')),
|
||||
EXTRACT(MONTH FROM (to_timestamp(timestamp) AT TIME ZONE 'UTC'))
|
||||
SQL
|
||||
|
||||
ActiveRecord::Base.connection.exec_query(
|
||||
ActiveRecord::Base.sanitize_sql_array([sql, safe_cutoff])
|
||||
)
|
||||
end
|
||||
|
||||
def process_month(month_data)
|
||||
user_id = month_data['user_id']
|
||||
year = month_data['year']
|
||||
month = month_data['month']
|
||||
|
||||
lock_key = "archive_points:#{user_id}:#{year}:#{month}"
|
||||
|
||||
# Advisory lock prevents duplicate processing
|
||||
# Returns false if lock couldn't be acquired (already locked)
|
||||
lock_acquired = ActiveRecord::Base.with_advisory_lock(lock_key, timeout_seconds: 0) do
|
||||
archive_month(user_id, year, month)
|
||||
@stats[:processed] += 1
|
||||
true
|
||||
end
|
||||
|
||||
Rails.logger.info("Skipping #{lock_key} - already locked") unless lock_acquired
|
||||
rescue StandardError => e
|
||||
ExceptionReporter.call(e, "Failed to archive points for user #{user_id}, #{year}-#{month}")
|
||||
|
||||
@stats[:failed] += 1
|
||||
end
|
||||
|
||||
def archive_month(user_id, year, month)
|
||||
points = find_archivable_points(user_id, year, month)
|
||||
return if points.empty?
|
||||
|
||||
point_ids = points.pluck(:id)
|
||||
log_archival_start(user_id, year, month, point_ids.count)
|
||||
|
||||
archive = create_archive_chunk(user_id, year, month, points, point_ids)
|
||||
mark_points_as_archived(point_ids, archive.id)
|
||||
update_stats(point_ids.count)
|
||||
log_archival_success(archive)
|
||||
end
|
||||
|
||||
def find_archivable_points(user_id, year, month)
|
||||
timestamp_range = month_timestamp_range(year, month)
|
||||
|
||||
Point.where(user_id: user_id, raw_data_archived: false)
|
||||
.where(timestamp: timestamp_range)
|
||||
.where.not(raw_data: nil)
|
||||
end
|
||||
|
||||
def month_timestamp_range(year, month)
|
||||
start_of_month = Time.utc(year, month, 1).to_i
|
||||
end_of_month = (Time.utc(year, month, 1) + 1.month).to_i
|
||||
start_of_month...end_of_month
|
||||
end
|
||||
|
||||
def mark_points_as_archived(point_ids, archive_id)
|
||||
Point.transaction do
|
||||
Point.where(id: point_ids).update_all(
|
||||
raw_data_archived: true,
|
||||
raw_data_archive_id: archive_id,
|
||||
raw_data: nil
|
||||
)
|
||||
end
|
||||
end
|
||||
|
||||
def update_stats(archived_count)
|
||||
@stats[:archived] += archived_count
|
||||
end
|
||||
|
||||
def log_archival_start(user_id, year, month, count)
|
||||
Rails.logger.info("Archiving #{count} points for user #{user_id}, #{year}-#{format('%02d', month)}")
|
||||
end
|
||||
|
||||
def log_archival_success(archive)
|
||||
Rails.logger.info("✓ Archived chunk #{archive.chunk_number} (#{archive.size_mb} MB)")
|
||||
end
|
||||
|
||||
def create_archive_chunk(user_id, year, month, points, point_ids)
|
||||
# Determine chunk number (append-only)
|
||||
chunk_number = Points::RawDataArchive
|
||||
.where(user_id: user_id, year: year, month: month)
|
||||
.maximum(:chunk_number).to_i + 1
|
||||
|
||||
# Compress points data
|
||||
compressed_data = Points::RawData::ChunkCompressor.new(points).compress
|
||||
|
||||
# Create archive record
|
||||
archive = Points::RawDataArchive.create!(
|
||||
user_id: user_id,
|
||||
year: year,
|
||||
month: month,
|
||||
chunk_number: chunk_number,
|
||||
point_count: point_ids.count,
|
||||
point_ids_checksum: calculate_checksum(point_ids),
|
||||
archived_at: Time.current,
|
||||
metadata: {
|
||||
format_version: 1,
|
||||
compression: 'gzip',
|
||||
archived_by: 'Points::RawData::Archiver'
|
||||
}
|
||||
)
|
||||
|
||||
# Attach compressed file via ActiveStorage
|
||||
# Uses directory structure: raw_data_archives/:user_id/:year/:month/:chunk.jsonl.gz
|
||||
# The key parameter controls the actual storage path
|
||||
archive.file.attach(
|
||||
io: StringIO.new(compressed_data),
|
||||
filename: "#{format('%03d', chunk_number)}.jsonl.gz",
|
||||
content_type: 'application/gzip',
|
||||
key: "raw_data_archives/#{user_id}/#{year}/#{format('%02d', month)}/#{format('%03d', chunk_number)}.jsonl.gz"
|
||||
)
|
||||
|
||||
archive
|
||||
end
|
||||
|
||||
def calculate_checksum(point_ids)
|
||||
Digest::SHA256.hexdigest(point_ids.sort.join(','))
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
25
app/services/points/raw_data/chunk_compressor.rb
Normal file
25
app/services/points/raw_data/chunk_compressor.rb
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
module Points
|
||||
module RawData
|
||||
class ChunkCompressor
|
||||
def initialize(points_relation)
|
||||
@points = points_relation
|
||||
end
|
||||
|
||||
def compress
|
||||
io = StringIO.new
|
||||
gz = Zlib::GzipWriter.new(io)
|
||||
|
||||
# Stream points to avoid memory issues with large months
|
||||
@points.select(:id, :raw_data).find_each(batch_size: 1000) do |point|
|
||||
# Write as JSONL (one JSON object per line)
|
||||
gz.puts({ id: point.id, raw_data: point.raw_data }.to_json)
|
||||
end
|
||||
|
||||
gz.close
|
||||
io.string.force_encoding(Encoding::ASCII_8BIT) # Returns compressed bytes in binary encoding
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
105
app/services/points/raw_data/restorer.rb
Normal file
105
app/services/points/raw_data/restorer.rb
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
module Points
|
||||
module RawData
|
||||
class Restorer
|
||||
def restore_to_database(user_id, year, month)
|
||||
archives = Points::RawDataArchive.for_month(user_id, year, month)
|
||||
|
||||
raise "No archives found for user #{user_id}, #{year}-#{month}" if archives.empty?
|
||||
|
||||
Rails.logger.info("Restoring #{archives.count} archives to database...")
|
||||
|
||||
Point.transaction do
|
||||
archives.each { restore_archive_to_db(_1) }
|
||||
end
|
||||
|
||||
Rails.logger.info("✓ Restored #{archives.sum(:point_count)} points")
|
||||
end
|
||||
|
||||
def restore_to_memory(user_id, year, month)
|
||||
archives = Points::RawDataArchive.for_month(user_id, year, month)
|
||||
|
||||
raise "No archives found for user #{user_id}, #{year}-#{month}" if archives.empty?
|
||||
|
||||
Rails.logger.info("Loading #{archives.count} archives into cache...")
|
||||
|
||||
cache_key_prefix = "raw_data:temp:#{user_id}:#{year}:#{month}"
|
||||
count = 0
|
||||
|
||||
archives.each do |archive|
|
||||
count += restore_archive_to_cache(archive, cache_key_prefix)
|
||||
end
|
||||
|
||||
Rails.logger.info("✓ Loaded #{count} points into cache (expires in 1 hour)")
|
||||
end
|
||||
|
||||
def restore_all_for_user(user_id)
|
||||
archives =
|
||||
Points::RawDataArchive.where(user_id: user_id)
|
||||
.select(:year, :month)
|
||||
.distinct
|
||||
.order(:year, :month)
|
||||
|
||||
Rails.logger.info("Restoring #{archives.count} months for user #{user_id}...")
|
||||
|
||||
archives.each do |archive|
|
||||
restore_to_database(user_id, archive.year, archive.month)
|
||||
end
|
||||
|
||||
Rails.logger.info('✓ Complete user restore finished')
|
||||
end
|
||||
|
||||
private
|
||||
|
||||
def restore_archive_to_db(archive)
|
||||
decompressed = download_and_decompress(archive)
|
||||
|
||||
decompressed.each_line do |line|
|
||||
data = JSON.parse(line)
|
||||
|
||||
Point.where(id: data['id']).update_all(
|
||||
raw_data: data['raw_data'],
|
||||
raw_data_archived: false,
|
||||
raw_data_archive_id: nil
|
||||
)
|
||||
end
|
||||
end
|
||||
|
||||
def restore_archive_to_cache(archive, cache_key_prefix)
|
||||
decompressed = download_and_decompress(archive)
|
||||
count = 0
|
||||
|
||||
decompressed.each_line do |line|
|
||||
data = JSON.parse(line)
|
||||
|
||||
Rails.cache.write(
|
||||
"#{cache_key_prefix}:#{data['id']}",
|
||||
data['raw_data'],
|
||||
expires_in: 1.hour
|
||||
)
|
||||
|
||||
count += 1
|
||||
end
|
||||
|
||||
count
|
||||
end
|
||||
|
||||
def download_and_decompress(archive)
|
||||
# Download via ActiveStorage
|
||||
compressed_content = archive.file.blob.download
|
||||
|
||||
# Decompress
|
||||
io = StringIO.new(compressed_content)
|
||||
gz = Zlib::GzipReader.new(io)
|
||||
content = gz.read
|
||||
gz.close
|
||||
|
||||
content
|
||||
rescue StandardError => e
|
||||
Rails.logger.error("Failed to download/decompress archive #{archive.id}: #{e.message}")
|
||||
raise
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
|
|
@ -62,3 +62,6 @@ OIDC_AUTO_REGISTER = ENV.fetch('OIDC_AUTO_REGISTER', 'true') == 'true'
|
|||
|
||||
# Email/password registration setting (default: false for self-hosted, true for cloud)
|
||||
ALLOW_EMAIL_PASSWORD_REGISTRATION = ENV.fetch('ALLOW_EMAIL_PASSWORD_REGISTRATION', 'false') == 'true'
|
||||
|
||||
# Raw data archival setting
|
||||
ARCHIVE_RAW_DATA = ENV.fetch('ARCHIVE_RAW_DATA', 'false') == 'true'
|
||||
|
|
|
|||
|
|
@ -49,5 +49,9 @@ class DawarichSettings
|
|||
family: family_feature_enabled?
|
||||
}
|
||||
end
|
||||
|
||||
def archive_raw_data_enabled?
|
||||
@archive_raw_data_enabled ||= ARCHIVE_RAW_DATA
|
||||
end
|
||||
end
|
||||
end
|
||||
|
|
|
|||
|
|
@ -49,3 +49,8 @@ nightly_family_invitations_cleanup_job:
|
|||
cron: "30 2 * * *" # every day at 02:30
|
||||
class: "Family::Invitations::CleanupJob"
|
||||
queue: family
|
||||
|
||||
raw_data_archival_job:
|
||||
cron: "0 2 1 * *" # Monthly on the 1st at 2 AM
|
||||
class: "Points::RawData::ArchiveJob"
|
||||
queue: archival
|
||||
|
|
|
|||
|
|
@ -16,3 +16,4 @@
|
|||
- places
|
||||
- app_version_checking
|
||||
- cache
|
||||
- archival
|
||||
|
|
|
|||
23
db/migrate/20251206000001_create_points_raw_data_archives.rb
Normal file
23
db/migrate/20251206000001_create_points_raw_data_archives.rb
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
class CreatePointsRawDataArchives < ActiveRecord::Migration[8.0]
|
||||
def change
|
||||
create_table :points_raw_data_archives do |t|
|
||||
t.bigint :user_id, null: false
|
||||
t.integer :year, null: false
|
||||
t.integer :month, null: false
|
||||
t.integer :chunk_number, null: false, default: 1
|
||||
t.integer :point_count, null: false
|
||||
t.string :point_ids_checksum, null: false
|
||||
t.jsonb :metadata, default: {}, null: false
|
||||
t.datetime :archived_at, null: false
|
||||
|
||||
t.timestamps
|
||||
end
|
||||
|
||||
add_index :points_raw_data_archives, :user_id
|
||||
add_index :points_raw_data_archives, [:user_id, :year, :month]
|
||||
add_index :points_raw_data_archives, :archived_at
|
||||
add_foreign_key :points_raw_data_archives, :users, validate: false
|
||||
end
|
||||
end
|
||||
22
db/migrate/20251206000002_add_archival_columns_to_points.rb
Normal file
22
db/migrate/20251206000002_add_archival_columns_to_points.rb
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
class AddArchivalColumnsToPoints < ActiveRecord::Migration[8.0]
|
||||
disable_ddl_transaction!
|
||||
|
||||
def change
|
||||
add_column :points, :raw_data_archived, :boolean, default: false, null: false
|
||||
add_column :points, :raw_data_archive_id, :bigint, null: true
|
||||
|
||||
add_index :points, :raw_data_archived,
|
||||
where: 'raw_data_archived = true',
|
||||
name: 'index_points_on_archived_true',
|
||||
algorithm: :concurrently
|
||||
add_index :points, :raw_data_archive_id,
|
||||
algorithm: :concurrently
|
||||
|
||||
add_foreign_key :points, :points_raw_data_archives,
|
||||
column: :raw_data_archive_id,
|
||||
on_delete: :nullify, # Don't delete points if archive deleted
|
||||
validate: false
|
||||
end
|
||||
end
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
class ValidateArchivalForeignKeys < ActiveRecord::Migration[8.0]
|
||||
def change
|
||||
validate_foreign_key :points_raw_data_archives, :users
|
||||
validate_foreign_key :points, :points_raw_data_archives
|
||||
end
|
||||
end
|
||||
27
db/schema.rb
generated
27
db/schema.rb
generated
|
|
@ -10,7 +10,7 @@
|
|||
#
|
||||
# It's strongly recommended that you check this file into your version control system.
|
||||
|
||||
ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
|
||||
ActiveRecord::Schema[8.0].define(version: 2025_12_06_000004) do
|
||||
# These are extensions that must be enabled in order to support this database
|
||||
enable_extension "pg_catalog.plpgsql"
|
||||
enable_extension "postgis"
|
||||
|
|
@ -224,6 +224,10 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
|
|||
t.bigint "country_id"
|
||||
t.bigint "track_id"
|
||||
t.string "country_name"
|
||||
t.boolean "raw_data_archived", default: false, null: false
|
||||
t.bigint "raw_data_archive_id"
|
||||
t.integer "timestamp_year"
|
||||
t.integer "timestamp_month"
|
||||
t.index ["altitude"], name: "index_points_on_altitude"
|
||||
t.index ["battery"], name: "index_points_on_battery"
|
||||
t.index ["battery_status"], name: "index_points_on_battery_status"
|
||||
|
|
@ -238,6 +242,8 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
|
|||
t.index ["latitude", "longitude"], name: "index_points_on_latitude_and_longitude"
|
||||
t.index ["lonlat", "timestamp", "user_id"], name: "index_points_on_lonlat_timestamp_user_id", unique: true
|
||||
t.index ["lonlat"], name: "index_points_on_lonlat", using: :gist
|
||||
t.index ["raw_data_archive_id"], name: "index_points_on_raw_data_archive_id"
|
||||
t.index ["raw_data_archived"], name: "index_points_on_archived_true", where: "(raw_data_archived = true)"
|
||||
t.index ["reverse_geocoded_at"], name: "index_points_on_reverse_geocoded_at"
|
||||
t.index ["timestamp"], name: "index_points_on_timestamp"
|
||||
t.index ["track_id"], name: "index_points_on_track_id"
|
||||
|
|
@ -245,10 +251,27 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
|
|||
t.index ["user_id", "country_name"], name: "idx_points_user_country_name"
|
||||
t.index ["user_id", "reverse_geocoded_at"], name: "index_points_on_user_id_and_reverse_geocoded_at", where: "(reverse_geocoded_at IS NOT NULL)"
|
||||
t.index ["user_id", "timestamp", "track_id"], name: "idx_points_track_generation"
|
||||
t.index ["user_id", "timestamp_year", "timestamp_month", "raw_data_archived"], name: "index_points_on_user_time_archived"
|
||||
t.index ["user_id"], name: "index_points_on_user_id"
|
||||
t.index ["visit_id"], name: "index_points_on_visit_id"
|
||||
end
|
||||
|
||||
create_table "points_raw_data_archives", force: :cascade do |t|
|
||||
t.bigint "user_id", null: false
|
||||
t.integer "year", null: false
|
||||
t.integer "month", null: false
|
||||
t.integer "chunk_number", default: 1, null: false
|
||||
t.integer "point_count", null: false
|
||||
t.string "point_ids_checksum", null: false
|
||||
t.jsonb "metadata", default: {}, null: false
|
||||
t.datetime "archived_at", null: false
|
||||
t.datetime "created_at", null: false
|
||||
t.datetime "updated_at", null: false
|
||||
t.index ["archived_at"], name: "index_points_raw_data_archives_on_archived_at"
|
||||
t.index ["user_id", "year", "month"], name: "index_points_raw_data_archives_on_user_id_and_year_and_month"
|
||||
t.index ["user_id"], name: "index_points_raw_data_archives_on_user_id"
|
||||
end
|
||||
|
||||
create_table "stats", force: :cascade do |t|
|
||||
t.integer "year", null: false
|
||||
t.integer "month", null: false
|
||||
|
|
@ -384,8 +407,10 @@ ActiveRecord::Schema[8.0].define(version: 2025_12_01_192510) do
|
|||
add_foreign_key "notifications", "users"
|
||||
add_foreign_key "place_visits", "places"
|
||||
add_foreign_key "place_visits", "visits"
|
||||
add_foreign_key "points", "points_raw_data_archives", column: "raw_data_archive_id", on_delete: :nullify
|
||||
add_foreign_key "points", "users"
|
||||
add_foreign_key "points", "visits"
|
||||
add_foreign_key "points_raw_data_archives", "users"
|
||||
add_foreign_key "stats", "users"
|
||||
add_foreign_key "taggings", "tags"
|
||||
add_foreign_key "tags", "users"
|
||||
|
|
|
|||
243
lib/tasks/points_raw_data.rake
Normal file
243
lib/tasks/points_raw_data.rake
Normal file
|
|
@ -0,0 +1,243 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
namespace :points do
|
||||
namespace :raw_data do
|
||||
desc 'Restore raw_data from archive to database for a specific month'
|
||||
task :restore, [:user_id, :year, :month] => :environment do |_t, args|
|
||||
validate_args!(args)
|
||||
|
||||
user_id = args[:user_id].to_i
|
||||
year = args[:year].to_i
|
||||
month = args[:month].to_i
|
||||
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ' Restoring raw_data to DATABASE'
|
||||
puts " User: #{user_id} | Month: #{year}-#{format('%02d', month)}"
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ''
|
||||
|
||||
restorer = Points::RawData::Restorer.new
|
||||
restorer.restore_to_database(user_id, year, month)
|
||||
|
||||
puts ''
|
||||
puts '✓ Restoration complete!'
|
||||
puts ''
|
||||
puts "Points in #{year}-#{month} now have raw_data in database."
|
||||
puts 'Run VACUUM ANALYZE points; to update statistics.'
|
||||
end
|
||||
|
||||
desc 'Restore raw_data to memory/cache temporarily (for data migrations)'
|
||||
task :restore_temporary, [:user_id, :year, :month] => :environment do |_t, args|
|
||||
validate_args!(args)
|
||||
|
||||
user_id = args[:user_id].to_i
|
||||
year = args[:year].to_i
|
||||
month = args[:month].to_i
|
||||
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ' Loading raw_data into CACHE (temporary)'
|
||||
puts " User: #{user_id} | Month: #{year}-#{format('%02d', month)}"
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ''
|
||||
puts 'Data will be available for 1 hour via Point.raw_data_with_archive accessor'
|
||||
puts ''
|
||||
|
||||
restorer = Points::RawData::Restorer.new
|
||||
restorer.restore_to_memory(user_id, year, month)
|
||||
|
||||
puts ''
|
||||
puts '✓ Cache loaded successfully!'
|
||||
puts ''
|
||||
puts 'You can now run your data migration.'
|
||||
puts 'Example:'
|
||||
puts " rails runner \"Point.where(user_id: #{user_id}, timestamp_year: #{year}, timestamp_month: #{month}).find_each { |p| p.fix_coordinates_from_raw_data }\""
|
||||
puts ''
|
||||
puts 'Cache will expire in 1 hour automatically.'
|
||||
end
|
||||
|
||||
desc 'Restore all archived raw_data for a user'
|
||||
task :restore_all, [:user_id] => :environment do |_t, args|
|
||||
raise 'Usage: rake points:raw_data:restore_all[user_id]' unless args[:user_id]
|
||||
|
||||
user_id = args[:user_id].to_i
|
||||
user = User.find(user_id)
|
||||
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ' Restoring ALL archives for user'
|
||||
puts " #{user.email} (ID: #{user_id})"
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ''
|
||||
|
||||
archives = Points::RawDataArchive.where(user_id: user_id)
|
||||
.select(:year, :month)
|
||||
.distinct
|
||||
.order(:year, :month)
|
||||
|
||||
puts "Found #{archives.count} months to restore"
|
||||
puts ''
|
||||
|
||||
archives.each_with_index do |archive, idx|
|
||||
puts "[#{idx + 1}/#{archives.count}] Restoring #{archive.year}-#{format('%02d', archive.month)}..."
|
||||
|
||||
restorer = Points::RawData::Restorer.new
|
||||
restorer.restore_to_database(user_id, archive.year, archive.month)
|
||||
end
|
||||
|
||||
puts ''
|
||||
puts "✓ All archives restored for user #{user_id}!"
|
||||
end
|
||||
|
||||
desc 'Show archive statistics'
|
||||
task status: :environment do
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ' Points raw_data Archive Statistics'
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ''
|
||||
|
||||
total_archives = Points::RawDataArchive.count
|
||||
total_points = Point.count
|
||||
archived_points = Point.where(raw_data_archived: true).count
|
||||
percentage = total_points.positive? ? (archived_points.to_f / total_points * 100).round(2) : 0
|
||||
|
||||
puts "Archives: #{total_archives}"
|
||||
puts "Points archived: #{archived_points} / #{total_points} (#{percentage}%)"
|
||||
puts ''
|
||||
|
||||
# Storage size via ActiveStorage
|
||||
total_blob_size = ActiveStorage::Blob
|
||||
.joins('INNER JOIN active_storage_attachments ON active_storage_attachments.blob_id = active_storage_blobs.id')
|
||||
.where("active_storage_attachments.record_type = 'Points::RawDataArchive'")
|
||||
.sum(:byte_size)
|
||||
|
||||
puts "Storage used: #{ActiveSupport::NumberHelper.number_to_human_size(total_blob_size)}"
|
||||
puts ''
|
||||
|
||||
# Recent activity
|
||||
recent = Points::RawDataArchive.where('archived_at > ?', 7.days.ago).count
|
||||
puts "Archives created last 7 days: #{recent}"
|
||||
puts ''
|
||||
|
||||
# Top users
|
||||
puts 'Top 10 users by archive count:'
|
||||
puts '─────────────────────────────────────────────────'
|
||||
|
||||
Points::RawDataArchive.group(:user_id)
|
||||
.select('user_id, COUNT(*) as archive_count, SUM(point_count) as total_points')
|
||||
.order('archive_count DESC')
|
||||
.limit(10)
|
||||
.each_with_index do |stat, idx|
|
||||
user = User.find(stat.user_id)
|
||||
puts "#{idx + 1}. #{user.email.ljust(30)} #{stat.archive_count.to_s.rjust(3)} archives, #{stat.total_points.to_s.rjust(8)} points"
|
||||
end
|
||||
|
||||
puts ''
|
||||
end
|
||||
|
||||
desc 'Verify archive integrity for a month'
|
||||
task :verify, [:user_id, :year, :month] => :environment do |_t, args|
|
||||
validate_args!(args)
|
||||
|
||||
user_id = args[:user_id].to_i
|
||||
year = args[:year].to_i
|
||||
month = args[:month].to_i
|
||||
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ' Verifying Archives'
|
||||
puts " User: #{user_id} | Month: #{year}-#{format('%02d', month)}"
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ''
|
||||
|
||||
archives = Points::RawDataArchive.for_month(user_id, year, month)
|
||||
|
||||
if archives.empty?
|
||||
puts 'No archives found.'
|
||||
exit
|
||||
end
|
||||
|
||||
all_ok = true
|
||||
|
||||
archives.each do |archive|
|
||||
print "Chunk #{archive.chunk_number}: "
|
||||
|
||||
# Check file attached
|
||||
unless archive.file.attached?
|
||||
puts '✗ ERROR - File not attached!'
|
||||
all_ok = false
|
||||
next
|
||||
end
|
||||
|
||||
# Download and count
|
||||
begin
|
||||
compressed = archive.file.blob.download
|
||||
io = StringIO.new(compressed)
|
||||
gz = Zlib::GzipReader.new(io)
|
||||
|
||||
actual_count = 0
|
||||
gz.each_line { actual_count += 1 }
|
||||
gz.close
|
||||
|
||||
if actual_count == archive.point_count
|
||||
puts "✓ OK (#{actual_count} points, #{archive.size_mb} MB)"
|
||||
else
|
||||
puts "✗ MISMATCH - Expected #{archive.point_count}, found #{actual_count}"
|
||||
all_ok = false
|
||||
end
|
||||
rescue StandardError => e
|
||||
puts "✗ ERROR - #{e.message}"
|
||||
all_ok = false
|
||||
end
|
||||
end
|
||||
|
||||
puts ''
|
||||
if all_ok
|
||||
puts '✓ All archives verified successfully!'
|
||||
else
|
||||
puts '✗ Some archives have issues. Please investigate.'
|
||||
end
|
||||
end
|
||||
|
||||
desc 'Run initial archival for old data (safe to re-run)'
|
||||
task initial_archive: :environment do
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ' Initial Archival (2+ months old data)'
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ''
|
||||
puts 'This will archive points.raw_data for months 2+ months old.'
|
||||
puts 'This is safe to run multiple times (idempotent).'
|
||||
puts ''
|
||||
print 'Continue? (y/N): '
|
||||
|
||||
response = $stdin.gets.chomp.downcase
|
||||
unless response == 'y'
|
||||
puts 'Cancelled.'
|
||||
exit
|
||||
end
|
||||
|
||||
puts ''
|
||||
stats = Points::RawData::Archiver.new.call
|
||||
|
||||
puts ''
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ' Archival Complete'
|
||||
puts '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━'
|
||||
puts ''
|
||||
puts "Months processed: #{stats[:processed]}"
|
||||
puts "Points archived: #{stats[:archived]}"
|
||||
puts "Failures: #{stats[:failed]}"
|
||||
puts ''
|
||||
|
||||
return unless stats[:archived].positive?
|
||||
|
||||
puts 'Next steps:'
|
||||
puts '1. Verify a sample: rake points:raw_data:verify[user_id,year,month]'
|
||||
puts '2. Check stats: rake points:raw_data:status'
|
||||
puts '3. (Optional) Reclaim space: VACUUM FULL points; (during maintenance)'
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
def validate_args!(args)
|
||||
return if args[:user_id] && args[:year] && args[:month]
|
||||
|
||||
raise 'Usage: rake points:raw_data:TASK[user_id,year,month]'
|
||||
end
|
||||
10
package.json
10
package.json
|
|
@ -15,13 +15,5 @@
|
|||
"devDependencies": {
|
||||
"@playwright/test": "^1.56.1",
|
||||
"@types/node": "^24.0.13"
|
||||
},
|
||||
"scripts": {
|
||||
"build": "esbuild app/javascript/*.* --bundle --splitting --format=esm --outdir=app/assets/builds",
|
||||
"analyze": "esbuild app/javascript/*.* --bundle --metafile=meta.json --analyze"
|
||||
},
|
||||
"sideEffects": [
|
||||
"*.css",
|
||||
"maplibre-gl/dist/maplibre-gl.css"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
|
|
|||
32
spec/factories/points_raw_data_archives.rb
Normal file
32
spec/factories/points_raw_data_archives.rb
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
FactoryBot.define do
|
||||
factory :points_raw_data_archive, class: 'Points::RawDataArchive' do
|
||||
user
|
||||
year { 2024 }
|
||||
month { 6 }
|
||||
chunk_number { 1 }
|
||||
point_count { 100 }
|
||||
point_ids_checksum { Digest::SHA256.hexdigest('1,2,3') }
|
||||
archived_at { Time.current }
|
||||
metadata { { format_version: 1, compression: 'gzip' } }
|
||||
|
||||
after(:build) do |archive|
|
||||
# Attach a test file
|
||||
archive.file.attach(
|
||||
io: StringIO.new(gzip_test_data),
|
||||
filename: archive.filename,
|
||||
content_type: 'application/gzip'
|
||||
)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
def gzip_test_data
|
||||
io = StringIO.new
|
||||
gz = Zlib::GzipWriter.new(io)
|
||||
gz.puts({ id: 1, raw_data: { lon: 13.4, lat: 52.5 } }.to_json)
|
||||
gz.puts({ id: 2, raw_data: { lon: 13.5, lat: 52.6 } }.to_json)
|
||||
gz.close
|
||||
io.string
|
||||
end
|
||||
File diff suppressed because one or more lines are too long
46
spec/jobs/points/raw_data/archive_job_spec.rb
Normal file
46
spec/jobs/points/raw_data/archive_job_spec.rb
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
require 'rails_helper'
|
||||
|
||||
RSpec.describe Points::RawData::ArchiveJob, type: :job do
|
||||
describe '#perform' do
|
||||
let(:archiver) { instance_double(Points::RawData::Archiver) }
|
||||
|
||||
before do
|
||||
# Enable archival for tests
|
||||
allow(ENV).to receive(:[]).and_call_original
|
||||
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
|
||||
|
||||
allow(Points::RawData::Archiver).to receive(:new).and_return(archiver)
|
||||
allow(archiver).to receive(:call).and_return({ processed: 5, archived: 100, failed: 0 })
|
||||
end
|
||||
|
||||
it 'calls the archiver service' do
|
||||
expect(archiver).to receive(:call)
|
||||
|
||||
described_class.perform_now
|
||||
end
|
||||
|
||||
context 'when archiver raises an error' do
|
||||
let(:error) { StandardError.new('Archive failed') }
|
||||
|
||||
before do
|
||||
allow(archiver).to receive(:call).and_raise(error)
|
||||
end
|
||||
|
||||
it 're-raises the error' do
|
||||
expect do
|
||||
described_class.perform_now
|
||||
end.to raise_error(StandardError, 'Archive failed')
|
||||
end
|
||||
|
||||
it 'reports the error before re-raising' do
|
||||
expect(ExceptionReporter).to receive(:call).with(error, 'Points raw data archival job failed')
|
||||
|
||||
expect do
|
||||
described_class.perform_now
|
||||
end.to raise_error(StandardError)
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
34
spec/jobs/points/raw_data/re_archive_month_job_spec.rb
Normal file
34
spec/jobs/points/raw_data/re_archive_month_job_spec.rb
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
require 'rails_helper'
|
||||
|
||||
RSpec.describe Points::RawData::ReArchiveMonthJob, type: :job do
|
||||
describe '#perform' do
|
||||
let(:archiver) { instance_double(Points::RawData::Archiver) }
|
||||
let(:user_id) { 123 }
|
||||
let(:year) { 2024 }
|
||||
let(:month) { 6 }
|
||||
|
||||
before do
|
||||
allow(Points::RawData::Archiver).to receive(:new).and_return(archiver)
|
||||
end
|
||||
|
||||
it 'calls archive_specific_month with correct parameters' do
|
||||
expect(archiver).to receive(:archive_specific_month).with(user_id, year, month)
|
||||
|
||||
described_class.perform_now(user_id, year, month)
|
||||
end
|
||||
|
||||
context 'when re-archival fails' do
|
||||
before do
|
||||
allow(archiver).to receive(:archive_specific_month).and_raise(StandardError, 'Re-archive failed')
|
||||
end
|
||||
|
||||
it 're-raises the error' do
|
||||
expect do
|
||||
described_class.perform_now(user_id, year, month)
|
||||
end.to raise_error(StandardError, 'Re-archive failed')
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
116
spec/models/concerns/archivable_spec.rb
Normal file
116
spec/models/concerns/archivable_spec.rb
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
require 'rails_helper'
|
||||
|
||||
RSpec.describe Archivable, type: :model do
|
||||
let(:user) { create(:user) }
|
||||
let(:point) { create(:point, user: user, raw_data: { lon: 13.4, lat: 52.5 }) }
|
||||
|
||||
describe 'associations and scopes' do
|
||||
it { expect(point).to belong_to(:raw_data_archive).optional }
|
||||
|
||||
describe 'scopes' do
|
||||
let!(:archived_point) { create(:point, user: user, raw_data_archived: true) }
|
||||
let!(:not_archived_point) { create(:point, user: user, raw_data_archived: false) }
|
||||
|
||||
it '.archived returns archived points' do
|
||||
expect(Point.archived).to include(archived_point)
|
||||
expect(Point.archived).not_to include(not_archived_point)
|
||||
end
|
||||
|
||||
it '.not_archived returns non-archived points' do
|
||||
expect(Point.not_archived).to include(not_archived_point)
|
||||
expect(Point.not_archived).not_to include(archived_point)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
describe '#raw_data_with_archive' do
|
||||
context 'when raw_data is present in database' do
|
||||
it 'returns raw_data from database' do
|
||||
expect(point.raw_data_with_archive).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
|
||||
end
|
||||
end
|
||||
|
||||
context 'when raw_data is archived' do
|
||||
let(:archive) { create(:points_raw_data_archive, user: user) }
|
||||
let(:archived_point) do
|
||||
create(:point, user: user, raw_data: nil, raw_data_archived: true, raw_data_archive: archive)
|
||||
end
|
||||
|
||||
before do
|
||||
# Mock archive file content with this specific point
|
||||
compressed_data = gzip_data([
|
||||
{ id: archived_point.id, raw_data: { lon: 14.0, lat: 53.0 } }
|
||||
])
|
||||
allow(archive.file.blob).to receive(:download).and_return(compressed_data)
|
||||
end
|
||||
|
||||
it 'fetches raw_data from archive' do
|
||||
result = archived_point.raw_data_with_archive
|
||||
expect(result).to eq({ 'id' => archived_point.id, 'raw_data' => { 'lon' => 14.0, 'lat' => 53.0 } }['raw_data'])
|
||||
end
|
||||
end
|
||||
|
||||
context 'when raw_data is archived but point not in archive' do
|
||||
let(:archive) { create(:points_raw_data_archive, user: user) }
|
||||
let(:archived_point) do
|
||||
create(:point, user: user, raw_data: nil, raw_data_archived: true, raw_data_archive: archive)
|
||||
end
|
||||
|
||||
before do
|
||||
# Mock archive file with different point
|
||||
compressed_data = gzip_data([
|
||||
{ id: 999, raw_data: { lon: 14.0, lat: 53.0 } }
|
||||
])
|
||||
allow(archive.file.blob).to receive(:download).and_return(compressed_data)
|
||||
end
|
||||
|
||||
it 'returns empty hash' do
|
||||
expect(archived_point.raw_data_with_archive).to eq({})
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
describe '#restore_raw_data!' do
|
||||
let(:archive) { create(:points_raw_data_archive, user: user) }
|
||||
let(:archived_point) do
|
||||
create(:point, user: user, raw_data: nil, raw_data_archived: true, raw_data_archive: archive)
|
||||
end
|
||||
|
||||
it 'restores raw_data to database and clears archive flags' do
|
||||
new_data = { lon: 15.0, lat: 54.0 }
|
||||
archived_point.restore_raw_data!(new_data)
|
||||
|
||||
archived_point.reload
|
||||
expect(archived_point.raw_data).to eq(new_data.stringify_keys)
|
||||
expect(archived_point.raw_data_archived).to be false
|
||||
expect(archived_point.raw_data_archive_id).to be_nil
|
||||
end
|
||||
end
|
||||
|
||||
describe 'temporary cache' do
|
||||
let(:june_point) { create(:point, user: user, timestamp: Time.new(2024, 6, 15).to_i) }
|
||||
|
||||
it 'checks temporary restore cache with correct key format' do
|
||||
cache_key = "raw_data:temp:#{user.id}:2024:6:#{june_point.id}"
|
||||
cached_data = { lon: 16.0, lat: 55.0 }
|
||||
|
||||
Rails.cache.write(cache_key, cached_data, expires_in: 1.hour)
|
||||
|
||||
# Access through send since check_temporary_restore_cache is private
|
||||
result = june_point.send(:check_temporary_restore_cache)
|
||||
expect(result).to eq(cached_data)
|
||||
end
|
||||
end
|
||||
|
||||
def gzip_data(points_array)
|
||||
io = StringIO.new
|
||||
gz = Zlib::GzipWriter.new(io)
|
||||
points_array.each do |point_data|
|
||||
gz.puts(point_data.to_json)
|
||||
end
|
||||
gz.close
|
||||
io.string
|
||||
end
|
||||
end
|
||||
86
spec/models/points/raw_data_archive_spec.rb
Normal file
86
spec/models/points/raw_data_archive_spec.rb
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
require 'rails_helper'
|
||||
|
||||
RSpec.describe Points::RawDataArchive, type: :model do
|
||||
let(:user) { create(:user) }
|
||||
subject(:archive) { build(:points_raw_data_archive, user: user) }
|
||||
|
||||
describe 'associations' do
|
||||
it { is_expected.to belong_to(:user) }
|
||||
it { is_expected.to have_many(:points).dependent(:nullify) }
|
||||
end
|
||||
|
||||
describe 'validations' do
|
||||
it { is_expected.to validate_presence_of(:year) }
|
||||
it { is_expected.to validate_presence_of(:month) }
|
||||
it { is_expected.to validate_presence_of(:chunk_number) }
|
||||
it { is_expected.to validate_presence_of(:point_count) }
|
||||
it { is_expected.to validate_presence_of(:point_ids_checksum) }
|
||||
|
||||
it { is_expected.to validate_numericality_of(:year).is_greater_than(1970).is_less_than(2100) }
|
||||
it { is_expected.to validate_numericality_of(:month).is_greater_than_or_equal_to(1).is_less_than_or_equal_to(12) }
|
||||
it { is_expected.to validate_numericality_of(:chunk_number).is_greater_than(0) }
|
||||
|
||||
end
|
||||
|
||||
describe 'scopes' do
|
||||
let!(:recent_archive) { create(:points_raw_data_archive, user: user, year: 2024, month: 5, archived_at: 1.day.ago) }
|
||||
let!(:old_archive) { create(:points_raw_data_archive, user: user, year: 2023, month: 5, archived_at: 2.years.ago) }
|
||||
|
||||
describe '.recent' do
|
||||
it 'returns archives from last 30 days' do
|
||||
expect(described_class.recent).to include(recent_archive)
|
||||
expect(described_class.recent).not_to include(old_archive)
|
||||
end
|
||||
end
|
||||
|
||||
describe '.old' do
|
||||
it 'returns archives older than 1 year' do
|
||||
expect(described_class.old).to include(old_archive)
|
||||
expect(described_class.old).not_to include(recent_archive)
|
||||
end
|
||||
end
|
||||
|
||||
describe '.for_month' do
|
||||
let!(:june_archive) { create(:points_raw_data_archive, user: user, year: 2024, month: 6, chunk_number: 1) }
|
||||
let!(:june_archive_2) { create(:points_raw_data_archive, user: user, year: 2024, month: 6, chunk_number: 2) }
|
||||
let!(:july_archive) { create(:points_raw_data_archive, user: user, year: 2024, month: 7, chunk_number: 1) }
|
||||
|
||||
it 'returns archives for specific month ordered by chunk number' do
|
||||
result = described_class.for_month(user.id, 2024, 6)
|
||||
expect(result.map(&:chunk_number)).to eq([1, 2])
|
||||
expect(result).to include(june_archive, june_archive_2)
|
||||
expect(result).not_to include(july_archive)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
describe '#month_display' do
|
||||
it 'returns formatted month and year' do
|
||||
archive = build(:points_raw_data_archive, year: 2024, month: 6)
|
||||
expect(archive.month_display).to eq('June 2024')
|
||||
end
|
||||
end
|
||||
|
||||
describe '#filename' do
|
||||
it 'generates correct filename with directory structure' do
|
||||
archive = build(:points_raw_data_archive, user_id: 123, year: 2024, month: 6, chunk_number: 5)
|
||||
expect(archive.filename).to eq('raw_data_archives/123/2024/06/005.jsonl.gz')
|
||||
end
|
||||
end
|
||||
|
||||
describe '#size_mb' do
|
||||
it 'returns 0 when no file attached' do
|
||||
archive = build(:points_raw_data_archive)
|
||||
expect(archive.size_mb).to eq(0)
|
||||
end
|
||||
|
||||
it 'returns size in MB when file is attached' do
|
||||
archive = create(:points_raw_data_archive, user: user)
|
||||
# Mock file with 2MB size
|
||||
allow(archive.file.blob).to receive(:byte_size).and_return(2 * 1024 * 1024)
|
||||
expect(archive.size_mb).to eq(2.0)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
|
@ -35,7 +35,11 @@ RSpec.describe PointSerializer do
|
|||
'course_accuracy' => point.course_accuracy,
|
||||
'external_track_id' => point.external_track_id,
|
||||
'track_id' => point.track_id,
|
||||
'country_name' => point.read_attribute(:country_name)
|
||||
'country_name' => point.read_attribute(:country_name),
|
||||
'raw_data_archived' => point.raw_data_archived,
|
||||
'raw_data_archive_id' => point.raw_data_archive_id,
|
||||
'timestamp_year' => point.timestamp_year,
|
||||
'timestamp_month' => point.timestamp_month
|
||||
}
|
||||
end
|
||||
|
||||
|
|
|
|||
202
spec/services/points/raw_data/archiver_spec.rb
Normal file
202
spec/services/points/raw_data/archiver_spec.rb
Normal file
|
|
@ -0,0 +1,202 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
require 'rails_helper'
|
||||
|
||||
RSpec.describe Points::RawData::Archiver do
|
||||
let(:user) { create(:user) }
|
||||
let(:archiver) { described_class.new }
|
||||
|
||||
before do
|
||||
allow(PointsChannel).to receive(:broadcast_to)
|
||||
end
|
||||
|
||||
describe '#call' do
|
||||
context 'when archival is disabled' do
|
||||
before do
|
||||
allow(ENV).to receive(:[]).and_call_original
|
||||
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('false')
|
||||
end
|
||||
|
||||
it 'returns early without processing' do
|
||||
result = archiver.call
|
||||
|
||||
expect(result).to eq({ processed: 0, archived: 0, failed: 0 })
|
||||
end
|
||||
end
|
||||
|
||||
context 'when archival is enabled' do
|
||||
before do
|
||||
allow(ENV).to receive(:[]).and_call_original
|
||||
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
|
||||
end
|
||||
|
||||
let!(:old_points) do
|
||||
# Create points 3 months ago (definitely older than 2 month lag)
|
||||
old_date = 3.months.ago.beginning_of_month
|
||||
create_list(:point, 5, user: user,
|
||||
timestamp: old_date.to_i,
|
||||
raw_data: { lon: 13.4, lat: 52.5 })
|
||||
end
|
||||
|
||||
it 'archives old points' do
|
||||
expect { archiver.call }.to change(Points::RawDataArchive, :count).by(1)
|
||||
end
|
||||
|
||||
it 'marks points as archived' do
|
||||
archiver.call
|
||||
|
||||
expect(Point.where(raw_data_archived: true).count).to eq(5)
|
||||
end
|
||||
|
||||
it 'nullifies raw_data column' do
|
||||
archiver.call
|
||||
Point.where(user: user).find_each do |point|
|
||||
expect(point.raw_data).to be_nil
|
||||
end
|
||||
end
|
||||
|
||||
it 'returns correct stats' do
|
||||
result = archiver.call
|
||||
|
||||
expect(result[:processed]).to eq(1)
|
||||
expect(result[:archived]).to eq(5)
|
||||
expect(result[:failed]).to eq(0)
|
||||
end
|
||||
end
|
||||
|
||||
context 'with points from multiple months' do
|
||||
before do
|
||||
allow(ENV).to receive(:[]).and_call_original
|
||||
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
|
||||
end
|
||||
|
||||
let!(:june_points) do
|
||||
june_date = 4.months.ago.beginning_of_month
|
||||
create_list(:point, 3, user: user,
|
||||
timestamp: june_date.to_i,
|
||||
raw_data: { lon: 13.4, lat: 52.5 })
|
||||
end
|
||||
|
||||
let!(:july_points) do
|
||||
july_date = 3.months.ago.beginning_of_month
|
||||
create_list(:point, 2, user: user,
|
||||
timestamp: july_date.to_i,
|
||||
raw_data: { lon: 14.0, lat: 53.0 })
|
||||
end
|
||||
|
||||
it 'creates separate archives for each month' do
|
||||
expect { archiver.call }.to change(Points::RawDataArchive, :count).by(2)
|
||||
end
|
||||
|
||||
it 'archives all points' do
|
||||
archiver.call
|
||||
expect(Point.where(raw_data_archived: true).count).to eq(5)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
describe '#archive_specific_month' do
|
||||
before do
|
||||
allow(ENV).to receive(:[]).and_call_original
|
||||
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
|
||||
end
|
||||
|
||||
let(:test_date) { 3.months.ago.beginning_of_month.utc }
|
||||
let!(:june_points) do
|
||||
create_list(:point, 3, user: user,
|
||||
timestamp: test_date.to_i,
|
||||
raw_data: { lon: 13.4, lat: 52.5 })
|
||||
end
|
||||
|
||||
it 'archives specific month' do
|
||||
expect do
|
||||
archiver.archive_specific_month(user.id, test_date.year, test_date.month)
|
||||
end.to change(Points::RawDataArchive, :count).by(1)
|
||||
end
|
||||
|
||||
it 'creates archive with correct metadata' do
|
||||
archiver.archive_specific_month(user.id, test_date.year, test_date.month)
|
||||
|
||||
archive = user.raw_data_archives.last
|
||||
|
||||
expect(archive.user_id).to eq(user.id)
|
||||
expect(archive.year).to eq(test_date.year)
|
||||
expect(archive.month).to eq(test_date.month)
|
||||
expect(archive.point_count).to eq(3)
|
||||
expect(archive.chunk_number).to eq(1)
|
||||
end
|
||||
|
||||
it 'attaches compressed file' do
|
||||
archiver.archive_specific_month(user.id, test_date.year, test_date.month)
|
||||
|
||||
archive = user.raw_data_archives.last
|
||||
expect(archive.file).to be_attached
|
||||
expect(archive.file.key).to match(%r{raw_data_archives/\d+/\d{4}/\d{2}/001\.jsonl\.gz})
|
||||
end
|
||||
end
|
||||
|
||||
describe 'append-only architecture' do
|
||||
before do
|
||||
allow(ENV).to receive(:[]).and_call_original
|
||||
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
|
||||
end
|
||||
|
||||
# Use UTC from the start to avoid timezone issues
|
||||
let(:test_date_utc) { 3.months.ago.utc.beginning_of_month }
|
||||
let!(:june_points_batch1) do
|
||||
create_list(:point, 2, user: user,
|
||||
timestamp: test_date_utc.to_i,
|
||||
raw_data: { lon: 13.4, lat: 52.5 })
|
||||
end
|
||||
|
||||
it 'creates additional chunks for same month' do
|
||||
# First archival
|
||||
archiver.archive_specific_month(user.id, test_date_utc.year, test_date_utc.month)
|
||||
expect(Points::RawDataArchive.for_month(user.id, test_date_utc.year, test_date_utc.month).count).to eq(1)
|
||||
expect(Points::RawDataArchive.last.chunk_number).to eq(1)
|
||||
|
||||
# Verify first batch is archived
|
||||
june_points_batch1.each(&:reload)
|
||||
expect(june_points_batch1.all?(&:raw_data_archived)).to be true
|
||||
|
||||
# Add more points for same month (retrospective import)
|
||||
# Use unique timestamps to avoid uniqueness validation errors
|
||||
mid_month = test_date_utc + 15.days
|
||||
june_points_batch2 = [
|
||||
create(:point, user: user, timestamp: mid_month.to_i, raw_data: { lon: 14.0, lat: 53.0 }),
|
||||
create(:point, user: user, timestamp: (mid_month + 1.hour).to_i, raw_data: { lon: 14.0, lat: 53.0 })
|
||||
]
|
||||
|
||||
# Verify second batch exists and is not archived
|
||||
expect(june_points_batch2.all? { |p| !p.raw_data_archived }).to be true
|
||||
|
||||
# Second archival should create chunk 2
|
||||
archiver.archive_specific_month(user.id, test_date_utc.year, test_date_utc.month)
|
||||
expect(Points::RawDataArchive.for_month(user.id, test_date_utc.year, test_date_utc.month).count).to eq(2)
|
||||
expect(Points::RawDataArchive.last.chunk_number).to eq(2)
|
||||
end
|
||||
end
|
||||
|
||||
describe 'advisory locking' do
|
||||
before do
|
||||
allow(ENV).to receive(:[]).and_call_original
|
||||
allow(ENV).to receive(:[]).with('ARCHIVE_RAW_DATA').and_return('true')
|
||||
end
|
||||
|
||||
let!(:june_points) do
|
||||
old_date = 3.months.ago.beginning_of_month
|
||||
create_list(:point, 2, user: user,
|
||||
timestamp: old_date.to_i,
|
||||
raw_data: { lon: 13.4, lat: 52.5 })
|
||||
end
|
||||
|
||||
it 'prevents duplicate processing with advisory locks' do
|
||||
# Simulate lock couldn't be acquired (returns nil/false)
|
||||
allow(ActiveRecord::Base).to receive(:with_advisory_lock).and_return(false)
|
||||
|
||||
result = archiver.call
|
||||
expect(result[:processed]).to eq(0)
|
||||
expect(result[:failed]).to eq(0)
|
||||
end
|
||||
end
|
||||
end
|
||||
94
spec/services/points/raw_data/chunk_compressor_spec.rb
Normal file
94
spec/services/points/raw_data/chunk_compressor_spec.rb
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
require 'rails_helper'
|
||||
|
||||
RSpec.describe Points::RawData::ChunkCompressor do
|
||||
let(:user) { create(:user) }
|
||||
|
||||
before do
|
||||
# Stub broadcasting to avoid ActionCable issues in tests
|
||||
allow(PointsChannel).to receive(:broadcast_to)
|
||||
end
|
||||
let(:points) do
|
||||
[
|
||||
create(:point, user: user, raw_data: { lon: 13.4, lat: 52.5 }),
|
||||
create(:point, user: user, raw_data: { lon: 13.5, lat: 52.6 }),
|
||||
create(:point, user: user, raw_data: { lon: 13.6, lat: 52.7 })
|
||||
]
|
||||
end
|
||||
let(:compressor) { described_class.new(Point.where(id: points.map(&:id))) }
|
||||
|
||||
describe '#compress' do
|
||||
it 'returns compressed gzip data' do
|
||||
result = compressor.compress
|
||||
expect(result).to be_a(String)
|
||||
expect(result.encoding.name).to eq('ASCII-8BIT')
|
||||
end
|
||||
|
||||
it 'compresses points as JSONL format' do
|
||||
compressed = compressor.compress
|
||||
|
||||
# Decompress and verify format
|
||||
io = StringIO.new(compressed)
|
||||
gz = Zlib::GzipReader.new(io)
|
||||
lines = gz.readlines
|
||||
gz.close
|
||||
|
||||
expect(lines.count).to eq(3)
|
||||
|
||||
# Each line should be valid JSON
|
||||
lines.each_with_index do |line, index|
|
||||
data = JSON.parse(line)
|
||||
expect(data).to have_key('id')
|
||||
expect(data).to have_key('raw_data')
|
||||
expect(data['id']).to eq(points[index].id)
|
||||
end
|
||||
end
|
||||
|
||||
it 'includes point ID and raw_data in each line' do
|
||||
compressed = compressor.compress
|
||||
|
||||
io = StringIO.new(compressed)
|
||||
gz = Zlib::GzipReader.new(io)
|
||||
first_line = gz.readline
|
||||
gz.close
|
||||
|
||||
data = JSON.parse(first_line)
|
||||
expect(data['id']).to eq(points.first.id)
|
||||
expect(data['raw_data']).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
|
||||
end
|
||||
|
||||
it 'processes points in batches' do
|
||||
# Create many points to test batch processing with unique timestamps
|
||||
many_points = []
|
||||
base_time = Time.new(2024, 6, 15).to_i
|
||||
2500.times do |i|
|
||||
many_points << create(:point, user: user, timestamp: base_time + i, raw_data: { lon: 13.4, lat: 52.5 })
|
||||
end
|
||||
large_compressor = described_class.new(Point.where(id: many_points.map(&:id)))
|
||||
|
||||
compressed = large_compressor.compress
|
||||
|
||||
io = StringIO.new(compressed)
|
||||
gz = Zlib::GzipReader.new(io)
|
||||
line_count = 0
|
||||
gz.each_line { line_count += 1 }
|
||||
gz.close
|
||||
|
||||
expect(line_count).to eq(2500)
|
||||
end
|
||||
|
||||
it 'produces smaller compressed output than uncompressed' do
|
||||
compressed = compressor.compress
|
||||
|
||||
# Decompress to get original size
|
||||
io = StringIO.new(compressed)
|
||||
gz = Zlib::GzipReader.new(io)
|
||||
decompressed = gz.read
|
||||
gz.close
|
||||
|
||||
# Compressed should be smaller
|
||||
expect(compressed.bytesize).to be < decompressed.bytesize
|
||||
end
|
||||
end
|
||||
end
|
||||
228
spec/services/points/raw_data/restorer_spec.rb
Normal file
228
spec/services/points/raw_data/restorer_spec.rb
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
# frozen_string_literal: true
|
||||
|
||||
require 'rails_helper'
|
||||
|
||||
RSpec.describe Points::RawData::Restorer do
|
||||
let(:user) { create(:user) }
|
||||
let(:restorer) { described_class.new }
|
||||
|
||||
before do
|
||||
# Stub broadcasting to avoid ActionCable issues in tests
|
||||
allow(PointsChannel).to receive(:broadcast_to)
|
||||
end
|
||||
|
||||
describe '#restore_to_database' do
|
||||
let!(:archived_points) do
|
||||
create_list(:point, 3, user: user, timestamp: Time.new(2024, 6, 15).to_i,
|
||||
raw_data: nil, raw_data_archived: true)
|
||||
end
|
||||
|
||||
let(:archive) do
|
||||
# Create archive with actual point data
|
||||
compressed_data = gzip_points_data(archived_points.map do |p|
|
||||
{ id: p.id, raw_data: { lon: 13.4, lat: 52.5 } }
|
||||
end)
|
||||
|
||||
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6)
|
||||
arc.file.attach(
|
||||
io: StringIO.new(compressed_data),
|
||||
filename: arc.filename,
|
||||
content_type: 'application/gzip'
|
||||
)
|
||||
arc.save!
|
||||
|
||||
# Associate points with archive
|
||||
archived_points.each { |p| p.update!(raw_data_archive: arc) }
|
||||
|
||||
arc
|
||||
end
|
||||
|
||||
it 'restores raw_data to database' do
|
||||
archive # Ensure archive is created before restore
|
||||
restorer.restore_to_database(user.id, 2024, 6)
|
||||
|
||||
archived_points.each(&:reload)
|
||||
archived_points.each do |point|
|
||||
expect(point.raw_data).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
|
||||
end
|
||||
end
|
||||
|
||||
it 'clears archive flags' do
|
||||
archive # Ensure archive is created before restore
|
||||
restorer.restore_to_database(user.id, 2024, 6)
|
||||
|
||||
archived_points.each(&:reload)
|
||||
archived_points.each do |point|
|
||||
expect(point.raw_data_archived).to be false
|
||||
expect(point.raw_data_archive_id).to be_nil
|
||||
end
|
||||
end
|
||||
|
||||
it 'raises error when no archives found' do
|
||||
expect do
|
||||
restorer.restore_to_database(user.id, 2025, 12)
|
||||
end.to raise_error(/No archives found/)
|
||||
end
|
||||
|
||||
context 'with multiple chunks' do
|
||||
let!(:more_points) do
|
||||
create_list(:point, 2, user: user, timestamp: Time.new(2024, 6, 20).to_i,
|
||||
raw_data: nil, raw_data_archived: true)
|
||||
end
|
||||
|
||||
let!(:archive2) do
|
||||
compressed_data = gzip_points_data(more_points.map do |p|
|
||||
{ id: p.id, raw_data: { lon: 14.0, lat: 53.0 } }
|
||||
end)
|
||||
|
||||
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6, chunk_number: 2)
|
||||
arc.file.attach(
|
||||
io: StringIO.new(compressed_data),
|
||||
filename: arc.filename,
|
||||
content_type: 'application/gzip'
|
||||
)
|
||||
arc.save!
|
||||
|
||||
more_points.each { |p| p.update!(raw_data_archive: arc) }
|
||||
|
||||
arc
|
||||
end
|
||||
|
||||
it 'restores from all chunks' do
|
||||
archive # Ensure first archive is created
|
||||
archive2 # Ensure second archive is created
|
||||
restorer.restore_to_database(user.id, 2024, 6)
|
||||
|
||||
(archived_points + more_points).each(&:reload)
|
||||
expect(archived_points.first.raw_data).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
|
||||
expect(more_points.first.raw_data).to eq({ 'lon' => 14.0, 'lat' => 53.0 })
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
describe '#restore_to_memory' do
|
||||
let!(:archived_points) do
|
||||
create_list(:point, 2, user: user, timestamp: Time.new(2024, 6, 15).to_i,
|
||||
raw_data: nil, raw_data_archived: true)
|
||||
end
|
||||
|
||||
let(:archive) do
|
||||
compressed_data = gzip_points_data(archived_points.map do |p|
|
||||
{ id: p.id, raw_data: { lon: 13.4, lat: 52.5 } }
|
||||
end)
|
||||
|
||||
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6)
|
||||
arc.file.attach(
|
||||
io: StringIO.new(compressed_data),
|
||||
filename: arc.filename,
|
||||
content_type: 'application/gzip'
|
||||
)
|
||||
arc.save!
|
||||
|
||||
archived_points.each { |p| p.update!(raw_data_archive: arc) }
|
||||
|
||||
arc
|
||||
end
|
||||
|
||||
it 'loads data into cache' do
|
||||
archive # Ensure archive is created before restore
|
||||
restorer.restore_to_memory(user.id, 2024, 6)
|
||||
|
||||
archived_points.each do |point|
|
||||
cache_key = "raw_data:temp:#{user.id}:2024:6:#{point.id}"
|
||||
cached_value = Rails.cache.read(cache_key)
|
||||
expect(cached_value).to eq({ 'lon' => 13.4, 'lat' => 52.5 })
|
||||
end
|
||||
end
|
||||
|
||||
it 'does not modify database' do
|
||||
archive # Ensure archive is created before restore
|
||||
restorer.restore_to_memory(user.id, 2024, 6)
|
||||
|
||||
archived_points.each(&:reload)
|
||||
archived_points.each do |point|
|
||||
expect(point.raw_data).to be_nil
|
||||
expect(point.raw_data_archived).to be true
|
||||
end
|
||||
end
|
||||
|
||||
it 'sets cache expiration to 1 hour' do
|
||||
archive # Ensure archive is created before restore
|
||||
restorer.restore_to_memory(user.id, 2024, 6)
|
||||
|
||||
cache_key = "raw_data:temp:#{user.id}:2024:6:#{archived_points.first.id}"
|
||||
|
||||
# Cache should exist now
|
||||
expect(Rails.cache.exist?(cache_key)).to be true
|
||||
end
|
||||
end
|
||||
|
||||
describe '#restore_all_for_user' do
|
||||
let!(:june_points) do
|
||||
create_list(:point, 2, user: user, timestamp: Time.new(2024, 6, 15).to_i,
|
||||
raw_data: nil, raw_data_archived: true)
|
||||
end
|
||||
|
||||
let!(:july_points) do
|
||||
create_list(:point, 2, user: user, timestamp: Time.new(2024, 7, 15).to_i,
|
||||
raw_data: nil, raw_data_archived: true)
|
||||
end
|
||||
|
||||
let!(:june_archive) do
|
||||
compressed_data = gzip_points_data(june_points.map { |p| { id: p.id, raw_data: { month: 'june' } } })
|
||||
|
||||
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 6)
|
||||
arc.file.attach(
|
||||
io: StringIO.new(compressed_data),
|
||||
filename: arc.filename,
|
||||
content_type: 'application/gzip'
|
||||
)
|
||||
arc.save!
|
||||
|
||||
june_points.each { |p| p.update!(raw_data_archive: arc) }
|
||||
arc
|
||||
end
|
||||
|
||||
let!(:july_archive) do
|
||||
compressed_data = gzip_points_data(july_points.map { |p| { id: p.id, raw_data: { month: 'july' } } })
|
||||
|
||||
arc = build(:points_raw_data_archive, user: user, year: 2024, month: 7)
|
||||
arc.file.attach(
|
||||
io: StringIO.new(compressed_data),
|
||||
filename: arc.filename,
|
||||
content_type: 'application/gzip'
|
||||
)
|
||||
arc.save!
|
||||
|
||||
july_points.each { |p| p.update!(raw_data_archive: arc) }
|
||||
arc
|
||||
end
|
||||
|
||||
it 'restores all months for user' do
|
||||
restorer.restore_all_for_user(user.id)
|
||||
|
||||
june_points.each(&:reload)
|
||||
july_points.each(&:reload)
|
||||
|
||||
expect(june_points.first.raw_data).to eq({ 'month' => 'june' })
|
||||
expect(july_points.first.raw_data).to eq({ 'month' => 'july' })
|
||||
end
|
||||
|
||||
it 'clears all archive flags' do
|
||||
restorer.restore_all_for_user(user.id)
|
||||
|
||||
(june_points + july_points).each(&:reload)
|
||||
expect(Point.where(user: user, raw_data_archived: true).count).to eq(0)
|
||||
end
|
||||
end
|
||||
|
||||
def gzip_points_data(points_array)
|
||||
io = StringIO.new
|
||||
gz = Zlib::GzipWriter.new(io)
|
||||
points_array.each do |point_data|
|
||||
gz.puts(point_data.to_json)
|
||||
end
|
||||
gz.close
|
||||
io.string
|
||||
end
|
||||
end
|
||||
Loading…
Reference in a new issue