dawarich/app/services/points/raw_data/verifier.rb
Evgenii Burmakin 6ed6a4fd89
0.37.1 (#2092)
* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

* Pull only necessary data for map v2 points

* Feature/raw data archive (#2009)

* 0.36.2 (#2007)

* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Remove esbuild scripts from package.json

* Remove sideEffects field from package.json

* Raw data archivation

* Add tests

* Fix tests

* Fix tests

* Update ExceptionReporter

* Add schedule to run raw data archival job monthly

* Change file structure for raw data archival feature

* Update changelog and version for raw data archival feature

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Set raw_data to an empty hash instead of nil when archiving

* Fix storage configuration and file extraction

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation (#2018)

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation

* Remove raw data from visited cities api endpoint

* Use user timezone to show dates on maps (#2020)

* Fix/pre epoch time (#2019)

* Use user timezone to show dates on maps

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Fix tests failing due to new index on stats table

* Fix failing specs

* Update redis client configuration to support unix socket connection

* Update changelog

* Fix kml kmz import issues (#2023)

* Fix kml kmz import issues

* Refactor KML importer to improve readability and maintainability

* Implement moving points in map v2 and fix route rendering logic to ma… (#2027)

* Implement moving points in map v2 and fix route rendering logic to match map v1.

* Fix route spec

* fix(maplibre): update date format to ISO 8601 (#2029)

* Add verification step to raw data archival process (#2028)

* Add verification step to raw data archival process

* Add actual verification of raw data archives after creation, and only clear raw_data for verified archives.

* Fix failing specs

* Eliminate zip-bomb risk

* Fix potential memory leak in js

* Return .keep files

* Use Toast instead of alert for notifications

* Add help section to navbar dropdown

* Update changelog

* Remove raw_data_archival_job

* Ensure file is being closed properly after reading in Archivable concern

* Add composite index to stats table if not exists

* Update changelog

* Update entrypoint to always sync static assets (not only new ones)

* Add family layer to MapLibre maps (#2055)

* Add family layer to MapLibre maps

* Update migration

* Don't show family toggle if feature is disabled

* Update changelog

* Return changelog

* Update changelog

* Update tailwind file

* Bump sentry-rails from 6.0.0 to 6.1.0 (#1945)

Bumps [sentry-rails](https://github.com/getsentry/sentry-ruby) from 6.0.0 to 6.1.0.
- [Release notes](https://github.com/getsentry/sentry-ruby/releases)
- [Changelog](https://github.com/getsentry/sentry-ruby/blob/master/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-ruby/compare/6.0.0...6.1.0)

---
updated-dependencies:
- dependency-name: sentry-rails
  dependency-version: 6.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump turbo-rails from 2.0.17 to 2.0.20 (#1944)

Bumps [turbo-rails](https://github.com/hotwired/turbo-rails) from 2.0.17 to 2.0.20.
- [Release notes](https://github.com/hotwired/turbo-rails/releases)
- [Commits](https://github.com/hotwired/turbo-rails/compare/v2.0.17...v2.0.20)

---
updated-dependencies:
- dependency-name: turbo-rails
  dependency-version: 2.0.20
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Evgenii Burmakin <Freika@users.noreply.github.com>

* Bump webmock from 3.25.1 to 3.26.1 (#1943)

Bumps [webmock](https://github.com/bblimke/webmock) from 3.25.1 to 3.26.1.
- [Release notes](https://github.com/bblimke/webmock/releases)
- [Changelog](https://github.com/bblimke/webmock/blob/master/CHANGELOG.md)
- [Commits](https://github.com/bblimke/webmock/compare/v3.25.1...v3.26.1)

---
updated-dependencies:
- dependency-name: webmock
  dependency-version: 3.26.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Evgenii Burmakin <Freika@users.noreply.github.com>

* Bump brakeman from 7.1.0 to 7.1.1 (#1942)

Bumps [brakeman](https://github.com/presidentbeef/brakeman) from 7.1.0 to 7.1.1.
- [Release notes](https://github.com/presidentbeef/brakeman/releases)
- [Changelog](https://github.com/presidentbeef/brakeman/blob/main/CHANGES.md)
- [Commits](https://github.com/presidentbeef/brakeman/compare/v7.1.0...v7.1.1)

---
updated-dependencies:
- dependency-name: brakeman
  dependency-version: 7.1.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump redis from 5.4.0 to 5.4.1 (#1941)

Bumps [redis](https://github.com/redis/redis-rb) from 5.4.0 to 5.4.1.
- [Changelog](https://github.com/redis/redis-rb/blob/master/CHANGELOG.md)
- [Commits](https://github.com/redis/redis-rb/compare/v5.4.0...v5.4.1)

---
updated-dependencies:
- dependency-name: redis
  dependency-version: 5.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Put import deletion into background job (#2045)

* Put import deletion into background job

* Update changelog

* fix null type error and update heatmap styling (#2037)

* fix: use constant weight for maplibre heatmap layer

* fix null type, update heatmap styling

* improve heatmap styling

* fix typo

* Fix stats calculation to recursively reduce H3 resolution when too ma… (#2065)

* Fix stats calculation to recursively reduce H3 resolution when too many hexagons are generated

* Update CHANGELOG.md

* Validate trip start and end dates (#2066)

* Validate trip start and end dates

* Update changelog

* Update migration to clean up duplicate stats before adding unique index

* Fix fog of war radius setting being ignored and applying settings causing errors (#2068)

* Update changelog

* Add Rack::Deflater middleware to config/application.rb to enable gzip compression for responses.

* Add composite index to points on user_id and timestamp

* Deduplicte points based on timestamp brought to unix time

* Fix/stats cache invalidation (#2072)

* Fix family layer toggle in Map v2 settings for non-selfhosted env

* Invalidate cache

* Remove comments

* Remove comment

* Add new indicies to improve performance and remove unused ones to opt… (#2078)

* Add new indicies to improve performance and remove unused ones to optimize database.

* Remove comments

* Update map search suggestions panel styling

* Add yearly digest (#2073)

* Add yearly digest

* Rename YearlyDigests to Users::Digests

* Minor changes

* Update yearly digest layout and styles

* Add flags and chart to email

* Update colors

* Fix layout of stats in yearly digest view

* Remove cron job for yearly digest scheduling

* Update CHANGELOG.md

* Update digest email setting handling

* Allow sharing digest for 1 week or 1 month

* Change Digests Distance to Bigint

* Fix settings page

* Update changelog

* Add RailsPulse (#2079)

* Add RailsPulse

* Add RailsPulse monitoring tool with basic HTTP authentication

* Bring points_count to integer

* Update migration and version

* Update rubocop issues

* Fix migrations and data verification to remove safety_assured blocks and handle missing points gracefully.

* Update version

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Robin Tuszik <mail@robin.gg>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-30 19:06:10 +01:00

199 lines
6.6 KiB
Ruby

# frozen_string_literal: true
module Points
module RawData
class Verifier
def initialize
@stats = { verified: 0, failed: 0 }
end
def call
Rails.logger.info('Starting raw_data archive verification...')
unverified_archives.find_each do |archive|
verify_archive(archive)
end
Rails.logger.info("Verification complete: #{@stats}")
@stats
end
def verify_specific_archive(archive_id)
archive = Points::RawDataArchive.find(archive_id)
verify_archive(archive)
end
def verify_month(user_id, year, month)
archives = Points::RawDataArchive.for_month(user_id, year, month)
.where(verified_at: nil)
Rails.logger.info("Verifying #{archives.count} archives for #{year}-#{format('%02d', month)}...")
archives.each { |archive| verify_archive(archive) }
end
private
def unverified_archives
Points::RawDataArchive.where(verified_at: nil)
end
def verify_archive(archive)
Rails.logger.info("Verifying archive #{archive.id} (#{archive.month_display}, chunk #{archive.chunk_number})...")
verification_result = perform_verification(archive)
if verification_result[:success]
archive.update!(verified_at: Time.current)
@stats[:verified] += 1
Rails.logger.info("✓ Archive #{archive.id} verified successfully")
else
@stats[:failed] += 1
Rails.logger.error("✗ Archive #{archive.id} verification failed: #{verification_result[:error]}")
ExceptionReporter.call(
StandardError.new(verification_result[:error]),
"Archive verification failed for archive #{archive.id}"
)
end
rescue StandardError => e
@stats[:failed] += 1
ExceptionReporter.call(e, "Failed to verify archive #{archive.id}")
Rails.logger.error("✗ Archive #{archive.id} verification error: #{e.message}")
end
def perform_verification(archive)
# 1. Verify file exists and is attached
unless archive.file.attached?
return { success: false, error: 'File not attached' }
end
# 2. Verify file can be downloaded
begin
compressed_content = archive.file.blob.download
rescue StandardError => e
return { success: false, error: "File download failed: #{e.message}" }
end
# 3. Verify file size is reasonable
if compressed_content.bytesize.zero?
return { success: false, error: 'File is empty' }
end
# 4. Verify MD5 checksum (if blob has checksum)
if archive.file.blob.checksum.present?
calculated_checksum = Digest::MD5.base64digest(compressed_content)
if calculated_checksum != archive.file.blob.checksum
return { success: false, error: 'MD5 checksum mismatch' }
end
end
# 5. Verify file can be decompressed and is valid JSONL, extract data
begin
archived_data = decompress_and_extract_data(compressed_content)
rescue StandardError => e
return { success: false, error: "Decompression/parsing failed: #{e.message}" }
end
point_ids = archived_data.keys
# 6. Verify point count matches
if point_ids.count != archive.point_count
return {
success: false,
error: "Point count mismatch: expected #{archive.point_count}, found #{point_ids.count}"
}
end
# 7. Verify point IDs checksum matches
calculated_checksum = calculate_checksum(point_ids)
if calculated_checksum != archive.point_ids_checksum
return { success: false, error: 'Point IDs checksum mismatch' }
end
# 8. Check which points still exist in database (informational only)
existing_count = Point.where(id: point_ids).count
if existing_count != point_ids.count
Rails.logger.info(
"Archive #{archive.id}: #{point_ids.count - existing_count} points no longer in database " \
"(#{existing_count}/#{point_ids.count} remaining). This is OK if user deleted their data."
)
end
# 9. Verify archived raw_data matches current database raw_data (only for existing points)
if existing_count.positive?
verification_result = verify_raw_data_matches(archived_data)
return verification_result unless verification_result[:success]
else
Rails.logger.info(
"Archive #{archive.id}: Skipping raw_data verification - no points remain in database"
)
end
{ success: true }
end
def decompress_and_extract_data(compressed_content)
io = StringIO.new(compressed_content)
gz = Zlib::GzipReader.new(io)
archived_data = {}
gz.each_line do |line|
data = JSON.parse(line)
archived_data[data['id']] = data['raw_data']
end
gz.close
archived_data
end
def verify_raw_data_matches(archived_data)
# For small archives, verify all points. For large archives, sample up to 100 points.
# Always verify all if 100 or fewer points for maximum accuracy
if archived_data.size <= 100
point_ids_to_check = archived_data.keys
else
point_ids_to_check = archived_data.keys.sample(100)
end
# Filter to only check points that still exist in the database
existing_point_ids = Point.where(id: point_ids_to_check).pluck(:id)
if existing_point_ids.empty?
# No points remain to verify, but that's OK
Rails.logger.info("No points remaining to verify raw_data matches")
return { success: true }
end
mismatches = []
Point.where(id: existing_point_ids).find_each do |point|
archived_raw_data = archived_data[point.id]
current_raw_data = point.raw_data
# Compare the raw_data (both should be hashes)
if archived_raw_data != current_raw_data
mismatches << {
point_id: point.id,
archived: archived_raw_data,
current: current_raw_data
}
end
end
if mismatches.any?
return {
success: false,
error: "Raw data mismatch detected in #{mismatches.count} point(s). " \
"First mismatch: Point #{mismatches.first[:point_id]}"
}
end
{ success: true }
end
def calculate_checksum(point_ids)
Digest::SHA256.hexdigest(point_ids.sort.join(','))
end
end
end
end