dawarich/app/services/users/export_data.rb
Evgenii Burmakin 8d2ade1bdc
0.37.0 (#2067)
* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

* Pull only necessary data for map v2 points

* Feature/raw data archive (#2009)

* 0.36.2 (#2007)

* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Remove esbuild scripts from package.json

* Remove sideEffects field from package.json

* Raw data archivation

* Add tests

* Fix tests

* Fix tests

* Update ExceptionReporter

* Add schedule to run raw data archival job monthly

* Change file structure for raw data archival feature

* Update changelog and version for raw data archival feature

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Set raw_data to an empty hash instead of nil when archiving

* Fix storage configuration and file extraction

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation (#2018)

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation

* Remove raw data from visited cities api endpoint

* Use user timezone to show dates on maps (#2020)

* Fix/pre epoch time (#2019)

* Use user timezone to show dates on maps

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Fix tests failing due to new index on stats table

* Fix failing specs

* Update redis client configuration to support unix socket connection

* Update changelog

* Fix kml kmz import issues (#2023)

* Fix kml kmz import issues

* Refactor KML importer to improve readability and maintainability

* Implement moving points in map v2 and fix route rendering logic to ma… (#2027)

* Implement moving points in map v2 and fix route rendering logic to match map v1.

* Fix route spec

* fix(maplibre): update date format to ISO 8601 (#2029)

* Add verification step to raw data archival process (#2028)

* Add verification step to raw data archival process

* Add actual verification of raw data archives after creation, and only clear raw_data for verified archives.

* Fix failing specs

* Eliminate zip-bomb risk

* Fix potential memory leak in js

* Return .keep files

* Use Toast instead of alert for notifications

* Add help section to navbar dropdown

* Update changelog

* Remove raw_data_archival_job

* Ensure file is being closed properly after reading in Archivable concern

* Add composite index to stats table if not exists

* Update changelog

* Update entrypoint to always sync static assets (not only new ones)

* Add family layer to MapLibre maps (#2055)

* Add family layer to MapLibre maps

* Update migration

* Don't show family toggle if feature is disabled

* Update changelog

* Return changelog

* Update changelog

* Update tailwind file

* Bump sentry-rails from 6.0.0 to 6.1.0 (#1945)

Bumps [sentry-rails](https://github.com/getsentry/sentry-ruby) from 6.0.0 to 6.1.0.
- [Release notes](https://github.com/getsentry/sentry-ruby/releases)
- [Changelog](https://github.com/getsentry/sentry-ruby/blob/master/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-ruby/compare/6.0.0...6.1.0)

---
updated-dependencies:
- dependency-name: sentry-rails
  dependency-version: 6.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump turbo-rails from 2.0.17 to 2.0.20 (#1944)

Bumps [turbo-rails](https://github.com/hotwired/turbo-rails) from 2.0.17 to 2.0.20.
- [Release notes](https://github.com/hotwired/turbo-rails/releases)
- [Commits](https://github.com/hotwired/turbo-rails/compare/v2.0.17...v2.0.20)

---
updated-dependencies:
- dependency-name: turbo-rails
  dependency-version: 2.0.20
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Evgenii Burmakin <Freika@users.noreply.github.com>

* Bump webmock from 3.25.1 to 3.26.1 (#1943)

Bumps [webmock](https://github.com/bblimke/webmock) from 3.25.1 to 3.26.1.
- [Release notes](https://github.com/bblimke/webmock/releases)
- [Changelog](https://github.com/bblimke/webmock/blob/master/CHANGELOG.md)
- [Commits](https://github.com/bblimke/webmock/compare/v3.25.1...v3.26.1)

---
updated-dependencies:
- dependency-name: webmock
  dependency-version: 3.26.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Evgenii Burmakin <Freika@users.noreply.github.com>

* Bump brakeman from 7.1.0 to 7.1.1 (#1942)

Bumps [brakeman](https://github.com/presidentbeef/brakeman) from 7.1.0 to 7.1.1.
- [Release notes](https://github.com/presidentbeef/brakeman/releases)
- [Changelog](https://github.com/presidentbeef/brakeman/blob/main/CHANGES.md)
- [Commits](https://github.com/presidentbeef/brakeman/compare/v7.1.0...v7.1.1)

---
updated-dependencies:
- dependency-name: brakeman
  dependency-version: 7.1.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump redis from 5.4.0 to 5.4.1 (#1941)

Bumps [redis](https://github.com/redis/redis-rb) from 5.4.0 to 5.4.1.
- [Changelog](https://github.com/redis/redis-rb/blob/master/CHANGELOG.md)
- [Commits](https://github.com/redis/redis-rb/compare/v5.4.0...v5.4.1)

---
updated-dependencies:
- dependency-name: redis
  dependency-version: 5.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Put import deletion into background job (#2045)

* Put import deletion into background job

* Update changelog

* fix null type error and update heatmap styling (#2037)

* fix: use constant weight for maplibre heatmap layer

* fix null type, update heatmap styling

* improve heatmap styling

* fix typo

* Fix stats calculation to recursively reduce H3 resolution when too ma… (#2065)

* Fix stats calculation to recursively reduce H3 resolution when too many hexagons are generated

* Update CHANGELOG.md

* Validate trip start and end dates (#2066)

* Validate trip start and end dates

* Update changelog

* Update migration to clean up duplicate stats before adding unique index

* Fix fog of war radius setting being ignored and applying settings causing errors (#2068)

* Update changelog

* Add Rack::Deflater middleware to config/application.rb to enable gzip compression for responses.

* Add composite index to points on user_id and timestamp

* Deduplicte points based on timestamp brought to unix time

* Fix/stats cache invalidation (#2072)

* Fix family layer toggle in Map v2 settings for non-selfhosted env

* Invalidate cache

* Remove comments

* Remove comment

* Add new indicies to improve performance and remove unused ones to opt… (#2078)

* Add new indicies to improve performance and remove unused ones to optimize database.

* Remove comments

* Update map search suggestions panel styling

* Add yearly digest (#2073)

* Add yearly digest

* Rename YearlyDigests to Users::Digests

* Minor changes

* Update yearly digest layout and styles

* Add flags and chart to email

* Update colors

* Fix layout of stats in yearly digest view

* Remove cron job for yearly digest scheduling

* Update CHANGELOG.md

* Update digest email setting handling

* Allow sharing digest for 1 week or 1 month

* Change Digests Distance to Bigint

* Fix settings page

* Update changelog

* Add RailsPulse (#2079)

* Add RailsPulse

* Add RailsPulse monitoring tool with basic HTTP authentication

* Bring points_count to integer

* Update migration and version

* Update rubocop issues

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Robin Tuszik <mail@robin.gg>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-30 17:33:56 +01:00

380 lines
12 KiB
Ruby

# frozen_string_literal: true
require 'zip'
# Users::ExportData - Exports complete user data with preserved relationships
#
# Output JSON Structure Example:
# {
# "counts": {
# "areas": 5,
# "imports": 12,
# "exports": 3,
# "trips": 8,
# "stats": 24,
# "notifications": 10,
# "points": 15000,
# "visits": 45,
# "places": 20
# },
# "settings": {
# "distance_unit": "km",
# "timezone": "UTC",
# "immich_url": "https://immich.example.com",
# // ... other user settings (exported via user.safe_settings.settings)
# },
# "areas": [
# {
# "name": "Home",
# "latitude": "40.7128",
# "longitude": "-74.0060",
# "radius": 100,
# "created_at": "2024-01-01T00:00:00Z",
# "updated_at": "2024-01-01T00:00:00Z"
# }
# ],
# "imports": [
# {
# "name": "2023_MARCH.json",
# "source": "google_semantic_history",
# "created_at": "2024-01-01T00:00:00Z",
# "updated_at": "2024-01-01T00:00:00Z",
# "raw_points": 15432,
# "doubles": 23,
# "processed": 15409,
# "points_count": 15409,
# "status": "completed",
# "file_name": "import_1_2023_MARCH.json",
# "original_filename": "2023_MARCH.json",
# "file_size": 2048576,
# "content_type": "application/json"
# // Note: file_error may be present if file download fails
# // Note: file_name and original_filename will be null if no file attached
# }
# ],
# "exports": [
# {
# "name": "export_2024-01-01_to_2024-01-31.json",
# "url": null,
# "status": "completed",
# "file_format": "json",
# "file_type": "points",
# "start_at": "2024-01-01T00:00:00Z",
# "end_at": "2024-01-31T23:59:59Z",
# "created_at": "2024-02-01T00:00:00Z",
# "updated_at": "2024-02-01T00:00:00Z",
# "file_name": "export_1_export_2024-01-01_to_2024-01-31.json",
# "original_filename": "export_2024-01-01_to_2024-01-31.json",
# "file_size": 1048576,
# "content_type": "application/json"
# // Note: file_error may be present if file download fails
# // Note: file_name and original_filename will be null if no file attached
# }
# ],
# "trips": [
# {
# "name": "Business Trip to NYC",
# "started_at": "2024-01-15T08:00:00Z",
# "ended_at": "2024-01-18T20:00:00Z",
# "distance": 1245,
# "path": null, // PostGIS LineString geometry
# "visited_countries": {"US": "United States", "CA": "Canada"},
# "created_at": "2024-01-19T00:00:00Z",
# "updated_at": "2024-01-19T00:00:00Z"
# }
# ],
# "stats": [
# {
# "year": 2024,
# "month": 1,
# "distance": 456, // Note: integer, not float
# "daily_distance": {"1": 15.2, "2": 23.5}, // jsonb object
# "toponyms": [
# {"country": "United States", "cities": [{"city": "New York"}]}
# ],
# "created_at": "2024-02-01T00:00:00Z",
# "updated_at": "2024-02-01T00:00:00Z"
# }
# ],
# "notifications": [
# {
# "kind": "info",
# "title": "Import completed",
# "content": "Your data import has been processed successfully",
# "read_at": "2024-01-01T12:30:00Z", // null if unread
# "created_at": "2024-01-01T12:00:00Z",
# "updated_at": "2024-01-01T12:30:00Z"
# }
# ],
# "points": [
# {
# "battery_status": "charging",
# "battery": 85,
# "timestamp": 1704067200,
# "altitude": 15.5,
# "velocity": 25.5,
# "accuracy": 5.0,
# "ping": "test-ping",
# "tracker_id": "tracker-123",
# "topic": "owntracks/user/device",
# "trigger": "manual_event",
# "bssid": "aa:bb:cc:dd:ee:ff",
# "ssid": "TestWiFi",
# "connection": "wifi",
# "vertical_accuracy": 3.0,
# "mode": 2,
# "inrids": ["region1", "region2"],
# "in_regions": ["home", "work"],
# "raw_data": {"test": "data"},
# "city": "New York",
# "country": "United States",
# "geodata": {"address": "123 Main St"},
# "reverse_geocoded_at": "2024-01-01T00:00:00Z",
# "course": 45.5,
# "course_accuracy": 2.5,
# "external_track_id": "ext-123",
# "lonlat": "POINT(-74.006 40.7128)",
# "longitude": -74.006,
# "latitude": 40.7128,
# "created_at": "2024-01-01T00:00:00Z",
# "updated_at": "2024-01-01T00:00:00Z",
# "import_reference": {
# "name": "2023_MARCH.json",
# "source": "google_semantic_history",
# "created_at": "2024-01-01T00:00:00Z"
# },
# "country_info": {
# "name": "United States",
# "iso_a2": "US",
# "iso_a3": "USA"
# },
# "visit_reference": {
# "name": "Work Visit",
# "started_at": "2024-01-01T08:00:00Z",
# "ended_at": "2024-01-01T17:00:00Z"
# }
# },
# {
# // Example of point without relationships (edge cases)
# "timestamp": 1704070800,
# "altitude": 10.0,
# "longitude": -73.9857,
# "latitude": 40.7484,
# "lonlat": "POINT(-73.9857 40.7484)",
# "created_at": "2024-01-01T00:05:00Z",
# "updated_at": "2024-01-01T00:05:00Z",
# "import_reference": null, // Orphaned point
# "country_info": null, // No country data
# "visit_reference": null // Not part of a visit
# // ... other point fields may be null
# }
# ],
# "visits": [
# {
# "area_id": 123,
# "started_at": "2024-01-01T08:00:00Z",
# "ended_at": "2024-01-01T17:00:00Z",
# "duration": 32400,
# "name": "Work Visit",
# "status": "suggested",
# "created_at": "2024-01-01T00:00:00Z",
# "updated_at": "2024-01-01T00:00:00Z",
# "place_reference": {
# "name": "Office Building",
# "latitude": "40.7589",
# "longitude": "-73.9851",
# "source": "manual"
# }
# },
# {
# // Example of visit without place
# "area_id": null,
# "started_at": "2024-01-02T10:00:00Z",
# "ended_at": "2024-01-02T12:00:00Z",
# "duration": 7200,
# "name": "Unknown Location",
# "status": "confirmed",
# "created_at": "2024-01-02T00:00:00Z",
# "updated_at": "2024-01-02T00:00:00Z",
# "place_reference": null // No associated place
# }
# ],
# "places": [
# {
# "name": "Office Building",
# "longitude": "-73.9851",
# "latitude": "40.7589",
# "city": "New York",
# "country": "United States",
# "source": "manual",
# "geodata": {"properties": {"name": "Office Building"}},
# "reverse_geocoded_at": "2024-01-01T00:00:00Z",
# "lonlat": "POINT(-73.9851 40.7589)",
# "created_at": "2024-01-01T00:00:00Z",
# "updated_at": "2024-01-01T00:00:00Z"
# }
# ]
# }
#
# Import Strategy Notes:
# 1. Countries: Look up by name/ISO codes, create if missing
# 2. Imports: Match by name + source + created_at, create new import records
# 3. Places: Match by name + coordinates, create if missing
# 4. Visits: Match by name + timestamps + place_reference, create if missing
# 5. Points: Import with reconstructed foreign keys from references
# 6. Files: Import files are available in the files/ directory with names from file_name fields
class Users::ExportData
def initialize(user)
@user = user
end
def export
timestamp = Time.current.strftime('%Y%m%d_%H%M%S')
@export_directory = Rails.root.join('tmp', "#{user.email.gsub(/[^0-9A-Za-z._-]/, '_')}_#{timestamp}")
@files_directory = @export_directory.join('files')
FileUtils.mkdir_p(@files_directory)
export_record = user.exports.create!(
name: "user_data_export_#{timestamp}.zip",
file_format: :archive,
file_type: :user_data,
status: :processing
)
begin
json_file_path = @export_directory.join('data.json')
# Stream JSON writing instead of building in memory
File.open(json_file_path, 'w') do |file|
file.write('{"counts":')
file.write(calculate_entity_counts.to_json)
file.write(',"settings":')
file.write(user.safe_settings.settings.to_json)
file.write(',"areas":')
file.write(Users::ExportData::Areas.new(user).call.to_json)
file.write(',"imports":')
file.write(Users::ExportData::Imports.new(user, @files_directory).call.to_json)
file.write(',"exports":')
file.write(Users::ExportData::Exports.new(user, @files_directory).call.to_json)
file.write(',"trips":')
file.write(Users::ExportData::Trips.new(user).call.to_json)
file.write(',"stats":')
file.write(Users::ExportData::Stats.new(user).call.to_json)
file.write(',"notifications":')
file.write(Users::ExportData::Notifications.new(user).call.to_json)
file.write(',"points":')
Users::ExportData::Points.new(user, file).call
file.write(',"visits":')
file.write(Users::ExportData::Visits.new(user).call.to_json)
file.write(',"places":')
file.write(Users::ExportData::Places.new(user).call.to_json)
file.write('}')
end
zip_file_path = @export_directory.join('export.zip')
create_zip_archive(@export_directory, zip_file_path)
export_record.file.attach(
io: File.open(zip_file_path),
filename: export_record.name,
content_type: 'application/zip'
)
export_record.update!(status: :completed)
create_success_notification
export_record
rescue StandardError => e
export_record.update!(status: :failed) if export_record
ExceptionReporter.call(e, 'Export failed')
raise e
ensure
cleanup_temporary_files(@export_directory) if @export_directory&.exist?
end
end
private
attr_reader :user, :export_directory, :files_directory
def calculate_entity_counts
Rails.logger.info 'Calculating entity counts for export'
counts = {
areas: user.areas.count,
imports: user.imports.count,
exports: user.exports.count,
trips: user.trips.count,
stats: user.stats.count,
notifications: user.notifications.count,
points: user.points_count.to_i,
visits: user.visits.count,
places: user.visited_places.count
}
Rails.logger.info "Entity counts: #{counts}"
counts
end
def create_zip_archive(export_directory, zip_file_path)
original_compression = Zip.default_compression
Zip.default_compression = Zip::Entry::DEFLATED
Zip::File.open(zip_file_path, create: true) do |zipfile|
Dir.glob(export_directory.join('**', '*')).each do |file|
next if File.directory?(file) || file == zip_file_path.to_s
relative_path = file.sub(%r{#{export_directory}/}, '')
zipfile.add(relative_path, file)
end
end
ensure
Zip.default_compression = original_compression if original_compression
end
def cleanup_temporary_files(export_directory)
return unless File.directory?(export_directory)
Rails.logger.info "Cleaning up temporary export directory: #{export_directory}"
FileUtils.rm_rf(export_directory)
rescue StandardError => e
ExceptionReporter.call(e, 'Failed to cleanup temporary files')
end
def create_success_notification
counts = calculate_entity_counts
summary = "#{counts[:points]} points, " \
"#{counts[:visits]} visits, " \
"#{counts[:places]} places, " \
"#{counts[:trips]} trips, " \
"#{counts[:areas]} areas, " \
"#{counts[:imports]} imports, " \
"#{counts[:exports]} exports, " \
"#{counts[:stats]} stats, " \
"#{counts[:notifications]} notifications"
::Notifications::Create.new(
user: user,
title: 'Export completed',
content: "Your data export has been processed successfully (#{summary}). You can download it from the exports page.",
kind: :info
).call
end
end