dawarich/app/services/kml/importer.rb
Evgenii Burmakin c8242ce902
0.36.3 (#2013)
* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

* Pull only necessary data for map v2 points

* Feature/raw data archive (#2009)

* 0.36.2 (#2007)

* fix: move foreman to global gems to fix startup crash (#1971)

* Update exporting code to stream points data to file in batches to red… (#1980)

* Update exporting code to stream points data to file in batches to reduce memory usage

* Update changelog

* Update changelog

* Feature/maplibre frontend (#1953)

* Add a plan to use MapLibre GL JS for the frontend map rendering, replacing Leaflet

* Implement phase 1

* Phases 1-3 + part of 4

* Fix e2e tests

* Phase 6

* Implement fog of war

* Phase 7

* Next step: fix specs, phase 7 done

* Use our own map tiles

* Extract v2 map logic to separate manager classes

* Update settings panel on v2 map

* Update v2 e2e tests structure

* Reimplement location search in maps v2

* Update speed routes

* Implement visits and places creation in v2

* Fix last failing test

* Implement visits merging

* Fix a routes e2e test and simplify the routes layer styling.

* Extract js to modules from maps_v2_controller.js

* Implement area creation

* Fix spec problem

* Fix some e2e tests

* Implement live mode in v2 map

* Update icons and panel

* Extract some styles

* Remove unused file

* Start adding dark theme to popups on MapLibre maps

* Make popups respect dark theme

* Move v2 maps to maplibre namespace

* Update v2 references to maplibre

* Put place, area and visit info into side panel

* Update API to use safe settings config method

* Fix specs

* Fix method name to config in SafeSettings and update usages accordingly

* Add missing public files

* Add handling for real time points

* Fix remembering enabled/disabled layers of the v2 map

* Fix lots of e2e tests

* Add settings to select map version

* Use maps/v2 as main path for MapLibre maps

* Update routing

* Update live mode

* Update maplibre controller

* Update changelog

* Remove some console.log statements

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Remove esbuild scripts from package.json

* Remove sideEffects field from package.json

* Raw data archivation

* Add tests

* Fix tests

* Fix tests

* Update ExceptionReporter

* Add schedule to run raw data archival job monthly

* Change file structure for raw data archival feature

* Update changelog and version for raw data archival feature

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>

* Set raw_data to an empty hash instead of nil when archiving

* Fix storage configuration and file extraction

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation (#2018)

* Consider MIN_MINUTES_SPENT_IN_CITY during stats calculation

* Remove raw data from visited cities api endpoint

* Use user timezone to show dates on maps (#2020)

* Fix/pre epoch time (#2019)

* Use user timezone to show dates on maps

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Limit timestamps to valid range to prevent database errors when users enter pre-epoch dates.

* Fix tests failing due to new index on stats table

* Fix failing specs

* Update redis client configuration to support unix socket connection

* Update changelog

* Fix kml kmz import issues (#2023)

* Fix kml kmz import issues

* Refactor KML importer to improve readability and maintainability

* Implement moving points in map v2 and fix route rendering logic to ma… (#2027)

* Implement moving points in map v2 and fix route rendering logic to match map v1.

* Fix route spec

* fix(maplibre): update date format to ISO 8601 (#2029)

* Add verification step to raw data archival process (#2028)

* Add verification step to raw data archival process

* Add actual verification of raw data archives after creation, and only clear raw_data for verified archives.

* Fix failing specs

* Eliminate zip-bomb risk

* Fix potential memory leak in js

* Return .keep files

* Use Toast instead of alert for notifications

* Add help section to navbar dropdown

* Update changelog

* Remove raw_data_archival_job

* Ensure file is being closed properly after reading in Archivable concern

---------

Co-authored-by: Robin Tuszik <mail@robin.gg>
2025-12-14 12:05:59 +01:00

356 lines
9 KiB
Ruby

# frozen_string_literal: true
require 'rexml/document'
require 'zip'
class Kml::Importer
include Imports::Broadcaster
include Imports::FileLoader
attr_reader :import, :user_id, :file_path
def initialize(import, user_id, file_path = nil)
@import = import
@user_id = user_id
@file_path = file_path
end
def call
doc = load_and_parse_kml_document
points_data = extract_all_points(doc)
return if points_data.empty?
save_points_in_batches(points_data)
end
private
def load_and_parse_kml_document
file_content = load_kml_content
REXML::Document.new(file_content)
end
def extract_all_points(doc)
points_data = []
points_data.concat(extract_points_from_placemarks(doc))
points_data.concat(extract_points_from_gx_tracks(doc))
points_data.compact
end
def save_points_in_batches(points_data)
points_data.each_slice(1000) do |batch|
bulk_insert_points(batch)
end
end
def extract_points_from_placemarks(doc)
points = []
REXML::XPath.each(doc, '//Placemark') do |placemark|
points.concat(parse_placemark(placemark))
end
points
end
def extract_points_from_gx_tracks(doc)
points = []
REXML::XPath.each(doc, '//gx:Track') do |track|
points.concat(parse_gx_track(track))
end
points
end
def load_kml_content
content = read_file_content
content = ensure_binary_encoding(content)
kmz_file?(content) ? extract_kml_from_kmz(content) : content
end
def read_file_content
if file_path && File.exist?(file_path)
File.binread(file_path)
else
download_and_read_content
end
end
def download_and_read_content
downloader_content = Imports::SecureFileDownloader.new(import.file).download_with_verification
downloader_content.is_a?(StringIO) ? downloader_content.read : downloader_content
end
def ensure_binary_encoding(content)
content.force_encoding('BINARY') if content.respond_to?(:force_encoding)
content
end
def kmz_file?(content)
content[0..1] == 'PK'
end
def extract_kml_from_kmz(kmz_content)
kml_content = find_kml_in_zip(kmz_content)
raise 'No KML file found in KMZ archive' unless kml_content
kml_content
rescue Zip::Error => e
raise "Failed to extract KML from KMZ: #{e.message}"
end
def find_kml_in_zip(kmz_content)
kml_content = nil
Zip::InputStream.open(StringIO.new(kmz_content)) do |io|
while (entry = io.get_next_entry)
if kml_entry?(entry)
kml_content = io.read
break
end
end
end
kml_content
end
def kml_entry?(entry)
entry.name.downcase.end_with?('.kml')
end
def parse_placemark(placemark)
return [] unless has_explicit_timestamp?(placemark)
timestamp = extract_timestamp(placemark)
points = []
points.concat(extract_point_geometry(placemark, timestamp))
points.concat(extract_linestring_geometry(placemark, timestamp))
points.concat(extract_multigeometry(placemark, timestamp))
points.compact
end
def extract_point_geometry(placemark, timestamp)
point_node = REXML::XPath.first(placemark, './/Point/coordinates')
return [] unless point_node
coords = parse_coordinates(point_node.text)
coords.any? ? [build_point(coords.first, timestamp, placemark)] : []
end
def extract_linestring_geometry(placemark, timestamp)
linestring_node = REXML::XPath.first(placemark, './/LineString/coordinates')
return [] unless linestring_node
coords = parse_coordinates(linestring_node.text)
coords.map { |coord| build_point(coord, timestamp, placemark) }
end
def extract_multigeometry(placemark, timestamp)
points = []
REXML::XPath.each(placemark, './/MultiGeometry//coordinates') do |coords_node|
coords = parse_coordinates(coords_node.text)
coords.each do |coord|
points << build_point(coord, timestamp, placemark)
end
end
points
end
def parse_gx_track(track)
timestamps = extract_gx_timestamps(track)
coordinates = extract_gx_coordinates(track)
build_gx_track_points(timestamps, coordinates)
end
def extract_gx_timestamps(track)
timestamps = []
REXML::XPath.each(track, './/when') do |when_node|
timestamps << when_node.text.strip
end
timestamps
end
def extract_gx_coordinates(track)
coordinates = []
REXML::XPath.each(track, './/gx:coord') do |coord_node|
coordinates << coord_node.text.strip
end
coordinates
end
def build_gx_track_points(timestamps, coordinates)
points = []
min_size = [timestamps.size, coordinates.size].min
min_size.times do |i|
point = build_gx_track_point(timestamps[i], coordinates[i], i)
points << point if point
end
points
end
def build_gx_track_point(timestamp_str, coord_str, index)
time = Time.parse(timestamp_str).to_i
coord_parts = coord_str.split(/\s+/)
return nil if coord_parts.size < 2
lng, lat, alt = coord_parts.map(&:to_f)
{
lonlat: "POINT(#{lng} #{lat})",
altitude: alt&.to_i || 0,
timestamp: time,
import_id: import.id,
velocity: 0.0,
raw_data: { source: 'gx_track', index: index },
user_id: user_id,
created_at: Time.current,
updated_at: Time.current
}
rescue StandardError => e
Rails.logger.warn("Failed to parse gx:Track point at index #{index}: #{e.message}")
nil
end
def parse_coordinates(coord_text)
return [] if coord_text.blank?
coord_text.strip.split(/\s+/).map { |coord_str| parse_single_coordinate(coord_str) }.compact
end
def parse_single_coordinate(coord_str)
parts = coord_str.split(',')
return nil if parts.size < 2
{
lng: parts[0].to_f,
lat: parts[1].to_f,
alt: parts[2]&.to_f || 0.0
}
end
def has_explicit_timestamp?(placemark)
find_timestamp_node(placemark).present?
end
def extract_timestamp(placemark)
node = find_timestamp_node(placemark)
raise 'No timestamp found in placemark' unless node
Time.parse(node.text).to_i
rescue StandardError => e
Rails.logger.error("Failed to parse timestamp: #{e.message}")
raise e
end
def find_timestamp_node(placemark)
REXML::XPath.first(placemark, './/TimeStamp/when') ||
REXML::XPath.first(placemark, './/TimeSpan/begin') ||
REXML::XPath.first(placemark, './/TimeSpan/end')
end
def build_point(coord, timestamp, placemark)
return if invalid_coordinates?(coord)
{
lonlat: format_point_geometry(coord),
altitude: coord[:alt].to_i,
timestamp: timestamp,
import_id: import.id,
velocity: extract_velocity(placemark),
raw_data: extract_extended_data(placemark),
user_id: user_id,
created_at: Time.current,
updated_at: Time.current
}
end
def invalid_coordinates?(coord)
coord[:lat].blank? || coord[:lng].blank?
end
def format_point_geometry(coord)
"POINT(#{coord[:lng]} #{coord[:lat]})"
end
def extract_velocity(placemark)
speed_node = find_speed_node(placemark)
speed_node ? speed_node.text.to_f.round(1) : 0.0
rescue StandardError
0.0
end
def find_speed_node(placemark)
REXML::XPath.first(placemark, ".//Data[@name='speed']/value") ||
REXML::XPath.first(placemark, ".//Data[@name='Speed']/value") ||
REXML::XPath.first(placemark, ".//Data[@name='velocity']/value")
end
def extract_extended_data(placemark)
data = {}
data.merge!(extract_name_and_description(placemark))
data.merge!(extract_custom_data_fields(placemark))
data
rescue StandardError => e
Rails.logger.warn("Failed to extract extended data: #{e.message}")
{}
end
def extract_name_and_description(placemark)
data = {}
name_node = REXML::XPath.first(placemark, './/name')
data['name'] = name_node.text.strip if name_node
desc_node = REXML::XPath.first(placemark, './/description')
data['description'] = desc_node.text.strip if desc_node
data
end
def extract_custom_data_fields(placemark)
data = {}
REXML::XPath.each(placemark, './/ExtendedData/Data') do |data_node|
name = data_node.attributes['name']
value_node = REXML::XPath.first(data_node, './value')
data[name] = value_node.text if name && value_node
end
data
end
def bulk_insert_points(batch)
unique_batch = deduplicate_batch(batch)
upsert_points(unique_batch)
broadcast_import_progress(import, unique_batch.size)
rescue StandardError => e
create_notification("Failed to process KML file: #{e.message}")
end
def deduplicate_batch(batch)
batch.uniq { |record| [record[:lonlat], record[:timestamp], record[:user_id]] }
end
def upsert_points(batch)
# rubocop:disable Rails/SkipsModelValidations
Point.upsert_all(
batch,
unique_by: %i[lonlat timestamp user_id],
returning: false,
on_duplicate: :skip
)
# rubocop:enable Rails/SkipsModelValidations
end
def create_notification(message)
Notification.create!(
user_id: user_id,
title: 'KML Import Error',
content: message,
kind: :error
)
end
end