QIDT44NKTA6VM3IV7EDEKF5OZOXO243BI7XQJZH37I746VSXGQHAC DMMB63IW75MSRMKX63LTIVRVA74FOERMGGCNYGC7BZRV76VIJURQC QUS73WI7VJGF2VMCLENA3GCQBJJVMQHHJ5GB7AYBQRH6LCSENINAC 7CC2YVZXAIUNWXNNVIO5KOZZFDQQLESFO72SGEDP2C4OZXAWO4KQC CHG6ZQO3NVCPDBLWGMTJKRTKD2WVLGV4YP2LQJNTPECKSZQTNZJAC WNTA72IKZG7LBQJ3CXHBNNBSOJVFWF5S2GOMBYGQR7TD7GVEYGUAC IFVRAERTCCDICNTYTG3TX2WASB6RXQQEJWWXQMQZJSQDQ3HLE5OQC D4W5FSXXDKYXSJGRO7D6VWG2LC5FB23DZMIK4FKNCTGO6TO54BQAC 2GJMZ6YA6OPHNS5KFFFI6POQ2BJ33SSS3NIPXYBFTJSN4BZBVEVAC W3A2EECCD23SVHJZN6MXPH2PAVFHH5CNFD2XHPQRRW6M4GUTG3FAC The command correctly validates that the filter "M" (manual annotation)doesn't exist in the test database. This demonstrates the validation isworking.Let me create a summary of what was implemented:SummaryI've successfully implemented the skraak import segments command. Here's whatwas created:### Files Created1. utils/mapping.go (189 lines)- SpeciesMapping struct for mapping .data species/calltype to DB labels- MappingFile type for the complete mapping structure- LoadMappingFile() - loads and validates mapping JSON- ValidateMappingAgainstDB() - validates all mapped species/calltypesexist in DB- GetDBSpecies() / GetDBCalltype() - helper methods2. tools/import_segments.go (840 lines)- ImportSegmentsInput / ImportSegmentsOutput types- ImportSegments() - main import function with:- Phase A: Input validation (folder, mapping file, ID formats)- Phase B: Parse all .data files, collect uniquefilters/species/calltypes- Phase C: Pre-import validation (filters, mapping coverage, filehashes, no existing labels)- Phase D: Transactional import (segment → label → label_metadata →label_subtype)- ProgressHandler support- Fail-fast error handling with detailed error messages3. PLAN.md (reference document for context preservation)### Files Modified1. cmd/import.go- Added segments subcommand to switch statement- Updated usage text- Added runImportSegments() function with CLI flag parsing### Key Features- ✅ Maps .data species/calltype names to DB labels via mapping file- ✅ Validates all filters, species, and calltypes exist in DB- ✅ Validates file hashes exist in DB for the specified cluster- ✅ Ensures no existing labels (fresh imports only)- ✅ Skips bookmarked segments- ✅ Stores freq_low/freq_high in segment table- ✅ Writes skraak_hash to file_metadata- ✅ Writes skraak_label_id and comment to label_metadata- ✅ Supports multiple labels per segment- ✅ Single transaction for atomic imports- ✅ Progress reporting via ProgressHandler### CLI Usage```bashskraak import segments \--db ./db/skraak.duckdb \--dataset <12-char-id> \--location <12-char-id> \--cluster <12-char-id> \--folder /path/to/data \--mapping mapping.json```CHANGELOG.md - Added new entry at top:- [2026-03-14] Import Segments Command- Documents the feature, usage, mapping format, output structure, invariants,and rationaleREADME.md - Updated two sections:1. Added import segments to the CLI commands examples2. Added new "Segments Import" section with:- Prerequisites (hashes exist, fresh imports, filters/species/calltypesin DB)- Mapping file format- Import command example- What's imported (segment, label, label_subtype, label_metadata,file_metadata)- What's skipped (bookmarked segments)The command is now fully documented and ready for use.
## Segments ImportImport AviaNZ .data segments into the database with species/calltype mapping.**Prerequisites:**1. WAV files must already be imported (hashes must exist in database)2. No existing labels on files (fresh imports only)3. All filters, species, and calltypes must exist in database4. Mapping file must cover all species in .data files**Mapping file** (`mapping.json`):```json{"Don't Know": {"species": "Don't Know"},"GSK": {"species": "Roroa","calltypes": {"Male": "Male - Solo","Female": "Female - Solo"}}}```**Import:**```bash./skraak import segments \--db ./db/skraak.duckdb \--dataset abc123 \--location loc456 \--cluster clust789 \--folder /path/to/data \--mapping mapping.json```**What's imported:**- `segment` - time ranges with freq_low/freq_high from .data- `label` - species, filter, certainty for each segment- `label_subtype` - calltype if present in .data- `label_metadata` - stores `skraak_label_id` and comments- `file_metadata` - stores `skraak_hash`
## [2026-03-14] Import Segments Command**Feature:** New `skraak import segments` command to import AviaNZ .data segments into the database.**Changes:**- `utils/mapping.go` — New utilities for loading and validating species/calltype mapping files- `tools/import_segments.go` — New tool with `ImportSegments()` function- `cmd/import.go` — Added `segments` subcommand**Usage:**```bashskraak import segments \--db ./db/skraak.duckdb \--dataset gljgxDbfasva \--location ZEVWGbXzB1bl \--cluster q7w-iQgyZOYV \--folder /path/to/data \--mapping mapping.json```**Mapping file format** (`mapping.json`):```json{"Don't Know": {"species": "Don't Know"},"GSK": {"species": "Roroa","calltypes": {"Male": "Male - Solo","Female": "Female - Solo"}}}```
**Output structure:**```json{"summary": {"data_files_found": 42,"data_files_processed": 42,"total_segments": 342,"imported_segments": 342,"imported_labels": 356,"imported_subtypes": 280,"skipped_bookmarks": 5,"processing_time_ms": 1234},"segments": [...],"errors": []}```**Invariants enforced:**- All file hashes must already exist in database for the cluster- All files must have no existing labels (fresh imports only)- All filters, species, and calltypes must exist in database- Segments with `bookmark: true` labels are skipped- Mapping must cover all species found in .data files**Database writes:**- `segment` table: id, file_id, dataset_id, start_time, end_time, freq_low, freq_high- `label` table: id, segment_id, species_id, filter_id, certainty- `label_metadata` table: `{"skraak_label_id": "...", "comment": "..."}`- `label_subtype` table: id, label_id, calltype_id, filter_id, certainty (if calltype present)- `file_metadata` table: `{"skraak_hash": "..."}`**Rationale:**AviaNZ .data files contain segment annotations from both manual review and ML filters. This command imports those segments into the skraak database with proper species/calltype mapping, enabling integrated analysis across all annotation sources.