quietlight/skraak_mcp - Change WWDCXO56FIT6WJFHLN2ENUKR3Y7MIWOLNU3AN452NFESUM6GCTDAC

fix to skraak import segments

Created by quietlight on March 14, 2026

WWDCXO56FIT6WJFHLN2ENUKR3Y7MIWOLNU3AN452NFESUM6GCTDAC

Dependencies

In channels

main

Change contents

File deletion: PLAN.md

BF:BFD[3.1] → [4.40553:40584]

BF:BFD[4.40584] → [4.32459:32459]

B:BD[4.32459] → [4.32460:40552]

# Plan: `skraak import segments` Command
## Overview
Import segments from AviaNZ `.data` files into the skraak database, applying a species/calltype mapping file.
## CLI Usage
```bash
skraak import segments \
  --db ./db/skraak.duckdb \
  --folder /path/to/folder \
  --mapping /path/to/mapping_2026-03-14.json \
  --dataset abc123 \
  --location loc456 \
  --cluster clust789
```
## Database Relationships (Critical for Insert Order)
```
dataset
  └── location (FK: dataset_id)
        └── cluster (FK: location_id, dataset_id)
              └── file (FK: cluster_id, location_id)
                    └── file_dataset (FK: file_id, dataset_id)
                    └── file_metadata (FK: file_id)
                    └── segment (FK: file_id, dataset_id, file_dataset)
                          └── label (FK: segment_id, species_id, filter_id)
                                └── label_metadata (FK: label_id)
                                └── label_subtype (FK: label_id, calltype_id, filter_id)
species
  └── call_type (FK: species_id)
```
## Insert Order (Per Segment)
1. **segment** → needs file_id (from DB by hash), dataset_id
2. **label** → needs segment_id (just inserted), species_id (from mapping), filter_id (from DB), certainty
3. **label_metadata** → needs label_id, stores `{"skraak_label_id": "...", "comment": "..."}`
4. **label_subtype** → needs label_id, calltype_id (from mapping), filter_id, certainty
## Files to Create/Modify
### 1. NEW: `utils/mapping.go`
Mapping file utilities:
- `SpeciesMapping` struct
- `LoadMappingFile(path string) (map[string]SpeciesMapping, error)`
- `ValidateMappingAgainstDB(db *sql.DB, mapping map[string]SpeciesMapping) error`
### 2. NEW: `tools/import_segments.go`
Core tool logic with types:
- `ImportSegmentsInput`
- `ImportSegmentsOutput`
- `ImportSummary`
- `SegmentImport`
- `LabelImport`
- `ImportError`
- `ImportSegments()` function
### 3. MODIFY: `cmd/import.go`
Add `segments` subcommand to switch statement and `runImportSegments()` handler.
## Detailed Algorithm
### Phase A: Input Validation
1. Validate folder exists and contains .data files
2. Validate dataset_id, location_id, cluster_id format (12-char nanoid)
3. Validate cluster belongs to location, location belongs to dataset
4. Validate dataset type is 'structured'
5. Load mapping file and validate JSON schema
### Phase B: Pre-Import Validation (Fail-Fast)
1. Parse all .data files, collect:
   - Unique filter names
   - Unique species names (from labels)
   - Unique calltype names per species
2. Validate all filters exist in DB:
   ```sql
   SELECT id FROM filter WHERE name IN (?) AND active = true
   ```
3. Validate mapping covers all species found in .data files:
   - For each species in .data: must exist as key in mapping
   - mapping[species].species must exist in species table
4. Validate mapping covers all calltypes:
   - For each (species, calltype) in .data:
     - If mapping[species].calltypes exists: calltype must map to existing call_type.label
     - Else: calltype must exist as-is in call_type table for that species
5. For each .data file:
   - Find corresponding WAV file (same folder, strip .data extension)
   - Hash the WAV file
   - Verify hash exists in file table for this cluster
   - Verify no existing labels for this file (fresh imports only)
6. Validate segment bounds:
   - start_time < end_time
   - end_time <= file.duration
### Phase C: Transactional Import
```sql
BEGIN TRANSACTION;
-- Pre-load all ID lookups into memory
-- filterIDMap: filter_name -> filter_id
-- speciesIDMap: species_label -> species_id
-- calltypeIDMap: (species_label, calltype_label) -> calltype_id
-- fileIDMap: xxh64_hash -> file_id
FOR each .data file:
    wavPath = dataPath without .data extension
    hash = ComputeXXH64(wavPath)
    fileID = fileIDMap[hash]
    -- Update file_metadata with skraak_hash
    INSERT OR REPLACE INTO file_metadata (file_id, json, active)
    VALUES (fileID, JSON_SET(COALESCE(json, '{}'), '$.skraak_hash', ?), true)
    FOR each segment in .data:
        IF segment has any label with bookmark=true: CONTINUE
        segmentID = GenerateLongID()
        -- Clamp freq_low/freq_high (default 0, sample_rate/2)
        INSERT INTO segment (id, file_id, dataset_id, start_time, end_time,
                            freq_low, freq_high, created_at, last_modified, active)
        VALUES (segmentID, fileID, datasetID, start, end, freqLow, freqHigh, now(), now(), true)
        FOR each label in segment:
            labelID = GenerateLongID()
            dbSpecies = mapping[label.Species].species
            speciesID = speciesIDMap[dbSpecies]
            filterID = filterIDMap[label.Filter]
            certainty = label.Certainty
            INSERT INTO label (id, segment_id, species_id, filter_id, certainty,
                              created_at, last_modified, active)
            VALUES (labelID, segmentID, speciesID, filterID, certainty, now(), now(), true)
            -- Build label_metadata JSON
            metadata := {"skraak_label_id": labelID}
            IF label.Comment != "": metadata["comment"] = label.Comment
            INSERT INTO label_metadata (label_id, json, created_at, last_modified, active)
            VALUES (labelID, metadata, now(), now(), true)
            IF label.CallType != "":
                subtypeID = GenerateLongID()
                dbCalltype = mapping[label.Species].calltypes[label.CallType]
                    OR label.CallType if no mapping entry
                calltypeID = calltypeIDMap[(dbSpecies, dbCalltype)]
                INSERT INTO label_subtype (id, label_id, calltype_id, filter_id, certainty,
                                          created_at, last_modified, active)
                VALUES (subtypeID, labelID, calltypeID, filterID, certainty, now(), now(), true)
COMMIT;
```
## Mapping File Format
```json
{
  "Don't Know": {
    "species": "Don't Know"
  },
  "GSK": {
    "species": "Roroa",
    "calltypes": {
      "Male": "Male - Solo",
      "Female": "Female - Solo"
    }
  }
}
```
## Output JSON
```json
{
  "summary": {
    "data_files_found": 42,
    "data_files_processed": 42,
    "total_segments": 342,
    "imported_segments": 342,
    "imported_labels": 356,
    "imported_subtypes": 280,
    "processing_time_ms": 1234
  },
  "segments": [
    {
      "segment_id": "abc123...",
      "file_name": "recording.wav",
      "start_time": 165.0,
      "end_time": 185.0,
      "labels": [
        {
          "label_id": "def456...",
          "species": "Roroa",
          "calltype": "Male - Solo",
          "filter": "opensoundscape-kiwi-1.2",
          "certainty": 70
        }
      ]
    }
  ],
  "errors": []
}
```
## Invariants
| Invariant | How Enforced |
|-----------|--------------|
| Operates on 1 folder | CLI flag validation |
| All certainty scores transferred | Direct copy from .data to DB |
| All filters exist in DB | Phase B validation |
| Mapping is complete | Phase B validation |
| File hashes exist in DB | Phase B - lookup by xxh64_hash |
| No existing labels | Phase B - query before import |
| Species/calltypes exist | Phase B validation |
| Fail-fast on error | All validation before transaction |
| Single transaction | `db.BeginLoggedTx()` with rollback |
## Key Code Patterns to Reuse
- `utils.ParseDataFile()` - Parse .data files
- `utils.FindDataFiles()` - Find .data files in folder
- `utils.ComputeXXH64()` - Hash WAV files
- `utils.GenerateLongID()` - 21-char nanoid for segment/label IDs
- `utils.ValidateShortID()` - Validate 12-char IDs
- `db.BeginLoggedTx()` - Transaction with mutation logging
- `utils.CheckDuplicateHash()` - Check if hash exists (adapt for this use)
## Decisions Made
1. ✅ WAV files same folder as .data files
2. ✅ freq_low/freq_high stored in segment table (clamp to sample_rate if needed)
3. ✅ Multiple labels per segment → multiple label records
4. ✅ ProgressHandler pattern for progress reporting
5. ✅ Fail-fast + single transaction

Deletion in tools/import_segments.go at line 41 [4.7314]
B:BD[4.8820] → [4.8820:8873]
```
	SkippedBookmarks   int   `json:"skipped_bookmarks"`
```

Replacement in tools/import_segments.go at line 163 [4.7314]

B:BD[4.12980] → [4.12980:13089]

	importedSegments, importedLabels, importedSubtypes, skippedBookmarks, importErrors := importSegmentsIntoDB(

[4.12980]

[4.13089]

	importedSegments, importedLabels, importedSubtypes, fileUpdates, importErrors := importSegmentsIntoDB(

Insertion in tools/import_segments.go at line 173 [4.7314]

[4.13401]

	// Phase E: Write IDs back to .data files
	if len(fileUpdates) > 0 {
		writeErrors := writeIDsToDataFiles(fileUpdates)
		output.Errors = append(output.Errors, writeErrors...)
	}

Deletion in tools/import_segments.go at line 184 [4.7314]
B:BD[4.13675] → [4.13675:13727]
```
	output.Summary.SkippedBookmarks = skippedBookmarks
```

Insertion in tools/import_segments.go at line 526 [4.7314]

[4.23358]

// dataFileUpdate holds data to write back to .data file after import
type dataFileUpdate struct {
	DataPath string
	WavHash  string
	LabelIDs map[int]map[int]string // segmentIndex -> labelIndex -> labelID
}

Replacement in tools/import_segments.go at line 545 [4.7314]

B:BD[4.23774] → [4.23774:23833]

) ([]SegmentImport, int, int, int, []ImportSegmentError) {

[4.23774]

[4.23833]

) ([]SegmentImport, int, int, []dataFileUpdate, []ImportSegmentError) {

Replacement in tools/import_segments.go at line 550 [4.7314]
B:BD[4.23948] → [4.23948:23971]
```
	skippedBookmarks := 0
```
[4.23948]
[4.23971]
```
	var fileUpdates []dataFileUpdate
```
Replacement in tools/import_segments.go at line 559 [4.7314]
B:BD[4.24212] → [4.24212:24242]
```
		return nil, 0, 0, 0, errors
```
[4.24212]
[4.24242]
```
		return nil, 0, 0, nil, errors
```

Replacement in tools/import_segments.go at line 577 [4.7314]

B:BD[4.24586] → [4.24586:25219]

		// Update file_metadata with skraak_hash
		_, err = tx.ExecContext(ctx, `
			INSERT INTO file_metadata (file_id, json, created_at, last_modified, active)
			VALUES (?, json('{"skraak_hash": "' || ? || '"}'), now(), now(), true)
			ON CONFLICT (file_id) DO UPDATE SET
				json = json_set(COALESCE(json, '{}'), '$.skraak_hash', ?),
				last_modified = now(),
				active = true
		`, sf.FileID, sf.WavHash, sf.WavHash)
		if err != nil {
			errors = append(errors, ImportSegmentError{
				File:    filepath.Base(sf.DataPath),
				Stage:   "import",
				Message: fmt.Sprintf("failed to update file_metadata: %v", err),
			})
			continue

[4.24586]

[4.25219]

		// Track label IDs for writing back to .data file
		fileUpdate := dataFileUpdate{
			DataPath: sf.DataPath,
			WavHash:  sf.WavHash,
			LabelIDs: make(map[int]map[int]string),

Replacement in tools/import_segments.go at line 585 [4.7314]

B:BD[4.25246] → [4.25246:25551]

		for _, seg := range sf.Segments {
			// Check for bookmarks - skip entire segment if any label is bookmarked
			hasBookmark := false
			for _, label := range seg.Labels {
				if label.Bookmark {
					hasBookmark = true
					break
				}
			}
			if hasBookmark {
				skippedBookmarks++
				continue
			}

[4.25246]

[4.25551]

		for segIdx, seg := range sf.Segments {

Replacement in tools/import_segments.go at line 639 [4.7314]

B:BD[4.27332] → [4.27332:27370]

			for _, label := range seg.Labels {

[4.27332]

[4.27370]

			fileUpdate.LabelIDs[segIdx] = make(map[int]string)
			for labelIdx, label := range seg.Labels {

Replacement in tools/import_segments.go at line 698 [4.7314]

B:BD[4.29049] → [4.29049:29147]

				// Insert label_metadata
				metadataJSON := fmt.Sprintf(`{"skraak_label_id": "%s"`, labelID)

[4.29049]

[4.29147]

				// Track label ID for .data file update
				fileUpdate.LabelIDs[segIdx][labelIdx] = labelID
				// Insert label_metadata if comment exists

Deletion in tools/import_segments.go at line 703 [4.7314]
B:BD[4.29176] → [4.29176:29209]
```
					// Escape quotes in comment
```

Replacement in tools/import_segments.go at line 704 [4.7314]

B:BD[4.29277] → [4.29277:29377]

					metadataJSON += fmt.Sprintf(`, "comment": "%s"`, escapedComment)
				}
				metadataJSON += "}"

[4.29277]

[4.29377]

					metadataJSON := fmt.Sprintf(`{"comment": "%s"}`, escapedComment)

Replacement in tools/import_segments.go at line 706 [4.7314]

B:BD[4.29378] → [4.29378:29797]

				_, err = tx.ExecContext(ctx, `
					INSERT INTO label_metadata (label_id, json, created_at, last_modified, active)
					VALUES (?, ?, now(), now(), true)
				`, labelID, metadataJSON)
				if err != nil {
					errors = append(errors, ImportSegmentError{
						File:    filepath.Base(sf.DataPath),
						Stage:   "import",
						Message: fmt.Sprintf("failed to insert label_metadata: %v", err),
					})
					continue

[4.29378]

[4.29797]

					_, err = tx.ExecContext(ctx, `
						INSERT INTO label_metadata (label_id, json, created_at, last_modified, active)
						VALUES (?, ?, now(), now(), true)
					`, labelID, metadataJSON)
					if err != nil {
						errors = append(errors, ImportSegmentError{
							File:    filepath.Base(sf.DataPath),
							Stage:   "import",
							Message: fmt.Sprintf("failed to insert label_metadata: %v", err),
						})
						continue
					}

Insertion in tools/import_segments.go at line 783 [4.7314]
[4.31677]
[4.31677]
```
		fileUpdates = append(fileUpdates, fileUpdate)
```
Replacement in tools/import_segments.go at line 793 [4.7314]
B:BD[4.31880] → [4.31880:31925]
```
		return nil, 0, 0, skippedBookmarks, errors
```
[4.31880]
[4.31925]
```
		return nil, 0, 0, nil, errors
```

Replacement in tools/import_segments.go at line 796 [4.7314]

B:BD[4.31929] → [4.31929:32014]

	return importedSegments, importedLabels, importedSubtypes, skippedBookmarks, errors

[4.31929]

[4.32014]

	return importedSegments, importedLabels, importedSubtypes, fileUpdates, errors

Replacement in tools/import_segments.go at line 799 [4.7314]
B:BD[4.32017] → [4.32017:32076]
```
// countTotalSegments counts total non-bookmarked segments
```
[4.32017]
[4.32076]
```
// countTotalSegments counts total segments
```

Replacement in tools/import_segments.go at line 803 [4.7314]

B:BD[4.32185] → [4.32185:32348]

		for _, seg := range sf.Segments {
			hasBookmark := false
			for _, label := range seg.Labels {
				if label.Bookmark {
					hasBookmark = true
					break
				}

[4.32185]

[4.32348]

		count += len(sf.Segments)
	}
	return count
}
// writeIDsToDataFiles writes skraak_hash and skraak_label_ids back to .data files
func writeIDsToDataFiles(fileUpdates []dataFileUpdate) []ImportSegmentError {
	var errors []ImportSegmentError
	for _, fu := range fileUpdates {
		// Parse the .data file
		df, err := utils.ParseDataFile(fu.DataPath)
		if err != nil {
			errors = append(errors, ImportSegmentError{
				File:    filepath.Base(fu.DataPath),
				Stage:   "import",
				Message: fmt.Sprintf("failed to re-parse .data file for writing: %v", err),
			})
			continue
		}
		// Write skraak_hash to metadata
		if df.Meta.Extra == nil {
			df.Meta.Extra = make(map[string]any)
		}
		df.Meta.Extra["skraak_hash"] = fu.WavHash
		// Write skraak_label_id to each label
		for segIdx, labelIDs := range fu.LabelIDs {
			if segIdx >= len(df.Segments) {
				continue

Replacement in tools/import_segments.go at line 835 [4.7314]

B:BD[4.32353] → [4.32353:32386]

			if !hasBookmark {
				count++

[4.32353]

[4.32386]

			seg := df.Segments[segIdx]
			for labelIdx, labelID := range labelIDs {
				if labelIdx >= len(seg.Labels) {
					continue
				}
				label := seg.Labels[labelIdx]
				if label.Extra == nil {
					label.Extra = make(map[string]any)
				}
				label.Extra["skraak_label_id"] = labelID

Insertion in tools/import_segments.go at line 847 [4.7314]

[4.32395]


		// Write the updated .data file
		if err := df.Write(fu.DataPath); err != nil {
			errors = append(errors, ImportSegmentError{
				File:    filepath.Base(fu.DataPath),
				Stage:   "import",
				Message: fmt.Sprintf("failed to write updated .data file: %v", err),
			})
			continue
		}

Replacement in tools/import_segments.go at line 858 [4.7314]
B:BD[4.32398] → [4.32398:32412]
```
	return count
```
[4.32398]
[4.32412]
```
	return errors
```

Replacement in cmd/import.go at line 313 [5.9244]

B:BD[4.46071] → [2.1561:1650]

		fmt.Fprintf(os.Stderr, "  - Segments with bookmark=true labels are skipped\n") //WRONG

[4.46071]

[4.46152]

		fmt.Fprintf(os.Stderr, "  - Bookmark flags are ignored (not stored in database)\n")

Deletion in cmd/import.go at line 393 [5.9244]

B:BD[4.48525] → [4.48525:48611]

	fmt.Fprintf(os.Stderr, "  Bookmarks skipped: %d\n", output.Summary.SkippedBookmarks)

Replacement in README.md at line 303 [7.334405]

B:BD[6.4277] → [2.2133:2304]

- `label_metadata` - stores comments XXXX Not `skraak_label_id`, writes this to .data file XXXX
- `file_metadata` - XXXX Not `skraak_hash, writes this to .data file` XXXX

[6.4277]

[8.6579]

- `label_metadata` - stores comments (if present)

Replacement in README.md at line 305 [7.334405]

B:BD[8.6580] → [2.2305:2464]

**Skipped:** NO NO NO, WRONG
- Segments with `bookmark: true` labels (navigation markers) NO NO NO, imports the segment but ignores the bookmark:true metadata

[8.6580]

[6.4452]

**Data file updates:**
- `skraak_hash` written to metadata section
- `skraak_label_id` written to each label object

Insertion in README.md at line 309 [7.334405]

[6.4453]

[7.336498]

**Bookmarks:** Segments with `bookmark: true` are imported normally; the bookmark flag is ignored (not stored in database).

Deletion in CHANGELOG.md at line 51 [9.1]
B:BD[6.5505] → [6.5505:5533]
```
    "skipped_bookmarks": 5,
```

Replacement in CHANGELOG.md at line 68 [9.1]

B:BD[6.6105] → [2.2468:2553]

- `label_metadata` table: `{ "comment": "..."}`XXX Not "skraak_label_id": "...", XXX

[6.6105]

[6.6178]

- `label_metadata` table: `{"comment": "..."}` (only if comment present)

Replacement in CHANGELOG.md at line 70 [9.1]

B:BD[6.6273] → [2.2554:2615]

- XXX NO `file_metadata` table: `{"skraak_hash": "..."}` XXX

[6.6273]

[6.6323]


**Data file updates:**
- `skraak_hash` written to metadata section (first element of .data array)
- `skraak_label_id` written to each label object