quietlight/skraak_mcp: CHANGELOG.md

# Changelog

All notable changes to the Skraak MCP Server are documented here.

## [2026-03-14] Remove import_ml_selections (Deprecated)

**Breaking Change:** Removed deprecated `import selections` CLI command and `import_ml_selections` MCP tool.

The `import segments` command is the replacement, offering:
- AviaNZ .data file import (industry standard)
- Species/calltype mapping file validation
- Transactional imports with proper error handling
- Simpler, more maintainable codebase

**Removed:**
- `tools/import_ml_selections.go` (1134 lines)
- `cmd/mcp.go` — `import_ml_selections` MCP tool registration
- `cmd/import.go` — `selections` CLI subcommand

**Changes:**
- `utils/mapping.go` — Exported `Placeholders()` function for reuse

## [2026-03-14] Import Segments - Fix Orphaned Segments

**Fix:** Segments with no valid labels are now deleted from the database.

When a segment's labels all fail validation (e.g., missing species in mapping), the segment
was previously left orphaned in the database with no labels. Now the segment is deleted within
the same transaction, maintaining data integrity.

**Changes:**
- `tools/import_segments.go` — Delete orphaned segments when all labels fail validation
- `utils/mapping_test.go` — Unit tests for mapping file loading and validation
- `tools/import_segments_test.go` — Unit tests for input validation and segment counting
- `utils/data_file_test.go` — Added tests for skraak_hash and skraak_label_id round-trip

## [2026-03-14] Import Segments Command

**Feature:** New `skraak import segments` command to import AviaNZ .data segments into the database.

**Changes:**
- `utils/mapping.go` — New utilities for loading and validating species/calltype mapping files
- `tools/import_segments.go` — New tool with `ImportSegments()` function
- `cmd/import.go` — Added `segments` subcommand

**Usage:**
```bash
skraak import segments \
  --db ./db/skraak.duckdb \
  --dataset gljgxDbfasva \
  --location ZEVWGbXzB1bl \
  --cluster q7w-iQgyZOYV \
  --folder /path/to/data \
  --mapping mapping.json
```

**Mapping file format** (`mapping.json`):
```json
{
  "Don't Know": {
    "species": "Don't Know"
  },
  "GSK": {
    "species": "Roroa",
    "calltypes": {
      "Male": "Male - Solo",
      "Female": "Female - Solo"
    }
  }
}
```

**Output structure:**
```json
{
  "summary": {
    "data_files_found": 42,
    "data_files_processed": 42,
    "total_segments": 342,
    "imported_segments": 342,
    "imported_labels": 356,
    "imported_subtypes": 280,
    "processing_time_ms": 1234
  },
  "segments": [...],
  "errors": []
}
```

**Invariants enforced:**
- All file hashes must already exist in database for the cluster
- All files must have no existing labels (fresh imports only)
- All filters, species, and calltypes must exist in database
- Segments with `bookmark: true` labels are skipped
- Mapping must cover all species found in .data files

**Database writes:**
- `segment` table: id, file_id, dataset_id, start_time, end_time, freq_low, freq_high
- `label` table: id, segment_id, species_id, filter_id, certainty
- `label_metadata` table: `{"comment": "..."}` (only if comment present)
- `label_subtype` table: id, label_id, calltype_id, filter_id, certainty (if calltype present)

**Data file updates:**
- `skraak_hash` written to metadata section (first element of .data array)
- `skraak_label_id` written to each label object

**Rationale:**
AviaNZ .data files contain segment annotations from both manual review and ML filters. This command imports those segments into the skraak database with proper species/calltype mapping, enabling integrated analysis across all annotation sources.

## [2026-03-13] Calls Summarise Command

**Feature:** New `skraak calls summarise` command to analyse .data files after classification.

**Changes:**
- `tools/calls_summarise.go` — New tool with `CallsSummarise()` function
- `cmd/calls.go` — Added `summarise` subcommand

**Usage:**
```bash
skraak calls summarise --folder ./recordings > summary.json
skraak calls summarise --folder ./recordings | jq 'del(.segments)'  # summary only
```

**Output structure:**
```json
{
  "segments": [...],
  "data_files_read": 27,
  "data_files_skipped": [],
  "total_segments": 47,
  "filters": {
    "opensoundscape-kiwi-1.2": {
      "segments": 20,
      "species": {"Kiwi": 15, "Don't Know": 5},
      "calltypes": {"Kiwi": {"Male": 10, "Duet": 5}}
    }
  },
  "review_status": {
    "unreviewed": 30,
    "confirmed": 10,
    "dont_know": 5,
    "with_calltype": 8,
    "with_comments": 3,
    "bookmarked": 2
  },
  "operators": ["Auto"],
  "reviewers": ["David", "None"]
}
```

**Review status definitions:**
- `unreviewed`: certainty < 100 (default from detection)
- `confirmed`: certainty = 100 (user pressed bind key)
- `dont_know`: certainty = 0

**Calltypes:** Only appears in filters when species have calltypes set, showing per-species calltype counts.

**Rationale:**
After running `skraak classify` on .data files, it's difficult to understand the state of classifications. This command provides a comprehensive summary with both detailed segments array and aggregated statistics.

## [2026-03-10] Spectrogram Sample Rate Limiting

**Feature:** Spectrograms now automatically downsample high sample rate audio to 16kHz.

**Changes:**
- `utils/spectrogram.go` — Added `DefaultMaxSampleRate = 16000` constant
- `utils/resample.go` — Added `ResampleRate()` function for sample rate conversion
- `tools/calls_show_images.go` — Downsample segments before spectrogram generation
- `tui/classify.go` — Downsample segments before spectrogram generation

**Rationale:**
- High sample rates (e.g., 250kHz bat detectors) produce very tall spectrograms
- Birds are typically in 0-8kHz range; 16kHz sample rate (Nyquist = 8kHz) is sufficient
- Audio playback unchanged — plays at original sample rate

**Behavior:**
| Original Rate | Spectrogram Rate | Playback Rate |
|---------------|------------------|---------------|
| 8000 Hz | 8000 Hz | 8000 Hz |
| 16000 Hz | 16000 Hz | 16000 Hz |
| 44100 Hz | 16000 Hz | 44100 Hz |
| 250000 Hz | 16000 Hz | 250000 Hz |

## [2026-03-09] Case-Preserving WAV File Finding

**Fix:** WAV files with lowercase `.wav` extension now produce correct `.wav.data` files.

**Changes:**
- `tools/calls_from_preds.go` — Added `findWAVFile()` helper function
- `tools/calls_from_birda.go` — Updated to use `findWAVFile()`
- `tools/calls_from_raven.go` — Updated to use `findWAVFile()`

**Problem:** Previous code hardcoded `.WAV` extension, causing issues on case-sensitive filesystems:
- `abc.wav` would fail to be found
- Or produce `abc.WAV.data` instead of `abc.wav.data`

**Solution:** `findWAVFile(dir, baseName)` searches for:
1. `.WAV` (most common for main recordings)
2. `.wav` (common for clips)
3. `.Wav` (edge case)
4. Case-insensitive glob fallback

**Result:**
| WAV File | .data File |
|----------|------------|
| `abc.WAV` | `abc.WAV.data` |
| `abc.wav` | `abc.wav.data` |
| `abc.Wav` | `abc.Wav.data` |

## [2026-03-09] Bookmark Navigation in TUI

**New feature:** Bookmark segments for later review.

**Changes:**
- `utils/data_file.go` — Added `Bookmark bool` to Label struct
- `tools/calls_classify.go` — Added bookmark methods
- `tui/classify.go` — Added key handlers and display
- `tui/classify.go` — Header lines now wrap at 80 characters

**Format** (stored in label):
```json
[0, 3, 0, 16000, [{"species": "Kiwi", "certainty": 90, "filter": "BirdNET", "bookmark": true}]]
```

**Key bindings:**
| Key | Action |
|-----|--------|
| `Ctrl+D` | Toggle bookmark on current segment |
| `Ctrl+,` | Previous bookmark (wraps around) |
| `Ctrl+.` | Next bookmark (wraps around) |

**Behavior:**
- Bookmark lives on the filter-matching label
- `--filter BirdNET` shows bookmarks on BirdNET labels only
- No filter shows all bookmarks
- Wrap-around navigation with loop detection
- `[BOOKMARKED]` indicator shown in segment info

## [2026-03-09] Comment Dialog Editing in TUI

**Enhancement:** Full cursor editing support in the comment dialog.

**Changes:**
- `tui/classify.go` — Added cursor position tracking and navigation

**New features:**
| Key | Action |
|-----|--------|
| `←` / `→` | Move cursor left/right |
| `Space` | Insert space at cursor |
| `Backspace` | Delete character before cursor |
| `Delete` | Delete character at cursor |
| `Ctrl+A` | Move cursor to start |
| `Ctrl+E` | Move cursor to end |

**Fixed:**
- Space bar now works in comment dialog
- Backspace deletes at cursor position, not just at end

## [2026-03-09] New Commands: calls from-birda and calls from-raven

**New feature:** Import BirdNET and Raven annotation files to .data files.

**Added:**
- `tools/calls_from_birda.go` — BirdNET results file parser
- `tools/calls_from_raven.go` — Raven selections file parser
- `cmd/calls.go` — New subcommands `from-birda` and `from-raven`
- `tools/calls_from_birda_raven_test.go` — 10 test cases

**Commands:**
```bash
# BirdNET (filter always "BirdNET")
./skraak calls from-birda --folder /path/to/recordings
./skraak calls from-birda --file recording.BirdNET.results.csv [--delete]

# Raven (filter always "Raven")
./skraak calls from-raven --folder /path/to/recordings
./skraak calls from-raven --file recording.Table.1.selections.txt [--delete]
```

**File formats:**
- BirdNET: `*.BirdNET.results.csv` (CSV with BOM, columns: Start, End, Scientific name, Common name, Confidence, File)
- Raven: `*.selections.txt` (Tab-separated, columns: Begin Time, End Time, Low Freq, High Freq, Species)

**Behavior (same as from-preds):**
- Filter is always parsed from filename (no `--filter` option)
- No clobber: if filter already exists, error
- Merge: if different filter exists, append segments
- Confidence (BirdNET) converted from 0.0-1.0 to 0-100
- Frequency range preserved from Raven selections
- `--delete` option removes source files after successful import

**Tests:** 10 new tests covering:
- New .data file creation
- Same filter rejection (no clobber)
- Different filter merge
- Delete option
- Folder mode (BirdNET only)
- Multiple selections (Raven only)

## [2026-03-09] Safe .data File Writing in calls-from-preds

**Breaking change:** Filter must now be non-empty. Previously empty filter was allowed.

**Problem:** `calls-from-preds --write-dot-data` would silently clobber existing `.data` files, potentially destroying manual annotations.

**Solution:** Implemented safe write logic that protects existing data:

1. **No existing file** → Write new file (unchanged behavior)
2. **Existing file, same filter** → Error: "file already contains filter 'X' (refusing to clobber)"
3. **Existing file, different filter** → Merge segments (append new, sort by time)
4. **Existing file, parse error** → Error: "cannot parse existing file (refusing to clobber)"

**Changes:**
- `tools/calls_from_preds.go` — Added `writeDotDataFileSafe()` for safe write/merge logic
- `tools/calls_from_preds.go` — Added filter validation: empty filter now returns error
- `tools/calls_from_preds.go` — Filter defaults to CSV filename parsing if `--filter` not specified
- `tools/calls_from_preds.go` — Added `convertAviaNZSegment()` and `buildAviaNZMetaAndSegments()` helpers

**Filter logic:**
- If `--filter "name"` specified → use that filter
- If `--filter` not specified → parse from CSV filename (e.g., `predsST_opensoundscape-kiwi-1.2_2025-11-12.csv` → `opensoundscape-kiwi-1.2`)
- If filter is empty string → error

**Error handling:** First error stops batch processing (existing behavior preserved).

**Tests added:** `tools/calls_from_preds_test.go` with 7 test cases:
- Empty filter returns error
- New .data file created when none exists
- Existing file with same filter returns error (refuses to clobber)
- Existing file with different filter merges segments
- Existing file with parse error returns error (refuses to clobber)
- Explicit filter via `--filter` flag
- Non-parsable filename without filter returns error

## [2026-03-07] JSON Schema for AviaNZ .data Files

**New feature:** Added JSON Schema (Draft 2020-12) for validating AviaNZ .data annotation files.

**Added:**
- `db/avianz_data_schema.json` — Comprehensive schema for .data file format

**Schema coverage:**
- Root array with metadata object first, then segment arrays
- Meta object with `Operator`, `Reviewer`, `Duration` (optional, allows extra fields)
- Segment array: 5-element tuple `[starttime, endtime, freq_low, freq_high, labels]`
- Label object with required `species` and `certainty` (0-100)
- Optional fields: `filter`, `calltype`, `comment` (max 140 chars)
- Additional properties allowed on all objects (extensibility)
- Pattern constraint: `species` must not contain `>` separator

**Validation tests:**
- Missing required fields caught
- Certainty range (0-100) enforced
- Comment length (max 140) enforced
- Minimal valid files accepted

## [2026-03-07] Comment Feature in Classify TUI

**New feature:** Press spacebar in the classify TUI to add/edit comments on labels.

**Changes:**
- `utils/data_file.go` — Added `Comment` field to `Label` struct, parse/write handling
- `tools/calls_classify.go` — Added `SetComment()` and `GetCurrentComment()` methods, `Comment` field in `BindingResult`
- `tui/classify.go` — Added `commentMode`/`commentText` state, spacebar opens dialog, text input handling, dialog rendering

**AviaNZ spec compliance:** The spec allows "any additional attributes defined for this call" as key-value pairs. Comments are stored as `"comment": "text"` in the label object.

**Usage:**
- `[space]` — Open comment dialog (pre-fills existing comment)
- Type comment (max 140 chars, ASCII only)
- `[enter]` — Save comment
- `[esc]` — Cancel (discard changes)
- `[backspace]` — Delete last character
- `[ctrl+u]` — Clear all

**Help text:** `[esc]quit [,]prev [.]next [space]comment [enter]play [shift+enter]½speed`

## [2026-03-04] Half-Speed Audio Playback in Classify TUI

**New feature:** Press Shift+Enter in the classify TUI to play audio at half speed.

**Changes:**
- `utils/resample.go` — **NEW** Linear interpolation resampling for speed changes
- `utils/audio_player.go` — Added `PlayAtSpeed(samples, sampleRate, speed)` method
- `tools/calls_classify.go` — Added `PlaybackSpeed` field to `ClassifyState`
- `tui/classify.go` — Detect Shift+Enter modifier, display "▶ Playing 0.5x..." in status
- `tui/classify.go` — Changed quit key from `q` to `Escape` (frees `q` for bindings)

**Usage:** `[esc]quit  [enter]play  [shift+enter]½speed`

## [2026-03-04] Performance Optimizations for calls-from-preds

**Problem:** Processing 7617 WAV files took 16 minutes due to excessive I/O and sequential processing.

**Changes:**
- `utils/wav_metadata.go` — Added `ParseWAVHeaderMinimal()` that reads only 4KB instead of 200KB per file (50× less I/O). Added separate buffer pool for minimal headers.
- `tools/calls_from_preds.go` — Added parallel processing with 8 workers for .data file generation. Small batches (<10 files) use sequential processing to avoid goroutine overhead.
- `tools/calls_from_preds.go` — Added `ProgressHandler` callback type for progress reporting during long operations.
- `cmd/calls.go` — Added progress indicator showing "Processing WAV files: X/Y (Z%)" during .data file writing.

**Expected improvement:** ~8× faster on multi-core systems due to parallel processing + reduced I/O overhead.

## [2026-03-04] Add iTerm2 Inline Image Protocol Support

**New feature:** Added `--iterm` flag for terminals supporting the iTerm2 Inline Image Protocol (WezTerm, iTerm2, VS Code terminal).

- `utils/terminal_image.go` — Added `ProtocolITerm` enum value and `WriteITermImage()` using charm's `x/ansi/iterm2` package; PNG-encodes then base64-encodes for the iTerm2 escape sequence
- `tools/calls_show_images.go` — Added `ITerm` field to `CallsShowImagesInput`, checked before `Sixel` in protocol selection
- `tools/calls_classify.go` — Added `ITerm` field to `ClassifyConfig`
- `cmd/calls.go` — Added `--iterm` flag to `show-images` subcommand
- `cmd/calls_classify.go` — Added `--iterm` flag to `classify` subcommand
- `tui/classify.go` — Renamed `sixelImageCmd` to `inlineImageCmd` with protocol parameter; changed conditionals from `== ProtocolSixel` to `!= ProtocolKitty` so both sixel and iTerm2 use the same inline rendering path
- `utils/terminal_image_test.go` — Tests for `WriteITermImage`, `WriteImage` routing, and `ClearImages` no-op

## [2026-02-28] Fix Kitty Image Rendering at 448px in Classify TUI

**Bug fix:** Spectrogram display upgraded from 224x224 to 448x448 pixels. Old image artifacts persisted between segment navigations at the larger size.

- `utils/kitty_image.go` — Chunked Kitty protocol transmission (4096-byte chunks) per spec; small images still sent as single payload
- `tui/classify.go` — Return `tea.ClearScreen` on navigation keys (`,`, `.`, bindings) to force full redraw and reliable image clearing
- `tui/classify.go` — `ResizeImage` call updated from 224x224 to 448x448
- `utils/kitty_image_test.go` — Tests for single-chunk, multi-chunk, and clear behavior

## [2026-02-28] Audio Playback in Classify TUI

**New feature:** Press Enter to play the current segment's audio during classification.

- Added `utils/audio_player.go` — wraps ebitengine/oto v3 for PCM playback
- Oto context created lazily on first play, reused across segments
- Converts `[]float64` samples → signed int16 LE for oto
- Playback stops automatically on navigation (`,`/`.`), binding keys, and quit
- "▶ Playing..." indicator shown in segment info line
- New dependency: `github.com/ebitengine/oto/v3` (requires `libasound2-dev` on Linux)

## [2026-02-22] New CLI Command: calls-from-preds

**New feature:** Extract clustered bird calls from ML predictions CSV files.

**Usage:**
```bash
./skraak calls-from-preds --csv predictions.csv > calls.json
```

**How it works:**
1. Reads prediction CSV (file, start_time, end_time, ebird_code columns with 1/0 values)
2. Auto-detects clip duration from first row
3. Groups detections by (file, ebird_code) and sorts by start_time
4. Clusters consecutive detections where gap ≤ 3 × clip_duration
5. Filters out single detections (configurable via constant)

**Constants (easily changeable):**
```go
CLUSTER_GAP_MULTIPLIER     = 3  // Gap threshold = 3 × clip_duration
MIN_DETECTIONS_PER_CLUSTER = 1  // Filter single detections
```

**Performance:** 400k+ rows processed in ~0.67 seconds

**Output example:**
```json
{
  "calls": [
    {"file": "path.WAV", "start_time": 0, "end_time": 32, "ebird_code": "tomtit1", "detections": 11}
  ],
  "total_calls": 62593,
  "species_count": {"tomtit1": 12636, ...},
  "files_count": 14017
}
```

**Files:**
- `tools/calls_from_preds.go` — Core clustering logic
- `cmd/calls_from_preds.go` — CLI handler

---

## [2026-02-21] Remove import_audio_file MCP Tool

**Breaking change:** Removed `import_audio_file` MCP tool. Use CLI command `skraak import file` for single file imports.

**Rationale:** The MCP tool was redundant since:
1. Single file imports are better suited for CLI use (requires file path on local machine)
2. `import_audio_files` handles batch imports efficiently via MCP
3. Reduces MCP tool count from 11 to 10

**Changes:**
- **`cmd/mcp.go`** — Removed `import_audio_file` tool registration and adapter
- **`tools/import_file.go`** — Kept for CLI use only
- **`cmd/import.go`** — CLI command `skraak import file` unchanged

**Migration:** Use CLI command instead:
```bash
./skraak import file --db ./db/skraak.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/file.wav
```

---

## [2026-02-21] Verb-First CLI Commands

**Breaking change:** Replaced resource-first CLI commands with natural language verb-first structure.

**Before:**
```bash
./skraak dataset create --name "Test"
./skraak location update --id abc123 --name "Updated"
```

**After:**
```bash
./skraak create dataset --name "Test"
./skraak update location --id abc123 --name "Updated"
```

**Changes:**
- **`main.go`** — Removed legacy `dataset`, `location`, `cluster`, `pattern` commands
- **`cmd/create.go`** — New verb-first create handler
- **`cmd/update.go`** — New verb-first update handler  
- **`cmd/dataset.go`, `cmd/location.go`, `cmd/cluster.go`, `cmd/pattern.go`** — Exported create/update functions
- **Shell scripts** — Updated `test_bulk_import.sh` and `test_event_log.sh` to use new syntax

**Benefits:**
- Natural language flow: "create dataset" vs "dataset create"
- Consistent with `skraak import file/folder/bulk` pattern
- More intuitive for users
- Maintains clean tool separation in `@tools/` directory

**Migration:** Legacy commands now return "Unknown command" error, forcing adoption of new syntax.

---

## [2026-02-21] Fix Event Log Pointer Serialization

**Bug fix:** Event log contained pointer addresses instead of values for nullable database fields (`*float64`, `*GainLevel`, etc.), causing replay failures.

**Root cause:** `marshalParam()` in `db/tx_logger.go` didn't handle pointer types for numeric values or named type aliases (like `db.GainLevel`). These fell through to `fmt.Sprintf("%v", pointer)` which printed memory addresses like `"0x38a7bfb12078"`.

**Example of corrupted data:**
```json
"parameters": ["file_id", "2025-05-18T18:30:00+13:00", "248AB50053AB1B4A", "0x38a7bfb12078", "0x38a7bfb12088", "0x38a7bfb12090"]
```
The last three values should have been `gain`, `battery_v`, `temp_c` but were pointer addresses.

**Fixed:**
- `db/tx_logger.go` — Added explicit cases for all pointer types (`*int`, `*int64`, `*float64`, `*bool`, etc.)
- `db/tx_logger.go` — Added reflection-based fallback in default case to handle pointer-to-named-type (e.g., `*GainLevel`)
- `cmd/replay.go` — Increased `bufio.Scanner` buffer from 64KB to 20MB to handle large event lines (17,000 files = ~16 MB JSON line)

**Tests added:**
- `db/tx_logger_test.go` — Tests for `*int`, `*int64`, `*float64`, `*float32`, `*bool` with nil and value cases
- `db/tx_logger_test.go` — Tests for named type aliases and pointer-to-named-type

---

## [2026-02-19] Fix Update Commands - Preserve Unset Fields

**Bug fix:** Update commands were overwriting existing values with empty strings when optional flags weren't provided.

**Root cause:** CLI code set pointers to empty strings even when flags weren't provided, causing tools layer to interpret them as intentional empty values.

**Fixed:**
- `cmd/dataset.go` — `runDatasetUpdate()` now only sets pointer fields when flags have non-empty values
- `cmd/location.go` — `runLocationUpdate()` now only sets pointer fields when flags have non-empty values
- `cmd/cluster.go` — Already correct (only sets fields when provided)
- `cmd/pattern.go` — Already correct (only sets fields when provided)

**Tests added:**
- `tools/update_test.go` — Unit tests verifying update preserves unset fields for all entity types

---

## [2026-02-19] Schema Simplification - Remove species_dataset and ebird_taxonomy_v2024

**Database schema changes:**
- Dropped `species_dataset` table — all species now available across all datasets
- Dropped `ebird_taxonomy_v2024` table — use `WHERE taxonomy_version = '2024'` on `ebird_taxonomy` instead

**Rationale:**
- Simplifies species management (no duplicate species names across datasets)
- Reduces schema complexity (one fewer join for species lookups)
- `ebird_taxonomy_v2024` was redundant; filtering `ebird_taxonomy` directly is sufficient

**Code changes:**
- `tools/export.go` — Simplified manifest: `species` and `call_type` now "copy" (full table)
- `tools/export.go` — Removed `buildDerivedTableCreate()`, `populateDerivedTable()`, simplified `buildReferencedQuery()`
- `tools/import_ml_selections.go` — Species lookup no longer joins `species_dataset`
- `resources/schema.go` — Removed tables from list
- `db/schema_test.go` — Removed obsolete test cases
- `prompts/examples.go` — Updated taxonomy schema description

**Export manifest changes:**
- `species_dataset` → removed (no longer exists)
- `ebird_taxonomy_v2024` → removed (no longer exists)
- `species` → changed from "referenced" to "copy"
- `call_type` → changed from "referenced" to "copy"
- `filter` → changed from "referenced" to "copy"
- All "referenced" and "derived" handling code removed

---

## [2026-02-19] Dataset Export for Collaboration and Testing

**New feature: Export a dataset with all related data to a new database**

**Purpose:** Enable dataset-level exports for collaboration (export, modify, replay changes), testing (small focused test DBs), and archival.

**Architecture:**
- Schema read from embedded `db/schema.sql` (DDL statements extracted dynamically)
- Table copy order computed from FK relationships using `duckdb_constraints()`
- ATTACH mechanism for efficient cross-database copying
- Declarative manifest defines table relationships

**Added:**
- `tools/export.go` — `ExportDataset()` with table manifest and copy logic
- `cmd/export.go` — `skraak export dataset` CLI command
- `db/schema.go` — Schema utilities: `ReadSchemaSQL()`, `ExtractDDLStatements()`, `GetFKOrder()`
- `shell_scripts/test_export.sh` — Integration test script

**Command:**
```bash
skraak export dataset --db skraak.duckdb --id abc123 --output export.duckdb
skraak export dataset --db skraak.duckdb --id abc123 --output export.duckdb --dry-run
skraak export dataset --db skraak.duckdb --id abc123 --output export.duckdb --force
```

**What's exported:**
- Dataset row and all owned data (locations, clusters, files, selections, labels)
- Reference tables copied in full (`ebird_taxonomy`, `species`, `call_type`, `cyclic_recording_pattern`, `filter`)
- Empty event log created for capturing changes

**Design decisions:**
- Schema from `schema.sql` ensures schema-resilience (new columns auto-included)
- FK order computed dynamically via `duckdb_constraints()` function
- Close source DB before output DB (DuckDB single-connection limit)
- `SELECT *` copies all columns without hard-coding

**Testing:**
- `db/schema_test.go` — Unit tests for DDL extraction and FK ordering
- Integration tests verify row counts match source
- Error handling tests for missing dataset, existing file

---

## [2026-02-18] Event Log for Database Mutation Replay

**New feature: SQL-level event logging for backup synchronization**

**Purpose:** Capture all mutating SQL operations (INSERT, UPDATE, DELETE) to enable replay on backup databases for synchronization.

**Architecture:**
- Transaction wrapper (`db.LoggedTx`) intercepts all mutations
- Logged only on successful commit (rollback discards recorded queries)
- Events written to JSONL file (`<database>.events.jsonl`)
- Prepared statements fully supported via `LoggedStmt` wrapper

**Added:**
- `db/tx_logger.go` — LoggedTx, LoggedStmt, TransactionEvent types
- `cmd/replay.go` — `skraak replay events` CLI command
- `shell_scripts/test_event_log.sh` — Integration test script

**Modified:**
- All CLI commands initialize event log with defer close
- All tools use `db.BeginLoggedTx()` instead of `database.BeginTx()`
- `utils/cluster_import.go` updated for batch imports

**Event format (JSONL):**
```json
{
  "id": "V1StGXR8_Z5jdHi6B-myT",
  "timestamp": "2026-02-18T14:30:22+13:00",
  "tool": "create_or_update_dataset",
  "queries": [
    {"sql": "INSERT INTO ...", "parameters": [...]}
  ],
  "success": true,
  "duration_ms": 45
}
```

**Replay command:**
```bash
skraak replay events --db backup.duckdb --log skraak.duckdb.events.jsonl
skraak replay events --db backup.duckdb --log events.jsonl --dry-run
skraak replay events --db backup.duckdb --log events.jsonl --last 10
```

**Key design decisions:**
- SQL-level (not tool-level) for complete fidelity including imports
- Tool name included for context/debugging
- Only successful transactions logged
- Failed events skipped during replay
- `--continue` flag to proceed past errors

**Testing:**
- `db/tx_logger_test.go` — 123 unit tests, 75.9% coverage
- Pure function tests (isMutation, marshalParam, JSON marshaling)
- Integration tests with real DuckDB and file system
- Race detector verified

---

## [2026-02-11] CLI Refactoring — Two-Layer Architecture

**Major refactoring: Separated core logic from MCP types, added CLI commands**

**Problem:** All tool functions were tightly coupled to MCP SDK types (`*mcp.CallToolRequest`, `*mcp.CallToolResult`). This meant functionality could only be invoked via MCP protocol — no CLI access for power users.

**Solution:** Two-layer architecture separating core logic from MCP adapters.

**Created:**
- `cmd/mcp.go` — MCP server setup + 10 thin adapter wrappers (~3 lines each)
- `cmd/import.go` — `skraak import bulk` CLI command with flag parsing
- `cmd/sql.go` — `skraak sql` CLI command for ad-hoc queries

**Modified (mechanical, all tools/):**
- Removed `*mcp.CallToolRequest` parameter (was never used — `req` always ignored)
- Removed `*mcp.CallToolResult` from returns (was always empty `&mcp.CallToolResult{}`)
- Removed `import "github.com/modelcontextprotocol/go-sdk/mcp"` from all tool files
- Updated test files (`integration_test.go`, `pattern_test.go`) to match new signatures
- Updated `main.go` to pure dispatcher: `mcp | import | sql`

**Architecture:**
```
main.go              → pure dispatcher
cmd/mcp.go           → MCP server + adapter wrappers (ONLY file importing mcp SDK)
cmd/import.go        → CLI: skraak import bulk --db ... --dataset ... --csv ... --log ...
cmd/sql.go           → CLI: skraak sql --db ... "SELECT ..."
tools/*.go           → core logic, NO mcp dependency (plain Go structs in/out)
utils/, db/, etc.    → unchanged
```

**Benefits:**
- CLI access for power users without MCP
- Token savings (CLI avoids MCP protocol overhead)
- Code sharing between CLI and MCP
- MCP SDK contained to one file
- All tests pass

---

## [2026-02-10] Bulk File Import Cluster Assignment Bug Fix

**Critical Bug Fix: Files now correctly distributed across multiple clusters for same location**

**Problem:** When the same location appeared multiple times in the CSV with different date ranges, all files ended up in the last cluster created instead of being distributed across their respective clusters.

**Root Cause:** The `clusterIDMap` used only `LocationID` as the key, causing each new cluster for the same location to overwrite the previous one in the map.

**Solution:** Changed map key from `LocationID` to composite key `LocationID|DateRange`.

**Modified:**
- `tools/bulk_file_import.go` (lines 125, 171-172, 183-184)

**Impact:**
- Data integrity restored
- Multiple date ranges per location now works correctly
- Simple 3-line fix, backwards compatible

---

## [2026-02-07] File Modification Time Fallback

**Enhancement: Added file modification time as third timestamp fallback**

**Problem:** Small clusters (1-2 files) failed variance-based filename disambiguation because the algorithm needs multiple samples to determine date format (YYYYMMDD vs YYMMDD vs DDMMYY).

**Timestamp Resolution Order:**
```
1. AudioMoth comment → timestamp
2. Filename parsing → timestamp
3. File modification time → timestamp (NEW!)
4. FAIL (skip file with error)
```

**Modified:**
- `utils/cluster_import.go` - Added FileModTime fallback in `batchProcessFiles()`

**Benefits:**
- Fewer failures in small clusters
- No performance impact
- Backwards compatible
- Simple 10-line change

---

## [2026-02-07] Cluster Import Logic Extraction

**Major refactoring: Extracted shared cluster import logic into utils module**

**Key Insight:** A cluster is the atomic unit of import (one SD card / one recording session / one folder).

**Created:**
- `utils/cluster_import.go` (553 lines) - Single source of truth for cluster imports
  - `ImportCluster()` - Main entry point
  - `scanClusterFiles()` - Recursive WAV file scanning
  - `batchProcessFiles()` - Batch processing with variance-based parsing
  - `insertClusterFiles()` - Transactional insertion

**Modified:**
- `tools/import_files.go` - 75% code reduction (650 lines → 161 lines)
- `tools/bulk_file_import.go` - Bug fixes:
  - **CRITICAL BUG FIXED:** Now inserts into `file_dataset` table (was missing!)
  - **CRITICAL BUG FIXED:** Now inserts into `moth_metadata` table (was missing!)

**Benefits:**
- Bug fixed: 68,043 orphaned files found in test database
- ~500 lines of duplicated code eliminated
- Single source of truth for all import logic

---

## [2026-02-06] Tool Consolidation

**Consolidated 8 write/update tools → 4 create_or_update tools**

**Deleted:**
- 8 separate create/update tool files

**Added:**
- `tools/dataset.go` - `create_or_update_dataset`
- `tools/location.go` - `create_or_update_location`
- `tools/cluster.go` - `create_or_update_cluster`
- `tools/pattern.go` - `create_or_update_pattern`

**Design:**
- Omit `id` field → CREATE mode (generates nanoid)
- Provide `id` field → UPDATE mode (verifies exists)

**Benefits:**
- Tool count: 14 → 10
- ~31% less code (~320 lines removed)
- Shared validation logic

---

## [2026-02-06] Test Script Consolidation

**Rationalized and consolidated shell test scripts**

**Removed redundant scripts:**
- 6 incomplete/redundant test scripts

**Current test suite (8 scripts):**
1. `get_time.sh` - Time tool
2. `test_sql.sh` - SQL query tool
3. `test_tools.sh` - All create_or_update tools
4. `test_import_file.sh` - Single file import
5. `test_import_selections.sh` - ML selection import
6. `test_bulk_import.sh` - Bulk CSV import
7. `test_resources_prompts.sh` - Resources/prompts
8. `test_all_prompts.sh` - All 6 prompts

---

## [2026-02-06] Bulk File Import Tool

**New Feature: CSV-based bulk import across multiple locations and clusters**

**Added:**
- `tools/bulk_file_import.go` - CSV-based bulk import (~500 lines)

**Features:**
- CSV-driven import for multiple locations
- Auto-cluster creation
- Progress logging to file
- Summary statistics

**CSV Format:**
```csv
location_name,location_id,directory_path,date_range,sample_rate,file_count
Site A,loc123456789,/path/to/recordings,2024-01,48000,150
```

---

## [2026-02-02] Single File Import Tool

**New Feature: Import individual WAV files**

**Added:**
- `tools/import_file.go` - Single file import implementation (~300 lines)

**Features:**
- Import one WAV file at a time with detailed feedback
- Same processing pipeline as batch import
- Duplicate detection with `is_duplicate` flag
- Atomic operation (succeeds completely or fails)

---

## [2026-01-29] ML Selection Import Tool

**New Feature: Import ML-detected kiwi call selections from folder structure**

**Added:**
- `utils/selection_parser.go` - Selection parsing utilities
- `utils/selection_parser_test.go` - 34 test cases
- `tools/import_ml_selections.go` - MCP tool (~1050 lines)

**Features:**
- Folder structure: `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`
- Two-pass file matching (exact, then fuzzy)
- Comprehensive validation
- Transactional import

---

## [2026-01-28] Comprehensive Go Unit Testing

**Added comprehensive unit test suite**

**Added:**
- `utils/astronomical_test.go` - 11 test cases
- `utils/audiomoth_parser_test.go` - 36 test cases
- `utils/filename_parser_test.go` - 60 test cases
- `utils/wav_metadata_test.go` - 22 test cases
- `utils/xxh64_test.go` - 6 test cases

**Coverage:**
- 170+ tests total
- 91.5% code coverage

---

## [2026-01-26] Generic SQL Tool + Codebase Rationalization

**Major architectural change: Replaced 6 specialized tools with generic SQL**

**Deleted:**
- 6 specialized query tools (datasets, locations, clusters, files)
- 2 obsolete test scripts

**Added:**
- `tools/sql.go` - Generic `execute_sql` tool (~200 lines)
- `shell_scripts/test_sql.sh` - Comprehensive SQL test suite

**Modified:**
- `prompts/examples.go` - Rewritten to teach SQL patterns

**Benefits:**
- Full SQL expressiveness (JOINs, aggregates, CTEs)
- Infinite query possibilities vs 6 fixed queries
- More aligned with MCP philosophy
- Smaller codebase (2 tools instead of 8)

**Security:**
- Database read-only
- Validation blocks write operations
- Parameterized queries prevent SQL injection
- Row limits prevent overwhelming responses

---

## [2026-01-26] Shell Scripts Organization

**Reorganized all shell scripts into `shell_scripts/` directory**

- Keeps project root clean
- All scripts updated with correct relative paths