quietlight/skraak_mcp: README.md

# Skraak MCP Server

A production-ready Model Context Protocol (MCP) server implemented in Go that provides a generic SQL interface for an acoustic monitoring database, plus time utilities for AI assistants.

## Overview

This MCP server provides AI assistants with direct SQL query access to an acoustic monitoring database containing recordings, locations, species data, and classifications. Built using the official MCP Go SDK, it follows MCP's three-primitive architecture with an LLM-friendly design focused on providing schema context rather than rigid tool APIs.

## Features

### Tools (Model-Controlled)
- **execute_sql**: Execute arbitrary SQL SELECT queries against the database
  - Supports: SELECT, WITH (CTEs), parameterized queries (? placeholders)
  - Security: Read-only database access, forbidden keyword validation
  - Limits: Default 1000 rows (max 10000)
- **get_current_time**: Returns current system time in RFC3339 format with timezone and Unix timestamp

### Resources (Application-Driven)
- **schema://full**: Complete database schema (348 lines)
- **schema://table/{name}**: Individual table definitions
- Provides full context for LLM to construct any query

### Prompts (User-Controlled)
- 6 SQL workflow templates teaching query patterns
- Examples for JOINs, aggregates, filtering, hierarchical queries
- Help LLMs construct appropriate queries for common tasks

### Architecture Benefits
- **Generic SQL > Specialized Tools**: Infinite query flexibility vs rigid APIs
- **Schema-Driven**: LLMs construct queries from full database context
- **Read-Only Safety**: Database enforced read-only mode + validation layers
- **Full SQL Expressiveness**: JOINs, aggregates, CTEs, subqueries all available

### Utility Libraries
- **Filename parsing**: Smart date format detection (YYYYMMDD, YYMMDD, DDMMYY) with variance-based disambiguation
- **Timezone handling**: Fixed-offset strategy for DST transitions
- **AudioMoth parsing**: Extract metadata from AudioMoth WAV comments
- **WAV metadata**: Efficient header parsing for duration, sample rate, INFO chunks
- **Astronomical calculations**: Solar/civil night detection, moon phase
- **XXH64 hashing**: Fast file hashing for deduplication

### Testing
- **136 unit tests** with **91.5% code coverage**
- Comprehensive test suite ported from TypeScript original
- End-to-end shell script tests for MCP protocol
- Full MCP protocol compliance via stdio transport
- Type-safe tool handlers with automatic JSON schema generation
- Extensible architecture for adding new tools

### Tool Count
- **14 total tools**: Read (2) + Write (8) + Import (4)

## Requirements

- Go 1.25.6 or later
- MCP-compatible client (Claude Desktop, etc.)

## Installation

```bash
# Clone or navigate to the project directory
cd /home/david/go/src/skraak_mcp

# Download dependencies
go mod download

# Build the server
go build -o skraak_mcp
```

## Usage

### Running the Server

The server communicates over stdio (standard input/output) as per MCP specification and requires a DuckDB database path as an argument:

```bash
./skraak_mcp /path/to/database.duckdb
```

**Database Options:**
- **Production**: `./db/skraak.duckdb` - Real data, use for Claude Desktop
- **Testing**: `./db/test.duckdb` - Test data, use for development/testing

**⚠️ IMPORTANT**: Always use `test.duckdb` for testing and development to avoid corrupting production data!

Examples:
```bash
# Production use (Claude Desktop)
./skraak_mcp ./db/skraak.duckdb

# Testing (development, shell scripts)
./skraak_mcp ./db/test.duckdb
```

### Configuring with Claude Desktop

Add to your Claude Desktop MCP configuration file:

**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
**Linux**: `~/.config/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "skraak_mcp": {
      "command": "/home/david/go/src/skraak_mcp/skraak_mcp",
      "args": ["/path/to/database.duckdb"]
    }
  }
}
```

## Available Tools

### Query Tools

### execute_sql

Execute arbitrary SQL SELECT queries against the acoustic monitoring database.

**Input**:
```json
{
  "query": "SELECT id, name FROM dataset WHERE active = true",
  "parameters": [],
  "limit": 1000
}
```

**Parameters**:
- `query` (required): SQL query (must start with SELECT or WITH)
- `parameters` (optional): Array of values for ? placeholders
- `limit` (optional): Row limit (default 1000, max 10000)

**Output**:
```json
{
  "rows": [...],
  "row_count": 42,
  "columns": ["id", "name", "active"]
}
```

**Security**:
- Database is read-only (enforced by DuckDB)
- Forbidden keywords blocked: INSERT, UPDATE, DELETE, DROP, CREATE, ALTER
- Row limits prevent overwhelming responses

**Example Queries**:
```sql
-- Basic query
SELECT * FROM dataset WHERE active = true

-- Parameterized query
SELECT * FROM location WHERE dataset_id = ? AND active = true

-- JOIN with aggregates
SELECT d.name, COUNT(l.id) as locations
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id
WHERE d.active = true
GROUP BY d.name
```

### get_current_time

Returns the current system time with comprehensive timezone information.

**Input**: None

**Output**:
```json
{
  "time": "2024-01-25T10:30:45Z",
  "timezone": "UTC",
  "unix": 1706181045
}
```

**Fields**:
- `time`: Current system time in RFC3339 format (ISO 8601 compatible)
- `timezone`: System timezone identifier
- `unix`: Unix timestamp in seconds since epoch

### Import Tools

### import_audio_files

Batch import WAV files from a folder into the database.

**Input**:
```json
{
  "folder_path": "/absolute/path/to/recordings",
  "dataset_id": "abc123xyz789",
  "location_id": "def456uvw012",
  "cluster_id": "ghi789rst345",
  "recursive": true
}
```

**Parameters**:
- `folder_path` (required): Absolute path to folder containing WAV files
- `dataset_id` (required): Dataset ID (12 characters)
- `location_id` (required): Location ID (12 characters)
- `cluster_id` (required): Cluster ID (12 characters)
- `recursive` (optional): Scan subfolders recursively (default: true)

**Features**:
- Automatically parses AudioMoth and filename timestamps
- Calculates XXH64 hashes for deduplication
- Extracts WAV metadata (duration, sample rate, channels)
- Computes astronomical data (solar/civil night, moon phase)
- Skips duplicates (by hash)
- Imports in single transaction (all-or-nothing)

### import_file

Import a single WAV file into the database.

**Input**:
```json
{
  "file_path": "/absolute/path/to/recording.wav",
  "dataset_id": "abc123xyz789",
  "location_id": "def456uvw012",
  "cluster_id": "ghi789rst345"
}
```

**Output**:
```json
{
  "file_id": "nB3xK8pLm9qR5sT7uV2wX",
  "file_name": "recording.wav",
  "hash": "a1b2c3d4e5f6g7h8",
  "duration_seconds": 60.0,
  "sample_rate": 250000,
  "timestamp_local": "2024-01-15T20:30:00+13:00",
  "is_audiomoth": true,
  "is_duplicate": false,
  "processing_time": "250ms"
}
```

**Use Cases**:
- Import individual files with detailed feedback
- Programmatic import with known file paths
- Get immediate duplicate detection feedback

### import_ml_selections

Import ML-detected kiwi call selections from folder structure.

**Input**:
```json
{
  "folder_path": "/path/to/Clips_filter_date",
  "dataset_id": "abc123xyz789",
  "cluster_id": "def456uvw012"
}
```

**Features**:
- **Folder structure**: `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`
- **Filename parsing**: `{base}-{start}-{end}.wav` format
- **Two-pass file matching**: Exact match, then fuzzy date_time pattern match
- **Comprehensive validation**: Filter, species, call types, files, selection bounds
- **Transactional import**: All-or-nothing with error collection
- **Database relations**: selection → label (species) → label_subtype (call type)

### bulk_file_import

Batch import WAV files across multiple locations/clusters using a CSV file.

**Input**:
```json
{
  "dataset_id": "abc123xyz789",
  "csv_path": "/absolute/path/to/import.csv",
  "log_file_path": "/absolute/path/to/progress.log"
}
```

**CSV Format**:
```csv
location_name,location_id,directory_path,date_range,sample_rate,file_count
Site A,loc123456789,/path/to/siteA,2024-01,48000,150
Site B,loc987654321,/path/to/siteB,2024-02,48000,200
```

**Required CSV Columns**:
- `location_name`: Human-readable location name
- `location_id`: 12-character location ID from database
- `directory_path`: Absolute path to folder containing WAV files
- `date_range`: Cluster name (typically date range like "2024-01" or "Jan-2024")
- `sample_rate`: Sample rate in Hz (e.g., 48000, 250000)
- `file_count`: Expected number of files (for validation)

**Features**:
- **Auto-creates clusters**: Creates clusters if they don't exist for location/date_range
- **Progress logging**: Writes detailed logs to file for real-time monitoring (use `tail -f`)
- **Synchronous execution**: Processes locations sequentially, fail-fast on errors
- **Summary statistics**: Returns counts for clusters, files, duplicates, errors
- **Duplicate handling**: Skips files with duplicate hashes across all clusters

**Output**:
```json
{
  "total_locations": 10,
  "clusters_created": 5,
  "clusters_existing": 5,
  "total_files_scanned": 1500,
  "files_imported": 1200,
  "files_duplicate": 250,
  "files_error": 50,
  "processing_time": "5m30s",
  "errors": []
}
```

**Use Cases**:
- Bulk import across many locations at once
- Automated import pipelines with CSV generation
- Large-scale data migration
- Batch processing with progress monitoring

### Write Tools

### create_dataset

Create a new dataset.

**Input**:
```json
{
  "name": "My Dataset",
  "description": "Description of dataset",
  "type": "organise"
}
```

**Parameters**:
- `name` (required): Dataset name (max 255 characters)
- `description` (optional): Dataset description (max 255 characters)
- `type` (optional): Dataset type - "organise", "test", or "train" (default: "organise")

### create_location

Create a new location within a dataset.

**Input**:
```json
{
  "dataset_id": "abc123xyz789",
  "name": "Recording Site A",
  "latitude": -41.2865,
  "longitude": 174.7762,
  "timezone_id": "Pacific/Auckland",
  "description": "Forest recording site"
}
```

**Parameters**:
- `dataset_id` (required): Parent dataset ID (12 characters)
- `name` (required): Location name (max 140 characters)
- `latitude` (required): Latitude in decimal degrees (-90 to 90)
- `longitude` (required): Longitude in decimal degrees (-180 to 180)
- `timezone_id` (required): IANA timezone ID (e.g., "Pacific/Auckland")
- `description` (optional): Location description (max 255 characters)

### create_cluster

Create a new cluster within a location.

**Input**:
```json
{
  "dataset_id": "abc123xyz789",
  "location_id": "def456uvw012",
  "name": "2024-01",
  "sample_rate": 48000,
  "description": "January 2024 recordings",
  "cyclic_recording_pattern_id": "pat123456789"
}
```

**Parameters**:
- `dataset_id` (required): Parent dataset ID (12 characters)
- `location_id` (required): Parent location ID (12 characters)
- `name` (required): Cluster name (max 140 characters)
- `sample_rate` (required): Sample rate in Hz (must be positive)
- `description` (optional): Cluster description (max 255 characters)
- `cyclic_recording_pattern_id` (optional): Recording pattern ID (12 characters)

### create_cyclic_recording_pattern

Create a reusable recording pattern with record/sleep cycle.

**Input**:
```json
{
  "record_seconds": 60,
  "sleep_seconds": 540
}
```

**Parameters**:
- `record_seconds` (required): Number of seconds to record (must be positive)
- `sleep_seconds` (required): Number of seconds to sleep between recordings (must be positive)

**Best Practice**: Query existing patterns first with execute_sql to reuse matching patterns:
```sql
SELECT id FROM cyclic_recording_pattern
WHERE record_s = ? AND sleep_s = ? AND active = true
```

### update_dataset, update_location, update_cluster, update_pattern

Update metadata for existing entities. See tool descriptions for parameters.

## Development

### Project Structure

```
skraak_mcp/
├── go.mod                        # Go module definition
├── go.sum                        # Dependency checksums
├── main.go                       # Server entry point, tool registration
├── README.md                     # This file
├── CLAUDE.md                     # Development notes and best practices
├── db/                           # Database files
│   ├── db.go                     # Database connection (read-only)
│   ├── types.go                  # Type definitions
│   ├── schema.sql                # Database schema (348 lines)
│   ├── skraak.duckdb             # Production database ⚠️
│   └── test.duckdb               # Test database ✅
├── tools/                        # MCP tool implementations
│   ├── time.go                   # get_current_time
│   ├── sql.go                    # execute_sql
│   ├── import_files.go           # import_audio_files (batch WAV import)
│   ├── import_file.go            # import_file (single WAV file)
│   ├── import_ml_selections.go   # import_ml_selections (ML detections)
│   ├── bulk_file_import.go       # bulk_file_import (CSV-based bulk import)
│   ├── write_dataset.go          # create_dataset
│   ├── write_location.go         # create_location
│   ├── write_cluster.go          # create_cluster
│   ├── write_pattern.go          # create_cyclic_recording_pattern
│   ├── update_dataset.go         # update_dataset
│   ├── update_location.go        # update_location
│   ├── update_cluster.go         # update_cluster
│   └── update_pattern.go         # update_pattern
├── resources/                    # MCP resources
│   └── schema.go                 # Database schema resources
├── prompts/                      # MCP prompts
│   └── examples.go               # SQL workflow templates
├── utils/                        # Utility functions
│   ├── astronomical.go           # Solar/civil night, moon phase
│   ├── astronomical_test.go      # Tests (11 cases)
│   ├── audiomoth_parser.go       # AudioMoth WAV parsing
│   ├── audiomoth_parser_test.go  # Tests (36 cases)
│   ├── filename_parser.go        # Filename timestamp parsing
│   ├── filename_parser_test.go   # Tests (60 cases)
│   ├── wav_metadata.go           # WAV header parsing
│   ├── wav_metadata_test.go      # Tests (22 cases)
│   ├── xxh64.go                  # File hashing
│   └── xxh64_test.go             # Tests (6 cases)
└── shell_scripts/                # End-to-end test scripts
    ├── test_sql.sh               # SQL tool tests
    ├── test_resources_prompts.sh # Resources/prompts tests
    ├── test_all_prompts.sh       # All prompts tests
    └── get_time.sh               # Time tool test
```

### Database Schema

The acoustic monitoring database contains:

**Core Tables:**
- **dataset** - Recording projects (organise/test/train types)
- **location** - Recording sites with GPS coordinates (139 active locations)
- **cluster** - Grouped recordings at each location
- **file** - Individual audio files with metadata

**Annotation Tables:**
- **selection** - Time-frequency selections within files
- **label** - Classifications and annotations
- **kiwi_call**, **call**, **syllable** - Hierarchical call structure

**Taxonomy Tables:**
- **species**, **genus**, **family**, **order**, **class**, **phylum**, **kingdom**
- **species_group**, **family_group**, **order_group** - Grouping tables

**Key Fields:**
- Most tables have `active` boolean for soft deletes
- Timestamps use `timestamp_local` (with timezone) and may include `timestamp_utc`
- Files include astronomical data: `maybe_solar_night`, `maybe_civil_night`, `moon_phase`
- AudioMoth metadata: `recorder_id`, `gain`, `battery_v`, `temp_c`

Full schema available via `schema://full` resource.

### Adding New Tools

The server follows MCP best practices with type-safe tool handlers:

1. **Create tool file** in `tools/` package (e.g., `tools/analysis.go`)

2. **Define input/output structures** with jsonschema tags:
```go
type AnalysisInput struct {
    ClusterID string `json:"cluster_id" jsonschema:"Cluster to analyze"`
    Metric    string `json:"metric" jsonschema:"Metric to calculate"`
}

type AnalysisOutput struct {
    Value  float64 `json:"value" jsonschema:"Calculated metric value"`
    Unit   string  `json:"unit" jsonschema:"Unit of measurement"`
}
```

3. **Implement handler function**:
```go
func Analyze(ctx context.Context, req *mcp.CallToolRequest, input AnalysisInput) (
    *mcp.CallToolResult,
    AnalysisOutput,
    error,
) {
    // Implementation
    return &mcp.CallToolResult{}, output, nil
}
```

4. **Register in main.go**:
```go
err := mcp.AddTool(
    server,
    "analyze_cluster",
    "Calculate metrics for a recording cluster",
    tools.Analyze,
)
```

5. **Add tests** in `tools/analysis_test.go`

### Testing

#### Unit Tests

Comprehensive test suite with **91.5% code coverage**:

```bash
# Run all tests
go test ./...

# Run specific package tests
go test ./utils/

# Verbose output with test names
go test -v ./utils/

# Run specific test
go test -v ./utils/ -run TestParseFilenameTimestamps

# Coverage report
go test -cover ./utils/

# Generate HTML coverage report
go test -coverprofile=coverage.out ./utils/
go tool cover -html=coverage.out
```

**Test Coverage:**
- **136 unit tests** across 5 test files
- Filename parsing: Format detection, variance disambiguation, timezone handling
- AudioMoth: Comment parsing, all gain levels, temperature/battery
- WAV metadata: Duration, sample rate, INFO chunks
- Astronomical: Solar/civil night, moon phase calculations
- Edge cases: Invalid dates, leap years, DST transitions

#### End-to-End Tests

Shell scripts test the MCP protocol integration:

```bash
# Navigate to test scripts
cd shell_scripts

# ⚠️ ALWAYS use test.duckdb for testing!

# Test SQL tool (pipe to file to avoid token overflow)
./test_sql.sh ../db/test.duckdb > test.txt 2>&1
rg '"result":' test.txt | wc -l  # Count successful queries

# Test resources and prompts
./test_resources_prompts.sh ../db/test.duckdb > test_resources.txt 2>&1
cat test_resources.txt | jq '.'

# Test all prompts
./test_all_prompts.sh ../db/test.duckdb > test_prompts.txt 2>&1

# Test time tool (no database needed)
./get_time.sh
```

**Important**: Always use `test.duckdb` for testing to avoid corrupting production data!

## Dependencies

### Core
- Go 1.25.6 or later
- [MCP Go SDK](https://github.com/modelcontextprotocol/go-sdk) v1.2.0+
- [DuckDB Go SDK](https://github.com/duckdb/duckdb-go) v2

### Utilities
- [SunCalc Go](https://github.com/sixdouglas/suncalc) - Astronomical calculations
- [XXHash](https://github.com/cespare/xxhash/v2) - Fast hashing algorithm

## Protocol Compliance

This server implements:
- MCP Protocol version: Latest
- Transport: stdio (JSON-RPC 2.0)
- Capabilities: Tools
- Future support: Resources, Prompts

## License

MIT

## Contributing

Contributions welcome! Please ensure:
- Code follows Go best practices
- Tools include comprehensive descriptions
- JSON schema tags document all fields
- Error handling is robust

## Troubleshooting

**Server won't start**:
- Check Go version: `go version`
- Rebuild: `go build -o skraak_mcp`
- Check logs in stderr

**Tool not appearing in client**:
- Verify MCP configuration path
- Restart Claude Desktop
- Check server binary path is correct

**Time format issues**:
- Output uses RFC3339 (ISO 8601) format
- Timezone reflects system configuration
- Unix timestamp is in seconds (not milliseconds)