README.md

Skraak MCP Server

A production-ready Model Context Protocol (MCP) server implemented in Go that provides a generic SQL interface for an acoustic monitoring database, plus time utilities for AI assistants.

Overview

This MCP server provides AI assistants with direct SQL query access to an acoustic monitoring database containing recordings, locations, species data, and classifications. Built using the official MCP Go SDK, it follows MCP's three-primitive architecture with an LLM-friendly design focused on providing schema context rather than rigid tool APIs.

Features

Tools (Model-Controlled)

  • execute_sql: Execute arbitrary SQL SELECT queries against the database
    • Supports: SELECT, WITH (CTEs), parameterized queries (? placeholders)
    • Security: Read-only database access, forbidden keyword validation
    • Limits: Default 1000 rows (max 10000)
  • get_current_time: Returns current system time in RFC3339 format with timezone and Unix timestamp

Resources (Application-Driven)

  • schema://full: Complete database schema (348 lines)
  • schema://table/{name}: Individual table definitions
  • Provides full context for LLM to construct any query

Prompts (User-Controlled)

  • 6 SQL workflow templates teaching query patterns
  • Examples for JOINs, aggregates, filtering, hierarchical queries
  • Help LLMs construct appropriate queries for common tasks

Architecture Benefits

  • Generic SQL > Specialized Tools: Infinite query flexibility vs rigid APIs
  • Schema-Driven: LLMs construct queries from full database context
  • Read-Only Safety: Database enforced read-only mode + validation layers
  • Full SQL Expressiveness: JOINs, aggregates, CTEs, subqueries all available

Utility Libraries

  • Filename parsing: Smart date format detection (YYYYMMDD, YYMMDD, DDMMYY) with variance-based disambiguation
  • Timezone handling: Fixed-offset strategy for DST transitions
  • AudioMoth parsing: Extract metadata from AudioMoth WAV comments
  • WAV metadata: Efficient header parsing for duration, sample rate, INFO chunks
  • Astronomical calculations: Solar/civil night detection, moon phase
  • XXH64 hashing: Fast file hashing for deduplication

Testing

  • 136 unit tests with 91.5% code coverage
  • Comprehensive test suite ported from TypeScript original
  • End-to-end shell script tests for MCP protocol
  • Full MCP protocol compliance via stdio transport
  • Type-safe tool handlers with automatic JSON schema generation
  • Extensible architecture for adding new tools

Tool Count

  • 14 total tools: Read (2) + Write (8) + Import (4)

Requirements

  • Go 1.25.6 or later
  • MCP-compatible client (Claude Desktop, etc.)

Installation

# Clone or navigate to the project directory
cd /home/david/go/src/skraak_mcp

# Download dependencies
go mod download

# Build the server
go build -o skraak_mcp

Usage

Running the Server

The server communicates over stdio (standard input/output) as per MCP specification and requires a DuckDB database path as an argument:

./skraak_mcp /path/to/database.duckdb

Database Options:

  • Production: ./db/skraak.duckdb - Real data, use for Claude Desktop
  • Testing: ./db/test.duckdb - Test data, use for development/testing

⚠️ IMPORTANT: Always use test.duckdb for testing and development to avoid corrupting production data!

Examples:

# Production use (Claude Desktop)
./skraak_mcp ./db/skraak.duckdb

# Testing (development, shell scripts)
./skraak_mcp ./db/test.duckdb

Configuring with Claude Desktop

Add to your Claude Desktop MCP configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "skraak_mcp": {
      "command": "/home/david/go/src/skraak_mcp/skraak_mcp",
      "args": ["/path/to/database.duckdb"]
    }
  }
}

Available Tools

Query Tools

execute_sql

Execute arbitrary SQL SELECT queries against the acoustic monitoring database.

Input:

{
  "query": "SELECT id, name FROM dataset WHERE active = true",
  "parameters": [],
  "limit": 1000
}

Parameters:

  • query (required): SQL query (must start with SELECT or WITH)
  • parameters (optional): Array of values for ? placeholders
  • limit (optional): Row limit (default 1000, max 10000)

Output:

{
  "rows": [...],
  "row_count": 42,
  "columns": ["id", "name", "active"]
}

Security:

  • Database is read-only (enforced by DuckDB)
  • Forbidden keywords blocked: INSERT, UPDATE, DELETE, DROP, CREATE, ALTER
  • Row limits prevent overwhelming responses

Example Queries:

-- Basic query
SELECT * FROM dataset WHERE active = true

-- Parameterized query
SELECT * FROM location WHERE dataset_id = ? AND active = true

-- JOIN with aggregates
SELECT d.name, COUNT(l.id) as locations
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id
WHERE d.active = true
GROUP BY d.name

get_current_time

Returns the current system time with comprehensive timezone information.

Input: None

Output:

{
  "time": "2024-01-25T10:30:45Z",
  "timezone": "UTC",
  "unix": 1706181045
}

Fields:

  • time: Current system time in RFC3339 format (ISO 8601 compatible)
  • timezone: System timezone identifier
  • unix: Unix timestamp in seconds since epoch

Import Tools

import_audio_files

Batch import WAV files from a folder into the database.

Input:

{
  "folder_path": "/absolute/path/to/recordings",
  "dataset_id": "abc123xyz789",
  "location_id": "def456uvw012",
  "cluster_id": "ghi789rst345",
  "recursive": true
}

Parameters:

  • folder_path (required): Absolute path to folder containing WAV files
  • dataset_id (required): Dataset ID (12 characters)
  • location_id (required): Location ID (12 characters)
  • cluster_id (required): Cluster ID (12 characters)
  • recursive (optional): Scan subfolders recursively (default: true)

Features:

  • Automatically parses AudioMoth and filename timestamps
  • Calculates XXH64 hashes for deduplication
  • Extracts WAV metadata (duration, sample rate, channels)
  • Computes astronomical data (solar/civil night, moon phase)
  • Skips duplicates (by hash)
  • Imports in single transaction (all-or-nothing)

import_file

Import a single WAV file into the database.

Input:

{
  "file_path": "/absolute/path/to/recording.wav",
  "dataset_id": "abc123xyz789",
  "location_id": "def456uvw012",
  "cluster_id": "ghi789rst345"
}

Output:

{
  "file_id": "nB3xK8pLm9qR5sT7uV2wX",
  "file_name": "recording.wav",
  "hash": "a1b2c3d4e5f6g7h8",
  "duration_seconds": 60.0,
  "sample_rate": 250000,
  "timestamp_local": "2024-01-15T20:30:00+13:00",
  "is_audiomoth": true,
  "is_duplicate": false,
  "processing_time": "250ms"
}

Use Cases:

  • Import individual files with detailed feedback
  • Programmatic import with known file paths
  • Get immediate duplicate detection feedback

import_ml_selections

Import ML-detected kiwi call selections from folder structure.

Input:

{
  "folder_path": "/path/to/Clips_filter_date",
  "dataset_id": "abc123xyz789",
  "cluster_id": "def456uvw012"
}

Features:

  • Folder structure: Clips_{filter_name}_{date}/Species/CallType/*.wav+.png
  • Filename parsing: {base}-{start}-{end}.wav format
  • Two-pass file matching: Exact match, then fuzzy date_time pattern match
  • Comprehensive validation: Filter, species, call types, files, selection bounds
  • Transactional import: All-or-nothing with error collection
  • Database relations: selection → label (species) → label_subtype (call type)

bulk_file_import

Batch import WAV files across multiple locations/clusters using a CSV file.

Input:

{
  "dataset_id": "abc123xyz789",
  "csv_path": "/absolute/path/to/import.csv",
  "log_file_path": "/absolute/path/to/progress.log"
}

CSV Format:

location_name,location_id,directory_path,date_range,sample_rate,file_count
Site A,loc123456789,/path/to/siteA,2024-01,48000,150
Site B,loc987654321,/path/to/siteB,2024-02,48000,200

Required CSV Columns:

  • location_name: Human-readable location name
  • location_id: 12-character location ID from database
  • directory_path: Absolute path to folder containing WAV files
  • date_range: Cluster name (typically date range like "2024-01" or "Jan-2024")
  • sample_rate: Sample rate in Hz (e.g., 48000, 250000)
  • file_count: Expected number of files (for validation)

Features:

  • Auto-creates clusters: Creates clusters if they don't exist for location/date_range
  • Progress logging: Writes detailed logs to file for real-time monitoring (use tail -f)
  • Synchronous execution: Processes locations sequentially, fail-fast on errors
  • Summary statistics: Returns counts for clusters, files, duplicates, errors
  • Duplicate handling: Skips files with duplicate hashes across all clusters

Output:

{
  "total_locations": 10,
  "clusters_created": 5,
  "clusters_existing": 5,
  "total_files_scanned": 1500,
  "files_imported": 1200,
  "files_duplicate": 250,
  "files_error": 50,
  "processing_time": "5m30s",
  "errors": []
}

Use Cases:

  • Bulk import across many locations at once
  • Automated import pipelines with CSV generation
  • Large-scale data migration
  • Batch processing with progress monitoring

Write Tools

create_dataset

Create a new dataset.

Input:

{
  "name": "My Dataset",
  "description": "Description of dataset",
  "type": "organise"
}

Parameters:

  • name (required): Dataset name (max 255 characters)
  • description (optional): Dataset description (max 255 characters)
  • type (optional): Dataset type - "organise", "test", or "train" (default: "organise")

create_location

Create a new location within a dataset.

Input:

{
  "dataset_id": "abc123xyz789",
  "name": "Recording Site A",
  "latitude": -41.2865,
  "longitude": 174.7762,
  "timezone_id": "Pacific/Auckland",
  "description": "Forest recording site"
}

Parameters:

  • dataset_id (required): Parent dataset ID (12 characters)
  • name (required): Location name (max 140 characters)
  • latitude (required): Latitude in decimal degrees (-90 to 90)
  • longitude (required): Longitude in decimal degrees (-180 to 180)
  • timezone_id (required): IANA timezone ID (e.g., "Pacific/Auckland")
  • description (optional): Location description (max 255 characters)

create_cluster

Create a new cluster within a location.

Input:

{
  "dataset_id": "abc123xyz789",
  "location_id": "def456uvw012",
  "name": "2024-01",
  "sample_rate": 48000,
  "description": "January 2024 recordings",
  "cyclic_recording_pattern_id": "pat123456789"
}

Parameters:

  • dataset_id (required): Parent dataset ID (12 characters)
  • location_id (required): Parent location ID (12 characters)
  • name (required): Cluster name (max 140 characters)
  • sample_rate (required): Sample rate in Hz (must be positive)
  • description (optional): Cluster description (max 255 characters)
  • cyclic_recording_pattern_id (optional): Recording pattern ID (12 characters)

create_cyclic_recording_pattern

Create a reusable recording pattern with record/sleep cycle.

Input:

{
  "record_seconds": 60,
  "sleep_seconds": 540
}

Parameters:

  • record_seconds (required): Number of seconds to record (must be positive)
  • sleep_seconds (required): Number of seconds to sleep between recordings (must be positive)

Best Practice: Query existing patterns first with execute_sql to reuse matching patterns:

SELECT id FROM cyclic_recording_pattern
WHERE record_s = ? AND sleep_s = ? AND active = true

update_dataset, update_location, update_cluster, update_pattern

Update metadata for existing entities. See tool descriptions for parameters.

Development

Project Structure

skraak_mcp/
├── go.mod                        # Go module definition
├── go.sum                        # Dependency checksums
├── main.go                       # Server entry point, tool registration
├── README.md                     # This file
├── CLAUDE.md                     # Development notes and best practices
├── db/                           # Database files
│   ├── db.go                     # Database connection (read-only)
│   ├── types.go                  # Type definitions
│   ├── schema.sql                # Database schema (348 lines)
│   ├── skraak.duckdb             # Production database ⚠️
│   └── test.duckdb               # Test database ✅
├── tools/                        # MCP tool implementations
│   ├── time.go                   # get_current_time
│   ├── sql.go                    # execute_sql
│   ├── import_files.go           # import_audio_files (batch WAV import)
│   ├── import_file.go            # import_file (single WAV file)
│   ├── import_ml_selections.go   # import_ml_selections (ML detections)
│   ├── bulk_file_import.go       # bulk_file_import (CSV-based bulk import)
│   ├── write_dataset.go          # create_dataset
│   ├── write_location.go         # create_location
│   ├── write_cluster.go          # create_cluster
│   ├── write_pattern.go          # create_cyclic_recording_pattern
│   ├── update_dataset.go         # update_dataset
│   ├── update_location.go        # update_location
│   ├── update_cluster.go         # update_cluster
│   └── update_pattern.go         # update_pattern
├── resources/                    # MCP resources
│   └── schema.go                 # Database schema resources
├── prompts/                      # MCP prompts
│   └── examples.go               # SQL workflow templates
├── utils/                        # Utility functions
│   ├── astronomical.go           # Solar/civil night, moon phase
│   ├── astronomical_test.go      # Tests (11 cases)
│   ├── audiomoth_parser.go       # AudioMoth WAV parsing
│   ├── audiomoth_parser_test.go  # Tests (36 cases)
│   ├── filename_parser.go        # Filename timestamp parsing
│   ├── filename_parser_test.go   # Tests (60 cases)
│   ├── wav_metadata.go           # WAV header parsing
│   ├── wav_metadata_test.go      # Tests (22 cases)
│   ├── xxh64.go                  # File hashing
│   └── xxh64_test.go             # Tests (6 cases)
└── shell_scripts/                # End-to-end test scripts
    ├── test_sql.sh               # SQL tool tests
    ├── test_resources_prompts.sh # Resources/prompts tests
    ├── test_all_prompts.sh       # All prompts tests
    └── get_time.sh               # Time tool test

Database Schema

The acoustic monitoring database contains:

Core Tables:

  • dataset - Recording projects (organise/test/train types)
  • location - Recording sites with GPS coordinates (139 active locations)
  • cluster - Grouped recordings at each location
  • file - Individual audio files with metadata

Annotation Tables:

  • selection - Time-frequency selections within files
  • label - Classifications and annotations
  • kiwi_call, call, syllable - Hierarchical call structure

Taxonomy Tables:

  • species, genus, family, order, class, phylum, kingdom
  • species_group, family_group, order_group - Grouping tables

Key Fields:

  • Most tables have active boolean for soft deletes
  • Timestamps use timestamp_local (with timezone) and may include timestamp_utc
  • Files include astronomical data: maybe_solar_night, maybe_civil_night, moon_phase
  • AudioMoth metadata: recorder_id, gain, battery_v, temp_c

Full schema available via schema://full resource.

Adding New Tools

The server follows MCP best practices with type-safe tool handlers:

  1. Create tool file in tools/ package (e.g., tools/analysis.go)

  2. Define input/output structures with jsonschema tags:

type AnalysisInput struct {
    ClusterID string `json:"cluster_id" jsonschema:"Cluster to analyze"`
    Metric    string `json:"metric" jsonschema:"Metric to calculate"`
}

type AnalysisOutput struct {
    Value  float64 `json:"value" jsonschema:"Calculated metric value"`
    Unit   string  `json:"unit" jsonschema:"Unit of measurement"`
}
  1. Implement handler function:
func Analyze(ctx context.Context, req *mcp.CallToolRequest, input AnalysisInput) (
    *mcp.CallToolResult,
    AnalysisOutput,
    error,
) {
    // Implementation
    return &mcp.CallToolResult{}, output, nil
}
  1. Register in main.go:
err := mcp.AddTool(
    server,
    "analyze_cluster",
    "Calculate metrics for a recording cluster",
    tools.Analyze,
)
  1. Add tests in tools/analysis_test.go

Testing

Unit Tests

Comprehensive test suite with 91.5% code coverage:

# Run all tests
go test ./...

# Run specific package tests
go test ./utils/

# Verbose output with test names
go test -v ./utils/

# Run specific test
go test -v ./utils/ -run TestParseFilenameTimestamps

# Coverage report
go test -cover ./utils/

# Generate HTML coverage report
go test -coverprofile=coverage.out ./utils/
go tool cover -html=coverage.out

Test Coverage:

  • 136 unit tests across 5 test files
  • Filename parsing: Format detection, variance disambiguation, timezone handling
  • AudioMoth: Comment parsing, all gain levels, temperature/battery
  • WAV metadata: Duration, sample rate, INFO chunks
  • Astronomical: Solar/civil night, moon phase calculations
  • Edge cases: Invalid dates, leap years, DST transitions

End-to-End Tests

Shell scripts test the MCP protocol integration:

# Navigate to test scripts
cd shell_scripts

# ⚠️ ALWAYS use test.duckdb for testing!

# Test SQL tool (pipe to file to avoid token overflow)
./test_sql.sh ../db/test.duckdb > test.txt 2>&1
rg '"result":' test.txt | wc -l  # Count successful queries

# Test resources and prompts
./test_resources_prompts.sh ../db/test.duckdb > test_resources.txt 2>&1
cat test_resources.txt | jq '.'

# Test all prompts
./test_all_prompts.sh ../db/test.duckdb > test_prompts.txt 2>&1

# Test time tool (no database needed)
./get_time.sh

Important: Always use test.duckdb for testing to avoid corrupting production data!

Dependencies

Core

Utilities

Protocol Compliance

This server implements:

  • MCP Protocol version: Latest
  • Transport: stdio (JSON-RPC 2.0)
  • Capabilities: Tools
  • Future support: Resources, Prompts

License

MIT

Contributing

Contributions welcome! Please ensure:

  • Code follows Go best practices
  • Tools include comprehensive descriptions
  • JSON schema tags document all fields
  • Error handling is robust

Troubleshooting

Server won't start:

  • Check Go version: go version
  • Rebuild: go build -o skraak_mcp
  • Check logs in stderr

Tool not appearing in client:

  • Verify MCP configuration path
  • Restart Claude Desktop
  • Check server binary path is correct

Time format issues:

  • Output uses RFC3339 (ISO 8601) format
  • Timezone reflects system configuration
  • Unix timestamp is in seconds (not milliseconds)