### Import Tools
### import_audio_files
Batch import WAV files from a folder into the database.
**Input**:
```json
{
"folder_path": "/absolute/path/to/recordings",
"dataset_id": "abc123xyz789",
"location_id": "def456uvw012",
"cluster_id": "ghi789rst345",
"recursive": true
}
```
**Parameters**:
- `folder_path` (required): Absolute path to folder containing WAV files
- `dataset_id` (required): Dataset ID (12 characters)
- `location_id` (required): Location ID (12 characters)
- `cluster_id` (required): Cluster ID (12 characters)
- `recursive` (optional): Scan subfolders recursively (default: true)
**Features**:
- Automatically parses AudioMoth and filename timestamps
- Calculates XXH64 hashes for deduplication
- Extracts WAV metadata (duration, sample rate, channels)
- Computes astronomical data (solar/civil night, moon phase)
- Skips duplicates (by hash)
- Imports in single transaction (all-or-nothing)
### import_file
Import a single WAV file into the database.
**Input**:
```json
{
"file_path": "/absolute/path/to/recording.wav",
"dataset_id": "abc123xyz789",
"location_id": "def456uvw012",
"cluster_id": "ghi789rst345"
}
```
**Output**:
```json
{
"file_id": "nB3xK8pLm9qR5sT7uV2wX",
"file_name": "recording.wav",
"hash": "a1b2c3d4e5f6g7h8",
"duration_seconds": 60.0,
"sample_rate": 250000,
"timestamp_local": "2024-01-15T20:30:00+13:00",
"is_audiomoth": true,
"is_duplicate": false,
"processing_time": "250ms"
}
```
**Use Cases**:
- Import individual files with detailed feedback
- Programmatic import with known file paths
- Get immediate duplicate detection feedback
### import_ml_selections
Import ML-detected kiwi call selections from folder structure.
**Input**:
```json
{
"folder_path": "/path/to/Clips_filter_date",
"dataset_id": "abc123xyz789",
"cluster_id": "def456uvw012"
}
```
**Features**:
- **Folder structure**: `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`
- **Filename parsing**: `{base}-{start}-{end}.wav` format
- **Two-pass file matching**: Exact match, then fuzzy date_time pattern match
- **Comprehensive validation**: Filter, species, call types, files, selection bounds
- **Transactional import**: All-or-nothing with error collection
- **Database relations**: selection → label (species) → label_subtype (call type)
### bulk_file_import
Batch import WAV files across multiple locations/clusters using a CSV file.
**Input**:
```json
{
"dataset_id": "abc123xyz789",
"csv_path": "/absolute/path/to/import.csv",
"log_file_path": "/absolute/path/to/progress.log"
}
```
**CSV Format**:
```csv
location_name,location_id,directory_path,date_range,sample_rate,file_count
Site A,loc123456789,/path/to/siteA,2024-01,48000,150
Site B,loc987654321,/path/to/siteB,2024-02,48000,200
```
**Required CSV Columns**:
- `location_name`: Human-readable location name
- `location_id`: 12-character location ID from database
- `directory_path`: Absolute path to folder containing WAV files
- `date_range`: Cluster name (typically date range like "2024-01" or "Jan-2024")
- `sample_rate`: Sample rate in Hz (e.g., 48000, 250000)
- `file_count`: Expected number of files (for validation)
**Features**:
- **Auto-creates clusters**: Creates clusters if they don't exist for location/date_range
- **Progress logging**: Writes detailed logs to file for real-time monitoring (use `tail -f`)
- **Synchronous execution**: Processes locations sequentially, fail-fast on errors
- **Summary statistics**: Returns counts for clusters, files, duplicates, errors
- **Duplicate handling**: Skips files with duplicate hashes across all clusters
**Output**:
```json
{
"total_locations": 10,
"clusters_created": 5,
"clusters_existing": 5,
"total_files_scanned": 1500,
"files_imported": 1200,
"files_duplicate": 250,
"files_error": 50,
"processing_time": "5m30s",
"errors": []
}
```
**Use Cases**:
- Bulk import across many locations at once
- Automated import pipelines with CSV generation
- Large-scale data migration
- Batch processing with progress monitoring
### Write Tools
### create_dataset
Create a new dataset.
**Input**:
```json
{
"name": "My Dataset",
"description": "Description of dataset",
"type": "organise"
}
```
**Parameters**:
- `name` (required): Dataset name (max 255 characters)
- `description` (optional): Dataset description (max 255 characters)
- `type` (optional): Dataset type - "organise", "test", or "train" (default: "organise")
### create_location
Create a new location within a dataset.
**Input**:
```json
{
"dataset_id": "abc123xyz789",
"name": "Recording Site A",
"latitude": -41.2865,
"longitude": 174.7762,
"timezone_id": "Pacific/Auckland",
"description": "Forest recording site"
}
```
**Parameters**:
- `dataset_id` (required): Parent dataset ID (12 characters)
- `name` (required): Location name (max 140 characters)
- `latitude` (required): Latitude in decimal degrees (-90 to 90)
- `longitude` (required): Longitude in decimal degrees (-180 to 180)
- `timezone_id` (required): IANA timezone ID (e.g., "Pacific/Auckland")
- `description` (optional): Location description (max 255 characters)
### create_cluster
Create a new cluster within a location.
**Input**:
```json
{
"dataset_id": "abc123xyz789",
"location_id": "def456uvw012",
"name": "2024-01",
"sample_rate": 48000,
"description": "January 2024 recordings",
"cyclic_recording_pattern_id": "pat123456789"
}
```
**Parameters**:
- `dataset_id` (required): Parent dataset ID (12 characters)
- `location_id` (required): Parent location ID (12 characters)
- `name` (required): Cluster name (max 140 characters)
- `sample_rate` (required): Sample rate in Hz (must be positive)
- `description` (optional): Cluster description (max 255 characters)
- `cyclic_recording_pattern_id` (optional): Recording pattern ID (12 characters)
### create_cyclic_recording_pattern
Create a reusable recording pattern with record/sleep cycle.
**Input**:
```json
{
"record_seconds": 60,
"sleep_seconds": 540
}
```
**Parameters**:
- `record_seconds` (required): Number of seconds to record (must be positive)
- `sleep_seconds` (required): Number of seconds to sleep between recordings (must be positive)
**Best Practice**: Query existing patterns first with execute_sql to reuse matching patterns:
```sql
SELECT id FROM cyclic_recording_pattern
WHERE record_s = ? AND sleep_s = ? AND active = true
```
### update_dataset, update_location, update_cluster, update_pattern
Update metadata for existing entities. See tool descriptions for parameters.