HomeBase/IMAGE_STORAGE_GUIDE.md

# 📸 Image Storage System Guide

This guide covers the long-term image storage solution integrated into HomeBase. It provides reliable, organized storage for images pulled from web services with built-in corruption detection and ML-friendly tagging.

## Features

- **Automatic Image Fetching**: Pull images from web services every 2-3 minutes
- **Corruption Detection**: SHA256 checksums verify data integrity
- **Tag-Based Organization**: Tag images for machine learning model training
- **Efficient Storage**: File-based storage with SQLite metadata database
- **RESTful API**: Complete API for image management and queries
- **Scalable**: Designed to handle thousands of images

## Architecture

```
┌─────────────────────────────────────────────────────┐
│           Image Fetcher Service                     │
│    (Runs every 2-3 minutes, pull from web service)  │
└────────┬────────────────────────────────────────────┘
         │
         ├─────┬──────────────┬──────────────┐
         │     │              │              │
         ▼     ▼              ▼              ▼
    Download  Hash        Store File    Insert Metadata
    Image     (SHA256)     (data/images) (SQLite)
         │
         └──────────────────────────────┘
                      │
            ┌─────────▼──────────┐
            │  REST API Endpoints │
            │  - List Images      │
            │  - Add Tags         │
            │  - Search/Filter    │
            │  - Verify Integrity │
            └────────────────────┘
```

## Quick Start

### 1. Install Dependencies

```bash
npm install
```

### 2. Initialize the System

```bash
node setup.js init
```

This will:
- Create the SQLite database
- Set up data directory structure
- Load configuration
- Test the system

### 3. Configure Image Sources

Edit `image-sources.json`:

```json
{
  "sources": [
    {
      "name": "Webcam Feed",
      "url": "https://your-service.com/image.jpg",
      "tags": ["webcam", "monitoring"],
      "enabled": true
    }
  ],
  "fetchInterval": 0.033
}
```

**Fetch Intervals** (edit `fetchInterval` to customize):
- `0.0167` = 1 second
- `0.033` = 2 seconds (recommended for fast updates)
- `0.05` = 3 seconds
- `2.5` = 2.5 minutes (original default)

### 4. Start the Server

```bash
npm start
```

The system will automatically start fetching images at the configured interval.

## Configuration

### image-sources.json

```json
{
  "sources": [
    {
      "name": "Example Source",
      "url": "https://example.com/image.jpg",
      "tags": ["tag1", "tag2"],
      "enabled": true
    }
  ],
  "fetchInterval": 0.033
}
```

**Options:**
- `fetchInterval`: **Minutes** between fetch cycles. Use decimals for sub-minute intervals:
  - `0.0167` = 1 second
  - `0.033` = 2 seconds
  - `0.05` = 3 seconds
  - `0.167` = 10 seconds
  - `1` = 1 minute
  - `2.5` = 2.5 minutes (original default)

## API Endpoints

### List Images

```bash
GET /api/images?page=1&pageSize=50&sort=fetched_at&order=DESC
GET /api/images?tag=webcam              # Filter by tag
GET /api/images?sourceUrl=https://...  # Filter by source
```

**Response:**
```json
{
  "success": true,
  "images": [
    {
      "id": 1,
      "filename": "image_1234567890_abc123.jpg",
      "source_url": "https://...",
      "filesize": 102400,
      "file_hash": "sha256hash",
      "fetched_at": "2023-01-01T12:00:00Z",
      "tags": ["webcam", "monitoring"]
    }
  ],
  "pagination": {
    "page": 1,
    "pageSize": 50,
    "total": 240,
    "pages": 5
  }
}
```

### Get Image Details

```bash
GET /api/images/{id}
```

### Download Image

```bash
GET /api/images/{id}/download
```

### Fetch New Image

```bash
POST /api/images
Content-Type: application/json

{
  "source_url": "https://example.com/image.jpg",
  "tags": ["tag1", "tag2"]
}
```

### Add Tags to Image

```bash
POST /api/images/{id}/tags
Content-Type: application/json

{
  "tags": ["newtag1", "newtag2"]
}
```

### List All Tags

```bash
GET /api/tags
```

**Response:**
```json
{
  "success": true,
  "tags": ["webcam", "monitoring", "test", "example"]
}
```

### Storage Statistics

```bash
GET /api/stats
```

**Response:**
```json
{
  "success": true,
  "stats": {
    "imageCount": 240,
    "totalSize": 24576000,
    "totalSizeGB": "0.02",
    "fileCount": 240
  }
}
```

### Verify Image Integrity

```bash
POST /api/verify
```

This checks all images for corruption using their stored checksums.

### Cleanup Old Images

```bash
POST /api/cleanup
Content-Type: application/json

{
  "daysOld": 30
}
```

### Delete Image

```bash
DELETE /api/images/{id}
```

## Corruption Detection

The system uses SHA256 checksums to detect corruption:

1. **Storage**: When an image is saved, its SHA256 hash is calculated and stored
2. **Verification**: The `/api/verify` endpoint re-hashes all files and compares with stored hashes
3. **Marking**: Corrupted images are marked in the database and excluded from queries
4. **Recovery**: Corrupted files can be re-fetched using the source URL

### Manual Verification

```bash
curl -X POST http://localhost:3001/api/verify
```

## Tagging for ML Training

Tags are essential for organizing training datasets:

```bash
# Add images with training tags
POST /api/images
{
  "source_url": "https://...",
  "tags": ["dataset_v1", "labeled", "weather-sunny"]
}

# Query all images with specific tag
GET /api/images?tag=weather-sunny

# Get tag statistics
GET /api/tags
```

## Database Schema

### Images Table

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Primary key |
| filename | TEXT | Unique filename |
| source_url | TEXT | Original image URL |
| file_path | TEXT | Local file path |
| filesize | INTEGER | File size in bytes |
| file_hash | TEXT | SHA256 hash |
| mime_type | TEXT | Content type |
| fetched_at | DATETIME | When image was fetched |
| is_corrupted | BOOLEAN | Corruption flag |

### Tags Table

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Primary key |
| image_id | INTEGER | Foreign key to images |
| tag | TEXT | Tag text |
| created_at | DATETIME | When tag was added |

## File Structure

```
homebase/
├── server.js              # Main server
├── package.json           # Dependencies
├── setup.js              # Setup script
├── image-sources.json    # Configuration
├── lib/
│   ├── database.js       # SQLite operations
│   ├── storage.js        # File storage operations
│   └── image-fetcher.js  # Image fetching service
├── routes/
│   └── images.js         # API routes
└── data/                 # Created at runtime
    ├── homebase.db       # SQLite database
    └── images/           # Stored image files
```

## Maintenance

### Manual Setup

```bash
# Initialize system
node setup.js init

# Test fetch
node setup.js test https://example.com/image.jpg tag1,tag2

# View configuration
node setup.js config
```

### Regular Tasks

```bash
# Daily: Verify integrity
curl -X POST http://localhost:3001/api/verify

# Weekly: Cleanup old images (30+ days)
curl -X POST http://localhost:3001/api/cleanup -H "Content-Type: application/json" -d '{"daysOld": 30}'

# Monitor storage
curl http://localhost:3001/api/stats
```

## Performance Tips

1. **Pagination**: Use `pageSize=50` or less for large datasets
2. **Tagging**: Use consistent tag names for easier filtering
3. **Cleanup**: Run cleanup weekly to manage storage
4. **Verification**: Run verification monthly to detect issues early

## Troubleshooting

### No images fetching?

1. Check `image-sources.json` - ensure `enabled: true`
2. Verify URL is accessible
3. Check server logs for errors
4. Test fetch: `node setup.js test https://url.jpg`

### High storage usage?

```bash
# Check statistics
curl http://localhost:3001/api/stats

# Cleanup images older than 7 days
curl -X POST http://localhost:3001/api/cleanup \
  -H "Content-Type: application/json" \
  -d '{"daysOld": 7}'
```

### Corrupted files detected?

```bash
# Verify all images
curl -X POST http://localhost:3001/api/verify

# Get list of corrupted images
curl 'http://localhost:3001/api/images?corrupted=true'

# Re-fetch if source still available
POST /api/images with same source_url
```

## Advanced Usage

### Export Dataset for ML Training

```bash
# Get all images with specific tag
curl 'http://localhost:3001/api/images?tag=weather-sunny&pageSize=999' \
  -o dataset.json
```

### Monitor Fetch Status

```bash
curl http://localhost:3001/api/fetcher/status
```

### Batch Operations

```bash
# Add tags to multiple images (via API loop)
for image_id in 1 2 3 4 5; do
  curl -X POST http://localhost:3001/api/images/$image_id/tags \
    -H "Content-Type: application/json" \
    -d '{"tags": ["batch-import"]}'
done
```

## Security Considerations

1. **Database Access**: SQLite database is file-based; protect with filesystem permissions
2. **Image Storage**: Protect `data/images` directory from unauthorized access
3. **API Security**: Consider adding authentication for production use
4. **File Validation**: System validates MIME types and file sizes

## Performance Metrics

- **Fetch Time**: ~1-5 seconds per image (network dependent)
- **Database Queries**: <100ms for typical queries
- **Verification**: ~50ms per image
- **Storage**: ~1KB overhead per image in database

---

For more information, check the API documentation or run `node setup.js help`.