Files
HomeBase/IMAGE_STORAGE_GUIDE.md
Spencer 9c72b00b1b
Some checks failed
Deploy to BeePC / deploy (push) Has been cancelled
feat: Implement image fetching and storage system
- Add image-fetcher module for downloading and saving images from various sources.
- Create storage module for managing image files, including downloading, verifying integrity, and cleaning up orphaned files.
- Develop gallery HTML page for displaying images with sorting and filtering options.
- Set up RESTful API routes for image management, including fetching, adding tags, and deleting images.
- Introduce setup script for initializing the database and configuring image sources.
- Implement a batch process for verifying image integrity and cleaning up old images.
- Add setup batch script for easy installation and configuration of the image storage system.
2026-02-12 13:13:36 -05:00

9.7 KiB

📸 Image Storage System Guide

This guide covers the long-term image storage solution integrated into HomeBase. It provides reliable, organized storage for images pulled from web services with built-in corruption detection and ML-friendly tagging.

Features

  • Automatic Image Fetching: Pull images from web services every 2-3 minutes
  • Corruption Detection: SHA256 checksums verify data integrity
  • Tag-Based Organization: Tag images for machine learning model training
  • Efficient Storage: File-based storage with SQLite metadata database
  • RESTful API: Complete API for image management and queries
  • Scalable: Designed to handle thousands of images

Architecture

┌─────────────────────────────────────────────────────┐
│           Image Fetcher Service                     │
│    (Runs every 2-3 minutes, pull from web service)  │
└────────┬────────────────────────────────────────────┘
         │
         ├─────┬──────────────┬──────────────┐
         │     │              │              │
         ▼     ▼              ▼              ▼
    Download  Hash        Store File    Insert Metadata
    Image     (SHA256)     (data/images) (SQLite)
         │                              
         └──────────────────────────────┘
                      │
            ┌─────────▼──────────┐
            │  REST API Endpoints │
            │  - List Images      │
            │  - Add Tags         │
            │  - Search/Filter    │
            │  - Verify Integrity │
            └────────────────────┘

Quick Start

1. Install Dependencies

npm install

2. Initialize the System

node setup.js init

This will:

  • Create the SQLite database
  • Set up data directory structure
  • Load configuration
  • Test the system

3. Configure Image Sources

Edit image-sources.json:

{
  "sources": [
    {
      "name": "Webcam Feed",
      "url": "https://your-service.com/image.jpg",
      "tags": ["webcam", "monitoring"],
      "enabled": true
    }
  ],
  "fetchInterval": 0.033
}

Fetch Intervals (edit fetchInterval to customize):

  • 0.0167 = 1 second
  • 0.033 = 2 seconds (recommended for fast updates)
  • 0.05 = 3 seconds
  • 2.5 = 2.5 minutes (original default)

4. Start the Server

npm start

The system will automatically start fetching images at the configured interval.

Configuration

image-sources.json

{
  "sources": [
    {
      "name": "Example Source",
      "url": "https://example.com/image.jpg",
      "tags": ["tag1", "tag2"],
      "enabled": true
    }
  ],
  "fetchInterval": 0.033
}

Options:

  • fetchInterval: Minutes between fetch cycles. Use decimals for sub-minute intervals:
    • 0.0167 = 1 second
    • 0.033 = 2 seconds
    • 0.05 = 3 seconds
    • 0.167 = 10 seconds
    • 1 = 1 minute
    • 2.5 = 2.5 minutes (original default)

API Endpoints

List Images

GET /api/images?page=1&pageSize=50&sort=fetched_at&order=DESC
GET /api/images?tag=webcam              # Filter by tag
GET /api/images?sourceUrl=https://...  # Filter by source

Response:

{
  "success": true,
  "images": [
    {
      "id": 1,
      "filename": "image_1234567890_abc123.jpg",
      "source_url": "https://...",
      "filesize": 102400,
      "file_hash": "sha256hash",
      "fetched_at": "2023-01-01T12:00:00Z",
      "tags": ["webcam", "monitoring"]
    }
  ],
  "pagination": {
    "page": 1,
    "pageSize": 50,
    "total": 240,
    "pages": 5
  }
}

Get Image Details

GET /api/images/{id}

Download Image

GET /api/images/{id}/download

Fetch New Image

POST /api/images
Content-Type: application/json

{
  "source_url": "https://example.com/image.jpg",
  "tags": ["tag1", "tag2"]
}

Add Tags to Image

POST /api/images/{id}/tags
Content-Type: application/json

{
  "tags": ["newtag1", "newtag2"]
}

List All Tags

GET /api/tags

Response:

{
  "success": true,
  "tags": ["webcam", "monitoring", "test", "example"]
}

Storage Statistics

GET /api/stats

Response:

{
  "success": true,
  "stats": {
    "imageCount": 240,
    "totalSize": 24576000,
    "totalSizeGB": "0.02",
    "fileCount": 240
  }
}

Verify Image Integrity

POST /api/verify

This checks all images for corruption using their stored checksums.

Cleanup Old Images

POST /api/cleanup
Content-Type: application/json

{
  "daysOld": 30
}

Delete Image

DELETE /api/images/{id}

Corruption Detection

The system uses SHA256 checksums to detect corruption:

  1. Storage: When an image is saved, its SHA256 hash is calculated and stored
  2. Verification: The /api/verify endpoint re-hashes all files and compares with stored hashes
  3. Marking: Corrupted images are marked in the database and excluded from queries
  4. Recovery: Corrupted files can be re-fetched using the source URL

Manual Verification

curl -X POST http://localhost:3001/api/verify

Tagging for ML Training

Tags are essential for organizing training datasets:

# Add images with training tags
POST /api/images
{
  "source_url": "https://...",
  "tags": ["dataset_v1", "labeled", "weather-sunny"]
}

# Query all images with specific tag
GET /api/images?tag=weather-sunny

# Get tag statistics
GET /api/tags

Database Schema

Images Table

Column Type Description
id INTEGER Primary key
filename TEXT Unique filename
source_url TEXT Original image URL
file_path TEXT Local file path
filesize INTEGER File size in bytes
file_hash TEXT SHA256 hash
mime_type TEXT Content type
fetched_at DATETIME When image was fetched
is_corrupted BOOLEAN Corruption flag

Tags Table

Column Type Description
id INTEGER Primary key
image_id INTEGER Foreign key to images
tag TEXT Tag text
created_at DATETIME When tag was added

File Structure

homebase/
├── server.js              # Main server
├── package.json           # Dependencies
├── setup.js              # Setup script
├── image-sources.json    # Configuration
├── lib/
│   ├── database.js       # SQLite operations
│   ├── storage.js        # File storage operations
│   └── image-fetcher.js  # Image fetching service
├── routes/
│   └── images.js         # API routes
└── data/                 # Created at runtime
    ├── homebase.db       # SQLite database
    └── images/           # Stored image files

Maintenance

Manual Setup

# Initialize system
node setup.js init

# Test fetch
node setup.js test https://example.com/image.jpg tag1,tag2

# View configuration
node setup.js config

Regular Tasks

# Daily: Verify integrity
curl -X POST http://localhost:3001/api/verify

# Weekly: Cleanup old images (30+ days)
curl -X POST http://localhost:3001/api/cleanup -H "Content-Type: application/json" -d '{"daysOld": 30}'

# Monitor storage
curl http://localhost:3001/api/stats

Performance Tips

  1. Pagination: Use pageSize=50 or less for large datasets
  2. Tagging: Use consistent tag names for easier filtering
  3. Cleanup: Run cleanup weekly to manage storage
  4. Verification: Run verification monthly to detect issues early

Troubleshooting

No images fetching?

  1. Check image-sources.json - ensure enabled: true
  2. Verify URL is accessible
  3. Check server logs for errors
  4. Test fetch: node setup.js test https://url.jpg

High storage usage?

# Check statistics
curl http://localhost:3001/api/stats

# Cleanup images older than 7 days
curl -X POST http://localhost:3001/api/cleanup \
  -H "Content-Type: application/json" \
  -d '{"daysOld": 7}'

Corrupted files detected?

# Verify all images
curl -X POST http://localhost:3001/api/verify

# Get list of corrupted images
curl 'http://localhost:3001/api/images?corrupted=true'

# Re-fetch if source still available
POST /api/images with same source_url

Advanced Usage

Export Dataset for ML Training

# Get all images with specific tag
curl 'http://localhost:3001/api/images?tag=weather-sunny&pageSize=999' \
  -o dataset.json

Monitor Fetch Status

curl http://localhost:3001/api/fetcher/status

Batch Operations

# Add tags to multiple images (via API loop)
for image_id in 1 2 3 4 5; do
  curl -X POST http://localhost:3001/api/images/$image_id/tags \
    -H "Content-Type: application/json" \
    -d '{"tags": ["batch-import"]}'
done

Security Considerations

  1. Database Access: SQLite database is file-based; protect with filesystem permissions
  2. Image Storage: Protect data/images directory from unauthorized access
  3. API Security: Consider adding authentication for production use
  4. File Validation: System validates MIME types and file sizes

Performance Metrics

  • Fetch Time: ~1-5 seconds per image (network dependent)
  • Database Queries: <100ms for typical queries
  • Verification: ~50ms per image
  • Storage: ~1KB overhead per image in database

For more information, check the API documentation or run node setup.js help.