# 📸 Image Storage System Guide This guide covers the long-term image storage solution integrated into HomeBase. It provides reliable, organized storage for images pulled from web services with built-in corruption detection and ML-friendly tagging. ## Features - **Automatic Image Fetching**: Pull images from web services every 2-3 minutes - **Corruption Detection**: SHA256 checksums verify data integrity - **Tag-Based Organization**: Tag images for machine learning model training - **Efficient Storage**: File-based storage with SQLite metadata database - **RESTful API**: Complete API for image management and queries - **Scalable**: Designed to handle thousands of images ## Architecture ``` ┌─────────────────────────────────────────────────────┐ │ Image Fetcher Service │ │ (Runs every 2-3 minutes, pull from web service) │ └────────┬────────────────────────────────────────────┘ │ ├─────┬──────────────┬──────────────┐ │ │ │ │ ▼ ▼ ▼ ▼ Download Hash Store File Insert Metadata Image (SHA256) (data/images) (SQLite) │ └──────────────────────────────┘ │ ┌─────────▼──────────┐ │ REST API Endpoints │ │ - List Images │ │ - Add Tags │ │ - Search/Filter │ │ - Verify Integrity │ └────────────────────┘ ``` ## Quick Start ### 1. Install Dependencies ```bash npm install ``` ### 2. Initialize the System ```bash node setup.js init ``` This will: - Create the SQLite database - Set up data directory structure - Load configuration - Test the system ### 3. Configure Image Sources Edit `image-sources.json`: ```json { "sources": [ { "name": "Webcam Feed", "url": "https://your-service.com/image.jpg", "tags": ["webcam", "monitoring"], "enabled": true } ], "fetchInterval": 0.033 } ``` **Fetch Intervals** (edit `fetchInterval` to customize): - `0.0167` = 1 second - `0.033` = 2 seconds (recommended for fast updates) - `0.05` = 3 seconds - `2.5` = 2.5 minutes (original default) ### 4. Start the Server ```bash npm start ``` The system will automatically start fetching images at the configured interval. ## Configuration ### image-sources.json ```json { "sources": [ { "name": "Example Source", "url": "https://example.com/image.jpg", "tags": ["tag1", "tag2"], "enabled": true } ], "fetchInterval": 0.033 } ``` **Options:** - `fetchInterval`: **Minutes** between fetch cycles. Use decimals for sub-minute intervals: - `0.0167` = 1 second - `0.033` = 2 seconds - `0.05` = 3 seconds - `0.167` = 10 seconds - `1` = 1 minute - `2.5` = 2.5 minutes (original default) ## API Endpoints ### List Images ```bash GET /api/images?page=1&pageSize=50&sort=fetched_at&order=DESC GET /api/images?tag=webcam # Filter by tag GET /api/images?sourceUrl=https://... # Filter by source ``` **Response:** ```json { "success": true, "images": [ { "id": 1, "filename": "image_1234567890_abc123.jpg", "source_url": "https://...", "filesize": 102400, "file_hash": "sha256hash", "fetched_at": "2023-01-01T12:00:00Z", "tags": ["webcam", "monitoring"] } ], "pagination": { "page": 1, "pageSize": 50, "total": 240, "pages": 5 } } ``` ### Get Image Details ```bash GET /api/images/{id} ``` ### Download Image ```bash GET /api/images/{id}/download ``` ### Fetch New Image ```bash POST /api/images Content-Type: application/json { "source_url": "https://example.com/image.jpg", "tags": ["tag1", "tag2"] } ``` ### Add Tags to Image ```bash POST /api/images/{id}/tags Content-Type: application/json { "tags": ["newtag1", "newtag2"] } ``` ### List All Tags ```bash GET /api/tags ``` **Response:** ```json { "success": true, "tags": ["webcam", "monitoring", "test", "example"] } ``` ### Storage Statistics ```bash GET /api/stats ``` **Response:** ```json { "success": true, "stats": { "imageCount": 240, "totalSize": 24576000, "totalSizeGB": "0.02", "fileCount": 240 } } ``` ### Verify Image Integrity ```bash POST /api/verify ``` This checks all images for corruption using their stored checksums. ### Cleanup Old Images ```bash POST /api/cleanup Content-Type: application/json { "daysOld": 30 } ``` ### Delete Image ```bash DELETE /api/images/{id} ``` ## Corruption Detection The system uses SHA256 checksums to detect corruption: 1. **Storage**: When an image is saved, its SHA256 hash is calculated and stored 2. **Verification**: The `/api/verify` endpoint re-hashes all files and compares with stored hashes 3. **Marking**: Corrupted images are marked in the database and excluded from queries 4. **Recovery**: Corrupted files can be re-fetched using the source URL ### Manual Verification ```bash curl -X POST http://localhost:3001/api/verify ``` ## Tagging for ML Training Tags are essential for organizing training datasets: ```bash # Add images with training tags POST /api/images { "source_url": "https://...", "tags": ["dataset_v1", "labeled", "weather-sunny"] } # Query all images with specific tag GET /api/images?tag=weather-sunny # Get tag statistics GET /api/tags ``` ## Database Schema ### Images Table | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | filename | TEXT | Unique filename | | source_url | TEXT | Original image URL | | file_path | TEXT | Local file path | | filesize | INTEGER | File size in bytes | | file_hash | TEXT | SHA256 hash | | mime_type | TEXT | Content type | | fetched_at | DATETIME | When image was fetched | | is_corrupted | BOOLEAN | Corruption flag | ### Tags Table | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | image_id | INTEGER | Foreign key to images | | tag | TEXT | Tag text | | created_at | DATETIME | When tag was added | ## File Structure ``` homebase/ ├── server.js # Main server ├── package.json # Dependencies ├── setup.js # Setup script ├── image-sources.json # Configuration ├── lib/ │ ├── database.js # SQLite operations │ ├── storage.js # File storage operations │ └── image-fetcher.js # Image fetching service ├── routes/ │ └── images.js # API routes └── data/ # Created at runtime ├── homebase.db # SQLite database └── images/ # Stored image files ``` ## Maintenance ### Manual Setup ```bash # Initialize system node setup.js init # Test fetch node setup.js test https://example.com/image.jpg tag1,tag2 # View configuration node setup.js config ``` ### Regular Tasks ```bash # Daily: Verify integrity curl -X POST http://localhost:3001/api/verify # Weekly: Cleanup old images (30+ days) curl -X POST http://localhost:3001/api/cleanup -H "Content-Type: application/json" -d '{"daysOld": 30}' # Monitor storage curl http://localhost:3001/api/stats ``` ## Performance Tips 1. **Pagination**: Use `pageSize=50` or less for large datasets 2. **Tagging**: Use consistent tag names for easier filtering 3. **Cleanup**: Run cleanup weekly to manage storage 4. **Verification**: Run verification monthly to detect issues early ## Troubleshooting ### No images fetching? 1. Check `image-sources.json` - ensure `enabled: true` 2. Verify URL is accessible 3. Check server logs for errors 4. Test fetch: `node setup.js test https://url.jpg` ### High storage usage? ```bash # Check statistics curl http://localhost:3001/api/stats # Cleanup images older than 7 days curl -X POST http://localhost:3001/api/cleanup \ -H "Content-Type: application/json" \ -d '{"daysOld": 7}' ``` ### Corrupted files detected? ```bash # Verify all images curl -X POST http://localhost:3001/api/verify # Get list of corrupted images curl 'http://localhost:3001/api/images?corrupted=true' # Re-fetch if source still available POST /api/images with same source_url ``` ## Advanced Usage ### Export Dataset for ML Training ```bash # Get all images with specific tag curl 'http://localhost:3001/api/images?tag=weather-sunny&pageSize=999' \ -o dataset.json ``` ### Monitor Fetch Status ```bash curl http://localhost:3001/api/fetcher/status ``` ### Batch Operations ```bash # Add tags to multiple images (via API loop) for image_id in 1 2 3 4 5; do curl -X POST http://localhost:3001/api/images/$image_id/tags \ -H "Content-Type: application/json" \ -d '{"tags": ["batch-import"]}' done ``` ## Security Considerations 1. **Database Access**: SQLite database is file-based; protect with filesystem permissions 2. **Image Storage**: Protect `data/images` directory from unauthorized access 3. **API Security**: Consider adding authentication for production use 4. **File Validation**: System validates MIME types and file sizes ## Performance Metrics - **Fetch Time**: ~1-5 seconds per image (network dependent) - **Database Queries**: <100ms for typical queries - **Verification**: ~50ms per image - **Storage**: ~1KB overhead per image in database --- For more information, check the API documentation or run `node setup.js help`.