Files
HomeBase/IMAGE_STORAGE_GUIDE.md
Spencer 9c72b00b1b
Some checks failed
Deploy to BeePC / deploy (push) Has been cancelled
feat: Implement image fetching and storage system
- Add image-fetcher module for downloading and saving images from various sources.
- Create storage module for managing image files, including downloading, verifying integrity, and cleaning up orphaned files.
- Develop gallery HTML page for displaying images with sorting and filtering options.
- Set up RESTful API routes for image management, including fetching, adding tags, and deleting images.
- Introduce setup script for initializing the database and configuring image sources.
- Implement a batch process for verifying image integrity and cleaning up old images.
- Add setup batch script for easy installation and configuration of the image storage system.
2026-02-12 13:13:36 -05:00

436 lines
9.7 KiB
Markdown

# 📸 Image Storage System Guide
This guide covers the long-term image storage solution integrated into HomeBase. It provides reliable, organized storage for images pulled from web services with built-in corruption detection and ML-friendly tagging.
## Features
- **Automatic Image Fetching**: Pull images from web services every 2-3 minutes
- **Corruption Detection**: SHA256 checksums verify data integrity
- **Tag-Based Organization**: Tag images for machine learning model training
- **Efficient Storage**: File-based storage with SQLite metadata database
- **RESTful API**: Complete API for image management and queries
- **Scalable**: Designed to handle thousands of images
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ Image Fetcher Service │
│ (Runs every 2-3 minutes, pull from web service) │
└────────┬────────────────────────────────────────────┘
├─────┬──────────────┬──────────────┐
│ │ │ │
▼ ▼ ▼ ▼
Download Hash Store File Insert Metadata
Image (SHA256) (data/images) (SQLite)
└──────────────────────────────┘
┌─────────▼──────────┐
│ REST API Endpoints │
│ - List Images │
│ - Add Tags │
│ - Search/Filter │
│ - Verify Integrity │
└────────────────────┘
```
## Quick Start
### 1. Install Dependencies
```bash
npm install
```
### 2. Initialize the System
```bash
node setup.js init
```
This will:
- Create the SQLite database
- Set up data directory structure
- Load configuration
- Test the system
### 3. Configure Image Sources
Edit `image-sources.json`:
```json
{
"sources": [
{
"name": "Webcam Feed",
"url": "https://your-service.com/image.jpg",
"tags": ["webcam", "monitoring"],
"enabled": true
}
],
"fetchInterval": 0.033
}
```
**Fetch Intervals** (edit `fetchInterval` to customize):
- `0.0167` = 1 second
- `0.033` = 2 seconds (recommended for fast updates)
- `0.05` = 3 seconds
- `2.5` = 2.5 minutes (original default)
### 4. Start the Server
```bash
npm start
```
The system will automatically start fetching images at the configured interval.
## Configuration
### image-sources.json
```json
{
"sources": [
{
"name": "Example Source",
"url": "https://example.com/image.jpg",
"tags": ["tag1", "tag2"],
"enabled": true
}
],
"fetchInterval": 0.033
}
```
**Options:**
- `fetchInterval`: **Minutes** between fetch cycles. Use decimals for sub-minute intervals:
- `0.0167` = 1 second
- `0.033` = 2 seconds
- `0.05` = 3 seconds
- `0.167` = 10 seconds
- `1` = 1 minute
- `2.5` = 2.5 minutes (original default)
## API Endpoints
### List Images
```bash
GET /api/images?page=1&pageSize=50&sort=fetched_at&order=DESC
GET /api/images?tag=webcam # Filter by tag
GET /api/images?sourceUrl=https://... # Filter by source
```
**Response:**
```json
{
"success": true,
"images": [
{
"id": 1,
"filename": "image_1234567890_abc123.jpg",
"source_url": "https://...",
"filesize": 102400,
"file_hash": "sha256hash",
"fetched_at": "2023-01-01T12:00:00Z",
"tags": ["webcam", "monitoring"]
}
],
"pagination": {
"page": 1,
"pageSize": 50,
"total": 240,
"pages": 5
}
}
```
### Get Image Details
```bash
GET /api/images/{id}
```
### Download Image
```bash
GET /api/images/{id}/download
```
### Fetch New Image
```bash
POST /api/images
Content-Type: application/json
{
"source_url": "https://example.com/image.jpg",
"tags": ["tag1", "tag2"]
}
```
### Add Tags to Image
```bash
POST /api/images/{id}/tags
Content-Type: application/json
{
"tags": ["newtag1", "newtag2"]
}
```
### List All Tags
```bash
GET /api/tags
```
**Response:**
```json
{
"success": true,
"tags": ["webcam", "monitoring", "test", "example"]
}
```
### Storage Statistics
```bash
GET /api/stats
```
**Response:**
```json
{
"success": true,
"stats": {
"imageCount": 240,
"totalSize": 24576000,
"totalSizeGB": "0.02",
"fileCount": 240
}
}
```
### Verify Image Integrity
```bash
POST /api/verify
```
This checks all images for corruption using their stored checksums.
### Cleanup Old Images
```bash
POST /api/cleanup
Content-Type: application/json
{
"daysOld": 30
}
```
### Delete Image
```bash
DELETE /api/images/{id}
```
## Corruption Detection
The system uses SHA256 checksums to detect corruption:
1. **Storage**: When an image is saved, its SHA256 hash is calculated and stored
2. **Verification**: The `/api/verify` endpoint re-hashes all files and compares with stored hashes
3. **Marking**: Corrupted images are marked in the database and excluded from queries
4. **Recovery**: Corrupted files can be re-fetched using the source URL
### Manual Verification
```bash
curl -X POST http://localhost:3001/api/verify
```
## Tagging for ML Training
Tags are essential for organizing training datasets:
```bash
# Add images with training tags
POST /api/images
{
"source_url": "https://...",
"tags": ["dataset_v1", "labeled", "weather-sunny"]
}
# Query all images with specific tag
GET /api/images?tag=weather-sunny
# Get tag statistics
GET /api/tags
```
## Database Schema
### Images Table
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Primary key |
| filename | TEXT | Unique filename |
| source_url | TEXT | Original image URL |
| file_path | TEXT | Local file path |
| filesize | INTEGER | File size in bytes |
| file_hash | TEXT | SHA256 hash |
| mime_type | TEXT | Content type |
| fetched_at | DATETIME | When image was fetched |
| is_corrupted | BOOLEAN | Corruption flag |
### Tags Table
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Primary key |
| image_id | INTEGER | Foreign key to images |
| tag | TEXT | Tag text |
| created_at | DATETIME | When tag was added |
## File Structure
```
homebase/
├── server.js # Main server
├── package.json # Dependencies
├── setup.js # Setup script
├── image-sources.json # Configuration
├── lib/
│ ├── database.js # SQLite operations
│ ├── storage.js # File storage operations
│ └── image-fetcher.js # Image fetching service
├── routes/
│ └── images.js # API routes
└── data/ # Created at runtime
├── homebase.db # SQLite database
└── images/ # Stored image files
```
## Maintenance
### Manual Setup
```bash
# Initialize system
node setup.js init
# Test fetch
node setup.js test https://example.com/image.jpg tag1,tag2
# View configuration
node setup.js config
```
### Regular Tasks
```bash
# Daily: Verify integrity
curl -X POST http://localhost:3001/api/verify
# Weekly: Cleanup old images (30+ days)
curl -X POST http://localhost:3001/api/cleanup -H "Content-Type: application/json" -d '{"daysOld": 30}'
# Monitor storage
curl http://localhost:3001/api/stats
```
## Performance Tips
1. **Pagination**: Use `pageSize=50` or less for large datasets
2. **Tagging**: Use consistent tag names for easier filtering
3. **Cleanup**: Run cleanup weekly to manage storage
4. **Verification**: Run verification monthly to detect issues early
## Troubleshooting
### No images fetching?
1. Check `image-sources.json` - ensure `enabled: true`
2. Verify URL is accessible
3. Check server logs for errors
4. Test fetch: `node setup.js test https://url.jpg`
### High storage usage?
```bash
# Check statistics
curl http://localhost:3001/api/stats
# Cleanup images older than 7 days
curl -X POST http://localhost:3001/api/cleanup \
-H "Content-Type: application/json" \
-d '{"daysOld": 7}'
```
### Corrupted files detected?
```bash
# Verify all images
curl -X POST http://localhost:3001/api/verify
# Get list of corrupted images
curl 'http://localhost:3001/api/images?corrupted=true'
# Re-fetch if source still available
POST /api/images with same source_url
```
## Advanced Usage
### Export Dataset for ML Training
```bash
# Get all images with specific tag
curl 'http://localhost:3001/api/images?tag=weather-sunny&pageSize=999' \
-o dataset.json
```
### Monitor Fetch Status
```bash
curl http://localhost:3001/api/fetcher/status
```
### Batch Operations
```bash
# Add tags to multiple images (via API loop)
for image_id in 1 2 3 4 5; do
curl -X POST http://localhost:3001/api/images/$image_id/tags \
-H "Content-Type: application/json" \
-d '{"tags": ["batch-import"]}'
done
```
## Security Considerations
1. **Database Access**: SQLite database is file-based; protect with filesystem permissions
2. **Image Storage**: Protect `data/images` directory from unauthorized access
3. **API Security**: Consider adding authentication for production use
4. **File Validation**: System validates MIME types and file sizes
## Performance Metrics
- **Fetch Time**: ~1-5 seconds per image (network dependent)
- **Database Queries**: <100ms for typical queries
- **Verification**: ~50ms per image
- **Storage**: ~1KB overhead per image in database
---
For more information, check the API documentation or run `node setup.js help`.