Some checks failed
Deploy to BeePC / deploy (push) Has been cancelled
- Add image-fetcher module for downloading and saving images from various sources. - Create storage module for managing image files, including downloading, verifying integrity, and cleaning up orphaned files. - Develop gallery HTML page for displaying images with sorting and filtering options. - Set up RESTful API routes for image management, including fetching, adding tags, and deleting images. - Introduce setup script for initializing the database and configuring image sources. - Implement a batch process for verifying image integrity and cleaning up old images. - Add setup batch script for easy installation and configuration of the image storage system.
436 lines
9.7 KiB
Markdown
436 lines
9.7 KiB
Markdown
# 📸 Image Storage System Guide
|
|
|
|
This guide covers the long-term image storage solution integrated into HomeBase. It provides reliable, organized storage for images pulled from web services with built-in corruption detection and ML-friendly tagging.
|
|
|
|
## Features
|
|
|
|
- **Automatic Image Fetching**: Pull images from web services every 2-3 minutes
|
|
- **Corruption Detection**: SHA256 checksums verify data integrity
|
|
- **Tag-Based Organization**: Tag images for machine learning model training
|
|
- **Efficient Storage**: File-based storage with SQLite metadata database
|
|
- **RESTful API**: Complete API for image management and queries
|
|
- **Scalable**: Designed to handle thousands of images
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ Image Fetcher Service │
|
|
│ (Runs every 2-3 minutes, pull from web service) │
|
|
└────────┬────────────────────────────────────────────┘
|
|
│
|
|
├─────┬──────────────┬──────────────┐
|
|
│ │ │ │
|
|
▼ ▼ ▼ ▼
|
|
Download Hash Store File Insert Metadata
|
|
Image (SHA256) (data/images) (SQLite)
|
|
│
|
|
└──────────────────────────────┘
|
|
│
|
|
┌─────────▼──────────┐
|
|
│ REST API Endpoints │
|
|
│ - List Images │
|
|
│ - Add Tags │
|
|
│ - Search/Filter │
|
|
│ - Verify Integrity │
|
|
└────────────────────┘
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
### 2. Initialize the System
|
|
|
|
```bash
|
|
node setup.js init
|
|
```
|
|
|
|
This will:
|
|
- Create the SQLite database
|
|
- Set up data directory structure
|
|
- Load configuration
|
|
- Test the system
|
|
|
|
### 3. Configure Image Sources
|
|
|
|
Edit `image-sources.json`:
|
|
|
|
```json
|
|
{
|
|
"sources": [
|
|
{
|
|
"name": "Webcam Feed",
|
|
"url": "https://your-service.com/image.jpg",
|
|
"tags": ["webcam", "monitoring"],
|
|
"enabled": true
|
|
}
|
|
],
|
|
"fetchInterval": 0.033
|
|
}
|
|
```
|
|
|
|
**Fetch Intervals** (edit `fetchInterval` to customize):
|
|
- `0.0167` = 1 second
|
|
- `0.033` = 2 seconds (recommended for fast updates)
|
|
- `0.05` = 3 seconds
|
|
- `2.5` = 2.5 minutes (original default)
|
|
|
|
### 4. Start the Server
|
|
|
|
```bash
|
|
npm start
|
|
```
|
|
|
|
The system will automatically start fetching images at the configured interval.
|
|
|
|
## Configuration
|
|
|
|
### image-sources.json
|
|
|
|
```json
|
|
{
|
|
"sources": [
|
|
{
|
|
"name": "Example Source",
|
|
"url": "https://example.com/image.jpg",
|
|
"tags": ["tag1", "tag2"],
|
|
"enabled": true
|
|
}
|
|
],
|
|
"fetchInterval": 0.033
|
|
}
|
|
```
|
|
|
|
**Options:**
|
|
- `fetchInterval`: **Minutes** between fetch cycles. Use decimals for sub-minute intervals:
|
|
- `0.0167` = 1 second
|
|
- `0.033` = 2 seconds
|
|
- `0.05` = 3 seconds
|
|
- `0.167` = 10 seconds
|
|
- `1` = 1 minute
|
|
- `2.5` = 2.5 minutes (original default)
|
|
|
|
## API Endpoints
|
|
|
|
### List Images
|
|
|
|
```bash
|
|
GET /api/images?page=1&pageSize=50&sort=fetched_at&order=DESC
|
|
GET /api/images?tag=webcam # Filter by tag
|
|
GET /api/images?sourceUrl=https://... # Filter by source
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"images": [
|
|
{
|
|
"id": 1,
|
|
"filename": "image_1234567890_abc123.jpg",
|
|
"source_url": "https://...",
|
|
"filesize": 102400,
|
|
"file_hash": "sha256hash",
|
|
"fetched_at": "2023-01-01T12:00:00Z",
|
|
"tags": ["webcam", "monitoring"]
|
|
}
|
|
],
|
|
"pagination": {
|
|
"page": 1,
|
|
"pageSize": 50,
|
|
"total": 240,
|
|
"pages": 5
|
|
}
|
|
}
|
|
```
|
|
|
|
### Get Image Details
|
|
|
|
```bash
|
|
GET /api/images/{id}
|
|
```
|
|
|
|
### Download Image
|
|
|
|
```bash
|
|
GET /api/images/{id}/download
|
|
```
|
|
|
|
### Fetch New Image
|
|
|
|
```bash
|
|
POST /api/images
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"source_url": "https://example.com/image.jpg",
|
|
"tags": ["tag1", "tag2"]
|
|
}
|
|
```
|
|
|
|
### Add Tags to Image
|
|
|
|
```bash
|
|
POST /api/images/{id}/tags
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"tags": ["newtag1", "newtag2"]
|
|
}
|
|
```
|
|
|
|
### List All Tags
|
|
|
|
```bash
|
|
GET /api/tags
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"tags": ["webcam", "monitoring", "test", "example"]
|
|
}
|
|
```
|
|
|
|
### Storage Statistics
|
|
|
|
```bash
|
|
GET /api/stats
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"stats": {
|
|
"imageCount": 240,
|
|
"totalSize": 24576000,
|
|
"totalSizeGB": "0.02",
|
|
"fileCount": 240
|
|
}
|
|
}
|
|
```
|
|
|
|
### Verify Image Integrity
|
|
|
|
```bash
|
|
POST /api/verify
|
|
```
|
|
|
|
This checks all images for corruption using their stored checksums.
|
|
|
|
### Cleanup Old Images
|
|
|
|
```bash
|
|
POST /api/cleanup
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"daysOld": 30
|
|
}
|
|
```
|
|
|
|
### Delete Image
|
|
|
|
```bash
|
|
DELETE /api/images/{id}
|
|
```
|
|
|
|
## Corruption Detection
|
|
|
|
The system uses SHA256 checksums to detect corruption:
|
|
|
|
1. **Storage**: When an image is saved, its SHA256 hash is calculated and stored
|
|
2. **Verification**: The `/api/verify` endpoint re-hashes all files and compares with stored hashes
|
|
3. **Marking**: Corrupted images are marked in the database and excluded from queries
|
|
4. **Recovery**: Corrupted files can be re-fetched using the source URL
|
|
|
|
### Manual Verification
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3001/api/verify
|
|
```
|
|
|
|
## Tagging for ML Training
|
|
|
|
Tags are essential for organizing training datasets:
|
|
|
|
```bash
|
|
# Add images with training tags
|
|
POST /api/images
|
|
{
|
|
"source_url": "https://...",
|
|
"tags": ["dataset_v1", "labeled", "weather-sunny"]
|
|
}
|
|
|
|
# Query all images with specific tag
|
|
GET /api/images?tag=weather-sunny
|
|
|
|
# Get tag statistics
|
|
GET /api/tags
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
### Images Table
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| id | INTEGER | Primary key |
|
|
| filename | TEXT | Unique filename |
|
|
| source_url | TEXT | Original image URL |
|
|
| file_path | TEXT | Local file path |
|
|
| filesize | INTEGER | File size in bytes |
|
|
| file_hash | TEXT | SHA256 hash |
|
|
| mime_type | TEXT | Content type |
|
|
| fetched_at | DATETIME | When image was fetched |
|
|
| is_corrupted | BOOLEAN | Corruption flag |
|
|
|
|
### Tags Table
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| id | INTEGER | Primary key |
|
|
| image_id | INTEGER | Foreign key to images |
|
|
| tag | TEXT | Tag text |
|
|
| created_at | DATETIME | When tag was added |
|
|
|
|
## File Structure
|
|
|
|
```
|
|
homebase/
|
|
├── server.js # Main server
|
|
├── package.json # Dependencies
|
|
├── setup.js # Setup script
|
|
├── image-sources.json # Configuration
|
|
├── lib/
|
|
│ ├── database.js # SQLite operations
|
|
│ ├── storage.js # File storage operations
|
|
│ └── image-fetcher.js # Image fetching service
|
|
├── routes/
|
|
│ └── images.js # API routes
|
|
└── data/ # Created at runtime
|
|
├── homebase.db # SQLite database
|
|
└── images/ # Stored image files
|
|
```
|
|
|
|
## Maintenance
|
|
|
|
### Manual Setup
|
|
|
|
```bash
|
|
# Initialize system
|
|
node setup.js init
|
|
|
|
# Test fetch
|
|
node setup.js test https://example.com/image.jpg tag1,tag2
|
|
|
|
# View configuration
|
|
node setup.js config
|
|
```
|
|
|
|
### Regular Tasks
|
|
|
|
```bash
|
|
# Daily: Verify integrity
|
|
curl -X POST http://localhost:3001/api/verify
|
|
|
|
# Weekly: Cleanup old images (30+ days)
|
|
curl -X POST http://localhost:3001/api/cleanup -H "Content-Type: application/json" -d '{"daysOld": 30}'
|
|
|
|
# Monitor storage
|
|
curl http://localhost:3001/api/stats
|
|
```
|
|
|
|
## Performance Tips
|
|
|
|
1. **Pagination**: Use `pageSize=50` or less for large datasets
|
|
2. **Tagging**: Use consistent tag names for easier filtering
|
|
3. **Cleanup**: Run cleanup weekly to manage storage
|
|
4. **Verification**: Run verification monthly to detect issues early
|
|
|
|
## Troubleshooting
|
|
|
|
### No images fetching?
|
|
|
|
1. Check `image-sources.json` - ensure `enabled: true`
|
|
2. Verify URL is accessible
|
|
3. Check server logs for errors
|
|
4. Test fetch: `node setup.js test https://url.jpg`
|
|
|
|
### High storage usage?
|
|
|
|
```bash
|
|
# Check statistics
|
|
curl http://localhost:3001/api/stats
|
|
|
|
# Cleanup images older than 7 days
|
|
curl -X POST http://localhost:3001/api/cleanup \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"daysOld": 7}'
|
|
```
|
|
|
|
### Corrupted files detected?
|
|
|
|
```bash
|
|
# Verify all images
|
|
curl -X POST http://localhost:3001/api/verify
|
|
|
|
# Get list of corrupted images
|
|
curl 'http://localhost:3001/api/images?corrupted=true'
|
|
|
|
# Re-fetch if source still available
|
|
POST /api/images with same source_url
|
|
```
|
|
|
|
## Advanced Usage
|
|
|
|
### Export Dataset for ML Training
|
|
|
|
```bash
|
|
# Get all images with specific tag
|
|
curl 'http://localhost:3001/api/images?tag=weather-sunny&pageSize=999' \
|
|
-o dataset.json
|
|
```
|
|
|
|
### Monitor Fetch Status
|
|
|
|
```bash
|
|
curl http://localhost:3001/api/fetcher/status
|
|
```
|
|
|
|
### Batch Operations
|
|
|
|
```bash
|
|
# Add tags to multiple images (via API loop)
|
|
for image_id in 1 2 3 4 5; do
|
|
curl -X POST http://localhost:3001/api/images/$image_id/tags \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"tags": ["batch-import"]}'
|
|
done
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Database Access**: SQLite database is file-based; protect with filesystem permissions
|
|
2. **Image Storage**: Protect `data/images` directory from unauthorized access
|
|
3. **API Security**: Consider adding authentication for production use
|
|
4. **File Validation**: System validates MIME types and file sizes
|
|
|
|
## Performance Metrics
|
|
|
|
- **Fetch Time**: ~1-5 seconds per image (network dependent)
|
|
- **Database Queries**: <100ms for typical queries
|
|
- **Verification**: ~50ms per image
|
|
- **Storage**: ~1KB overhead per image in database
|
|
|
|
---
|
|
|
|
For more information, check the API documentation or run `node setup.js help`.
|