feat: add storage and training modules for snore detection

- Implemented `storage.py` for managing metadata storage, including sample addition, retrieval, and review state management. - Created `training.py` for training a local model using Random Forest, including functions for training and predicting samples. - Developed a web interface in `app.js` for capturing audio samples, managing labels, and training the model. - Added HTML structure in `index.html` for the SnoreStopper control room with sections for sample capture, overnight gathering, training, and status display. - Styled the application with `styles.css` to enhance user experience and interface aesthetics.
2026-03-12 13:35:17 -04:00
commit 28012e70e0
21 changed files with 2680 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,102 @@
+# SnoreStopper v2
+
+SnoreStopper is a self-hosted Python + web UI project for collecting night audio samples, labeling snore events, and training a local classifier.
+
+This starter build gives you:
+- Audio input device discovery from the server machine
+- On-demand audio sample capture to compressed FLAC files
+- Spectrogram generation for each sample
+- Browser-based labeling workflow (`snore`, `not_snore`, `unclear`)
+- Local model training with scikit-learn
+- Per-sample detection using the trained model
+- Overnight sample gathering with configurable clip interval
+- Snore watch proposal queue with thumbs up/down review gating before training
+
+## Project Layout
+
+```text
+SnoreStopper_v2/
+  data/
+    raw/                # Recorded FLAC clips
+    spectrograms/       # PNG spectrogram images
+    models/             # Trained model artifacts
+    meta/               # Metadata JSON (samples, labels)
+  src/snorestopper/
+    main.py             # FastAPI app and endpoints
+    audio.py            # Device listing and recording
+    features.py         # Audio IO + spectrogram + features
+    training.py         # Local model training/prediction
+    storage.py          # Metadata persistence
+    schemas.py          # API request/response models
+  web/
+    index.html
+    styles.css
+    app.js
+  requirements.txt
+  pyproject.toml
+```
+
+## Quick Start (Windows PowerShell)
+
+1. Create and activate a virtual environment.
+```powershell
+python -m venv .venv
+.\.venv\Scripts\Activate.ps1
+```
+
+2. Install dependencies.
+```powershell
+python -m pip install --upgrade pip
+pip install -r requirements.txt
+```
+
+3. Start the local server.
+```powershell
+uvicorn --app-dir src snorestopper.main:app --reload --host 127.0.0.1 --port 8000
+```
+
+4. Open the app.
+- Browser: `http://127.0.0.1:8000`
+- API docs: `http://127.0.0.1:8000/docs`
+
+## Typical Workflow
+
+1. Choose an input device and record short clips over several nights.
+2. Review each clip and spectrogram.
+3. Label clips in the UI.
+4. Train the model locally from your labeled dataset.
+5. Start an overnight run with `auto_watch` enabled to capture clips and queue predictions.
+6. Approve or invert watch proposals with thumbs up/down in the review queue.
+7. Retrain the model from approved labels.
+
+## Overnight + Snore Watch Flow
+
+- `Overnight Gatherer` captures clips on a fixed interval for N hours.
+- If `auto_watch` is enabled, each clip is scored by the local model.
+- Predictions are stored as pending proposals and are **not** used for training yet.
+- You decide per clip:
+  - `Thumbs Up` -> proposal becomes approved training label
+  - `Thumbs Down` -> proposal is inverted (`snore <-> not_snore`) and approved for training
+- Manual labels still work and can override watch proposals.
+
+## Notes
+
+- This is intentionally self-hosted and local-first: all recorded data, labels, and model artifacts stay on your machine.
+- The current model is a baseline (RandomForest + handcrafted spectral features) so you can get to a working loop quickly.
+- Recording quality and label quality are the main drivers of model performance.
+
+## Environment Variables (Optional)
+
+- `SNORESTOPPER_ROOT`: Override project root directory
+- `SNORESTOPPER_SAMPLE_RATE`: Default `16000`
+- `SNORESTOPPER_CHANNELS`: Default `1`
+- `SNORESTOPPER_MIN_DURATION`: Default `2`
+- `SNORESTOPPER_MAX_DURATION`: Default `90`
+- `SNORESTOPPER_MODEL_FILE`: Default `snore_classifier.joblib`
+
+## Next Build Targets
+
+- Scheduled overnight capture jobs
+- Better event segmentation and confidence thresholds
+- Hardware trigger module for anti-snoring actions
+- User profiles and per-user local models