Commit c6c1587

mo khan <mo@mokhan.ca>
2026-01-29 04:15:07
chore: add more user stories
1 parent 9f492a4
.elelem/backlog/002-hardware-detection.md
@@ -0,0 +1,44 @@
+As a `new user`, I `want elelem to detect my hardware capabilities`, so that `it can recommend an appropriate model for my system`.
+
+# SYNOPSIS
+
+Detect GPU/CPU capabilities to determine what models can run locally.
+
+# DESCRIPTION
+
+When elelem starts with no configuration, it should be able to detect:
+
+1. **GPU presence and type**:
+   - NVIDIA GPU with CUDA support (check nvidia-smi or similar)
+   - AMD GPU with ROCm support
+   - No discrete GPU (CPU-only fallback)
+
+2. **Available VRAM/RAM**:
+   - GPU memory available for model loading
+   - System RAM as fallback for CPU inference
+
+3. **Model recommendations**:
+   - Map hardware capabilities to appropriate model sizes
+   - Example: 8GB VRAM → 7B parameter model, 4GB VRAM → 3B model, CPU-only → small model
+
+This information will be used by the local provider to:
+- Select the default model automatically
+- Warn users if their hardware may struggle with a requested model
+
+# SEE ALSO
+
+* [ ] lib/elelem/system_prompt.rb (platform detection)
+* [ ] Story 001 (spike findings will inform implementation)
+
+# Tasks
+
+* [ ] TBD (filled in design mode)
+
+# Acceptance Criteria
+
+* [ ] Correctly detects NVIDIA GPU presence on Linux
+* [ ] Correctly detects AMD GPU presence on Linux  
+* [ ] Correctly detects available VRAM when GPU present
+* [ ] Correctly detects available system RAM
+* [ ] Returns a capability summary that can be used for model selection
+* [ ] Works gracefully when detection tools (nvidia-smi, rocm-smi) are not installed
.elelem/backlog/003-model-download.md
@@ -0,0 +1,44 @@
+As a `new user`, I `want elelem to automatically download the recommended model`, so that `I can start using it immediately without manual setup`.
+
+# SYNOPSIS
+
+Download LLM models from Hugging Face with progress indication.
+
+# DESCRIPTION
+
+When the local provider is used and the required model is not present locally:
+
+1. **Model selection**:
+   - Use hardware detection (Story 002) to pick an appropriate default model
+   - Support a curated list of known-good coding models (e.g., CodeLlama, DeepSeek Coder, Qwen Coder)
+
+2. **Download process**:
+   - Download from Hugging Face Hub (GGUF format preferred for llama.cpp)
+   - Show download progress (stream CLI output or use Terminal#waiting)
+   - Store in `~/.cache/elelem/models/` or similar standard location
+
+3. **Model management**:
+   - Check if model already exists before downloading
+   - Handle interrupted downloads gracefully (resume or restart)
+
+The approach (HF CLI vs direct download) will be determined by Story 001 spike.
+
+# SEE ALSO
+
+* [ ] Story 001 (determines download approach)
+* [ ] Story 002 (provides hardware info for model selection)
+* [ ] lib/elelem/terminal.rb (progress indication)
+* [ ] ~/.cache/elelem/models/ (storage location)
+
+# Tasks
+
+* [ ] TBD (filled in design mode)
+
+# Acceptance Criteria
+
+* [ ] Model downloads successfully from Hugging Face
+* [ ] User sees progress indication during download
+* [ ] Downloaded model is stored in consistent location
+* [ ] Subsequent runs do not re-download existing model
+* [ ] Graceful error handling if download fails (network error, disk full, etc.)
+* [ ] At least one good default coding model is identified and tested
.elelem/backlog/004-local-inference-provider.md
@@ -0,0 +1,53 @@
+As a `user`, I `want to run LLM inference locally without external servers`, so that `I can use elelem without API keys, Ollama, or network connectivity`.
+
+# SYNOPSIS
+
+Implement a local inference provider that loads and runs models directly in-process.
+
+# DESCRIPTION
+
+Create a new provider in `lib/elelem/net/` that:
+
+1. **Loads models locally**:
+   - Use the approach determined by Story 001 (llama.cpp bindings or CLI)
+   - Load GGUF model files from `~/.cache/elelem/models/`
+   - Support GPU acceleration (CUDA, ROCm) when available
+   - Fall back to CPU inference when no GPU present
+
+2. **Implements the provider interface**:
+   - Match the interface of existing providers (ollama.rb, openai.rb, claude.rb)
+   - Support streaming responses
+   - Handle the conversation history format
+
+3. **Performance considerations**:
+   - Model loading may take a few seconds - show appropriate feedback
+   - Keep model loaded in memory for subsequent prompts (don't reload per-request)
+   - Handle memory limits gracefully
+
+4. **Configuration**:
+   - Configurable via `.elelem.yml` similar to other providers
+   - Support specifying custom model path
+   - Support model selection override
+
+# SEE ALSO
+
+* [ ] Story 001 (determines implementation approach)
+* [ ] Story 003 (provides downloaded models)
+* [ ] lib/elelem/net/ollama.rb (provider interface reference)
+* [ ] lib/elelem/net/openai.rb (provider interface reference)
+* [ ] lib/elelem/net/claude.rb (provider interface reference)
+
+# Tasks
+
+* [ ] TBD (filled in design mode)
+
+# Acceptance Criteria
+
+* [ ] Provider loads model from local disk
+* [ ] Provider generates streaming responses
+* [ ] Provider works with GPU acceleration on CUDA
+* [ ] Provider works with GPU acceleration on ROCm
+* [ ] Provider falls back to CPU when no GPU available
+* [ ] Provider integrates with existing elelem conversation flow
+* [ ] Tool calling works with local models (if model supports it)
+* [ ] Works fully offline once model is downloaded
.elelem/backlog/005-default-provider-selection.md
@@ -0,0 +1,49 @@
+As a `new user`, I `want elelem to use local inference by default`, so that `I can start using it immediately without any configuration`.
+
+# SYNOPSIS
+
+Make the local provider the default when no configuration exists.
+
+# DESCRIPTION
+
+Update elelem's provider selection logic so that:
+
+1. **First-run experience**:
+   - When no `.elelem.yml` exists and no environment variables are set
+   - Automatically select the local provider
+   - Trigger model download if needed (Story 003)
+   - Start the normal prompt interface - no wizard or extra questions
+
+2. **Provider priority** (when no explicit config):
+   1. Local provider (new default)
+   2. Ollama (if running and accessible)
+   3. OpenAI (if OPENAI_API_KEY set)
+   4. Claude (if ANTHROPIC_API_KEY set)
+
+3. **Explicit configuration**:
+   - Users can still configure any provider in `.elelem.yml`
+   - Explicit config always takes precedence
+   - Document how to switch providers
+
+4. **Seamless transition**:
+   - Existing users with configuration are not affected
+   - Only new users (no config) get the new default behavior
+
+# SEE ALSO
+
+* [ ] Story 004 (local provider implementation)
+* [ ] lib/elelem/agent.rb (provider selection logic)
+* [ ] Configuration loading code
+
+# Tasks
+
+* [ ] TBD (filled in design mode)
+
+# Acceptance Criteria
+
+* [ ] New user with no config starts elelem and can chat immediately
+* [ ] Local provider is used by default (not Ollama or cloud providers)
+* [ ] Model downloads automatically on first run if not present
+* [ ] Existing users with `.elelem.yml` are not affected
+* [ ] Users with API keys in environment can still use cloud providers
+* [ ] Clear documentation on how to configure different providers