Commit `47fca2d`

mo khan <mo@mokhan.ca>

2026-01-29 18:59:28

chore: consolidate stories and add one for ADR process

main

1 parent f69417a

Changed files (6)

.elelem

backlog

archive

002-hardware-detection.md

003-model-download.md

004-local-inference-provider.md

005-default-provider-selection.md

010-adr-support-in-design-mode.md

011-local-inference-implementation.md

.elelem/backlog/002-hardware-detection.md → .elelem/backlog/archive/002-hardware-detection.md

File renamed without changes

.elelem/backlog/003-model-download.md → .elelem/backlog/archive/003-model-download.md

File renamed without changes

.elelem/backlog/004-local-inference-provider.md → .elelem/backlog/archive/004-local-inference-provider.md

File renamed without changes

.elelem/backlog/005-default-provider-selection.md → .elelem/backlog/archive/005-default-provider-selection.md

File renamed without changes

.elelem/backlog/010-adr-support-in-design-mode.md

@@ -0,0 +1,73 @@
+As a `developer`, I `want design mode to support creating Architecture Decision Records`, so that `architectural decisions are documented consistently and repeatably`.
+
+# SYNOPSIS
+
+Add ADR creation capability to design mode with a standard template.
+
+# DESCRIPTION
+
+When architectural decisions emerge during design sessions, the agent should be able to create ADRs using a consistent template. ADRs are stored in `doc/adr/` and follow a numbered naming convention.
+
+The design prompt should be updated to:
+1. Explain when to create ADRs (significant architectural decisions)
+2. Provide the ADR template
+3. Allow writing to `doc/adr/` directory
+
+# SEE ALSO
+
+* [ ] lib/elelem/prompts/design.erb - Design mode prompt
+* [ ] doc/adr/ - ADR storage location (to be created)
+
+# Tasks
+
+* [ ] TBD (filled in design mode)
+
+# Acceptance Criteria
+
+* [ ] Design mode prompt includes ADR template
+* [ ] Design mode can write files to `doc/adr/`
+* [ ] ADRs follow naming convention: `ADR-NNNN-short-name.md`
+* [ ] Template includes: Date, Status, Context, Decision, Consequences
+* [ ] Agent understands when to propose creating an ADR
+
+# ADR Template
+
+```markdown
+# ADR-NNNN: Title
+
+**Date:** YYYY-MM-DD
+**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-XXXX
+
+## Context
+
+What is the issue that we're seeing that is motivating this decision or change?
+
+## Decision
+
+What is the change that we're proposing and/or doing?
+
+## Consequences
+
+**Positive:**
+- Benefit 1
+- Benefit 2
+
+**Negative:**
+- Tradeoff 1
+- Tradeoff 2
+```
+
+# Guidance for Design Mode
+
+Include in the prompt:
+
+> **When to create an ADR:**
+> - Choosing between multiple valid approaches
+> - Adopting a new technology or pattern
+> - Changing an existing architectural decision
+> - Decisions that affect multiple components
+>
+> **When NOT to create an ADR:**
+> - Implementation details within a single file
+> - Bug fixes
+> - Routine refactoring

.elelem/backlog/011-local-inference-implementation.md

@@ -0,0 +1,94 @@
+As a `new user`, I `want elelem to run locally without external servers or API keys`, so that `I can start using it immediately with zero configuration`.
+
+# SYNOPSIS
+
+Implement complete local inference: hardware detection, model download, local provider, and default selection.
+
+# DESCRIPTION
+
+This story implements the full local inference capability, consolidating the work from stories 002-005 (see ADR-0001). The spike (story 001) should be completed first to inform implementation decisions.
+
+## 1. Hardware Detection
+
+Detect GPU/CPU capabilities to determine what models can run locally:
+
+- **GPU presence and type**: NVIDIA (CUDA), AMD (ROCm), or CPU-only
+- **Available VRAM/RAM**: GPU memory and system RAM
+- **Model recommendations**: Map hardware to appropriate model sizes
+  - 8GB+ VRAM → 7B parameter model
+  - 4GB VRAM → 3B model
+  - CPU-only → small model (1-3B)
+
+## 2. Model Download
+
+Download LLM models from Hugging Face with progress indication:
+
+- Use hardware detection to pick an appropriate default model
+- Support curated list of coding models (CodeLlama, DeepSeek Coder, Qwen Coder)
+- Download GGUF format from Hugging Face Hub
+- Store in `~/.cache/elelem/models/`
+- Show progress, handle interrupted downloads
+
+## 3. Local Inference Provider
+
+Create `lib/elelem/net/local.rb` provider:
+
+- Load GGUF models using approach from spike (llama.cpp bindings or CLI)
+- Support GPU acceleration (CUDA, ROCm) with CPU fallback
+- Implement same interface as existing providers (streaming, conversation history)
+- Keep model loaded in memory between prompts
+- Configurable via `.elelem.yml`
+
+## 4. Default Provider Selection
+
+Make local provider the default for new users:
+
+- When no config exists and no API keys set, use local provider
+- Trigger model download if needed
+- Provider priority (when no explicit config):
+  1. Local provider (new default)
+  2. Ollama (if running)
+  3. Cloud providers (if API keys set)
+- Existing users with config are not affected
+
+# SEE ALSO
+
+* [ ] .elelem/backlog/001-local-inference-spike.md - Complete spike first
+* [ ] doc/adr/ADR-0001-consolidate-local-inference-stories.md - Decision record
+* [ ] lib/elelem/net/ollama.rb - Provider interface reference
+* [ ] lib/elelem/net/openai.rb - Provider interface reference
+* [ ] lib/elelem/system_prompt.rb - Platform detection patterns
+
+# Tasks
+
+* [ ] TBD (filled in design mode, after spike completes)
+
+# Acceptance Criteria
+
+## Hardware Detection
+* [ ] Correctly detects NVIDIA GPU presence on Linux
+* [ ] Correctly detects AMD GPU presence on Linux
+* [ ] Correctly detects available VRAM when GPU present
+* [ ] Correctly detects available system RAM
+* [ ] Works gracefully when detection tools are not installed
+
+## Model Download
+* [ ] Model downloads successfully from Hugging Face
+* [ ] User sees progress indication during download
+* [ ] Downloaded model is stored in consistent location
+* [ ] Subsequent runs do not re-download existing model
+* [ ] Graceful error handling if download fails
+
+## Local Provider
+* [ ] Provider loads model from local disk
+* [ ] Provider generates streaming responses
+* [ ] Provider works with GPU acceleration (CUDA and ROCm)
+* [ ] Provider falls back to CPU when no GPU available
+* [ ] Provider integrates with existing conversation flow
+* [ ] Works fully offline once model is downloaded
+
+## Default Selection
+* [ ] New user with no config starts elelem and can chat immediately
+* [ ] Local provider is used by default
+* [ ] Model downloads automatically on first run if not present
+* [ ] Existing users with `.elelem.yml` are not affected

Commit 47fca2d

Commit `47fca2d`