Commit 0c52a1f
Changed files (1)
cmd
maildir
cmd/maildir/DESIGN.md
@@ -0,0 +1,376 @@
+# Maildir MCP Server Design
+
+## Overview
+
+The Maildir MCP Server provides secure access to email archives stored in maildir format. It enables AI assistants to search, analyze, and extract insights from email data while maintaining strict privacy controls and security boundaries.
+
+## Problem Statement
+
+Email archives contain valuable personal and professional context that could enhance AI assistant capabilities:
+
+- **Communication Patterns**: Understanding relationships and interaction frequency
+- **Context Retrieval**: Finding relevant past conversations for current tasks
+- **Contact Management**: Extracting and organizing contact information
+- **Content Analysis**: Analyzing communication styles, topics, and sentiment
+- **Timeline Reconstruction**: Understanding project history through email threads
+
+However, email data is highly sensitive and requires:
+- Strong privacy controls and access restrictions
+- Efficient parsing of various email formats (MIME, HTML, plain text)
+- Respect for email threading and conversation structure
+- Metadata preservation while content filtering
+
+## Architecture
+
+### Maildir Format Support
+
+The server will support the standard maildir format:
+
+```
+/path/to/maildir/
+├── INBOX/
+│ ├── cur/ # Read messages
+│ ├── new/ # Unread messages
+│ └── tmp/ # Temporary files
+├── Sent/
+├── Drafts/
+├── Trash/
+└── [Custom Folders]/
+```
+
+**Filename Format**: `{timestamp}.{process_id}_{delivery_id}.{hostname},{unique_id}:2,{flags}`
+- Flags: `S` (Seen), `R` (Replied), `F` (Flagged), `T` (Trashed), `D` (Draft)
+
+### Core Components
+
+#### 1. Maildir Scanner
+- Recursively scan maildir directory structure
+- Index folder hierarchy and message counts
+- Track maildir state and changes
+- Support for both Maildir and Maildir++ formats
+
+#### 2. Email Parser
+- Parse RFC 2822 email messages
+- Extract headers (From, To, Subject, Date, Message-ID, etc.)
+- Handle MIME multipart messages
+- Extract plain text and HTML content
+- Preserve thread relationships (In-Reply-To, References)
+- Support various character encodings
+
+#### 3. Content Processor
+- Convert HTML to markdown for AI consumption
+- Extract and clean plain text content
+- Parse email signatures and quotes
+- Identify forwarded messages and replies
+- Extract attachments metadata (without content for security)
+
+#### 4. Search Engine
+- Full-text search across message content
+- Metadata filtering (date ranges, senders, folders)
+- Thread-aware search results
+- Fuzzy matching for contact names and subjects
+- Boolean search operators
+
+#### 5. Privacy Filter
+- Configurable PII detection and masking
+- Exclude sensitive folders (e.g., banking, legal)
+- Content sanitization options
+- Whitelist/blacklist for contact domains
+
+## MCP Tools
+
+### 1. `maildir_scan_folders`
+**Description**: Scan and list available maildir folders with message counts.
+
+**Input Schema**:
+```json
+{
+ "type": "object",
+ "properties": {
+ "maildir_path": {
+ "type": "string",
+ "description": "Path to the maildir root directory"
+ },
+ "include_counts": {
+ "type": "boolean",
+ "default": true,
+ "description": "Include message counts for each folder"
+ }
+ },
+ "required": ["maildir_path"]
+}
+```
+
+**Output**: List of folders with metadata (path, message count, unread count)
+
+### 2. `maildir_list_messages`
+**Description**: List messages in a folder with pagination and filtering.
+
+**Input Schema**:
+```json
+{
+ "type": "object",
+ "properties": {
+ "maildir_path": {"type": "string"},
+ "folder": {"type": "string", "default": "INBOX"},
+ "limit": {"type": "integer", "default": 50, "maximum": 200},
+ "offset": {"type": "integer", "default": 0},
+ "date_from": {"type": "string", "format": "date"},
+ "date_to": {"type": "string", "format": "date"},
+ "sender": {"type": "string"},
+ "subject_contains": {"type": "string"},
+ "unread_only": {"type": "boolean", "default": false}
+ },
+ "required": ["maildir_path"]
+}
+```
+
+**Output**: Paginated list of message headers and metadata
+
+### 3. `maildir_read_message`
+**Description**: Read full message content with optional content filtering.
+
+**Input Schema**:
+```json
+{
+ "type": "object",
+ "properties": {
+ "maildir_path": {"type": "string"},
+ "message_id": {"type": "string"},
+ "include_html": {"type": "boolean", "default": false},
+ "include_headers": {"type": "boolean", "default": true},
+ "sanitize_content": {"type": "boolean", "default": true}
+ },
+ "required": ["maildir_path", "message_id"]
+}
+```
+
+**Output**: Full message with headers, content, and metadata
+
+### 4. `maildir_search_messages`
+**Description**: Full-text search across email content with advanced filtering.
+
+**Input Schema**:
+```json
+{
+ "type": "object",
+ "properties": {
+ "maildir_path": {"type": "string"},
+ "query": {"type": "string"},
+ "folders": {"type": "array", "items": {"type": "string"}},
+ "date_from": {"type": "string", "format": "date"},
+ "date_to": {"type": "string", "format": "date"},
+ "senders": {"type": "array", "items": {"type": "string"}},
+ "limit": {"type": "integer", "default": 50, "maximum": 200},
+ "sort_by": {"type": "string", "enum": ["date", "relevance"], "default": "relevance"}
+ },
+ "required": ["maildir_path", "query"]
+}
+```
+
+**Output**: Ranked search results with snippets and relevance scores
+
+### 5. `maildir_get_thread`
+**Description**: Retrieve complete email thread/conversation.
+
+**Input Schema**:
+```json
+{
+ "type": "object",
+ "properties": {
+ "maildir_path": {"type": "string"},
+ "message_id": {"type": "string"},
+ "max_depth": {"type": "integer", "default": 50}
+ },
+ "required": ["maildir_path", "message_id"]
+}
+```
+
+**Output**: Thread structure with all related messages in chronological order
+
+### 6. `maildir_analyze_contacts`
+**Description**: Extract and analyze contact information and communication patterns.
+
+**Input Schema**:
+```json
+{
+ "type": "object",
+ "properties": {
+ "maildir_path": {"type": "string"},
+ "date_from": {"type": "string", "format": "date"},
+ "date_to": {"type": "string", "format": "date"},
+ "min_messages": {"type": "integer", "default": 2},
+ "include_frequency": {"type": "boolean", "default": true}
+ },
+ "required": ["maildir_path"]
+}
+```
+
+**Output**: Contact list with email frequency, last contact date, and relationship strength
+
+### 7. `maildir_get_statistics`
+**Description**: Generate email usage statistics and insights.
+
+**Input Schema**:
+```json
+{
+ "type": "object",
+ "properties": {
+ "maildir_path": {"type": "string"},
+ "period": {"type": "string", "enum": ["week", "month", "year"], "default": "month"},
+ "include_charts": {"type": "boolean", "default": false}
+ },
+ "required": ["maildir_path"]
+}
+```
+
+**Output**: Statistics on email volume, top contacts, response times, etc.
+
+## Security & Privacy
+
+### Access Control
+- Restrict access to explicitly authorized maildir paths
+- Validate all path operations to prevent directory traversal
+- Support for read-only access mode
+- Configurable folder exclusions
+
+### Content Filtering
+- Optional PII detection and masking (phone numbers, SSNs, etc.)
+- Email address anonymization options
+- Subject line sanitization
+- Attachment content exclusion (metadata only)
+
+### Configuration
+- User-defined sensitivity levels
+- Whitelist/blacklist for contact domains
+- Excluded folder patterns
+- Content filtering rules
+
+### Example Security Config
+```json
+{
+ "allowed_paths": ["/home/user/.local/share/mail"],
+ "excluded_folders": ["Banking", "Legal", "Medical"],
+ "pii_masking": true,
+ "contact_anonymization": false,
+ "max_content_length": 10000,
+ "excluded_extensions": [".exe", ".zip", ".pdf"]
+}
+```
+
+## Implementation Details
+
+### File System Operations
+- Efficient directory traversal with caching
+- Watch for maildir changes (new messages)
+- Handle corrupted or malformed email files gracefully
+- Support for compressed maildir archives
+
+### Email Parsing
+- Use Go's built-in `net/mail` package for basic parsing
+- Additional MIME parsing for multipart messages
+- Handle various character encodings (UTF-8, Latin-1, etc.)
+- Extract metadata while preserving original structure
+
+### Search Implementation
+- In-memory inverted index for fast text search
+- Bloom filters for efficient negative lookups
+- Fuzzy string matching for contact names
+- Regular expression support for advanced queries
+
+### Threading Algorithm
+- Parse References and In-Reply-To headers
+- Subject line normalization (Re:, Fwd: removal)
+- Handle broken threading gracefully
+- Support for multiple threading strategies
+
+## Performance Considerations
+
+### Caching Strategy
+- Cache folder structure and message counts
+- Index commonly accessed messages
+- Lazy loading of message content
+- TTL-based cache invalidation
+
+### Memory Management
+- Stream large messages to avoid memory issues
+- Pagination for large result sets
+- Configurable limits on search result size
+- Efficient string operations for content processing
+
+### Scalability
+- Support for maildir archives with millions of messages
+- Incremental indexing for new messages
+- Background processing for expensive operations
+- Rate limiting for resource-intensive queries
+
+## Error Handling
+
+### Graceful Degradation
+- Continue processing despite corrupted messages
+- Handle permission errors gracefully
+- Provide meaningful error messages for invalid queries
+- Fallback options for unsupported email formats
+
+### Logging & Monitoring
+- Structured logging for all operations
+- Performance metrics collection
+- Error rate tracking
+- Privacy-safe audit logging
+
+## Testing Strategy
+
+### Unit Tests
+- Email parsing with various MIME types
+- Maildir scanning with different folder structures
+- Search functionality with edge cases
+- Security validation for path traversal attempts
+
+### Integration Tests
+- Real maildir processing with sample data
+- Performance testing with large archives
+- Security testing with malicious inputs
+- Cross-platform compatibility testing
+
+### Test Data
+- Synthetic email corpus for testing
+- Various maildir layouts and formats
+- Corrupted email samples
+- Edge cases (empty folders, special characters)
+
+## Future Enhancements
+
+### Advanced Features
+- Email sentiment analysis
+- Automatic categorization and tagging
+- Smart contact grouping
+- Email scheduling analysis
+- Conversation summarization
+
+### Integration Options
+- Export to various formats (JSON, CSV, mbox)
+- Integration with external search engines
+- Contact synchronization with address books
+- Calendar event extraction from emails
+
+### Machine Learning
+- Spam/ham classification
+- Important message detection
+- Automatic reply suggestions
+- Writing style analysis
+
+## Compliance & Legal
+
+### Data Protection
+- GDPR compliance for EU users
+- Data retention policies
+- Right to be forgotten implementation
+- Consent management for contact analysis
+
+### Export/Import
+- Standard mailbox format support (mbox, EML)
+- Backup and restore functionality
+- Cross-platform migration tools
+- Format conversion utilities
+
+This design provides a comprehensive, secure, and privacy-conscious approach to email analysis while maintaining the flexibility needed for AI assistant integration.