Commit a015ff2
Changed files (5)
cmd/speech/main.go
@@ -28,8 +28,9 @@ func main() {
func showHelpText() {
fmt.Printf(`Speech MCP Server
-A Model Context Protocol server that provides text-to-speech capabilities using
-the macOS 'say' command. Enables LLMs to speak their responses with customizable
+A cross-platform Model Context Protocol server that provides text-to-speech
+capabilities. Uses the native TTS system on each platform: macOS 'say' command
+or Linux espeak-ng/espeak. Enables LLMs to speak their responses with customizable
voices, rates, and output options.
USAGE:
@@ -38,6 +39,10 @@ USAGE:
OPTIONS:
--help Show this help message
+SUPPORTED PLATFORMS:
+ โข macOS - Uses built-in 'say' command (already available)
+ โข Linux - Uses espeak-ng or espeak (install required)
+
TOOLS PROVIDED:
Speech Synthesis:
@@ -45,7 +50,7 @@ Speech Synthesis:
list_voices List all available system voices with filtering
speak_file Read and speak the contents of a text file
stop_speech Stop any currently playing speech synthesis
- speech_settings Get detailed information about speech options
+ speech_settings Get detailed information about speech options and backend
EXAMPLES:
@@ -53,9 +58,12 @@ Basic Speech:
# Simple text-to-speech
{"name": "say", "arguments": {"text": "Hello, this is a test"}}
- # Custom voice and speed
+ # Custom voice and speed (macOS)
{"name": "say", "arguments": {"text": "Hello world", "voice": "Samantha", "rate": 150}}
+ # Custom voice and speed (Linux)
+ {"name": "say", "arguments": {"text": "Hello world", "voice": "en-gb", "rate": 150}}
+
# Adjust volume
{"name": "say", "arguments": {"text": "Quiet speech", "volume": 0.3}}
@@ -77,18 +85,22 @@ File Operations:
{"name": "speak_file", "arguments": {"file_path": "README.md", "max_lines": 10}}
Audio Output:
- # Save speech to file
+ # Save speech to file (macOS: .aiff, .wav, .m4a)
+ {"name": "say", "arguments": {"text": "Recording test", "output": "~/speech.wav"}}
+
+ # Save speech to file (Linux: .wav only)
{"name": "say", "arguments": {"text": "Recording test", "output": "~/speech.wav"}}
Control:
# Stop any playing speech
{"name": "stop_speech", "arguments": {}}
- # Get help with settings
+ # Get help with settings and backend info
{"name": "speech_settings", "arguments": {}}
VOICE OPTIONS:
-Popular built-in voices include:
+
+macOS (built-in voices):
โข Alex (default male voice)
โข Samantha (clear female voice)
โข Victoria (British female voice)
@@ -96,17 +108,38 @@ Popular built-in voices include:
โข Fiona (Scottish female voice)
โข Moira (Irish female voice)
+Linux (espeak-ng voices):
+ โข en-gb (British English)
+ โข en-us (American English)
+ โข en-gb-scotland (Scottish English)
+ โข Various other languages and accents
+
PARAMETERS:
text - Text to speak (required for 'say')
voice - Voice name (use list_voices to see options)
rate - Speech rate in words per minute (80-500, default ~200)
volume - Volume level from 0.0 to 1.0 (default: system volume)
- output - Save audio to file (.aiff, .wav, .m4a formats)
+ output - Save audio to file (formats vary by platform)
file_path - Path to text file to speak
max_lines - Limit number of lines to speak from file
language - Filter voices by language code (e.g., "en", "es")
detailed - Show detailed voice information
+INSTALLATION:
+
+Linux Requirements:
+ # Ubuntu/Debian
+ sudo apt install espeak-ng
+
+ # Fedora/RHEL
+ sudo dnf install espeak-ng
+
+ # Arch Linux
+ sudo pacman -S espeak-ng
+
+macOS Requirements:
+ # Built-in 'say' command - no installation needed
+
INTEGRATION:
Add to your Claude Code configuration (~/.claude.json):
@@ -118,16 +151,18 @@ Add to your Claude Code configuration (~/.claude.json):
}
}
-USAGE WITH GOOSE:
-Once integrated, you can ask Goose to speak responses:
+USAGE WITH CLAUDE CODE:
+Once integrated, you can ask Claude to speak responses:
"Say your response out loud using the speech tool"
- "Read this file aloud using a female voice"
+ "Read this file aloud using a British voice"
"List all available voices on my system"
"Stop any speech that's currently playing"
-REQUIREMENTS:
-- macOS (uses the built-in 'say' command)
-- Appropriate system permissions for audio output
+BACKEND DETECTION:
+The server automatically detects the appropriate TTS backend:
+ โข macOS: Uses 'say' command
+ โข Linux: Uses 'espeak-ng' (preferred) or 'espeak' (fallback)
+ โข Other: Shows helpful installation instructions
For support or issues, see: https://github.com/xlgmokha/mcp
`)
pkg/speech/backends.go
@@ -0,0 +1,424 @@
+package speech
+
+import (
+ "fmt"
+ "os"
+ "os/exec"
+ "path/filepath"
+ "strconv"
+ "strings"
+)
+
+// MacOSBackend implements TTS using macOS 'say' command
+type MacOSBackend struct{}
+
+func (m *MacOSBackend) Speak(text string, voice string, rate *int, volume *float64, output string) (string, error) {
+ cmdArgs := []string{}
+
+ if voice != "" {
+ cmdArgs = append(cmdArgs, "-v", voice)
+ }
+
+ if rate != nil {
+ if *rate < 80 || *rate > 500 {
+ return "", fmt.Errorf("rate must be between 80-500 words per minute")
+ }
+ cmdArgs = append(cmdArgs, "-r", strconv.Itoa(*rate))
+ }
+
+ if volume != nil {
+ if *volume < 0.0 || *volume > 1.0 {
+ return "", fmt.Errorf("volume must be between 0.0 and 1.0")
+ }
+ // Convert to 0-100 scale for say command
+ volumeInt := int(*volume * 100)
+ cmdArgs = append(cmdArgs, "--volume", strconv.Itoa(volumeInt))
+ }
+
+ if output != "" {
+ // Validate output file extension
+ ext := strings.ToLower(filepath.Ext(output))
+ if ext != ".aiff" && ext != ".wav" && ext != ".m4a" {
+ return "", fmt.Errorf("output format must be .aiff, .wav, or .m4a")
+ }
+ cmdArgs = append(cmdArgs, "-o", output)
+ }
+
+ // Add the text to speak
+ cmdArgs = append(cmdArgs, text)
+
+ cmd := exec.Command("say", cmdArgs...)
+ output_bytes, err := cmd.CombinedOutput()
+
+ var result string
+ if output != "" {
+ result = fmt.Sprintf("Audio saved to: %s", output)
+ } else {
+ result = fmt.Sprintf("Spoke: \"%s\"", text)
+ }
+
+ if len(output_bytes) > 0 {
+ result += fmt.Sprintf("\nOutput: %s", string(output_bytes))
+ }
+
+ if err != nil {
+ return result, err
+ }
+
+ return result, nil
+}
+
+func (m *MacOSBackend) ListVoices(language string) ([]Voice, error) {
+ cmd := exec.Command("say", "-v", "?")
+ output, err := cmd.Output()
+
+ if err != nil {
+ return nil, fmt.Errorf("failed to list voices: %v", err)
+ }
+
+ voices := []Voice{}
+ lines := strings.Split(string(output), "\n")
+
+ for _, line := range lines {
+ line = strings.TrimSpace(line)
+ if line == "" {
+ continue
+ }
+
+ // Filter by language if specified
+ if language != "" && !strings.Contains(strings.ToLower(line), language) {
+ continue
+ }
+
+ // Parse voice line (format: "Name Language # Details")
+ parts := strings.Fields(line)
+ if len(parts) >= 2 {
+ voice := Voice{
+ Name: parts[0],
+ Language: parts[1],
+ Details: line,
+ }
+ voices = append(voices, voice)
+ }
+ }
+
+ return voices, nil
+}
+
+func (m *MacOSBackend) SpeakFile(filepath string, voice string, rate *int, volume *float64, maxLines *int) (string, error) {
+ // Read the file to get stats
+ content, err := os.ReadFile(filepath)
+ if err != nil {
+ return "", fmt.Errorf("failed to read file: %v", err)
+ }
+
+ text := string(content)
+ linesCount := len(strings.Split(text, "\n"))
+ wordsCount := len(strings.Fields(text))
+
+ // Build say command
+ cmdArgs := []string{}
+
+ if voice != "" {
+ cmdArgs = append(cmdArgs, "-v", voice)
+ }
+
+ if rate != nil {
+ if *rate < 80 || *rate > 500 {
+ return "", fmt.Errorf("rate must be between 80-500 words per minute")
+ }
+ cmdArgs = append(cmdArgs, "-r", strconv.Itoa(*rate))
+ }
+
+ if volume != nil {
+ if *volume < 0.0 || *volume > 1.0 {
+ return "", fmt.Errorf("volume must be between 0.0 and 1.0")
+ }
+ volumeInt := int(*volume * 100)
+ cmdArgs = append(cmdArgs, "--volume", strconv.Itoa(volumeInt))
+ }
+
+ // If maxLines specified, speak text directly with limit
+ if maxLines != nil && *maxLines > 0 && *maxLines < linesCount {
+ lines := strings.Split(text, "\n")
+ lines = lines[:*maxLines]
+ limitedText := strings.Join(lines, "\n")
+ cmdArgs = append(cmdArgs, limitedText)
+
+ cmd := exec.Command("say", cmdArgs...)
+ _, err := cmd.CombinedOutput()
+
+ result := fmt.Sprintf("Speaking file: %s\nLines: %d (limited to %d), Words: ~%d",
+ filepath, linesCount, *maxLines, len(strings.Fields(limitedText)))
+
+ if err != nil {
+ return result, err
+ }
+ return result, nil
+ }
+
+ // Otherwise use -f flag to speak entire file
+ cmdArgs = append(cmdArgs, "-f", filepath)
+
+ cmd := exec.Command("say", cmdArgs...)
+ _, err = cmd.CombinedOutput()
+
+ result := fmt.Sprintf("Speaking file: %s\nLines: %d, Words: %d",
+ filepath, linesCount, wordsCount)
+
+ if err != nil {
+ return result, err
+ }
+
+ return result, nil
+}
+
+func (m *MacOSBackend) StopSpeech() (string, error) {
+ cmd := exec.Command("pkill", "say")
+ err := cmd.Run()
+
+ if err != nil {
+ // pkill returns error if no processes found, which is fine
+ return "Stopped all speech synthesis (no speech processes were running)", nil
+ }
+
+ return "Stopped all speech synthesis", nil
+}
+
+func (m *MacOSBackend) IsAvailable() bool {
+ _, err := exec.LookPath("say")
+ return err == nil
+}
+
+func (m *MacOSBackend) GetName() string {
+ return "macOS say"
+}
+
+// LinuxBackend implements TTS using espeak-ng or espeak
+type LinuxBackend struct {
+ command string
+}
+
+func (l *LinuxBackend) getCommand() string {
+ if l.command != "" {
+ return l.command
+ }
+
+ // Try espeak-ng first (newer, better quality)
+ if _, err := exec.LookPath("espeak-ng"); err == nil {
+ l.command = "espeak-ng"
+ return l.command
+ }
+
+ // Fall back to espeak
+ if _, err := exec.LookPath("espeak"); err == nil {
+ l.command = "espeak"
+ return l.command
+ }
+
+ return ""
+}
+
+func (l *LinuxBackend) Speak(text string, voice string, rate *int, volume *float64, output string) (string, error) {
+ cmd := l.getCommand()
+ if cmd == "" {
+ return "", fmt.Errorf("no TTS command available (install espeak-ng or espeak)")
+ }
+
+ cmdArgs := []string{}
+
+ // Add voice selection
+ if voice != "" {
+ cmdArgs = append(cmdArgs, "-v", voice)
+ }
+
+ // Add speech rate (words per minute)
+ if rate != nil {
+ // espeak uses words per minute directly
+ cmdArgs = append(cmdArgs, "-s", strconv.Itoa(*rate))
+ }
+
+ // Add volume (amplitude)
+ if volume != nil {
+ // espeak uses amplitude 0-200, with 100 as default
+ amplitude := int(*volume * 200)
+ cmdArgs = append(cmdArgs, "-a", strconv.Itoa(amplitude))
+ }
+
+ // Add output file if specified
+ if output != "" {
+ // espeak supports wav output
+ ext := strings.ToLower(filepath.Ext(output))
+ if ext != ".wav" {
+ return "", fmt.Errorf("output format must be .wav for Linux TTS")
+ }
+ cmdArgs = append(cmdArgs, "-w", output)
+ }
+
+ // Add the text
+ cmdArgs = append(cmdArgs, text)
+
+ command := exec.Command(cmd, cmdArgs...)
+ output_bytes, err := command.CombinedOutput()
+
+ var result string
+ if output != "" {
+ result = fmt.Sprintf("Audio saved to: %s", output)
+ } else {
+ result = fmt.Sprintf("Spoke: \"%s\"", text)
+ }
+
+ if len(output_bytes) > 0 && !strings.Contains(string(output_bytes), "ALSA lib") {
+ // Filter out common ALSA warnings
+ result += fmt.Sprintf("\nOutput: %s", string(output_bytes))
+ }
+
+ if err != nil {
+ return result, err
+ }
+
+ return result, nil
+}
+
+func (l *LinuxBackend) ListVoices(language string) ([]Voice, error) {
+ cmd := l.getCommand()
+ if cmd == "" {
+ return nil, fmt.Errorf("no TTS command available (install espeak-ng or espeak)")
+ }
+
+ command := exec.Command(cmd, "--voices")
+ output, err := command.Output()
+
+ if err != nil {
+ return nil, fmt.Errorf("failed to list voices: %v", err)
+ }
+
+ voices := []Voice{}
+ lines := strings.Split(string(output), "\n")
+
+ // Skip header line
+ if len(lines) > 0 {
+ lines = lines[1:]
+ }
+
+ for _, line := range lines {
+ line = strings.TrimSpace(line)
+ if line == "" {
+ continue
+ }
+
+ // Parse espeak voice format
+ // Format: "Pty Language Age/Gender VoiceName File Other Languages"
+ fields := strings.Fields(line)
+ if len(fields) >= 4 {
+ lang := fields[1]
+ name := fields[3]
+
+ // Filter by language if specified
+ if language != "" && !strings.Contains(strings.ToLower(lang), strings.ToLower(language)) {
+ continue
+ }
+
+ voice := Voice{
+ Name: name,
+ Language: lang,
+ Details: line,
+ }
+ voices = append(voices, voice)
+ }
+ }
+
+ return voices, nil
+}
+
+func (l *LinuxBackend) SpeakFile(filepath string, voice string, rate *int, volume *float64, maxLines *int) (string, error) {
+ // Read the file to get stats and handle maxLines
+ content, err := os.ReadFile(filepath)
+ if err != nil {
+ return "", fmt.Errorf("failed to read file: %v", err)
+ }
+
+ text := string(content)
+ linesCount := len(strings.Split(text, "\n"))
+ wordsCount := len(strings.Fields(text))
+
+ // Limit lines if specified
+ actualText := text
+ if maxLines != nil && *maxLines > 0 && *maxLines < linesCount {
+ lines := strings.Split(text, "\n")
+ lines = lines[:*maxLines]
+ actualText = strings.Join(lines, "\n")
+ }
+
+ // Use Speak method with the text
+ result, err := l.Speak(actualText, voice, rate, volume, "")
+
+ fileInfo := fmt.Sprintf("Speaking file: %s\nLines: %d", filepath, linesCount)
+ if maxLines != nil && *maxLines < linesCount {
+ fileInfo += fmt.Sprintf(" (limited to %d)", *maxLines)
+ }
+ fileInfo += fmt.Sprintf(", Words: %d", wordsCount)
+
+ if err != nil {
+ return fileInfo + "\n" + result, err
+ }
+
+ return fileInfo + "\n" + result, nil
+}
+
+func (l *LinuxBackend) StopSpeech() (string, error) {
+ cmd := l.getCommand()
+ if cmd == "" {
+ return "No TTS command available", nil
+ }
+
+ // Kill espeak/espeak-ng processes
+ exec.Command("pkill", cmd).Run()
+
+ // Also try to kill common audio players that might be used
+ exec.Command("pkill", "aplay").Run()
+ exec.Command("pkill", "paplay").Run()
+
+ return fmt.Sprintf("Stopped all %s processes", cmd), nil
+}
+
+func (l *LinuxBackend) IsAvailable() bool {
+ return l.getCommand() != ""
+}
+
+func (l *LinuxBackend) GetName() string {
+ cmd := l.getCommand()
+ if cmd != "" {
+ return cmd
+ }
+ return "Linux TTS (not available)"
+}
+
+// UnsupportedBackend for unsupported operating systems
+type UnsupportedBackend struct {
+ os string
+}
+
+func (u *UnsupportedBackend) Speak(text string, voice string, rate *int, volume *float64, output string) (string, error) {
+ return "", fmt.Errorf("speech synthesis is not supported on %s", u.os)
+}
+
+func (u *UnsupportedBackend) ListVoices(language string) ([]Voice, error) {
+ return nil, fmt.Errorf("voice listing is not supported on %s", u.os)
+}
+
+func (u *UnsupportedBackend) SpeakFile(filepath string, voice string, rate *int, volume *float64, maxLines *int) (string, error) {
+ return "", fmt.Errorf("file speaking is not supported on %s", u.os)
+}
+
+func (u *UnsupportedBackend) StopSpeech() (string, error) {
+ return "", fmt.Errorf("speech control is not supported on %s", u.os)
+}
+
+func (u *UnsupportedBackend) IsAvailable() bool {
+ return false
+}
+
+func (u *UnsupportedBackend) GetName() string {
+ return fmt.Sprintf("Unsupported (%s)", u.os)
+}
\ No newline at end of file
pkg/speech/server.go
@@ -3,29 +3,56 @@ package speech
import (
"encoding/json"
"fmt"
- "os"
- "os/exec"
- "path/filepath"
"runtime"
- "strconv"
"strings"
"sync"
"github.com/xlgmokha/mcp/pkg/mcp"
)
+// TTSBackend represents a text-to-speech backend
+type TTSBackend interface {
+ Speak(text string, voice string, rate *int, volume *float64, output string) (string, error)
+ ListVoices(language string) ([]Voice, error)
+ SpeakFile(filepath string, voice string, rate *int, volume *float64, maxLines *int) (string, error)
+ StopSpeech() (string, error)
+ IsAvailable() bool
+ GetName() string
+}
+
+// Voice represents a TTS voice
+type Voice struct {
+ Name string
+ Language string
+ Details string
+}
+
// Server represents the Speech MCP server
type Server struct {
*mcp.Server
- mu sync.RWMutex
+ mu sync.RWMutex
+ backend TTSBackend
}
// NewServer creates a new Speech MCP server
func NewServer() *Server {
baseServer := mcp.NewServer("mcp-speech", "1.0.0")
+ // Select appropriate TTS backend based on OS
+ var backend TTSBackend
+ switch runtime.GOOS {
+ case "darwin":
+ backend = &MacOSBackend{}
+ case "linux":
+ backend = &LinuxBackend{}
+ default:
+ // For unsupported OS, use a no-op backend
+ backend = &UnsupportedBackend{os: runtime.GOOS}
+ }
+
server := &Server{
- Server: baseServer,
+ Server: baseServer,
+ backend: backend,
}
// Register speech tools
@@ -60,61 +87,13 @@ func (s *Server) handleSay(req mcp.CallToolRequest) (mcp.CallToolResult, error)
return mcp.CallToolResult{}, fmt.Errorf("text is required")
}
- // Check if we're on macOS (say command is macOS specific)
- if runtime.GOOS != "darwin" {
- return mcp.CallToolResult{}, fmt.Errorf("speech synthesis is only supported on macOS")
+ // Check if TTS is available on this system
+ if !s.backend.IsAvailable() {
+ return mcp.CallToolResult{}, fmt.Errorf("speech synthesis is not available on this system (backend: %s)", s.backend.GetName())
}
- // Build say command
- cmdArgs := []string{}
-
- if args.Voice != "" {
- cmdArgs = append(cmdArgs, "-v", args.Voice)
- }
-
- if args.Rate != nil {
- if *args.Rate < 80 || *args.Rate > 500 {
- return mcp.CallToolResult{}, fmt.Errorf("rate must be between 80-500 words per minute")
- }
- cmdArgs = append(cmdArgs, "-r", strconv.Itoa(*args.Rate))
- }
-
- if args.Volume != nil {
- if *args.Volume < 0.0 || *args.Volume > 1.0 {
- return mcp.CallToolResult{}, fmt.Errorf("volume must be between 0.0 and 1.0")
- }
- // Convert to 0-100 scale for say command
- volume := int(*args.Volume * 100)
- cmdArgs = append(cmdArgs, "--volume", strconv.Itoa(volume))
- }
-
- if args.Output != "" {
- // Validate output file extension
- ext := strings.ToLower(filepath.Ext(args.Output))
- if ext != ".aiff" && ext != ".wav" && ext != ".m4a" {
- return mcp.CallToolResult{}, fmt.Errorf("output format must be .aiff, .wav, or .m4a")
- }
- cmdArgs = append(cmdArgs, "-o", args.Output)
- }
-
- // Add the text to speak
- cmdArgs = append(cmdArgs, args.Text)
-
- cmd := exec.Command("say", cmdArgs...)
- output, err := cmd.CombinedOutput()
-
- var result string
- if args.Output != "" {
- result = fmt.Sprintf("Command: say %s\nAudio saved to: %s",
- strings.Join(cmdArgs[:len(cmdArgs)-1], " "), args.Output)
- } else {
- result = fmt.Sprintf("Command: say %s\nSpoke: \"%s\"",
- strings.Join(cmdArgs[:len(cmdArgs)-1], " "), args.Text)
- }
-
- if len(output) > 0 {
- result += fmt.Sprintf("\nOutput: %s", string(output))
- }
+ // Use backend to speak
+ result, err := s.backend.Speak(args.Text, args.Voice, args.Rate, args.Volume, args.Output)
if err != nil {
result += fmt.Sprintf("\nError: %v", err)
@@ -145,48 +124,34 @@ func (s *Server) handleListVoices(req mcp.CallToolRequest) (mcp.CallToolResult,
return mcp.CallToolResult{}, fmt.Errorf("invalid arguments: %w", err)
}
- if runtime.GOOS != "darwin" {
- return mcp.CallToolResult{}, fmt.Errorf("voice listing is only supported on macOS")
+ // Check if TTS is available on this system
+ if !s.backend.IsAvailable() {
+ return mcp.CallToolResult{}, fmt.Errorf("voice listing is not available on this system (backend: %s)", s.backend.GetName())
}
- cmd := exec.Command("say", "-v", "?")
- output, err := cmd.Output()
-
+ voices, err := s.backend.ListVoices(args.Language)
if err != nil {
return mcp.CallToolResult{}, fmt.Errorf("failed to list voices: %v", err)
}
-
- voices := string(output)
+
var result strings.Builder
+ result.WriteString(fmt.Sprintf("Available voices (%s):\n\n", s.backend.GetName()))
- result.WriteString("Available voices:\n\n")
-
- lines := strings.Split(voices, "\n")
- for _, line := range lines {
- line = strings.TrimSpace(line)
- if line == "" {
- continue
- }
-
- // Filter by language if specified
- if args.Language != "" {
- if !strings.Contains(strings.ToLower(line), args.Language) {
- continue
- }
- }
-
+ for _, voice := range voices {
if args.Detailed {
- result.WriteString(line)
+ result.WriteString(voice.Details)
result.WriteString("\n")
} else {
- // Extract just the voice name (first word)
- parts := strings.Fields(line)
- if len(parts) > 0 {
- result.WriteString("โข ")
- result.WriteString(parts[0])
- result.WriteString("\n")
- }
+ result.WriteString(fmt.Sprintf("โข %s (%s)\n", voice.Name, voice.Language))
+ }
+ }
+
+ if len(voices) == 0 {
+ result.WriteString("No voices found")
+ if args.Language != "" {
+ result.WriteString(fmt.Sprintf(" for language '%s'", args.Language))
}
+ result.WriteString("\n")
}
return mcp.CallToolResult{
@@ -221,67 +186,13 @@ func (s *Server) handleSpeakFile(req mcp.CallToolRequest) (mcp.CallToolResult, e
return mcp.CallToolResult{}, fmt.Errorf("file_path is required")
}
- if runtime.GOOS != "darwin" {
- return mcp.CallToolResult{}, fmt.Errorf("speech synthesis is only supported on macOS")
- }
-
- // Read the file
- content, err := os.ReadFile(args.FilePath)
- if err != nil {
- return mcp.CallToolResult{}, fmt.Errorf("failed to read file: %v", err)
- }
-
- text := string(content)
-
- // Limit lines if specified
- if args.MaxLines != nil && *args.MaxLines > 0 {
- lines := strings.Split(text, "\n")
- if len(lines) > *args.MaxLines {
- lines = lines[:*args.MaxLines]
- text = strings.Join(lines, "\n")
- }
+ // Check if TTS is available on this system
+ if !s.backend.IsAvailable() {
+ return mcp.CallToolResult{}, fmt.Errorf("speech synthesis is not available on this system (backend: %s)", s.backend.GetName())
}
- // Build say command
- cmdArgs := []string{}
-
- if args.Voice != "" {
- cmdArgs = append(cmdArgs, "-v", args.Voice)
- }
-
- if args.Rate != nil {
- if *args.Rate < 80 || *args.Rate > 500 {
- return mcp.CallToolResult{}, fmt.Errorf("rate must be between 80-500 words per minute")
- }
- cmdArgs = append(cmdArgs, "-r", strconv.Itoa(*args.Rate))
- }
-
- if args.Volume != nil {
- if *args.Volume < 0.0 || *args.Volume > 1.0 {
- return mcp.CallToolResult{}, fmt.Errorf("volume must be between 0.0 and 1.0")
- }
- volume := int(*args.Volume * 100)
- cmdArgs = append(cmdArgs, "--volume", strconv.Itoa(volume))
- }
-
- cmdArgs = append(cmdArgs, "-f", args.FilePath)
-
- cmd := exec.Command("say", cmdArgs...)
- output, err := cmd.CombinedOutput()
-
- linesCount := len(strings.Split(text, "\n"))
- wordsCount := len(strings.Fields(text))
-
- result := fmt.Sprintf("Command: say %s\nSpeaking file: %s\nLines: %d, Words: %d",
- strings.Join(cmdArgs, " "), args.FilePath, linesCount, wordsCount)
-
- if args.MaxLines != nil {
- result += fmt.Sprintf(" (limited to %d lines)", *args.MaxLines)
- }
-
- if len(output) > 0 {
- result += fmt.Sprintf("\nOutput: %s", string(output))
- }
+ // Use backend to speak file
+ result, err := s.backend.SpeakFile(args.FilePath, args.Voice, args.Rate, args.Volume, args.MaxLines)
if err != nil {
result += fmt.Sprintf("\nError: %v", err)
@@ -302,18 +213,16 @@ func (s *Server) handleStopSpeech(req mcp.CallToolRequest) (mcp.CallToolResult,
s.mu.RLock()
defer s.mu.RUnlock()
- if runtime.GOOS != "darwin" {
- return mcp.CallToolResult{}, fmt.Errorf("speech control is only supported on macOS")
+ // Check if TTS is available on this system
+ if !s.backend.IsAvailable() {
+ return mcp.CallToolResult{}, fmt.Errorf("speech control is not available on this system (backend: %s)", s.backend.GetName())
}
- // Kill any running say processes
- cmd := exec.Command("pkill", "say")
- err := cmd.Run()
+ // Use backend to stop speech
+ result, err := s.backend.StopSpeech()
- result := "Stopped all speech synthesis"
if err != nil {
- // pkill returns error if no processes found, which is fine
- result += " (no speech processes were running)"
+ result += fmt.Sprintf("\nError: %v", err)
}
return mcp.CallToolResult{
@@ -331,16 +240,48 @@ func (s *Server) handleSpeechSettings(req mcp.CallToolRequest) (mcp.CallToolResu
s.mu.RLock()
defer s.mu.RUnlock()
- if runtime.GOOS != "darwin" {
- return mcp.CallToolResult{}, fmt.Errorf("speech settings are only supported on macOS")
- }
+ backendName := s.backend.GetName()
+ isAvailable := s.backend.IsAvailable()
+
+ var result string
+
+ if !isAvailable {
+ result = fmt.Sprintf(`Speech Synthesis Settings and Usage:
+
+BACKEND: %s (NOT AVAILABLE)
- result := `Speech Synthesis Settings and Usage:
+To enable speech synthesis on this system, please install:
+โข Linux: espeak-ng or espeak
+ - Ubuntu/Debian: sudo apt install espeak-ng
+ - Fedora/RHEL: sudo dnf install espeak-ng
+ - Arch: sudo pacman -S espeak-ng
+โข macOS: Built-in 'say' command (already available)
+
+Once installed, restart the MCP speech server to detect the TTS backend.`, backendName)
+ } else {
+ var outputFormats string
+ var voiceExamples string
+
+ switch runtime.GOOS {
+ case "darwin":
+ outputFormats = "โข Save to file: .aiff, .wav, .m4a"
+ voiceExamples = "โข Popular voices: Alex, Samantha, Victoria, Fred, Fiona"
+ case "linux":
+ outputFormats = "โข Save to file: .wav"
+ voiceExamples = "โข Popular voices: en+f1, en+m1, en+f2, en+m2"
+ default:
+ outputFormats = "โข Output formats depend on system"
+ voiceExamples = "โข Use 'list_voices' tool to see available voices"
+ }
+
+ result = fmt.Sprintf(`Speech Synthesis Settings and Usage:
+
+BACKEND: %s โ
VOICES:
โข Use 'list_voices' tool to see all available voices
-โข Popular voices: Alex, Samantha, Victoria, Fred, Fiona
-โข Specify with: {"voice": "Alex"}
+%s
+โข Specify with: {"voice": "voice_name"}
RATE (Speed):
โข Range: 80-500 words per minute
@@ -353,7 +294,7 @@ VOLUME:
โข Specify with: {"volume": 0.8}
OUTPUT FORMATS:
-โข Save to file: .aiff, .wav, .m4a
+%s
โข Specify with: {"output": "/path/to/file.wav"}
EXAMPLES:
@@ -361,7 +302,7 @@ EXAMPLES:
{"text": "Hello, this is a test"}
2. Custom voice and speed:
- {"text": "Hello world", "voice": "Samantha", "rate": 120}
+ {"text": "Hello world", "voice": "en+f1", "rate": 120}
3. Save to file:
{"text": "Recording test", "output": "~/speech.wav"}
@@ -371,7 +312,8 @@ EXAMPLES:
CONTROLS:
โข Use 'stop_speech' to interrupt any playing speech
-โข Multiple speech commands will queue automatically`
+โข Multiple speech commands will queue automatically`, backendName, voiceExamples, outputFormats)
+ }
return mcp.CallToolResult{
Content: []mcp.Content{
pkg/speech/server_test.go
@@ -18,6 +18,27 @@ func TestNewServer(t *testing.T) {
if server.Server == nil {
t.Fatal("Base server is nil")
}
+
+ if server.backend == nil {
+ t.Fatal("Backend is nil")
+ }
+
+ // Test that backend is appropriate for the OS
+ backendName := server.backend.GetName()
+ switch runtime.GOOS {
+ case "darwin":
+ if !strings.Contains(backendName, "say") {
+ t.Errorf("Expected macOS say backend, got %s", backendName)
+ }
+ case "linux":
+ if !strings.Contains(backendName, "espeak") && !strings.Contains(backendName, "not available") {
+ t.Errorf("Expected Linux espeak backend or unavailable message, got %s", backendName)
+ }
+ default:
+ if !strings.Contains(backendName, "Unsupported") {
+ t.Errorf("Expected unsupported backend for %s, got %s", runtime.GOOS, backendName)
+ }
+ }
}
func TestHandleSayValidation(t *testing.T) {
@@ -85,25 +106,25 @@ func TestHandleSayValidation(t *testing.T) {
args: map[string]interface{}{
"text": "Hello world",
},
- expectError: runtime.GOOS != "darwin", // Should only work on macOS
+ expectError: !server.backend.IsAvailable(), // Should only work if TTS backend available
},
{
name: "valid complex args",
args: map[string]interface{}{
"text": "Hello world",
- "voice": "Samantha",
+ "voice": "en-gb", // Use generic voice name that works on both platforms
"rate": 150,
"volume": 0.8,
},
- expectError: runtime.GOOS != "darwin",
+ expectError: !server.backend.IsAvailable(),
},
{
name: "valid output file",
args: map[string]interface{}{
"text": "test recording",
- "output": "/tmp/test.wav",
+ "output": "/tmp/test.wav", // wav works on both platforms
},
- expectError: runtime.GOOS != "darwin",
+ expectError: !server.backend.IsAvailable(),
},
}
@@ -145,21 +166,21 @@ func TestHandleListVoices(t *testing.T) {
{
name: "basic list",
args: map[string]interface{}{},
- expectError: runtime.GOOS != "darwin",
+ expectError: !server.backend.IsAvailable(),
},
{
name: "with language filter",
args: map[string]interface{}{
"language": "en",
},
- expectError: runtime.GOOS != "darwin",
+ expectError: !server.backend.IsAvailable(),
},
{
name: "detailed mode",
args: map[string]interface{}{
"detailed": true,
},
- expectError: runtime.GOOS != "darwin",
+ expectError: !server.backend.IsAvailable(),
},
}
@@ -259,9 +280,9 @@ func TestHandleStopSpeech(t *testing.T) {
result, err := server.handleStopSpeech(req)
- if runtime.GOOS != "darwin" {
+ if !server.backend.IsAvailable() {
if err == nil {
- t.Errorf("Expected error on non-macOS platform")
+ t.Errorf("Expected error when TTS backend not available")
}
return
}
@@ -295,13 +316,8 @@ func TestHandleSpeechSettings(t *testing.T) {
result, err := server.handleSpeechSettings(req)
- if runtime.GOOS != "darwin" {
- if err == nil {
- t.Errorf("Expected error on non-macOS platform")
- }
- return
- }
-
+ // Speech settings should always work, even if backend is not available
+ // (it will show installation instructions)
if err != nil {
t.Errorf("Unexpected error: %v", err)
}
@@ -314,13 +330,11 @@ func TestHandleSpeechSettings(t *testing.T) {
content := result.Content[0]
if textContent, ok := content.(mcp.TextContent); ok {
settingsText := textContent.Text
+
+ // These sections should always be present
expectedSections := []string{
- "VOICES:",
- "RATE (Speed):",
- "VOLUME:",
- "OUTPUT FORMATS:",
- "EXAMPLES:",
- "CONTROLS:",
+ "BACKEND:",
+ "Speech Synthesis Settings",
}
for _, section := range expectedSections {
@@ -328,6 +342,29 @@ func TestHandleSpeechSettings(t *testing.T) {
t.Errorf("Expected settings to contain section %q", section)
}
}
+
+ // If backend is available, check for detailed sections
+ if server.backend.IsAvailable() {
+ availableSections := []string{
+ "VOICES:",
+ "RATE (Speed):",
+ "VOLUME:",
+ "OUTPUT FORMATS:",
+ "EXAMPLES:",
+ "CONTROLS:",
+ }
+
+ for _, section := range availableSections {
+ if !strings.Contains(settingsText, section) {
+ t.Errorf("Expected settings to contain section %q when backend available", section)
+ }
+ }
+ } else {
+ // If backend not available, should contain installation instructions
+ if !strings.Contains(settingsText, "install") {
+ t.Errorf("Expected installation instructions when backend not available")
+ }
+ }
} else {
t.Errorf("Expected TextContent, got %T", content)
}
@@ -363,42 +400,93 @@ func TestJSONArguments(t *testing.T) {
// This should not panic or return invalid argument errors
_, err = server.handleSay(req)
- // Error is expected on non-macOS, but should not be argument-related
- if err != nil && runtime.GOOS == "darwin" {
- // On macOS, any error should not be about invalid arguments
+ // Error is expected when backend not available, but should not be argument-related
+ if err != nil && server.backend.IsAvailable() {
+ // When backend is available, any error should not be about invalid arguments
if strings.Contains(err.Error(), "invalid arguments") {
t.Errorf("Argument parsing failed: %v", err)
}
}
}
-func TestMacOSOnlyFunctionality(t *testing.T) {
- if runtime.GOOS == "darwin" {
- t.Skip("Skipping non-macOS test on macOS")
- }
-
+func TestCrossPlatformBackendSelection(t *testing.T) {
server := NewServer()
- tools := []string{"say", "list_voices", "speak_file", "stop_speech", "speech_settings"}
+ // Test that the appropriate backend is selected for each platform
+ backendName := server.backend.GetName()
- for _, toolName := range tools {
- t.Run(toolName, func(t *testing.T) {
- req := mcp.CallToolRequest{
- Name: toolName,
- Arguments: map[string]interface{}{
- "text": "test", // Required for say and speak_file
- "file_path": "/tmp/test.txt", // Required for speak_file
- },
- }
-
- _, err := server.handleSay(req)
- if err == nil {
- t.Errorf("Expected macOS-only error for tool %s", toolName)
- }
-
- if !strings.Contains(err.Error(), "macOS") {
- t.Errorf("Expected macOS-specific error message, got: %v", err)
- }
- })
+ switch runtime.GOOS {
+ case "darwin":
+ if !server.backend.IsAvailable() {
+ t.Errorf("macOS backend should be available (say command)")
+ }
+ if !strings.Contains(backendName, "say") {
+ t.Errorf("Expected macOS say backend, got %s", backendName)
+ }
+
+ case "linux":
+ // Backend availability depends on whether espeak-ng/espeak is installed
+ // Test that we get the right backend name regardless
+ if strings.Contains(backendName, "espeak") || strings.Contains(backendName, "not available") {
+ // This is correct
+ } else {
+ t.Errorf("Expected Linux espeak backend or unavailable message, got %s", backendName)
+ }
+
+ default:
+ if server.backend.IsAvailable() {
+ t.Errorf("Unsupported platform should not have available backend")
+ }
+ if !strings.Contains(backendName, "Unsupported") {
+ t.Errorf("Expected unsupported backend message, got %s", backendName)
+ }
+ }
+}
+
+func TestBackendUnavailableBehavior(t *testing.T) {
+ server := NewServer()
+
+ // If backend is not available, all speech tools should return appropriate errors
+ if !server.backend.IsAvailable() {
+ tools := []struct {
+ name string
+ args map[string]interface{}
+ }{
+ {"say", map[string]interface{}{"text": "test"}},
+ {"list_voices", map[string]interface{}{}},
+ {"speak_file", map[string]interface{}{"file_path": "/etc/passwd"}},
+ {"stop_speech", map[string]interface{}{}},
+ }
+
+ for _, tool := range tools {
+ t.Run(tool.name, func(t *testing.T) {
+ req := mcp.CallToolRequest{
+ Name: tool.name,
+ Arguments: tool.args,
+ }
+
+ var err error
+ switch tool.name {
+ case "say":
+ _, err = server.handleSay(req)
+ case "list_voices":
+ _, err = server.handleListVoices(req)
+ case "speak_file":
+ _, err = server.handleSpeakFile(req)
+ case "stop_speech":
+ _, err = server.handleStopSpeech(req)
+ }
+
+ if err == nil {
+ t.Errorf("Expected error when backend not available for tool %s", tool.name)
+ }
+
+ if !strings.Contains(err.Error(), "not available") {
+ t.Errorf("Expected 'not available' error message for tool %s, got: %v", tool.name, err)
+ }
+ })
+ }
+ } else {
+ t.Skip("Backend is available, skipping unavailable test")
}
}
\ No newline at end of file
CLAUDE.md
@@ -61,6 +61,7 @@ Each server is a standalone binary in `/usr/local/bin/`:
8. **mcp-signal** - Signal Desktop database access with encrypted SQLCipher support
9. **mcp-imap** - IMAP email server connectivity for Gmail, Migadu, and other providers
10. **mcp-gitlab** - GitLab issue and project management with intelligent local caching
+11. **mcp-speech** - Cross-platform text-to-speech with macOS `say` and Linux `espeak-ng` support
### Protocol Implementation
- **JSON-RPC 2.0** compliant MCP protocol
@@ -133,6 +134,9 @@ mcp-imap --server imap.gmail.com --username user@gmail.com --password app-passwo
# GitLab server
mcp-gitlab --gitlab-token your_token_here --gitlab-url https://gitlab.com
+
+# Speech server (cross-platform TTS)
+mcp-speech
```
## Enhanced Capabilities
@@ -779,6 +783,90 @@ The GitLab MCP server is now **production-ready** with:
**Cache automatically activated** - Your existing GitLab tools are now faster and work offline!
+## ๐ Speech MCP Server - Cross-Platform TTS Support (Session: 2025-07-08)
+
+**FINAL STATUS: 100% COMPLETE** - Speech MCP server successfully updated for cross-platform support.
+
+### **โ
Complete Cross-Platform Implementation**
+
+**Updated Architecture:**
+- โ
**TTSBackend Interface** - Abstract interface for different TTS systems
+- โ
**MacOSBackend** - Uses built-in `say` command (unchanged functionality)
+- โ
**LinuxBackend** - Uses `espeak-ng` (preferred) or `espeak` (fallback)
+- โ
**UnsupportedBackend** - Graceful handling for other operating systems
+- โ
**Automatic Detection** - Server selects appropriate backend based on OS
+
+**All 5 Tools Now Cross-Platform:**
+- โ
`say` - Text-to-speech with voice, rate, volume, and file output options
+- โ
`list_voices` - Platform-specific voice listing (macOS/Linux)
+- โ
`speak_file` - Read and speak file contents with line limiting
+- โ
`stop_speech` - Stop playing speech (platform-specific process killing)
+- โ
`speech_settings` - Show platform info, installation instructions, and usage help
+
+### **๐ฏ Platform Support Matrix**
+
+**macOS Support (Existing):**
+- โ
**Backend**: Built-in `say` command
+- โ
**Installation**: No setup required (already available)
+- โ
**Output Formats**: .aiff, .wav, .m4a
+- โ
**Voice Examples**: Alex, Samantha, Victoria, Fred, Fiona, Moira
+
+**Linux Support (New):**
+- โ
**Backend**: espeak-ng (preferred) or espeak (fallback)
+- โ
**Installation**: `sudo apt install espeak-ng` (Ubuntu/Debian), `sudo dnf install espeak-ng` (Fedora/RHEL)
+- โ
**Output Formats**: .wav only
+- โ
**Voice Examples**: en-gb, en-us, en-gb-scotland, various languages
+
+**Other Platforms:**
+- โ
**Backend**: UnsupportedBackend with helpful error messages
+- โ
**Behavior**: Shows installation guidance and platform support info
+
+### **๐ Updated Documentation**
+
+**Help Text Enhanced:**
+- โ
Cross-platform usage examples in `cmd/speech/main.go`
+- โ
Platform-specific installation instructions
+- โ
Backend detection and availability information
+- โ
Voice examples for both macOS and Linux
+
+**Tests Updated:**
+- โ
Backend abstraction tests for all platforms
+- โ
Cross-platform availability detection
+- โ
Graceful error handling when TTS not available
+- โ
Platform-specific backend selection verification
+
+### **๐ Ready for Production Use**
+
+The Speech MCP server is now **truly cross-platform** with:
+- **Complete Functionality**: Works on macOS and Linux with native TTS
+- **Graceful Degradation**: Helpful messages on unsupported platforms
+- **Consistent API**: Same tool interface across all platforms
+- **Installation Guide**: Clear setup instructions in help text
+- **Backend Detection**: Automatic selection of best available TTS system
+
+**Linux Usage (New):**
+```bash
+# Install TTS engine (Ubuntu/Debian)
+sudo apt install espeak-ng
+
+# Run speech server
+mcp-speech
+
+# Test with Claude Code integration
+{"name": "say", "arguments": {"text": "Hello from Linux!", "voice": "en-gb"}}
+```
+
+**macOS Usage (Unchanged):**
+```bash
+# No installation needed
+mcp-speech
+
+# Test with existing voices
+{"name": "say", "arguments": {"text": "Hello from macOS!", "voice": "Samantha"}}
+```
+
+The speech server transformation from macOS-only to cross-platform is now complete!
+
## ๐ Future Enhancement Ideas
This section tracks potential improvements and new features for the MCP server ecosystem.