macOS · Cursor AI · Voice-first

pseudo-jarvis

Real-time voice dictation that types directly into Cursor Agent — the first step toward a personal AI you can simply talk to.

The goal is an Iron Man–style JARVIS: speak naturally, and your AI assistant understands intent, confirms when needed, and acts. Today this project wires voice → text → Cursor; tomorrow it layers agents on top of that stream.
macOS only tkinter GUI + .app Cursor Agent Linux / Windows — planned

Not the same as Cursor's microphone

Cursor has a built-in voice button in Agent chat (hold Ctrl+M / Cmd+M, or click the mic). It records a clip, runs batch speech-to-text, and drops the transcript into the input field when you confirm. pseudo-jarvis is a different pipeline — built for hands-free, continuous dictation into Cursor with voice commands and an agent rule that confirms intent before acting.

Cursor mic built-in

  • Push-to-talk or hold shortcut — one recording, then transcribe
  • Transcript appears only inside Cursor's chat input
  • You still click Send (or type) to submit the message
  • No custom voice commands or pause punctuation
  • Uses Cursor's own STT stack (quality varies by release / window)

pseudo-jarvis this project

  • Continuous dictation — text types at the cursor as you speak
  • Simulates keystrokes (needs macOS Accessibility)
  • Say send after a pause → Enter + @voice-input-confirmation.mds
  • freeze / resume and pause > 2 s → . + Shift+Enter
  • Agent rule asks for confirm. before implementing
Cursor microphonepseudo-jarvis
When to use Quick one-shot prompts; you review text before sending Long dictation sessions; JARVIS-style talk → confirm → act
Recording model Record clip → batch STT → paste into input Always listening; chunks transcribed in a priority queue
Submit message Manual Send button or Enter Voice command send (Enter + rule mention)
Agent safety Whatever you typed is sent as-is @voice-input-confirmation.mds restates intent; waits for Confirm.
Setup Microphone permission for Cursor only Separate app; Microphone + Accessibility; ADD each project once
Use Cursor's mic when you want a short prompt inside chat with no extra tooling. Use pseudo-jarvis when you want to dictate continuously, break sentences on long pauses, submit with your voice, and have the agent confirm before it edits code or runs commands.

Architecture & tech stack

A macOS tkinter app captures speech, transcribes via Google Speech Recognition, and simulates keystrokes into Cursor Agent. Run from source (python gui_app.py) or install pseudo-jarvis.app built with PyInstaller.

App shell active

  • gui_app.py — tkinter GUI
  • app/session.py — session runner
  • build_app.sh.app bundle

Voice engine active

  • VoiceToText in app/voice_to_text.py
  • Priority queue for send / resume
  • Stop button → stop() (no stray typing)

Audio & speech active

  • PortAudio (Homebrew)
  • PyAudio + SpeechRecognition
  • Google Speech API (online)

Cursor integration active

  • pyautogui + pynput (click, paste, type)
  • Paste @voice-input-confirmation.mds
  • Agent rule confirms intent

Project registry active

  • app/project_registry.py
  • ADD copies voice rule into projects
  • setup-variables/subscribed-projects.txt

AI agents planned

Task runners, automations, and multi-step reasoning built on the voice transcript pipeline.

High-level data flow

flowchart LR
  subgraph Input
    MIC[Microphone]
    BTN[GUI Start / Stop]
  end
  subgraph pseudo_jarvis
    GUI[gui_app.py]
    SESS[app/session.py]
    VTT[VoiceToText]
    QUEUE[Priority queue]
    SR[Google Speech API]
  end
  subgraph Output
    CUR[Cursor Agent typing box]
  end
  MIC --> VTT
  BTN --> GUI
  GUI --> SESS --> VTT
  VTT --> QUEUE --> SR
  SR --> VTT
  VTT -->|pyautogui / paste| CUR
  BTN -->|Stop| VTT
        

Component map

flowchart TB
  subgraph Startup
    A[Mic dropdown in GUI] --> B[Start button]
    B --> C[wait for click in Cursor]
    C --> D[type rule mention]
    D --> E[listen_and_transcribe]
  end
  subgraph Dictation
    E --> F[_on_audio callback]
    F --> G[_recognition_worker]
    G --> H{command?}
    H -->|speech| I[_type_recognized_text]
    H -->|send| J[Enter + rule mention]
    H -->|freeze| K[halt typing]
    H -->|resume| L[click + rule mention]
  end
  I --> CUR[(Cursor focus)]
  J --> CUR
  L --> CUR
        

App UI (gui_app.py)

The pseudo-jarvis window is a single tkinter app. Everything below runs on macOS only; grant Microphone and Accessibility to pseudo-jarvis (or Terminal when running from source).

▶ Subscribed projects Collapsible list of project roots from setup-variables/subscribed-projects.txt. Expand to see which Cursor repos already have the voice rule installed.
Add project to pseudo-jarvis Project root text field · Browse… folder picker · ADD button. Copies voice-input-confirmation.mds into .cursor/rules/, registers the path, and adds the rule file to that project's .gitignore.
Voice session · Cursor Agent Microphone dropdown (from list_input_devices) · Start begins a session on a background thread · Stop calls VoiceToText.stop() and re-enables the mic picker.
Session log Scrollable pane. All session print() output is redirected here via QueueLogWriter — calibration, listening hints, freeze/resume messages, and errors.

First launch

A dialog explains that you must ADD each Cursor project root before dictating into it.

Start

Disables Start and mic picker, enables Stop, spawns run_session(device_index) on a worker thread, and binds the live VoiceToText instance for Stop.

Stop

Sets the stop flag, drains the recognition queue, and joins worker threads so no keystrokes fire after the session ends.

Distribution

./build_app.sh produces dist/pseudo-jarvis.app. The bundled app stores config under ~/Library/Application Support/pseudo-jarvis/ when not run from the repo.

GUI → session → Cursor

sequenceDiagram
  actor User
  participant GUI as gui_app.py
  participant Sess as app/session.py
  participant VTT as VoiceToText
  participant Cursor as Cursor Agent
  User->>GUI: choose mic, click Start
  GUI->>Sess: run_session on background thread
  Sess->>User: click typing box
  User->>Cursor: click
  Sess->>Cursor: paste @voice-input-confirmation.mds
  Sess->>VTT: listen_and_transcribe
  GUI->>GUI: session log shows hints
  User->>Cursor: speak
  VTT->>Cursor: type at cursor
  User->>GUI: click Stop
  GUI->>VTT: stop
        

ADD project flow

flowchart TD
  A[Browse or paste project root] --> B[Click ADD]
  B --> C[Copy voice-input-confirmation.mds]
  C --> D[Append path to subscribed-projects.txt]
  D --> E[Add rule path to project .gitignore]
  E --> F[Show in Subscribed projects list]
        

Command & feature workflows

Voice commands are matched as whole phrases after a pause. Click the tabs below to see each flow. Voice signal reliability is still being tuned — see disclaimer above.

Session startup

sequenceDiagram
  actor User
  participant GUI as pseudo-jarvis GUI
  participant App as VoiceToText
  participant Cursor as Cursor Agent
  User->>GUI: open app, pick mic, Start
  GUI->>User: click typing box
  User->>Cursor: click
  App->>Cursor: paste @voice-input-confirmation.mds
  App->>App: start listening
  User->>Cursor: speak — text appears at cursor
          

Continuous dictation

flowchart LR
  A[Speech chunk] --> B[Google transcribe]
  B --> C{voice command?}
  C -->|no| D[Type at cursor]
  C -->|yes| E[Run command handler]
  D --> F[Short pause joins with space]
  F --> G[Keep listening]
          

Pause > 2 seconds

flowchart TD
  A[Silence exceeds 2 s] --> B[Type period and space]
  B --> C[Shift+Enter new line in Cursor]
  C --> D[Next phrase types without leading space]
          
TriggerActionTyped
Pause > 2 s (once per pause)Sentence break. + Shift+Enter
Pause ≤ 2 sSame thoughtspace + next words

send — submit message

sequenceDiagram
  actor User
  participant App as VoiceToText
  participant Cursor as Cursor Agent
  User->>App: say "send" after pause
  App->>Cursor: Enter
  Note over App: wait 0.5 s
  App->>Cursor: paste @voice-input-confirmation.mds
  Note over Cursor: Agent asks for confirm.
          

freeze — pause dictation

flowchart TD
  A[Say freeze after pause] --> B[Dictation halted]
  B --> C[Mic still listens]
  C --> D[Only resume accepted]
  B --> E[No text typed at cursor]
          

resume — continue after freeze

sequenceDiagram
  actor User
  participant App as VoiceToText
  participant Cursor as Cursor Agent
  User->>App: say "resume" after pause
  App->>User: click typing box
  User->>Cursor: click
  App->>Cursor: paste @voice-input-confirmation.mds
  App->>App: enable dictation
  User->>Cursor: speak again
          

Stop session — GUI Stop button

flowchart TD
  A[Click Stop in pseudo-jarvis] --> B[VoiceToText.stop sets flag]
  B --> C[Mic listener stops]
  C --> D[Queue drained]
  D --> E[No further typing or paste]
  E --> F[Session log: stopped message]
  F --> G[Start re-enabled, mic picker unlocked]
          
Command / controlWhenResult
sendAfter pauseEnter → wait 0.5 s → rule mention → agent confirmation
freezeAfter pauseStop typing; mic listens for resume
resumeAfter freezeClick → rule mention → dictate again
Stop buttonAny time during sessionEnd session; zero keystrokes after stop