● macOS · Cursor AI · Voice-first

pseudo-jarvis

Real-time voice dictation that types directly into Cursor Agent — the first step toward a personal AI you can simply talk to.

The goal is an Iron Man–style JARVIS: speak naturally, and your AI assistant understands intent, confirms when needed, and acts. Today this project wires voice → text → Cursor; tomorrow it layers agents on top of that stream.

View on GitHub → Setup guide (README)

macOS only tkinter GUI + .app Cursor Agent Linux / Windows — planned

Work in progress. We are still working to improve voice command signals (send, freeze, resume) — capture timing, latency after you stop speaking, and background-noise handling may vary between sessions.

02 — Why pseudo-jarvis?

Not the same as Cursor's microphone

Cursor has a built-in voice button in Agent chat (hold Ctrl+M / Cmd+M, or click the mic). It records a clip, runs batch speech-to-text, and drops the transcript into the input field when you confirm. pseudo-jarvis is a different pipeline — built for hands-free, continuous dictation into Cursor with voice commands and an agent rule that confirms intent before acting.

Cursor mic built-in

Push-to-talk or hold shortcut — one recording, then transcribe
Transcript appears only inside Cursor's chat input
You still click Send (or type) to submit the message
No custom voice commands or pause punctuation
Uses Cursor's own STT stack (quality varies by release / window)

pseudo-jarvis this project

Continuous dictation — text types at the cursor as you speak
Simulates keystrokes (needs macOS Accessibility)
Say send after a pause → Enter + @voice-input-confirmation.mds
freeze / resume and pause > 2 s → . + Shift+Enter
Agent rule asks for confirm. before implementing

	Cursor microphone	pseudo-jarvis
When to use	Quick one-shot prompts; you review text before sending	Long dictation sessions; JARVIS-style talk → confirm → act
Recording model	Record clip → batch STT → paste into input	Always listening; chunks transcribed in a priority queue
Submit message	Manual Send button or Enter	Voice command `send` (Enter + rule mention)
Agent safety	Whatever you typed is sent as-is	`@voice-input-confirmation.mds` restates intent; waits for `Confirm.`
Setup	Microphone permission for Cursor only	Separate app; Microphone + Accessibility; ADD each project once

Use Cursor's mic when you want a short prompt inside chat with no extra tooling. Use pseudo-jarvis when you want to dictate continuously, break sentences on long pauses, submit with your voice, and have the agent confirm before it edits code or runs commands.

03 — System design

Architecture & tech stack

A macOS tkinter app captures speech, transcribes via Google Speech Recognition, and simulates keystrokes into Cursor Agent. Run from source (python gui_app.py) or install pseudo-jarvis.app built with PyInstaller.

App shell active

gui_app.py — tkinter GUI
app/session.py — session runner
build_app.sh → .app bundle

Voice engine active

VoiceToText in app/voice_to_text.py
Priority queue for send / resume
Stop button → stop() (no stray typing)

Audio & speech active

PortAudio (Homebrew)
PyAudio + SpeechRecognition
Google Speech API (online)

Cursor integration active

pyautogui + pynput (click, paste, type)
Paste @voice-input-confirmation.mds
Agent rule confirms intent

Project registry active

app/project_registry.py
ADD copies voice rule into projects
setup-variables/subscribed-projects.txt

AI agents planned

Task runners, automations, and multi-step reasoning built on the voice transcript pipeline.

High-level data flow

flowchart LR
  subgraph Input
    MIC[Microphone]
    BTN[GUI Start / Stop]
  end
  subgraph pseudo_jarvis
    GUI[gui_app.py]
    SESS[app/session.py]
    VTT[VoiceToText]
    QUEUE[Priority queue]
    SR[Google Speech API]
  end
  subgraph Output
    CUR[Cursor Agent typing box]
  end
  MIC --> VTT
  BTN --> GUI
  GUI --> SESS --> VTT
  VTT --> QUEUE --> SR
  SR --> VTT
  VTT -->|pyautogui / paste| CUR
  BTN -->|Stop| VTT

Component map

flowchart TB
  subgraph Startup
    A[Mic dropdown in GUI] --> B[Start button]
    B --> C[wait for click in Cursor]
    C --> D[type rule mention]
    D --> E[listen_and_transcribe]
  end
  subgraph Dictation
    E --> F[_on_audio callback]
    F --> G[_recognition_worker]
    G --> H{command?}
    H -->|speech| I[_type_recognized_text]
    H -->|send| J[Enter + rule mention]
    H -->|freeze| K[halt typing]
    H -->|resume| L[click + rule mention]
  end
  I --> CUR[(Cursor focus)]
  J --> CUR
  L --> CUR

04 — Application window

App UI (`gui_app.py`)

The pseudo-jarvis window is a single tkinter app. Everything below runs on macOS only; grant Microphone and Accessibility to pseudo-jarvis (or Terminal when running from source).

▶ Subscribed projects Collapsible list of project roots from setup-variables/subscribed-projects.txt. Expand to see which Cursor repos already have the voice rule installed.

Add project to pseudo-jarvis Project root text field · Browse… folder picker · ADD button. Copies voice-input-confirmation.mds into .cursor/rules/, registers the path, and adds the rule file to that project's .gitignore.

Voice session · Cursor Agent Microphone dropdown (from list_input_devices) · Start begins a session on a background thread · Stop calls VoiceToText.stop() and re-enables the mic picker.

Session log Scrollable pane. All session print() output is redirected here via QueueLogWriter — calibration, listening hints, freeze/resume messages, and errors.

First launch

A dialog explains that you must ADD each Cursor project root before dictating into it.

Start

Disables Start and mic picker, enables Stop, spawns run_session(device_index) on a worker thread, and binds the live VoiceToText instance for Stop.

Stop

Sets the stop flag, drains the recognition queue, and joins worker threads so no keystrokes fire after the session ends.

Distribution

./build_app.sh produces dist/pseudo-jarvis.app. The bundled app stores config under ~/Library/Application Support/pseudo-jarvis/ when not run from the repo.

GUI → session → Cursor

sequenceDiagram
  actor User
  participant GUI as gui_app.py
  participant Sess as app/session.py
  participant VTT as VoiceToText
  participant Cursor as Cursor Agent
  User->>GUI: choose mic, click Start
  GUI->>Sess: run_session on background thread
  Sess->>User: click typing box
  User->>Cursor: click
  Sess->>Cursor: paste @voice-input-confirmation.mds
  Sess->>VTT: listen_and_transcribe
  GUI->>GUI: session log shows hints
  User->>Cursor: speak
  VTT->>Cursor: type at cursor
  User->>GUI: click Stop
  GUI->>VTT: stop

ADD project flow

flowchart TD
  A[Browse or paste project root] --> B[Click ADD]
  B --> C[Copy voice-input-confirmation.mds]
  C --> D[Append path to subscribed-projects.txt]
  D --> E[Add rule path to project .gitignore]
  E --> F[Show in Subscribed projects list]

05 — How it works

Command & feature workflows

Voice commands are matched as whole phrases after a pause. Click the tabs below to see each flow. Voice signal reliability is still being tuned — see disclaimer above.

Session startup

sequenceDiagram
  actor User
  participant GUI as pseudo-jarvis GUI
  participant App as VoiceToText
  participant Cursor as Cursor Agent
  User->>GUI: open app, pick mic, Start
  GUI->>User: click typing box
  User->>Cursor: click
  App->>Cursor: paste @voice-input-confirmation.mds
  App->>App: start listening
  User->>Cursor: speak — text appears at cursor

Continuous dictation

flowchart LR
  A[Speech chunk] --> B[Google transcribe]
  B --> C{voice command?}
  C -->|no| D[Type at cursor]
  C -->|yes| E[Run command handler]
  D --> F[Short pause joins with space]
  F --> G[Keep listening]

Pause > 2 seconds

flowchart TD
  A[Silence exceeds 2 s] --> B[Type period and space]
  B --> C[Shift+Enter new line in Cursor]
  C --> D[Next phrase types without leading space]

Trigger	Action	Typed
Pause > 2 s (once per pause)	Sentence break	`.` + Shift+Enter
Pause ≤ 2 s	Same thought	space + next words

send — submit message

sequenceDiagram
  actor User
  participant App as VoiceToText
  participant Cursor as Cursor Agent
  User->>App: say "send" after pause
  App->>Cursor: Enter
  Note over App: wait 0.5 s
  App->>Cursor: paste @voice-input-confirmation.mds
  Note over Cursor: Agent asks for confirm.

freeze — pause dictation

flowchart TD
  A[Say freeze after pause] --> B[Dictation halted]
  B --> C[Mic still listens]
  C --> D[Only resume accepted]
  B --> E[No text typed at cursor]

resume — continue after freeze

sequenceDiagram
  actor User
  participant App as VoiceToText
  participant Cursor as Cursor Agent
  User->>App: say "resume" after pause
  App->>User: click typing box
  User->>Cursor: click
  App->>Cursor: paste @voice-input-confirmation.mds
  App->>App: enable dictation
  User->>Cursor: speak again

Stop session — GUI Stop button

flowchart TD
  A[Click Stop in pseudo-jarvis] --> B[VoiceToText.stop sets flag]
  B --> C[Mic listener stops]
  C --> D[Queue drained]
  D --> E[No further typing or paste]
  E --> F[Session log: stopped message]
  F --> G[Start re-enabled, mic picker unlocked]

Command / control	When	Result
`send`	After pause	Enter → wait 0.5 s → rule mention → agent confirmation
`freeze`	After pause	Stop typing; mic listens for resume
`resume`	After freeze	Click → rule mention → dictate again
`Stop` button	Any time during session	End session; zero keystrokes after stop

pseudo-jarvis

Not the same as Cursor's microphone

Cursor mic built-in

pseudo-jarvis this project

Architecture & tech stack

App shell active

Voice engine active

Audio & speech active

Cursor integration active

Project registry active

AI agents planned

High-level data flow

Component map

App UI (gui_app.py)

First launch

Start

Stop

Distribution

GUI → session → Cursor

ADD project flow

Command & feature workflows

Session startup

Continuous dictation

Pause > 2 seconds

send — submit message

freeze — pause dictation

resume — continue after freeze

Stop session — GUI Stop button

App UI (`gui_app.py`)