TranscribeAudio
What is it?
The TranscribeAudio node uses OpenAI's models to convert audio into text. It supports multiple transcription models and can work with both direct model configuration and agent-based transcription.
When would I use it?
Use this node when you want to:
- Convert speech to text
- Transcribe audio recordings
- Extract text from audio files
- Process voice input into written form
- Create transcripts of audio content
How to use it
Basic Setup
- Add the TranscribeAudio node to your workflow
- Connect an audio source to the "audio" input
- Choose a transcription model or connect an agent
- Run the node to generate the transcription
Parameters
- agent: An optional existing agent configuration to use for transcription
- model: The transcription model to use (defaults to "gpt-4o-mini-transcribe")
- audio: The audio file to transcribe (required)
- output: The transcribed text output
Outputs
- output: The transcribed text from the audio
- agent: The agent object used for transcription, which can be connected to other nodes
Example
A complete workflow for recording and transcribing audio:
- Add a Microphone node to capture audio
- Connect the Microphone's "audio" output to the TranscribeAudio node's "audio" input
- Select your preferred transcription model
- Run the workflow
- The transcribed text will be available in the "output" parameter
Important Notes
- The node requires a valid OpenAI API key set up in your environment as
OPENAI_API_KEY - Available models include:
- gpt-4o-mini-transcribe
- gpt-4o-transcribe
- whisper-1
- You can provide your own agent configuration for more customized behavior
- The quality of transcription depends on the audio quality and the selected model
Common Issues
- Missing API Key: Ensure your OpenAI API key is properly set up as the environment variable
- No Audio Provided: Make sure you've connected a valid audio source to the "audio" input
- Poor Transcription Quality: Try using a different model or improving the audio quality
- Processing Errors: Very long audio files or poor audio quality might result in less accurate transcriptions