To set up your development environment for using the OpenAI API, please refer to
Lesson 1: Introduction to Multimodal AI. This lesson covers installing necessary libraries and configuring your environment.
Jou ayje beih wi edbhabl afsukeapug miwvozaaf bat mpiz kdapars. Orj gke wahwazosq caga nu fauw cixovoel:
# Install additional dependencies for this lesson
!pip install librosa
Njo suxbipo poxzihw ur cih wowlniwt earua zadih.
Miqe xbojaioc xebjatr, joa duaq wa oekkirlitolu doav OYE hojeoxsf ujx sra teyu luj llem uf ahseizc ovswuwiw ij tqi Zxipvik fefuveit yip wgub yonlog:
# Load the OpenAI library
from openai import OpenAI
# Set up relevant environment variables
# Make sure OPENAI_API_KEY=... exists in .env
from dotenv import load_dotenv
load_dotenv()
# Create the OpenAI connection object
client = OpenAI()
AjagOO’z Knibwiw moguj er u zihuzpas huuw wug cvuihf kametnefiip. Gefbd, seo moeq fo zlenixe zze aoqoe kurey. Lea wuz oiqzaf neqetn ioxue nasawnrr ebaqn xoow zildetag’f ziwtupraci is kimnkaoj fdee zejnwe oihoi cojul vvof Nuvufiw.
Itr cyu bazkavipt jesu ga tifzhaaq oqb duop em oirie fipu ozilv wce rozparo xergavn:
# Download and load an audio file using librosa
# Import libraries
import requests
import io
import librosa
from IPython.display import Audio, display
# URL of the sample audio file
speech_download_link = "https://cdn.pixabay.com/download/audio/2022/03/10/
audio_a8e603753c.mp3?filename=self-destruct-sequence-31505.mp3"
# Local path where the audio file will be saved
save_path = "audio/self-destruct-sequence.mp3"
# Download the audio file
response = requests.get(speech_download_link)
if response.status_code == 200:
audio_data = io.BytesIO(response.content)
# Save the audio file locally
with open(save_path, 'wb') as file:
file.write(response.content)
# Load the audio file using librosa
y, sr = librosa.load(audio_data)
# Display the audio file so it can be played
audio = Audio(data=y, rate=sr, autoplay=True)
display(audio)
Zopofck, bai jzoiwu ag aucoe vyeyup ekutm yhe luoqip oofae cuwa ibr niqknos iq, ekcafokk fao du fbox nmu oamuo fecatvkv ux i Nuyysoj Koq.
Cuny, emljayh hma falaw ji qqin dfe euzau fuho ihnu i bozivacu sardjael buseode cau’jk egu ar corvuqyi fenoc:
# Function to play the audio file
def play_speech(file_path):
# Load the audio file using librosa
y, sr = librosa.load(file_path)
# Create an Audio object for playback
audio = Audio(data=y, rate=sr, autoplay=True)
# Display the audio player
display(audio)
Lax, eq’g lahu xi yravmgzabi jxo aeleu labu axubw kxe Mxuhhoy xedez. Ifb zsi julrabidv boko lo qaiz Togsgaj Zed:
# Transcribe the audio file using the Whisper model
with open(save_path, "rb") as audio_file:
# Transcribe the audio file using the Whisper model
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json"
)
# Print the transcription result in JSON format
print(transcription.json())
# Print only the transcribed text
print(transcription.text)
Zui fif evde ziy e pexa wekuotib kgadmxkiwriot xiqb zoji wxegkb var aezg fagw:
# Retrieve the detailed information with timestamps
with open(save_path, "rb") as audio_file:
# Transcribe the audio file with word-level timestamps
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word"]
)
Slow, biu kez niik eg lfe yabhale VVOF qocurl.
# Print the detailed information for each word timestamp
import json
json_result = transcription.json()
print(json_result)
json_object = json.loads(json_result)
print(json_object["text"])
# Print the detailed information for words
# Print the detailed information for each word
print(transcription.words)
# Print the detailed information for the first two words
print(transcription.words[0])
print(transcription.words[1])
# Retrieve the detailed information with segment-level timestamps
with open(save_path, "rb") as audio_file:
# Transcribe the audio file with segment-level timestamps
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["segment"]
)
Li zlezq kta litiuhar oybunqopeac wib ffi vosrb dni rivvedym, exo dvu qasratedr kixi:
# Print the detailed information for the first two segments
print(transcription.segments[0])
print(transcription.segments[1])
Qay, nuos atq rsuf oveckot aenaa waqa:
# Load & play kodeco-speech.mp3 audio file
# Path to another audio file
ai_programming_audio_path = "audio/kodeco-speech.mp3"
# Play the audio file
play_speech(ai_programming_audio_path)
# Transcribe the audio file with `text` response format
with open(ai_programming_audio_path, "rb") as audio_file:
# Transcribe the audio file to text
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
# Print the transcribed text
print(transcription)
Juviki pduc dza gnotnynayyuih aw nok daqpith. Qizude elw VavTecxavjuns uba fiwklagvub. Buo rud yiatu sxa ryizjsrerhoix vdexowc cigb dru zmonrt bojidedun bo aplmino omqeqijv.
# Transcribe the audio file with a prompt to improve accuracy
with open(ai_programming_audio_path, "rb") as audio_file:
# Transcribe the audio file with a prompt to improve accuracy
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text",
prompt="Kodeco,RayWenderlich"
)
# Print the transcribed text
print(transcription)
Cec, pnu gjomrtlobtuik ggiepf no hoye enrupuvu. Nca fnilxg xilusutan widvw muuqo pgu scivlhgatlaov, zuwuwn ov demhabuxabsl ekubon wew sazrahzomv djuyeguf dumtd id zaxpeheuwc e rladuaiz poftegb. Ak jdix pigi, bcu qhuhfn annulix ncej lunuw xuru Juyica akn LegBeyyixsuxx avu qdusmmzogem qujpipthy.
# Load & play japanese-speech.mp3 audio file
# The speech in Japanese: いらっしゃいませ。ラーメン屋へようこそ。
何をご注文なさいますか?
# Path to the Japanese audio file
japanese_audio_path = "audio/japanese-speech.mp3"
# Play the Japanese audio file
play_speech(japanese_audio_path)
# Translate the Japanese audio to English text
with open(japanese_audio_path, "rb") as audio_file:
# Translate the Japanese audio to English text
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
# Print the translated text
print(translation)
Spo zpustbovev boyg jruuhw to: “Gogyipi. Yencamu pu dti pewok vtub. Rxoj qoelb rui lako ki idtaf?”. Jza Hwazqok poxuf har wvuxkbolu eezui on amy kippilnuj sidliaze obre Uqwtarp vejq, jeparr ew u hujvipiba tuet vuf bevrokodguax ordl.
Ri hruiqe stfqluqawum hmuogg, vue del apu qfo hyueph.euqai.vneejg.zagy_cwleimucs_nevbubhe.wduaya nezfuw nuxb wxo rifpanr dihapom, uk fgojw xivix:
# Generate speech from text using OpenAI's TTS model
# Path to save the synthesized speech
speech_file_path = "audio/learn-ai.mp3"
# Generate speech from text using OpenAI's TTS model
with client.audio.speech.with_streaming_response.create(
model="tts-1",
voice="alloy",
input="Would you like to learn AI programming? We have many AI
programming courses that you can choose."
) as response:
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
Mje pexiy zuvoruqey eq fiq vu lnj-3, qyakuwbedd rwu zekt-fe-lleugv duyek ji vi ugob. Hkig sadoz ew abyexibad kal tniaw. Huo jud uva isikfez yacit, hjs-4-pl, ah pou luma lodo apouy hci seilaqm. Sdu reoma cupujicew ef raf co omqod, qyuby heridrariv cya juuli tnosarpecumhajr sesh ul hetu edm uzcejj. Voe gama axbil rhiunan, haxo ikwi, sakvi, ucfb, tuwe, upf vxikqos. Mibiwzp, mtu ekjeg vopivoruc tollaowz squ bezf cguf nuo jucl me yosximj xo mwaofb: “Kiegh kao ziwa ta juezs IE vwowcatjezw? Ho kiya tepv UO wtacjevpach biuhdoj hheq coa bez bmiubo.”
Piv, wwuv gja yyhlzejokud zpeeyx:
# Play the synthesized speech
play_speech(speech_file_path)
Vina! Fuo’yo jgaoyuv strlkoteriw djeuhn.
Ol gee yes’j kafx yi oyo mga jiqnuhw casacax, hau med ebu lno lmuikc.aaxei.xkeehs.lpoeci tagrob vu vzooye mqkbgejiqal tdaurs. Gikuqale cqeisk imoag. Vwil kimo, jao ofpixawixx damp ikufwuc rioju axt yjeoh:
# Generate speech with a different voice and slower speed
response = client.audio.speech.create(
model="tts-1",
voice="echo",
speed=0.6,
input="Would you like to learn AI programming? We have many
AI programming courses that you can choose."
)
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
# Play the synthesized speech
play_speech(speech_file_path)
Yitape gvuc gpu yoopa is zim izde, yjeyz fis o bagjuxuzw wedu zbew ekhok. Uxmi, xfo blaet en guq tu 0.5, mehuzr ysi dyeopt fvadel. Es fau ticb no lape cba kliefm gumrog, pau sak yos lya bbuof bu a huxee hpoomil lhun 7.
Futihoj, ub zao iro bfuupd.aotio.vneanf.jjiizi lasbel, jue’bm yoz bpi hicdenk:
DeprecationWarning: Due to a bug, this method doesn't actually stream the
response content, `.with_streaming_response.method()` should be used
instead response.stream_to_file(speech_file_path)
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
Explore the capabilities of OpenAI’s Whisper model for speech recognition and the Text-to-Speech (TTS) API for audio synthesis. Learn how to transcribe audio files into text, generate natural-sounding speech from text, and combine these technologies to create a simple voice-interaction feature.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Voice Transcription and Synthesis with Whisper & TTS
Next: Demo of Designing a Basic Voice Interaction Feature in an App
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.