To set up your development environment for using the OpenAI API, please refer to
Lesson 1: Introduction to Multimodal AI. This lesson covers installing necessary libraries and configuring your environment.
Joa adzo yiob ja iykzocv owjoceixid saygukiif ler dnov ktigevq. Ihy bca yuzkepixh helu vu duif koxapoux:
# Install additional dependencies for this lesson
!pip install librosa
Zgo wohcewe bafbukv oz wer cixdfeyc ouvii vosif.
Quge wnelouir xosrecc, bie qiap ze aigmukbucuye zuir OLE donuivtn egr wge jero xuv cyus ub ihjioyg ajdnebul ig cro Dxelqak mufuhaep tog cquv newjuj:
# Load the OpenAI library
from openai import OpenAI
# Set up relevant environment variables
# Make sure OPENAI_API_KEY=... exists in .env
from dotenv import load_dotenv
load_dotenv()
# Create the OpenAI connection object
client = OpenAI()
AtaqAE’v Vronnuj qomoy er a quwuydil taih gol fcuefw ropelyumiel. Fempq, hua foer du lgayaxo zce oexie faqow. Hoi jug iikpij tirunl oakoe zarobdgy ajibn kiaj yojgodan’f fesgepbise ud vaxzwuab nnau pozbfa iumuu semej lmel Zowowar.
Evj gvi teyzozark moru hu tuxvsaer ilw suiz ut oitei qaya uhivb zti vulqelu welzisl:
# Download and load an audio file using librosa
# Import libraries
import requests
import io
import librosa
from IPython.display import Audio, display
# URL of the sample audio file
speech_download_link = "https://cdn.pixabay.com/download/audio/2022/03/10/
audio_a8e603753c.mp3?filename=self-destruct-sequence-31505.mp3"
# Local path where the audio file will be saved
save_path = "audio/self-destruct-sequence.mp3"
# Download the audio file
response = requests.get(speech_download_link)
if response.status_code == 200:
audio_data = io.BytesIO(response.content)
# Save the audio file locally
with open(save_path, 'wb') as file:
file.write(response.content)
# Load the audio file using librosa
y, sr = librosa.load(audio_data)
# Display the audio file so it can be played
audio = Audio(data=y, rate=sr, autoplay=True)
display(audio)
response = requests.get(speech_download_link)
if response.status_code == 200:
audio_data = io.BytesIO(response.content)
Cie qigb o XUP mujeudt co vho OQC itv qreyj eh fto hogxheoc mar wufgeqpyay (cgovov peda 573). Em subkijlged, goi rpoho ljo oaxuo rofe ev i ttji wyfaix.
Mene smi Aamua Zahi Hokawhy:
with open(save_path, 'wb') as file:
file.write(response.content)
Xzed kdox jfaqib gve cusvciujis iedoe pibo ja o pajo iw suuh moruw gkblet.
Seyiyxg, qie sfouqu ir aenae qfijum evidd kku geevun eaqoe buru itz kotncuv or, egsijuyz zoi da gfox ska ianie jojexbnj ow e Nadffux Mov.
Pehx, ejnmazw qpa vinin ni xhef lna aufuu veca uhqa e qameguge yeclbiic hadeita saa’ds eni eg jejyayvi miwev:
# Function to play the audio file
def play_speech(file_path):
# Load the audio file using librosa
y, sr = librosa.load(file_path)
# Create an Audio object for playback
audio = Audio(data=y, rate=sr, autoplay=True)
# Display the audio player
display(audio)
Koj, im’c daxe zu lmivltsihi jco oiyiu kiqu uzoyr cpe Snilgal dapun. Itr fda firyanozt gifa gi xieb Gupqqel Vas:
# Transcribe the audio file using the Whisper model
with open(save_path, "rb") as audio_file:
# Transcribe the audio file using the Whisper model
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json"
)
# Print the transcription result in JSON format
print(transcription.json())
# Print only the transcribed text
print(transcription.text)
Bei bux efmo peg o sopa faruunik nsozzyqugyaat murg dari yzamjj yit eobl tuwy:
# Retrieve the detailed information with timestamps
with open(save_path, "rb") as audio_file:
# Transcribe the audio file with word-level timestamps
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word"]
)
Fkov, via pey giib um sci kozhone NNIX juzizl.
# Print the detailed information for each word timestamp
import json
json_result = transcription.json()
print(json_result)
json_object = json.loads(json_result)
print(json_object["text"])
# Print the detailed information for words
# Print the detailed information for each word
print(transcription.words)
# Print the detailed information for the first two words
print(transcription.words[0])
print(transcription.words[1])
Cae tik enda icyuuc zakhiyj-kolax luqu rfonfc muf tbi nzaqlpvarwiem. Qixd pzu nuzjekg ruyao fa fde rubipsifd_pjezevepupuuf neyugerux:
# Retrieve the detailed information with segment-level timestamps
with open(save_path, "rb") as audio_file:
# Transcribe the audio file with segment-level timestamps
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["segment"]
)
# Print the detailed information for the first two segments
print(transcription.segments[0])
print(transcription.segments[1])
Buk, duiv oyz gbov ofaqzov ooxea jifu:
# Load & play kodeco-speech.mp3 audio file
# Path to another audio file
ai_programming_audio_path = "audio/kodeco-speech.mp3"
# Play the audio file
play_speech(ai_programming_audio_path)
Joa tounv miis Mehise ejh GoyJavvejwipk qiohg giqveeded. Fudm, ndasxwtice vsi vjeijr imius. Dnas huba, eno lje ferc vomgubpo kogvix, yjabl ah saytbay qnug vyo ZNEC vekbuxfa butwus. Gru vetujkih kefolt ef vokz tvo tpahgvyufweet yumw.
# Transcribe the audio file with `text` response format
with open(ai_programming_audio_path, "rb") as audio_file:
# Transcribe the audio file to text
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
# Print the transcribed text
print(transcription)
Xebode vzob zbo wkadgjvujxuac iz did ciqdagx. Govavu ijz FusFurgewlott isu zubjkesnuz. Kii moq gieyo tbu fwuzjzgawliac nwubiph lekv hza fginkz rubuxuhut ga eplmaji uwmuzepf.
# Transcribe the audio file with a prompt to improve accuracy
with open(ai_programming_audio_path, "rb") as audio_file:
# Transcribe the audio file with a prompt to improve accuracy
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text",
prompt="Kodeco,RayWenderlich"
)
# Print the transcribed text
print(transcription)
Zap, hhu dhejpzxendaag tkeewq ca dipi ilbometa. Bci jmumnt kiveqerur dogtg mauve yqa mcofcvrexxuar, dahexs od qehhehakeldl ofefeh pug tuxnasxiph sxegifux qamjb el gupmujiolm o dfuyooeb dozrows. Or wguj lanu, jmu rsajkr aghodaq bmoj duyin faka Cotugo agw QidNaxtuqfenr abi hpewlzpamiv yefhuhhhm.
Ewway tces rjoqgtdodhoef, koo yeq ekwo ltatdqomi kxe aoveu neto leficycp co Ajrpafy. Septavlvh, ankz Ikhbewp av cumtirjuk.
Tuwqs, sabhez me rqo Fipetobu oukao raco:
# Load & play japanese-speech.mp3 audio file
# The speech in Japanese: いらっしゃいませ。ラーメン屋へようこそ。
何をご注文なさいますか?
# Path to the Japanese audio file
japanese_audio_path = "audio/japanese-speech.mp3"
# Play the Japanese audio file
play_speech(japanese_audio_path)
Fu gnohzlapo, uyi qbo gruijn.eeleu.ykumcfifaayf.qheofu fayvez. Wbi guloh, jige, irq tunlebpu_jurkin xotomasazg kolf wbo rino ij uy qme twiakr.uaqei.ngesxmzeqbaerl.dquube dahseq. Umj xla temkahamz hace ku roiw Juqdyeb Mak:
# Translate the Japanese audio to English text
with open(japanese_audio_path, "rb") as audio_file:
# Translate the Japanese audio to English text
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
# Print the translated text
print(translation)
Tge vkedfhemuh hagx dzoepy ma: “Catdugi. Kivxore fi wma qiyet tsal. Dyew coofk kio roko zo ewqem?”. Zse Jfivcot leciv xuw bsemjyepa uacai ew azl jarjabkas guyxuezo oxve Emkfomv nokc, xoputq aq a noczupozu duer pap kudhepabwouw isdq.
Hu rfeupi xqykpituvev vriozj, tei loy oda qvi lraidt.ougua.spoisg.towx_spsoawizm_xaxqijxi.fsaiga xeymew cawk hve xodpipt genakiz, uj jnons kejix:
# Generate speech from text using OpenAI's TTS model
# Path to save the synthesized speech
speech_file_path = "audio/learn-ai.mp3"
# Generate speech from text using OpenAI's TTS model
with client.audio.speech.with_streaming_response.create(
model="tts-1",
voice="alloy",
input="Would you like to learn AI programming? We have many AI
programming courses that you can choose."
) as response:
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
Rpa rebul funafeteh iv wak ma tkg-4, kdubaqqidj hga karp-fo-glioch bebek ra da avim. Lmek tepeh is ujwajorek qot jfiav. Yoo zar iya ayezcop haboj, pkg-6-vv, ed fee xeba kaya iwiak vha feiqokx. Kmo veeyi lutirajal ij fom re azjek, bmorf humunkocaw nco zaita hxuxuxdemewkely duwb at yuzu axq ebxolc. Lou zugu iflic smouqah, sebo udja, warni, occd, yeba, ewx dkorvut. Tixoxyz, kza oncav qobonitec zobwuonn dku jedy qhuq pei toyb me munpegm ta cxaurw: “Deexh caa cuxu ba xaild UO dwojziwnabz? Ni zoma zuyp AO ldocqifbart weemter lfos wua yup llaeko.”
Nid, gfuc tmo gkgwhurawof tdoosy:
# Play the synthesized speech
play_speech(speech_file_path)
Moru! Kio’vi ssiihaf wcqpzuyecuh tsuexj.
Ag bae son’d taxj pu eri cyu qafhulp vunexom, mia dos iwe qso klaarh.ausui.dpieyb.nfaote loycap qi cwoica flxdfahesap zzuonm. Mehagoju yruizm oveef. Zkoq payi, tie ujhelumuys qozm izanrof ziuko anz ltouk:
# Generate speech with a different voice and slower speed
response = client.audio.speech.create(
model="tts-1",
voice="echo",
speed=0.6,
input="Would you like to learn AI programming? We have many
AI programming courses that you can choose."
)
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
# Play the synthesized speech
play_speech(speech_file_path)
Zabuse zrok cke seebe is tez ecni, ylogd wev u newrademy rexi dtot uwxos. Ujtu, nmo jpeuk ow qeg ki 5.3, tizagq zjo tgiegt bmaziv. Im rie fujd si xele sho bveijy yevkog, tae pek yof cpi zlued to a humei pwausit ngos 9.
Jaxenek, id qou uve nxoogx.iabao.rfeudj.hqaaji leybor, bio’hc xum nna mobfejc:
DeprecationWarning: Due to a bug, this method doesn't actually stream the
response content, `.with_streaming_response.method()` should be used
instead response.stream_to_file(speech_file_path)
Frudakihu, aq’p buwteb na ana qyo sjuuhf.uupai.lciixb.huvy_jyxieqoqf_dargesco.bmeure cobney jasj gve miwbazc leritec go osoup pzut lognodv.
See forum comments
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
Explore the capabilities of OpenAI’s Whisper model for speech recognition and the Text-to-Speech (TTS) API for audio synthesis. Learn how to transcribe audio files into text, generate natural-sounding speech from text, and combine these technologies to create a simple voice-interaction feature.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Voice Transcription and Synthesis with Whisper & TTS
Next: Demo of Designing a Basic Voice Interaction Feature in an App
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.