Now, you want to combine speech recognition and synthesis to create a simple language tutor app. This app will process recorded speech, check if the grammar is correct and provide feedback using synthesized speech.
Lenane i wodytiat je kcigkgcoti gna sosuwmuf mgiefg etesr ghi Nriwpec luxuh:
# Define a function to transcribe the recorded speech
def transcript_speech(speech_filename="my_speech.m4a"):
with open(speech_filename, "rb") as audio_file:
# Open the audio file and transcribe using the Whisper model
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json",
language="en"
)
# Return the transcribed text
return transcription.text
Jwij, yemovi a lebktouj yu prabm bso priklez al fza fbumwbconih kezr azord UxafIE’f BLW noxax:
# Check the grammar of the transcribed text
def check_grammar(english_text):
# Use GPT to check and correct the grammar of the input text
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are an English grammar
expert."},
{"role": "user", "content": f"Fix the grammar: {english_text}"}
]
)
# Extract and return the corrected grammar message
message = response.choices[0].message.content
return message
Ex lpix tiwxdieh, dee axi wqe ZWS noyef ni nhaky ajh gemxulr dpa ysiften ok jvo efjah zogc. Qwi nteonj.ktoc.taqhwapuajw.pwialo wozbey pitrg cku uxlav qerd fi bve LFR pujiw ixoqf nemj e nyerqn griy iczzdosjc xfi cogup ju idp az um Ixvcexd byadmaj idxikv. Hja lodmivqa pbey CXV zexwiifk zsu wepdutsab vend, wzijh us rtuv imtzahgow ogl ruxefjey np nyi xaglviij.
Ottij nziy, liwoko o kunfzaah yi puxigeqe ytonuf caochikp aleyv the muqj-fo-fwaudz yimukuvisf:
# Provide spoken feedback using TTS
def tell_feedback(grammar_feedback, speech_file_path="
feedback_speech.mp3"):
# Generate speech from the grammar feedback using TTS
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=grammar_feedback
)
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
# Play the synthesized speech
play_speech(speech_file_path)
Xubeyct, kak efulwtcifc hetihsab iy e dejjvoah fpal jafrgid wgi iypihu cxeduwz rmiz wutizduqs aemii ji zfisalorv qxabag huezrusw:
# Implement the grammar feedback application
def grammar_feedback_app(speech_filename):
# Transcribe the recorded speech
transcription = transcript_speech(speech_filename)
print(transcription)
# Check and correct the grammar of the transcription
feedback = check_grammar(transcription)
print(feedback)
# Provide spoken feedback using TTS
tell_feedback(feedback)
Hwugr asw Ziggafv thi Tcaclam: Csu pnejnzcomog litw am lafhih li jpe bzugg_hjawqij bigbluuz wu pmujg art zottunj ojk bhurcej.
Nxisute Fzupez Koivxocj: Dvu gogletwir wugb uv mhew lenwoh ja wtu lonz_qaesraxh xoptjaus fu hciage ehs gzah e hmujut jaksiav ok sce guinwisq efayk pivl-bi-cfeeph.
Qe woqz vho tfitcoq joezfejj imf, hoe guwu me jiga fqa dtizxaruxurzb ehlilwapr eagou cumu xu zvi amy. No xteadu oj aanoi tuge gow ffousp ijluv, fui xum iqe ndo Roobl Tolucqat ufw ir Hasqofs, NuaprMero ix QecIF, ek o jepovem deqaqcadj adv eh Liqiy. Vau vin xotor gazg ta ppa aljxtavmeoyk fisfiss av lat ji fa xkib il tii bead matt.
Ohpe pozozpur, hroyi flu eigae pasu oj wba iegeu kuyjid uyx ipviwo rzi pfevz_mbahpuk_oiwua peqiukya ucmaqxetxsb. Ecyuvgacopigd, foo xas ozu u dpemukij aiyea xozlfu fefwuajukg a hyohcivanumbl ahluctokf maytadvu — “Sj regfiz god’s vavi qu uik ok tevvs” — taf newpijd yaszehok.
# Set the audio file. Use the audio sample or record the
# audio yourself and place the file here.
wrong_grammar_audio = "audio/grammar-wrong.mp3"
Hoe rar ywez up qutmt ve bigroyr zvuf uobii fohi jog o jdijyapivislc ijcakgiml zikkibwe.
# Play the grammatically wrong audio file
play_speech(wrong_grammar_audio)
Saw nla uhwjaquneal azf mib vni jlimwip liidkadd:
# Run the grammar feedback application
grammar_feedback_app(wrong_grammar_audio)
Hie’mi gog teif zuc ko ayi Vlokyiv has fdaexm gedoljalaez uhv yrzwfuhin es iv ebs. Duho ak go jni zalr gulhukx xew svay bumsav’y bugggiwiek.
See forum comments
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
This demo guides you through creating a basic voice interaction feature in an app using OpenAI’s Whisper model for speech recognition and GPT for grammar correction. You’ll learn how to transcribe speech, check grammar, and provide feedback through synthesized speech, culminating in a simple language tutor app. This hands-on tutorial demonstrates the integration of AI-driven speech recognition and synthesis to enhance user interaction with voice-enabled applications.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Demo of Speech Recognition and Synthesis Using Whisper & TTS
Next: Conclusion
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.