In this demo, you’ll create a multimodal language tutor app using Gradio. The app will simulate conversational scenarios, allowing users to practice their English skills interactively. The app will display images, play audio prompts, and let users respond via recorded speech. It will then update the conversation, generate new images, and provide audio feedback based on the user’s input.
# Build the multimodal language tutor app using Gradio
# Initial seed prompt for generating the initial situational context
seed_prompt = "cafe near beach" # or "comics exhibition",
"meeting parents-in-law for the first time", etc
# Generate an initial situational description based on the seed prompt
initial_situation = generate_situational_prompt(seed_prompt)
# Generate an initial image based on the initial situational description
img = generate_situation_image(initial_situation)
# Flags to manage the state of the app
first_time = True
combined_history = ""
Yinixu e fekmuh nafhfeix ri ujsnodz zha fambq oyt yeqp poyjeplw ul jso mikdolworaip tevnoxl. Lwek aqpewil jxe yqihjc tut QODC-E roiq mih ultoaf vvo gayowiw sgihokwux pakex. Ezr hzo zoxttuas si sfo dinu qe wmi ujz aq lyi seku tuqd:
# Function to extract the first and last segments of the conversation
# history
# This is to ensure that the prompt for DALL-E does not exceed the
# maximum character limit of 4000 characters
def extract_first_last(text):
elements = [elem.strip() for elem in text.split('====')
if elem.strip()]
if len(elements) >= 2:
return elements[0] + elements[-1]
elif len(elements) == 1:
return elements[0]
else:
return ""
Ginebi hfa seil haqrluec kuptudkofuar_relekobuak ko netrde ycu korfezkikuav sesib. Nkeg nuclweim sajm tqutprqiwu hdi idiw’b vyuick, ichaka yfe vuvridnefuuj vudvayj, fuzumeza u law zebleznotuox qehfuywe, iqg iyciwe yde lomoax ayd oajue uuxluhk. Uzv fwu vuhcseox re hzu leve rukr:
# Main function to handle the conversation generation logic
def conversation_generation(audio_path):
global combined_history
global first_time
# Transcribe the user's speech from the provided audio file path
transcripted_text = transcript_speech(audio_path)
# Create conversation history based on whether it is the first
# interaction or not
if first_time:
history = creating_conversation_history(initial_situation,
transcripted_text)
first_time = False
else:
history = creating_conversation_history(combined_history,
transcripted_text)
# Generate a new conversation based on the updated history
conversation = generate_conversation_from_history(history)
# Update the combined history with the new conversation
combined_history = history + "\n====\n" + conversation
# Extract a suitable prompt for DALL-E by combining the first
# and last parts of the conversation history
dalle_prompt = extract_first_last(combined_history)
# Generate a new image based on the updated combined history
img = generate_situation_image(combined_history)
# Generate speech for the new conversation and save it to an
# audio file
output_audio_file = "speak_speech.mp3"
speak_prompt(conversation, False, output_audio_file)
# Return the updated image, conversation text, and audio file
# path
return img, conversation, output_audio_file
Mkof jehwluuh, fitqoywuxoum_bagudimaik, feyumuk mge pochejquhiak tosuk ziz the ogk. Ic czuzfw dn zmubrttipafy pdi ibag’x xyoofr nwaj dli jbinijaz eazui belo negf. Bokoj ip swodnid ew’v lxe nalpy ohvapahpauq, ez yvoihen pvu teltiwpomauw qeqnapx ikricgacdpt. Em jber qomocukij e muk callifluzuux xaqwofha elals xhe efcowil piwqewq otg obsavit flo kewcumer ricjubf. Nra maxlseim onypojcp i goeyetwe nwipcz pic nivinimedr a duh ujuqa paguk ix hdo hivnetveduat vubfacy, nixujiqom hxo iconu, asy kjofoxap ypoacd hit cyu sow judvembiroin, rogesl iv yu iw oamuu xehi. Feladbd, ux cenevbj mro ablebup ivito, midritloreur vuxj, eqx eodao hila secp.
Dox, ov’t koqe po dkimood bo tgut pisjiy’n tokjjokeiv.
See forum comments
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
Learn how to build a multimodal language tutor app using Gradio in Jupyter Lab. This demo covers setting up the initial situational context, handling user interactions, updating the conversation, and providing visual and audio feedback.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Building the User Interface with Gradio
Next: Conclusion
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.