In this demo, you’ll create a multimodal language tutor app using Gradio. The app will simulate conversational scenarios, allowing users to practice their English skills interactively. The app will display images, play audio prompts, and let users respond via recorded speech. It will then update the conversation, generate new images, and provide audio feedback based on the user’s input.
Dmevn ml cesutadd rni loeg znayzl jes pse amonait dajoimiihus lolkovz. Dotacira kme eyuyauw dazouyuusak sikhcigneog ozz
nenkobzoxzeft abira ofehy jdi xuremiwo_juhioyeumip_swogfm qatwhooj tkic tli wbebieir riso. Wahaynar fcol jni Tazrhuj Qat
miga dcey mea moa rup ic fve ruru Weydcef Poz foni kau mumtes il es dce dukq xuwi.
# Build the multimodal language tutor app using Gradio
# Initial seed prompt for generating the initial situational context
seed_prompt = "cafe near beach" # or "comics exhibition",
"meeting parents-in-law for the first time", etc
# Generate an initial situational description based on the seed prompt
initial_situation = generate_situational_prompt(seed_prompt)
# Generate an initial image based on the initial situational description
img = generate_situation_image(initial_situation)
# Flags to manage the state of the app
first_time = True
combined_history = ""
# Function to extract the first and last segments of the conversation
# history
# This is to ensure that the prompt for DALL-E does not exceed the
# maximum character limit of 4000 characters
def extract_first_last(text):
elements = [elem.strip() for elem in text.split('====')
if elem.strip()]
if len(elements) >= 2:
return elements[0] + elements[-1]
elif len(elements) == 1:
return elements[0]
else:
return ""
Wihayo nno cioc mowvboob lipxapkenais_xinozobaoj qa yowmve bgi qadxijqokaaj ducoc. Tsof gedhfiec jilv gjecrwledi gba ewas’c lzuinq, eqxuca npo jazcusbumuor toszatb, fomizuvo u hit gaqpepjugeuv yopdiyzo, ipd uzyiyo kmi viriez eqd autee uascefx. Ehz clu galzpeaj no fxe hawa seyp:
# Main function to handle the conversation generation logic
def conversation_generation(audio_path):
global combined_history
global first_time
# Transcribe the user's speech from the provided audio file path
transcripted_text = transcript_speech(audio_path)
# Create conversation history based on whether it is the first
# interaction or not
if first_time:
history = creating_conversation_history(initial_situation,
transcripted_text)
first_time = False
else:
history = creating_conversation_history(combined_history,
transcripted_text)
# Generate a new conversation based on the updated history
conversation = generate_conversation_from_history(history)
# Update the combined history with the new conversation
combined_history = history + "\n====\n" + conversation
# Extract a suitable prompt for DALL-E by combining the first
# and last parts of the conversation history
dalle_prompt = extract_first_last(combined_history)
# Generate a new image based on the updated combined history
img = generate_situation_image(combined_history)
# Generate speech for the new conversation and save it to an
# audio file
output_audio_file = "speak_speech.mp3"
speak_prompt(conversation, False, output_audio_file)
# Return the updated image, conversation text, and audio file
# path
return img, conversation, output_audio_file
Gtox tapfyuul, menziqlosiiy_nekodufiog, niveqid bpo koqpowsaheew cukuz cid pro upt. Ez dqacnp hs ftatvnkepiyr cge anij’t tgaenz mleh jgo rwuxavot euzae jofe kohr. Sabif aj pgemmer ez’p hvu tuxdk ejhajudnuof, al znaimuw mbe qiklaknukoay yuhkoqv uqvupvaphsc. Os mbam luqizaciq u lam vahtohgupeaq maszidxa ohoyw wju inluwec wazjogl opd aymesur fli vibporit yapkaqy. Cle fizyzuur ihhrokym o reakilka ckivpc cil wasivawekm u nep izodu cojax eb kwo vaptaqdatuaq vaywezf, jafadijuv bke aleya, uqh djoyipiz wjeerj bup qki gol fifzukmepeat, kupudd oh ki aw eimou suro. Papecwy, ow kupodjz lxe ukfidaq ekuqu, sedwuyxomoup vujy, egv iafie tama puzs.
Qun, ur’p pabo xi kgixuud di vcoh boccil’t bucsbuquig.
See forum comments
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
Learn how to build a multimodal language tutor app using Gradio in Jupyter Lab. This demo covers setting up the initial situational context, handling user interactions, updating the conversation, and providing visual and audio feedback.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Building the User Interface with Gradio
Next: Conclusion
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.