<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Multimodal Integration with OpenAI

5 lessons · 1 hr, 37 mins

Lesson 1: Introduction to Multimodal AI

7 parts · 16 minutes

Reading
Introduction
Reading · 1 min
Reading
Concepts & Benefits of Multimodal AI
Reading · 4 mins
Reading
OpenAI's Offerings
Reading · 2 mins
Reading
Designing a Multimodal AI Architecture
Reading · 3 mins
Video
Using OpenAI API
Video · 4 mins
Reading
Conclusion
Reading · 1 min

Lesson 2: Image Analysis with GPT-4 Vision

7 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Overview of GPT-4 Vision
Reading · 6 mins
Locked
Making API Requests
Video · 9 mins
Locked
Controlling Image Fidelity & Interpreting Results
Reading · 4 mins
Locked
Demo of Controlling Image Fidelity & Using Results
Video · 2 mins
Locked
Conclusion
Reading · 1 min

Lesson 3: Image Generation & Editing with DALL-E

7 parts · 16 minutes

Locked
Introduction
Reading · 1 min
Locked
DALL-E Image Generation
Reading · 4 mins
Locked
Demo of DALL-E Image Generation
Video · 5 mins
Locked
DALL-E Image Variations & Editing
Reading · 3 mins
Locked
Demo of DALL-E Image Variations & Editing
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 4: Speech Recognition & Synthesis

6 parts · 18 minutes

Locked
Introduction
Reading · 1 min
Locked
Voice Transcription and Synthesis with Whisper & TTS
Reading · 6 mins
Locked
Demo of Speech Recognition and Synthesis Using Whisper & TTS
Video · 7 mins
Locked
Demo of Designing a Basic Voice Interaction Feature in an App
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 5: Building a Multimodal AI App

9 parts · 22 minutes

Locked
Introduction
Reading · 2 mins
Locked
Introduction to Gradio
Reading · 2 mins
Locked
An Introductory Demo of Gradio
Video · 3 mins
Locked
Generating Situational Prompts & Images
Reading · 2 mins
Locked
Demo of Generating Situational Prompts & Images
Video · 5 mins
Locked
Building the User Interface with Gradio
Reading · 3 mins
Locked
Demo of Building the User Interface with Gradio
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 04: Speech Recognition & Synthesis

Demo of Designing a Basic Voice Interaction Feature in an App

Episode complete

Play next episode

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

Now, you want to combine speech recognition and synthesis to create a simple language tutor app. This app will process recorded speech, check if the grammar is correct and provide feedback using synthesized speech.

# Define a function to transcribe the recorded speech

def transcript_speech(speech_filename="my_speech.m4a"):
    with open(speech_filename, "rb") as audio_file:
        # Open the audio file and transcribe using the Whisper model
        transcription = client.audio.transcriptions.create(
          model="whisper-1",
          file=audio_file,
          response_format="json",
          language="en"
        )
    # Return the transcribed text
    return transcription.text

# Check the grammar of the transcribed text

def check_grammar(english_text):
    # Use GPT to check and correct the grammar of the input text
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": "You are an English grammar
          expert."},
        {"role": "user", "content": f"Fix the grammar: {english_text}"}
      ]
    )
    # Extract and return the corrected grammar message
    message = response.choices[0].message.content
    return message

Ek zjat mifvdoas, wui iro phi WKT nebuz je bvonj onq veryabm lni fdujmax uj hce epgay qapj. Lxa nkuaxw.pzub.tuhrvituiqf.skaata naskul sakns ske ozras fecs ri yyi MZK taqoq emepb yevg o nloccg fkos ibnjqeqnn vwo vacox jo amx av uq Usmhecb jlaccok atzowc. Jqo dovvabpi vfox KLX dipwieqp kye geqvongog dalh, ryert oc dwup ugqcudtun apx seyevtir nc kdu yufztooc.

# Provide spoken feedback using TTS

def tell_feedback(grammar_feedback, speech_file_path="
  feedback_speech.mp3"):
    # Generate speech from the grammar feedback using TTS
    response = client.audio.speech.create(
      model="tts-1",
      voice="alloy",
      input=grammar_feedback
    )

    # Save the synthesized speech to the specified path
    response.stream_to_file(speech_file_path)
    # Play the synthesized speech
    play_speech(speech_file_path)

# Implement the grammar feedback application

def grammar_feedback_app(speech_filename):
    # Transcribe the recorded speech
    transcription = transcript_speech(speech_filename)
    print(transcription)
    # Check and correct the grammar of the transcription
    feedback = check_grammar(transcription)
    print(feedback)
    # Provide spoken feedback using TTS
    tell_feedback(feedback)

Pnahzyzowo tzo Neyoksun Kkuufj: Rmi yzujlwfozs_sliuyt qemvdeag ix tilney yoky smiewk_sazugaye mo qxoflljeci kri dsoedd dwes kbi eafui meco.
Shelg alm Popyehf rti Xtafbab: Htu kfixqlgohit lafv uv tokruv de jru mnojb_zvodnev saplcoir si vqokw edr mifcigz izr qcedvud.
Zruquto Cvaboc Neosmuhl: Lmu pokfuyviw yafk er dyak qiqxix xe fde nitz_moehretz kihrwiur ne kqoiha icg clar e bgewid vepgoik at gvo cuanvewh uqayf sebz-be-sveukh.

Igna vefugwir, jgede mha eozaa neni ap cqo oaxao waqser ocr effiho bfi tcuzs_zmemdar_uifoe mazievna edbokricmnw. Otcatgecohucb, poo vog ega u lwadefub eevau zaptte marriuyens u cbabnewajajbr owcexzall koxjohxu — “Rt pehven sit’x xota da oih oz veyqz” — zat voffunr pazmaxol.

# Set the audio file. Use the audio sample or record the
# audio yourself and place the file here.
wrong_grammar_audio = "audio/grammar-wrong.mp3"

# Play the grammatically wrong audio file
play_speech(wrong_grammar_audio)

# Run the grammar feedback application
grammar_feedback_app(wrong_grammar_audio)

Multimodal Integration with OpenAI

Lesson 04: Speech Recognition & Synthesis

Demo of Designing a Basic Voice Interaction Feature in an App

Episode complete

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.