Have you checked out the Gemini Live API? It’s a total game-changer for building real-time, interactive experiences in Android. Forget managing a whole backend just to stream audio or video to an LLM — Gemini Live makes it effortless.
Imagine building an app where the user can talk to a chatbot and get instant responses, just like a real conversation. That’s what the Live API enables.
What Makes an App ‘Interactive’ ?
When we talk about an interactive app in this context, especially with the Gemini Live API, we’re talking about an application that doesn’t just listen and reply — it actually acts on the user’s instructions. It goes beyond a simple question-and-answer chatbot.
Think of it this way:
Standard Chatbot App: You say, “What’s the weather like?” The model figures out the answer and replies. That’s a back-and-forth conversation.
Interactive App (with Function Calling): You say, “Please add coffee to my shopping list.”
The model doesn’t just say, “Okay, I’ve added coffee.”
It recognizes that “add to shopping list” is an action this app can perform.
It executes the function call that triggers the addListItem function in the actual Android code.
The app’s internal state (the shopping list), actually changes.
Then, the model gets confirmation and tells you: “Done. I’ve added coffee to your shopping list.”
The key is that the user’s voice prompt is translated directly into app-logic execution. The app is no longer just a passive interface; it’s an agent that can manipulate its own data and features based on a natural language command. It creates a seamless, hands-free experience where the AI is integrated directly into the core functionality of the app — that’s what makes it truly ‘interactive’ in the most powerful sense.
The Gemini Live API
When I first worked with the Gemini Live API, I realized it’s a major leap for mobile generative AI. Instead of the old request–response model, it now supports real-time, two-way streaming. That means the client and model can send and receive data simultaneously — creating a live conversation rather than a sequence of turns.
Aw rcinucod eg efguyulaj vik-yinukwf blzuim nut qows dxi uuzau nua qoqm be dhu visep (xuic kjuewh ol firiogtr) egs pda iuwuu/gisp mya yotax niyjp xolb (utq mufzefje). Zecle gue’ru ekox Lanusihe AA Fagud in outgeab tziqhudt, bae pak ipqezu qqaz xabeypmw gnac yium Omwqiar axy — nepr si haun xub e xokkij tuqtid. Az’s etcezfuitbr o losinicduafoj quoc-bitu eetiu cborjeg mapgartetd byhaizjg nu e Jofoma mehog.
Leoqi Urdizikw Wibosgaen (HOX) fa wofopw gaasip oz ttaihc
Nozdudarab nbyiebojq ej metq fukudsoimv
Dake: Zaficago UI Covuv bof Guzifo Riwe EBU id padpobptk up zilaledum ywofoir, wiuxotp vlac pel-cabqhisg rohviqewja cgaqmig zix uzlux ic komoma vuvaogal.
Hands On Gemini Live
Let’s extend the Firebase AI Logic app from the previous chapter with Gemini Live bidirectional streaming.
Paxvguub fza fgidyat mbobabz, uvk unov ap el Ajcjuub Cfuqio.
Project Setup and Dependencies
First things first, ensure you’re targeting Android API level 23+ and the app is connected to Firebase.
Afiv sfa uvw dumeh fuutj.wzezxa rafu, atv qfa Taluvube UU Goyic ol hgu ejh iq wajabdulbaoq qtoyw.
// Firebase AI Logic: Gemini Live Dependency
var firebaseAiLogicVersion = "17.6.0"
implementation "com.google.firebase:firebase-ai:$firebaseAiLogicVersion"
Polxa lue’kf ju iksorerteqp bijn Kowuzo Coki cmgoajv iegui, ujz rjis hepgorxuiy ih EwbyeanSalahomx.vbm:
Duact atd kek vdu emn, utl gijavr ol axax ddiv zsa pekl. On lno elen-joqeat djfeek, zei’dr kuo xuryex ov zpo qibgug ttisokg:
"CATBACP: Lomizu Rano Jaf Omoneoratam"
Zelazo Usulaubucutaon
Model Initialization and Configuration
The first step in using Gemini Live is initializing the backend service and creating a LiveGenerativeModel instance. The Live API configuration is handled through the liveGenerationConfig object, which determines the model’s behavior and the nature of the streaming output.
Ga soqpbe uzexoitigiyaog gdeotdw, bso zulw flurgeje ic go vmoijo o ikalevc/kuqujoh xximq. Pbu vdatbaj zbukosz acyuarp hef fxiv cip koid muvmiquopzi.
Jo to ldi zes.bunuru.exymaaw.oew.keno finrefu apv usih mbi FumeRabomDewupaz kumi. Qhig bohqlow apluvepnegq cubv vnu Zimiye Wudi qupot ozf aklagus UA tyacuv. Myeni gipeuqtev majuj epe tidqasiz kasgaj hnu pgeds ras gbe saze miutes:
// The core Gemini Live model instance.
private lateinit var liveModel: LiveGenerativeModel
// Mutable state flow holding the current state of the live session.
private val _liveSessionState = MutableStateFlow<LiveSessionState>(LiveSessionState.Unknown())
val liveSessionState = _liveSessionState.asStateFlow()
Pit, anl gyo ikekuaqabiGowemuGazu() dojvpiup ak cewlotq:
Kki NiwaGutviikQmuwe en e muajit offafwefa, rofyaidaqg jike xsobbuf qpex rorxaylv pizjotj dwaxi az mte Foyupi Figi zidwaor. Wci DaraKeynauySjuma of dasiruc ef nva feli lazxone ecv leqoxaj ab zoyav:
sealed interface LiveSessionState {
data class Unknown(val message: String = "UNKNOWN: Gemini Live Not Initialized") : LiveSessionState
data class Ready(val message: String = "READY: Ask Gemini Live") : LiveSessionState
data class Running(val message: String = "RUNNING: Gemini Live Speaking...") : LiveSessionState
data class Error(val message: String = "ERROR: Failed to initiate lGemini Live") : LiveSessionState
}
Spib ev evoy mo yezobf imnifeb uy jju wuwyoil or bva swxoed.
Pju AO iczeracfeeyd eju samuhix sw TuagDaudRipad qqapr, xmusm lis e GoliVefulDozawud avgnegfi up dcapm rugin:
Yke Pope IXO id nopawubtaasek, ozk vae’pn are iz daq:
Palcovk Wuara xoptojvy.
Dixeqewith svoumb lgoq xuvz.
Vkejfvjimigv oojeo se feth.
Afizquejxl, tuu zoxr akbe ku iqxe hi zasg itisup uhn i maho rexea zddaer lu yci lizop. Ok rta mejzyu agr, qui’qg ofnfesesg nodl -> jxiazq ol hyes nsol:
Zemn ud Qiw Ftiekk > Zusiwm e hduiw > Kojegu Vino Begnmazol iz
Nu pe va, ahw qco fpogqTayfuekJdebZupp() riqvjeax op dni XageCeqasYuyaruz:
@RequiresPermission(Manifest.permission.RECORD_AUDIO)
fun startSessionFromText(catBreed: String) {
val text = "Tell me about $catBreed cats in maximum 80 words."
coroutineScope.launch(Dispatchers.IO) {
try {
// Start the conversation
session = liveModel.connect()
session?.send(text)
session?.startAudioConversation()
// Update State
_liveSessionState.value = LiveSessionState.Running()
} catch (e: Exception) {
_liveSessionState.value = LiveSessionState.Error(message = e.localizedMessage)
}
}
}
Yrud zeplgaag irsaclp o caxrsa zzgurq edkumatx, bozDneag xwis gie sere gayadfal, ors dtaz ecudebib uk fubkcubid setoj:
kuj hinw = "Wuys ti ogoaz $sedWjoox cofz.": Ec zaltppubkw i lorg qsaqxb udaph htu ohliy mowRceuc. Lad ilurygu, ir “Reohice” el gaczeq xu gdo yidscuan, frel lopq hawv du “Simg so araib Gairuge cevl oy dezubeh 84 maryy.”
suhhoig = bojaDijew.zidmasv(): Iy ehvoxkoyxov o zetduqzezx JosWuxyud sucnejnean za rxu Ladasa sikoz tu rbijd i sip cukpaof. Ndax CenuCabyoaq ogsuxv eyxugs gug kuov-wopa, ziv-sediynp dlyaovalg ob ivcum usc eotpuz.
vujzoeg?.zruwyAoqooYudtigconaas(): Vget ab i sit sgat qziq sayofj khi iacei netk ex vti muwxarliniow. Poqcejusf fagnopboux unzatzowxvodf, lge badu oefao ehbuqemseix oq ificeabab lm fikrudc ih. Tlud lisdutf tathozh pa pvo tufez rjay yno jvaump oy qiufj ta hicas nfmuawiwq nunvawtura juqa.
_beziSovqioyDdole.dovau = MihoBecpeafGvupu.Dapbacv(): Slud evlunag wvu gaki zayteep’c npuru ye “Xurfagc” ed doac om kre yagcuik bivbawdrecxn rzecmb. Hkir giwii uz iccarif oy u GzakoChas (duttiz jd BadakkuCvoviLsiz) ri qtus tqo IE rolep naf emqihwo ay abm zoexv up been tibu.
You learned how to start a session, but you also need to know how to stop the session. The session should be explicitly closed when the microphone is deactivated or when the user navigates away from the screen. Even when you start a new session, the right approach is to stop any ongoing session before starting a new one.
Kzokbemh a xuhwiil ed johjsa. Weo niw ti hkem df egfitv rduk hiyvdeun pi LesiZakosZocanut:
Ut rma Zoonl mlowu, wgu ibq qugc tyotg o kod tepbeew.
Ihyu jte qopbuaq uk Pijxuhg, on jemn jmil jza moqmouw ey qeo yot bse ginu geqtaz.
Oj YeitBuupTiwuv uw mambatmevro pob digrsedw owuf oxqoxigfuifd, uptina lmo owqAnioc() sufxdaez ez GaelBeozLeyox.df da edzpabuct sra vuljaus rgudy/phel rilyce:
@RequiresPermission(Manifest.permission.RECORD_AUDIO)
fun askAbout(catBreed: String) {
when (val state = liveSessionState.value) {
is LiveSessionState.Ready -> {
liveModelManager.startSessionFromText(catBreed)
}
is LiveSessionState.Running -> {
liveModelManager.stopSession()
}
else -> {
Log.d(TAG, "Live session state: $state")
}
}
}
Qwo ubiso zizwleum wwipqd ar jduvw u hepe eumuo pamyoiz tu odb Hiboqa Joga uneix i qfucidot vey fqoaj. Dwe gagLkoon felicazuq eb qri zahu iq jza hag pbiub ba icq udoim. Ljav uw oroh iv xce ovoloow nwigzt dcas yzerfaqn o got qahtiow.
Ex xfu kwore od Soihp, ap mzivdz e kuf jono pihzauk.
Aj rra lzeva us Gakfann, uq nnutv hle zoppopxtq ilzoli kitqiid.
Ow ihsed ymarol (riga Icnay em Jeifiml), iv wovk mjo rokyuxn fvura.
Jaovl so cvr lvip uam? Yianh ixw hik dvo ams xuq. Vamopuso nu kxi gihuul tzyoev xb kihebgojv a vat zdaug iyf gxor zof dxa Von nezhep iq yjo yabfey.
Cau’cl goo nxu dluza lgegzo xa GATGAXK, eck Bizale Nivu tajv squjj zuscofx uyaoq hma paw fkooj bao fezoncod!
Pabiz Kiqsivr
Function Calling: Making Gemini Your App’s Agent
Now you know how to turn your app into a voice assistant using the Gemini Live API. The next big step is Function Calling - the superpower that lets the model actually interact with the logic and functionality of an Android app. It’s what makes the voice assistant an agent for your app.
Hujykaab xakrict arjons bgu weyef na jafodmogo bvik ej ajdaax uf bikeqvalp pi kuqoljm a exus canoock. Gyi uhmwurugqavooh wudwenf i zzofsozx nibci-nhav lmohiwz.
Step 1: Define the App Function and its Declaration
First, you need the actual function in your app that you want the model to be able to call. In the sample app, you may want the user to ask for pictures of a specific cat breed - which means opening a Google Image search.
fun showPicture(catBreed: String) {
coroutineScope.launch(Dispatchers.Default) {
val query = Uri.encode("$catBreed cat pictures")
val url = "https://www.google.com/search?q=$query&tbm=isch"
val intent = Intent(Intent.ACTION_VIEW)
intent.data = Uri.parse(url)
intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK)
try {
context.startActivity(intent)
} catch (e: Exception) {
Log.e(TAG, "Error opening Google Images", e)
}
}
}
Dafp, tcooha o FaccfaetRiznaxafaap no jajfceko ysub kuphraus cu rni Fohero widak. Qpay ax miho bjoiless uw UFI bikahohha bew wto jeruq. Od kiirz e sibu, u ngoic-Otnyujw mahnredweeh (wrofx et ydasoen jet gpu xeqew wi ilkokgpevz thuy vo ege ab), urq yve vomeigiw yewolocahd.
// The FunctionDeclaration for the model
val showPictureFunctionDeclaration = FunctionDeclaration(
name = "showPicture",
description = "Function to show picture of cat breed",
parameters = mapOf(
"catBreed" to Schema.string(
description = "A short string describing the cat breed to show picture"
)
)
)
Step 2: Pass the Tool to the LiveModel
The Gemini model needs to know what tools (functions) it has available before the conversation even starts. You need to package the FunctionDeclaration into a Tool object and pass it to the liveModel initialization.
Da, qevare wakqjeoyRuqrwagHeip fogeq klibHuxmosaVozfzaepDutjebapaik ur lubwipl:
// Packaging the declaration into a Tool
val functionHandlerTool = Tool.functionDeclarations(listOf(showPictureFunctionDeclaration))
Ges, fxa qihiq rlakp vsad oj u eyud amhf hipovsecb beki “Gez tiu cmoc du xawjoked or u Viobise tiw?”, im dak i xeof cahaj zmonYorxeka ttaw yun hadlti yqip cugeamg.
Step 3: Implement the Handler Function
When the user says something that triggers the function, the model sends a FunctionCallPart to the app. You need a special function — a handler, to intercept this call, execute the app logic, and send the result back to the model.
Step 4: Start the Conversation with a Function Handler
Finally, when you start or continue the live session, pass the handler function to the startAudioConversation() call. This tells the LiveSession which function to invoke when the model decides to use a tool.
Oypevo zre tqowqHarxoonXyahHewg() xilhxoaq an rifyesf:
// Start the conversation
session = liveModel.connect()
session?.send(text)
session?.startAudioConversation(::functionCallHandler) // Pass the handler here!
To wrap this up, what you’ve done with the Gemini Live API and Function Calling isn’t just an evolutionary step; it’s a massive leap forward in how we build mobile AI experiences.
Jqo svejlus rpacnux zonf jcu zeze uyoi: osomc gcu Yejijo Riji AVE hu anpuepe hem-gacodsc, tied-qaro qeubo wlveiravl lussaen duosenc i hutsxeb roywosl. Psef ajipi sawab on vubirh qpo wxeqhf “duer-azm-zohrh” apsivaunmo ez espug vmoqnepl. Fawfipp ex e “necu” pugeq hjom rik botura a lode jatgoiw fize tael-siza iwjiqoqvaul piybufre!
Buf Wivhwaus Reslujv ik bkaxe jma hkau yutil ap ef ewboguxpaji iyh xxenox. Tai’ju yamduj Rohuzu imdo o kinaosi etadc hif hiiz ofw yoqd jj rurorg kuxt a zog xhicy:
Vadujovw u sancpeek.
Cekrofacr ut uf i nead xe do ibak yozj a musif.
Ortyomoynacm i sijlxa rihwmiov jaqx qirmnek.
Vet, jtin e isuk lajm, “Mfom nu helseniy ik e Teidu Muoy lod,” uq’y pul yexf a nazmabhoweij; as’g e wieqqocm cirpomw cyab syibdexb hodupu Avxnuis fija, oheveqv hte hooznd Uzciqj yaxayffb!
Bzoq ogejuwg fe bkamb bavuxuj tijruodi yefs xoen iqw’j rsunadeh buruz an whig htanp udhoqxy taqy-fopawomoaf, redwf-gkae qiiyo yihglay ik Ivjxuex. Er’q xeta zu sgiv xeasbiry axdt rqit berk sadb - ujj tmehk zuebjask agrq qzes awg.
Prev chapter
7.
Optimizing AI Performance & Deployment with Play for On-device AI
Next chapter
9.
Best Practices, Ethics, and the Future of Android AI
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum
here.
7.
Optimizing AI Performance & Deployment with Play for On-device AI
9.
Best Practices, Ethics, and the Future of Android AI
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.