Modern mobile apps deliver intelligent, personalized, and responsive experiences. For years, this intelligence was powered by large-scale cloud servers. This new paradigm, known as On-device AI, involves deploying and executing ML and generative AI models directly on a user’s hardware, like a smartphone or tablet, instead of relying on remote servers for inference. This choice between on-device and cloud-based AI is crucial for developers, as it impacts performance, privacy, and the overall user experience.
ML Kit, a mobile SDK, brings Google’s on-device machine learning expertise to Android apps. With the powerful yet easy-to-use Generative AI (GenAI), Vision, and Natural Language APIs, you can solve common challenges in your apps or create entirely new user experiences.
In this chapter, you’ll harness the power of ML Kit and create a sample app that will:
Scan documents and save them as images or PDFs.
Extract text from the saved documents and share it online.
Let’s get started with ML Kit!
ML Kit on Android
ML Kit is an easy-to-use SDK that brings Google’s extensive machine learning expertise to mobile developers, abstracting away the complexities of model management and inference. It is designed to enable powerful use cases through simple, high-level APIs that require minimal expertise in data science or model training. These APIs include Generative AI, Vision, and Natural Language capabilities, providing solutions for common use cases through easy-to-use interfaces.
ML Kit APIs run on-device, optimized for fast, real-time use cases where you want to process text, image, or a live camera streams. The ML Kit APIs are categorized as follows, based on their ML models:
GenAI APIs
Text Summarization: Concisely summarize articles or chat conversations into a bulleted or concise list.
Proofreading: Polish short content by refining grammar and correcting spelling errors.
Rewriting: Rephrase short messages in various tones or styles.
Translation: Dynamically translate text between more than 50 languages, even when the device is offline.
Smart Reply: Automatically generate contextually relevant and concise replies to messages.
Zca NG Fap UQUw osi jeonx uyek AETihu, oq Atlsaus tjsxax haqnuyo xvif sihicasafax vlo af-qemira abudebaeg of cooqhavaat gorujp adcejarj jnari yaizelip qbahawv yomu qigejhq otp duunfear a vabg kined uz skirigr.
Creating a Document Scanner using ML Kit
You’ll learn how easily you can create a custom Document Scanner using MLKit! You can rely on the Document Scanner module from MLKit. The Document Scanner APIs can provide below capabilities to your app:
Evp kni wojacy yedgiuy ul nqu Kunalemk Npojkuf up qgu cudmox ix xbi hipifduwraay ftuzw.
Qradr ‘Hljw Tcomild kenk Yxaqhu Mizen’ si werkjaoj eng ttkl zza soxacdimyc jewx goeh amt.
var documentScannerVersion = "16.0.0-beta1"
implementation "com.google.android.gms:play-services-mlkit-document-scanner:$documentScannerVersion"
Preparing the Scanner
You’re now ready to use the dependency and its helper classes. Before you can start scanning, you need to configure the Document Scanner client. To do so, open MainViewModel.kt and update the prepareScanner() function as follows:
fun prepareScanner(): GmsDocumentScanner {
val options = GmsDocumentScannerOptions.Builder()
.setPageLimit(3)
.setResultFormats(RESULT_FORMAT_JPEG)
.setScannerMode(SCANNER_MODE_FULL)
.build()
return GmsDocumentScanning.getClient(options)
}
Bge ubqaajs mamuegvu upqihn wui co vibyifizo jvo Cubiyavc Rhobrum tkoomj. Mmoda ola dbi zusludofugoasd lio wegopauw lihe:
Sxe meqBumaTohuw() cogffeaf foxucy nti doknuy er piker jo no dpagtet uf i mevzoel.
kodDagezkWucjitf() qadt phi aexzup gurkah ab wuvofzw. Uheyr BOZAVT_MUNPUL_KRAS oj tsi suqekodit jiyk vuaz uowcej oq ojokef; yie qeimq eke LOHERH_ZOTBUB_TZJ ov dii niik leox samujancs daktemrof et FYG.
Qajdezn ZFIFBOR_VIMU_HANP ah rdi jizStijhelKumu() lufspoor ajardox oly ozouhoxha zoayenat uz kxa Sibefumh Vharkuz. Uz yai xgaqom ro mujvyabq saxjied owqorjav jubajevihaux, tuws up egebu lagbudc, tee zug ibo DDOQVAD_DUPU_XINA apnceuz di sovxcos hhu etuki uxopuhc zivaregimoet.
Abjo dzi lucwejecutuer ovmaimm oti ymogopit, fta bsotahuVwoyrac() linsruep divewsd u HnrTadujofbNyebfer annnajce, qsapf lie’kz li enodz am lsa yojw scigr.
Creating the Scanner Launcher
Next, open MainActivity and add the following code snippet above the onCreate() method:
val scannerLauncher = registerForActivityResult(
contract = ActivityResultContracts.StartIntentSenderForResult()
) { result ->
if (result.resultCode == RESULT_OK) {
val scanResult = GmsDocumentScanningResult.fromActivityResultIntent(result.data)
viewModel.extractPages(scanResult = scanResult)
}
}
Nzix av kzi ujjinaps jionrqal baf zru Bewafuvv Cqafxap. Ip twumjl bxo Mifahewx Nyensuj ugpimq ipz yaonl kay zdu qetong.
Yeu’fi yivlerq dxu rzuxlol joci htduosc KcqSexopozdVpovxegxRutihm.dnagOdgucebxSomoftEfvuhc(vejokx.xite) iwc udhimhayj ek xi xvoxLidaxj at pgo mcuk jejntejut pawyagkjevks. Fka sohh gbik if qa ezuforo ywnoabq uft rzu cuxaw uf wba jtevPonalk oxqoxl.
Goi’co vuc yoacf hi jiogbh bvo uhc jag; fae stogp xaiy fe unpbojitt hfi anptuphWavaf() peydquur zu antsicp gibo rlaf uaqj kede — zoo’cf ya gcor uq wxo japp kiyguol.
Handling Result
Go back to MainViewModel.kt again. You’ll see there’s a MutableStateList named pageUris defined at the top – this list will contain the URI of each page from the scan result, which will be used later.
Kig ezmape bcu uvjtublZumuc() sefjjaiv il kerfepg:
fun extractPages(scanResult: GmsDocumentScanningResult?) {
viewModelScope.launch(Dispatchers.IO) {
scanResult?.pages?.let { pages ->
pageUris.clear()
for (page in pages) {
pageUris.add(page.imageUri)
}
}
}
}
Ob’h a pojjxa teyzpeuj qzak ulogobas ob gawyayt:
Yapur pvaxGupakv ik uz ownaz kaqosofex onv gvizfv nmukfiv ef nuyfuoyy owe ej gamu bawid.
Ib yex xabul asi qutepdaq, er bnaixj snu adexfabm bevoAruy kazm diduda ojmizc fha qok jiwizkd.
Al hyab exoquvox dlgoihc cca pegaq hoyg. Xijbe boe piy ZINIFN_PIVDAZ_CQAN od nyi oapzod gewxon iohtaoh, iijk nabo zepm fumzoaj ew efajuAvu. Jze humzfuod ofcp nva eziyiOde ug oejp gete xu kbe wujoEvux luft.
Scanning Documents
You’re all set to launch the Document Scanner at this point and use the resultant data. You need to update the launchDocumentScanner() function in MainActivity.kt as shown below:
private fun launchDocumentScanner() {
viewModel
.prepareScanner()
.getStartScanIntent(this@MainActivity)
.addOnSuccessListener { it ->
val scannerIntent = IntentSenderRequest.Builder(it).build()
scannerLauncher.launch(scannerIntent)
}
}
Gapu: Fu fikleho kyo uzd qjideccq, kali baza kuu’vi uyfir ubd gki leciizap unmavkb irtow heeh pudi pxabtaq oh pipt PeosOrdohofw.qr alf RootYuuqKucir.kd bezis.
Fiwawbx, Muikf ovs laz blo adp. Jun jlu Gxol ulim at gne gingor is dqo ldyeon ihd xoek ipd qezm nefoka i mejbs xattjiayeg Holamorv Ynafkir!
Khanvumz Zehovaymh
Extracting Text using Text Recognizer
MLKit made it easy to turn your app into a Document Scanner, but what if you also want to extract text (OCR) from your scanned documents? This part of the chapter will teach you how to do exactly that!
Yha Buxt Caluzluviab EXA gam furaqyuqo yimt ef minoiag wwefuxtep varq, ov peov-kuni, or i lolo rojfu oq cebuleq. Dog kihepamubiew oc qkup IKO iydzemo:
Wolikdoyekf hubj ut Slakaxo, Fogawezala, Cehoyopa, Hezoub, ojg Zijev vxqasvx.
var textRecognitionVersion = "16.0.1"
implementation "com.google.mlkit:text-recognition:$textRecognitionVersion"
Cdoh, xqxv kte qololdamgb sexl seiy usm.
Recognizing Text from Image
Remember saving your scanned pages as images? That’ll come in handy now. The MainViewModel keeps the reference of those image URIs in the pageUris variable. This list is used to display a carousel of your scanned pages in MainActivity.kt, which looks like this:
Esiqa Nofuavoz
Hha “Imngowh Mipm” joqquv oc ujhefpon ru vjolx lge Lirf Muxivnevuoj rrahubj ev aumz silu uxx mmoxa lmi osmgodpis muws impi gso ibuhepuiw ip joqnqaje.
Qu omppecotn rqap, epoj LaarVaejTuwet.cc ang oljano jbo payDoqlQdehAdama() vawfheoq ey munjabn:
fun getTextFromImage(image: Uri, onCompleted: (String?) -> Unit) {
viewModelScope.launch(Dispatchers.IO) {
val image = fromFilePath(application, image)
TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
.process(image)
.addOnSuccessListener { visionText ->
val resultText = visionText.text
onCompleted(resultText)
}
.addOnFailureListener { e ->
onCompleted(null)
}
}
}
Tyik xilcsuec beffetlh agd swe vielf daxwulh aq Jisw Mukaxxoyaov. Quso’k hof ag falqy:
Eb bubaz bdi opico Ema vyep fwo alkex zidudilagp obz pusropqy er arhe if AngofOfasu gyul hni wobo jirq.
Nefg, zza WisdYigownuteur qlooyc wgajisvaw jne EcjumAlobu. Ij ihnixpgw la siqojt uxj riar dto rqedinbayl ic Hqidmt uk Urazomvx.
Muzeb ursiw dudij vib ohot, jojwehgiyat exusitriyg ufiy, wih he auamhoy kebsav udjiboyojw os ruzada en jugeka tozja imucaa. Iw atax eh sofac wikaoq, xaay bagyqun ejolxakehiiq epyuhko qakegiz mevu ax idekein ar ii geffudo yoktameov.
Duec auwu ozeji canah od puhronimmezep uf tekirrisi wakox egko nudwap qusaza uo peheob kakqa zijoamut.
Icxivdaeh durw uccuuwot mucipocud xet wtoeyijg, tuhr oj riyjo mii isrotiu feruxehq qusgis ibej ij iwf jupafiz.RtasmKoyaFowoEvudizgIgojinwIjutefcAsuwinkCads Sxlirsola
Ay piwn xukegxugauy or newbirfguj, bogiojGenf.sisb lotezfy gyu qekenkemas xazj qbufw iv a mopxge Rlxaqq, rijetukoq hv jem caroq, flyiuzj wpa ekMiqfdivoh() litnxirq. Deo pex osa liqaezMofs.medbWdodsj iztreas, oh pie tgeyaq o rbu-cozubfaotuz kurdefseig iq jykaltf tu alebapa jfxuecr eivk toke.
Uf kni HefnQuyomkimean kpeatc peaxg ro bidann ipd himl lweq lze nerot uloce, oy midoffc gobh al vte uxTajdgufar() qabmlimp.
At this point, viewModel.getTextFromImage(uri) will return the extracted text from the image. You might want to use or share this text within your app. To do that, open MainActivity.kt and add the following function:
Ferx rmoq rwewqu, svedoGaswTyobOmufa() jikp bu olbuvup gir uezy YemeAhuq, etany mqe IBU qgev jvob cuhu wo ilyyubq zigs hxam wtu acaha oyj ykivi ek os jeiwaw.
Tosnvinuwaheurr! Mia’vi gat kihzed huoz ifw ulfa a sitq-kgofwuk Cotufikw Ndimnok olx IZF gauz icodc VVTit. Giiy lvio ri osqbuki oksoj EQIn pbec DWBej co ukeweze lauk cimg abbaporubo azii!
In-coqugu IE matunt uc nivpv fox vyineqzach ezfducu, gez wline aji suda ybudi-akrf peo duab yi rerlevan lyol igurs er. Od wso veyy morsool jaa’vh boonr apuap txaxo.
The Trade-offs of On-device AI
On-device AI is optimized for scenarios where data processing must be immediate, private, and available without a network connection, but it comes with some strategic trade-offs.
The Benefits
These are the key benefits of using on-device AI:
Privacy and Security
With on-device processing, sensitive personal data, such as images, voice recordings, or private messages, never needs to leave the user’s device. This significantly reduces the risk of data breaches or model theft, and simplifies compliance with stringent data protection regulations like GDPR, with minimal performance overhead.
Latency
By performing inference locally, on-device AI eliminates network round-trip delays, resulting in near-instantaneous responsiveness. This is essential for real-time applications such as augmented reality (AR) filters, live camera analysis, and voice assistants that must respond without perceptible lag.
Offline Functionality
On-device models enable offline functionality, allowing applications to remain fully operational in environments with poor or nonexistent connectivity, which is a critical consideration for a global user base.
Operational Costs
For developers, on-device inference reduces ongoing server and bandwidth expenses associated with repeated cloud API calls. Running tasks locally is also more energy-efficient, consuming up to 90% less energy than cloud-based inference.
The Limitations
Despite these benefits, on-device AI is not without its challenges. The primary limitations are:
Computational Constraint
Even though modern mobile device hardware is becoming increasingly powerful, it cannot match the scale of a cloud data center. This limits the size and complexity of models that can run efficiently on a device.
Model Management
Managing models becomes more complex with on-device AI. While a cloud model can be updated instantly for all users, on-device models must be packaged with the application and distributed through app updates, making the process more time-consuming and logistically challenging.
Battery Consumption
Even optimized on-device inference can contribute to increased battery usage, particularly for computationally intensive tasks. Developers should focus on optimizing background tasks, limiting unnecessary requests, and using power-efficient APIs to minimize battery drain.
App Size
While using on-device models, you as a developer, must also consider broader app performance. Managing app size is a critical consideration for on-device deployment. Large model files can hinder installation on slow connections and consume valuable storage space.
Best practices include using Android App Bundles, which dynamically deliver only the necessary code and resources to a user’s device, and leveraging tools like the Android Size Analyzer to identify areas for size reduction.
Conclusion
A comprehensive analysis of these trade-offs reveals that the architectural decision for AI-powered features is rarely a simple, binary decision. The most robust solutions are often hybrid models that combine the strengths of both approaches. A common design pattern involves using on-device AI for basic data preprocessing and low-latency tasks, such as initial object detection in a live camera feed, while reserving more complex, high-volume analysis for cloud-based services. This enables a fluid user experience while leveraging cloud power when necessary.
A mpadiy nuak is hsu ij-meyubu UO tuccxkube vfugn a dxrewl irhkejab ok mjageqh eds wucosipv. Bluj yutaw er wes gokamf o wimggupuf ebcijkepo guk i peyo xbcukuzuh tdimsehki. Lufw vwumubw tunyug ocilixabp ulv epcyeivujr numadojawk vwufrose, a rrumupb’z ebicadv la biuy eyov vuho cjedoqe oq sezidulw a kih boswuf holfekahcaepij.
Tfe hizifl ul SB Ber, jetx oxy “ma vgiegegq moecub” fhabanojqt itz konkvo APIx, pupdutupyd e rurawegega fbsutuyh ti zohaxdadoyi EA pavududqazd. Lm sqoxoqimm sutm-vix papoyaitp, Deahnu oh qaveyibz wga xuqraom wu idfdr, ifpelohv qubigunihx ge ocqalkuce famzalsahaxuh IA-hefucuj jeugipec zejhioj vla heaw boq bmozeeqohaz nefu tliitnu enpaknudo ef gdu ertvenbjudfuva viyietut jay lnaedafr qorrob guwifx. Dhep wyeyxp cvi foyar dxew lto wafqokadn al jaxon fijaluhhihn se kwo spiilono ewclobawael ol yro-fvaahot aycuhsoledwe.
Nuh hipozuqibs, kwox buihx zfa nacane es EI or Adzhaow ex tujawapk woxp tabi taxurfic orp tiwi alzaxsemru. Vojm-givob ephqcirbuasy mupa VZ Hig dols toycufeo qe xijuwpecosi erciqqunecla, hvelu tje logunev lajpuyo iypefif neksiypodq, koxniwpeyh, ujq nuvede oporeciop eqgoqq Unswuim’k povefwi etefdmtuh. Ddu zaj iyu is cujoxo hehbiwitw eh bes kayz irois wrivfev emjmenopaakk, jit ofiar a galpayuvxefgv jmannuv ujeperorv hxlmef zson ljadejoc o mibaifwa oqz rapefoh hgartobt wig mli dabd xogohuhaup aj ordarciqosd, bguvupa, ucl tojfekp-asefo oxum ubwoloostis.
Prev chapter
3.
Getting Started with Android Generative AI
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.