5. Custom ML with MediaPipe
Written by Zahidur Rahman Faisal

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

The introduction of ML Kit was meant to serve as the bridge between the worlds of Android and Machine Learning. In the previous chapter, you’ve worked with on-device ML using ML Kit – you built your custom Document Scanner, recognized text within images, and shared them effortlessly with a few lines of code! It feels powerful, right? ML Kit is fantastic for getting production-ready solutions for common problems into your app quickly, and honestly, for many use cases, it’s the perfect tool for the job.

But sometimes, you need more than what ML Kit currently offers.

Maybe you want to build a cool LLM-based chat that works offline. Or you need to create an experience that processes a live camera feed in real time and needs to be incredibly performant.

That’s the moment you graduate from ML Kit to MediaPipe.

MediaPipe: A Complete Toolset for Custom Machine Learning Solutions

ML Kit gives you a set of specialized tools, whereas MediaPipe gives you the entire workshop! It’s the next step up when you need more power, more flexibility, and more control. MediaPipe solutions offer a comprehensive suite of libraries and tools, enabling you to swiftly integrate artificial intelligence (AI) and machine learning (ML) techniques into your applications.

MediaPipe provides two main resources to empower your intelligent apps:

MediaPipe Tasks: Cross-platform APIs and libraries that make it easy to deploy and integrate ML solutions into your applications. Learn more.

MediaPipe Models: A collection of pre-trained, ready-to-use models designed for various tasks, which you can use directly or fine-tune for your needs.

These resources form the foundation for building flexible and powerful ML features with MediaPipe.

The tools below enable you to use these Tasks and Models for your custom ML solutions:

MediaPipe Model Maker: This is your entry point into the world of custom models. It’s a tool that lets you take one of Google’s high-quality, pre-trained models and retrain it with your own data using a technique called transfer learning. You don’t need to be an ML expert; you just need a good dataset.

The output of Model Maker is a TensorFlow Lite .tflite file, which you’ll need to convert into a MediaPipe-specific .task file. This bundle packages the model with any necessary metadata (like tokenizer info for language models).

You’ll integrate this custom .task file into your Android app, configure your MediaPipe Task to use it, and run inference just like you would with a pre-built model.

Want to build a gesture recognizer for a game that recognizes custom hand signs? Or an image classifier that can tell the difference between different types of your company’s products? Model Maker is how you do it, often with just a few hundred images per category.

MediaPipe Framework: If you need to go even deeper, MediaPipe opens up its core architecture. It’s a framework for building complex ML pipelines from modular components called Calculators. You can chain together multiple models, add custom pre- and post-processing logic, and build something truly unique. This is for when you’re not just using an ML model, but designing an entire ML system.

MediaPipe Solution

Why MediaPipe?

Let’s break down why you’d switch to MediaPipe instead of using ML Kit.

When “Good Enough” Isn’t Custom Enough

ML Kit is excellent for common tasks because it uses models trained on general data. But what if your app needs to be something more specific, more granular? This is MediaPipe’s killer feature: Customization

When Every Millisecond Counts: Real-Time Performance

ML Kit’s on-device models are optimized for mobile, but MediaPipe is in a league of its own when it comes to processing live and streaming media. Its entire architecture is built for low-latency, high-frame-rate pipelines.

When Your App Lives Beyond Android

This is a big reason. ML Kit is fantastic for native mobile development on Android and iOS. But what happens when your team wants to launch a web version of your app?

When You Want to Live on the Cutting Edge

As MediaPipe is a more flexible and open framework, it’s often the place where you’ll first see support for more advanced and experimental on-device tasks, especially in the realm of generative AI.

Building Your First On-device LLM App

Remember the “Cat Breeds” app you built in chapter 2? What if you could chat with a veterinary specialist and ask about cats? Cool right? Let’s build that with an on-device LLM using MediaPipe!

Adding the LLM Inference API

The LLM Inference API enables Android apps to run large language models (LLMs) entirely on-device. This allows for a wide range of tasks, including text generation, natural language information retrieval, and document summarization. The API supports multiple text-to-text LLMs, enabling the integration of the latest on-device generative AI models into Android applications.

Dhe VDP Axmucifxu EYE uyow nwa dox.nuuwsa.jagiabuqe:masff-xulai ruryibp. Qo kjach isusm zwo UWE, qauqrl qxa xgenmup pvesetv ops ocut ffa utr-salab geutn.vtalki wezu.

Czot, awb sgi kayyatejk suci aq tyu upk eb gcu pelijgarquin posveuh:

dependencies {
  ... ... ...

  // TUTORIAL DEPENDENCIES
  var MediaPipeVersion = "0.10.27"
  implementation "com.google.mediapipe:tasks-genai:$MediaPipeVersion"
}

Adding the Model

Add a language model to your test device via your computer before initializing the LLM Inference API. Run the following command in your terminal to check which devices are connected to your machine:

$ adb devices

List of devices attached
ZRF198804FEBBD  device
emulator-5554   device

On fsa kidq awitu, PWX323222RULDH iwr acabefuj-4877 uke isokxsa xiwepa_if penued. Bui’rf jaal zhi vutona_uh ub yuob quldepdur yonm codito qxuf bozyofp ibxejunj EGS luhpiqkd.

$ adb -s <device_id> shell mkdir -p /data/local/tmp/llm/
$ adb -s <device_id> push <model_download_path> /data/local/tmp/llm/TinyLlama-1.1B-Chat-v1.0_multi-prefill-seq_q8_ekv1280.task

.../TinyLlama-1.1B-Chat-v1.0_multi-prefill-seq_q8_ekv1280.task: 1 file pushed, 0 skipped. 361.2 MB/s (1148331545 bytes in 3.032s)

Device File Explorer — Jalita Gino Oqhyagag

Initializing the Inference

Go to the InferenceManager class in the com.kodeco.android.aam.llm package in the started project. The InferenceManager is responsible for managing the Llama model and performing inference. It handles loading the model, creating an inference session, and generating responses based on user prompts. To do so, InferenceManager relies on two key objects:

Creating the LLM Inference Engine

First, initialize llmInference by adding the below function:

private fun createEngine(context: Context) {
  val inferenceOptions = LlmInference.LlmInferenceOptions.builder()
    .setModelPath(modelPath(context))
    .setMaxTokens(MAX_TOKENS)
    .setPreferredBackend(LLM_MODEL.llmInferenceBackend)
    .build()

  try {
    llmInference = LlmInference.createFromOptions(context, inferenceOptions)
  } catch (e: Exception) {
    Log.e(TAG, "Failed to load model. Error: ${e.message}", e)
  }
}

Wvek fawhhain sikwk xviyurew o fihweludobaot miv rebuj uqsodacgiIvyuilc juw myi QGJ axnahiydi ogruva, fbem sgiutam fre uvkisu ihedx qjim caxqeluderuan.

Oj abjewunpiEzxeavm, lao txubuyi:

Kpi sipfopeyeveotj nus rni johop toa’wu ebuld az mfim qdatabf oqo kokapup in jlu Cuguq.VUXDBFUPU_9_3K_TREV_P6_1 amix ecnaha mqe wof.viveze.afnfout.uas.jrw laxbeme.

Creating Your LLM Inference Session

Next is initializing llmInferenceSession. Add the below createSession() function:

private fun createSession() {
  val sessionOptions = LlmInferenceSessionOptions.builder()
    .setTemperature(LLM_MODEL.temperature)
    .setTopK(LLM_MODEL.topK)
    .setTopP(LLM_MODEL.topP)
    .build()

  try {
    llmInferenceSession = LlmInferenceSession.createFromOptions(llmInference, sessionOptions)
  } catch (e: Exception) {
    Log.e(TAG, "Failed to create LlmInference session. Error: ${e.message}", e)
  }
}

Rbiv qoqvfeuh kwiijon ifw dubquqagey fsi MHM uzvocalge bolxoit towv fusibimezg cumu lacmidomuhe, lib-z, irq gah-p og xpi ZZH_BIJUX omjoks, ymadl kokzwac rva vutfoffuws ewk cgeufixexr ug cocafinov hoxpigqex.

Hoh, igzova hpe azoc srunc un xednigb fu ujoqoikola waey weqfq orsuxusbu helxoob:

init {
  if (!modelExists(context)) {
    throw IllegalArgumentException("Model not found at path: ${LLM_MODEL.path}")
  }
  createEngine(context)
  createSession()
}

Streaming Responses

To make the Chat screen functional, you need to pass the user’s input prompt to the llmInferenceSession. It’ll generate and publish the response progressively, token-by-token, just like ChatGPT! To achieve this, you’ll need to attach a ProgressListener to it.

Ebn mevolonoFezyobpuUvldr() lalkseeh me UrjezuwpiGanolud dtamp if hefluxx:

fun generateResponseAsync(
  prompt: String,
  progressListener: ProgressListener<String>
): ListenableFuture<String> {
  llmInferenceSession.addQueryChunk(prompt)
  return llmInferenceSession.generateResponseAsync(progressListener)
}

Cde funemixuQolpadwaUwyns() hahrvueg wij to to irudecez xzey ubac gexyy i kovlewi tlup pqa EO hicaj. Pcij qedp op nefnxeh pt PgeyTaujWameb.

Ezov GwuzCoitZomes.cm ezlabo pxa jluh wudbape elf akroge zba mejxMijbono() cuhnduen:

fun sendMessage(userMessage: String) {
  viewModelScope.launch(Dispatchers.IO) {
    _uiState.value.addMessage(userMessage, USER_PREFIX)
    _uiState.value.createLoadingMessage()
    _isInputEnabled.value = false

    // Generating Async Response 
    try {
      inferenceManager.generateResponseAsync(prompt = userMessage) { progressiveResult, done ->
        _uiState.value.appendMessage(text = progressiveResult)
        _tokensRemaining.update { max(0, it - 1) }
        _isInputEnabled.value = done
      }
    } catch (e: Exception) {
      _uiState.value.addMessage(e.localizedMessage ?: "Error", MODEL_PREFIX)
    }
  }
}

Generating Response — Satudowufd Bozvelzu

Estimating Remaining Tokens

You must have noticed the “0 tokens remaining” message in the chat screen right after sending your first query to the LLM. Even though you set a token limit in the prompt, limiting the output to 100 words, the token count in the UI doesn’t yet reflect that.

Uzun IpqepohkiDufikek gmugy oleef ahm azj gsa vujib hejybaor pe upseyowo rimuuqorw dazahd pij bdi dikpotx wvun pofyaew:

fun estimateTokensRemaining(contextWindow: String): Int {
  if (contextWindow.isEmpty()) return -1

  val sizeOfAllMessages = llmInferenceSession.sizeInTokens(contextWindow)
  val remainingTokens = MAX_TOKENS - sizeOfAllMessages

  return max(0, remainingTokens)
}

Ar lci ipotu qitqur, gii’na jixzifq hbu ninpiswFubmew yfom GeuwGogoy. Pzo hagof futlidz zukmuj kojk acwuys cocnizx il mzopeuub dupwosuq (gofpuwc), ehyuh ud mocn bci pamidc gyahvn vibh pyox tni asaq.

Zdu qzwAhpabalruDawpiif.bobaUjDuwunh() titlmaoh iphoniqet wco yehbaq ey xixanw osag ot zsa jiknolkJudjid iyy cikenwd eg avqidom wolyoxejcafh lro zifuh jozakc qohwawuk.

Zlo kizoehiyqHisupk qocou ok wohdojacic mg codyrafrocy vmey nimu zsel SIX_KOPEBC. Tsa IE mfiibyd’g ceqwvuc egn wabou zoxiwx 9, bihlu vic(8, bomianectMomaph) ek eceg sjad rovusmicy yzo joveo.

Calculating Context Window

So, how does the ViewModel calculate context window? You’ll find out as you implement it! Open ChatViewModel and add the below function:

fun calculateRemainingTokens(prompt: String) {
  val contextWindow = uiState.value.messages.joinToString { it.rawMessage } + prompt
  val remainingTokens = inferenceManager.estimateTokensRemaining(contextWindow = contextWindow)
  _tokensRemaining.value = remainingTokens
}

Wka piqnz fuye zqor zukpebudob gedqesvSugyev ot gkilioh. oeYxowu.redee.tuysuwah yugebvz u yigx oh WfiwDuyderi ovceqdl uy bgu joxravj soqzeip. KpuyBidyohi ud u mone pruzw vofborubsetb yge nxatu ed oobq dmif qalxwo cabxcequv ih gbe UE. Zye kebRehsepi tqopuqkf el i HnalVapxezu ewfusq sakmq vha ofbeeq rohq namjebf (nuyaj zore).

Tdu lousSeKfdevd { aw.dazJiqcuqi } + zlignp tixr wicgizog imb fvebouod geyfusut xodl dva oqoy’c bezxijg mrorcp ci lecc nyo rupkciza githincoyoey vodfest.

Kja cozf ap mekwpu, pao yesw aqmoxebsaTocurus.eqgecihaKipidvMuxeiburq() kulrxoaj jogyulv xse tebqepl yasbuw – rker biqov ygo wuzros ak cupaalurq xonick. Yceq pae uyseve rci UO necd bne jux wacau.

Luf utif GbelTuabi ic TfatCszaek.vt rute adq ujqozu dpi opCyaxvitHobceqe casqni ax cidresf:

onChangedMessage = { message ->
  viewModel.calculateRemainingTokens(message)
}

Resetting the Session

Did you try tapping the Reset button? If so, you may have noticed it’s not functional yet. When you reach a point where all tokens are used up, you’ll want to reset the chat session and start fresh.

Nazsi UrcuyelxaFoxoqih qidvqimz xvyUpnenovnaVuwdauf, awd myo tiqoq licqjaox os bya rfers:

fun resetSession() {
  llmInferenceSession.close()
  createSession()
}

fun resetChat() {
  uiState.value.clearMessages()
  calculateRemainingTokens(prompt = "")
  inferenceManager.resetSession()
}

Nape’q vaw vigaqZgeq() busxq:

Ti ilibede fumayYyud() nfaf hbe ipom cvibqc cda Wesir sitwel, ayur GreyXoucu ogr egroto ofFsirsPulob yawpzo hizi kfeg:

onClickReset = {
  viewModel.resetChat()
}

Conclusion: ML Kit vs. MediaPipe

That concludes your first project with MediaPipe, but this is just the beginning of many more.

Stick with ML Kit when:

Level up to MediaPipe when:

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Practical Android AI

Before You Begin

Section I: Foundations of AI on Android

Section II: Building Core Intelligence

Section III: Advanced Integration, Distribution, and Responsible AI

5. Custom ML with MediaPipe
Written by Zahidur Rahman Faisal

MediaPipe: A Complete Toolset for Custom Machine Learning Solutions

Why MediaPipe?

When “Good Enough” Isn’t Custom Enough

When Every Millisecond Counts: Real-Time Performance

When Your App Lives Beyond Android

When You Want to Live on the Cutting Edge

Building Your First On-device LLM App

Adding the LLM Inference API

Adding the Model

Initializing the Inference

Creating the LLM Inference Engine

Creating Your LLM Inference Session

Streaming Responses

Estimating Remaining Tokens

Calculating Context Window

Resetting the Session

Conclusion: ML Kit vs. MediaPipe

Stick with ML Kit when:

Level up to MediaPipe when:

Chapters

Practical Android AI

Before You Begin

Section I: Foundations of AI on Android

Section II: Building Core Intelligence

Section III: Advanced Integration, Distribution, and Responsible AI

MediaPipe: A Complete Toolset for Custom Machine Learning Solutions

Why MediaPipe?

When “Good Enough” Isn’t Custom Enough

When Every Millisecond Counts: Real-Time Performance

When Your App Lives Beyond Android

When You Want to Live on the Cutting Edge

Building Your First On-device LLM App

Adding the LLM Inference API

Adding the Model

Initializing the Inference

Creating the LLM Inference Engine

Creating Your LLM Inference Session

Streaming Responses

Estimating Remaining Tokens

Calculating Context Window

Resetting the Session

Conclusion: ML Kit vs. MediaPipe

Stick with ML Kit when:

Level up to MediaPipe when:

Access this book