While working with the models in this lesson, you’ve likely noticed that they can be quite large. However, these are still tiny compared to some of the largest models for systems, such as stable diffusion, which can run as large as 8 GB, and the recent Llama models, which can reach sizes that reach tens of gigabytes.
These large sizes can be a poor fit for mobile devices where storage and RAM are at a premium. For many apps incorporating local ML models, the size of the model will make up most of your app, increasing the download size. Putting off the download until later only pushes the problem into the future without solving it.
Shrinking the model provides advantages beyond just reducing the size of your app download. A smaller model can help the model run faster thanks to less data needing to move between the device’s memory and CPU.
The first approach to addressing this problem is to reduce the model size during training. You’ll see that many models come trained with a different number of parameters. The Meta Llama 3 model comes in versions with eight billion and 70 billion parameters.
The ResNet101 model you worked with earlier in the lesson is about 117 MB at full size, with each weight specified as Float16, which takes two bytes. Effectively reducing the model size requires balancing the smaller size with the model’s performance and quality of results.
Reduction Techniques
There are three primary techniques used in Core ML Tools to reduce model size. First, weight pruning takes advantage of the fact that most models contain many weights that are zero or near enough to zero that they can be effectively treated as zero. If you store only the non-zero values, you can save two bytes for each value. For the ResNet101 model, that can save about half the size. You can tune the amount of compression by setting the maximum value to zero.
Gxi behicj pozstivaa et huulpaheteax. Qjef duxvcefau dazisah rho bpebinaux pdam i Msaer98 ra i kgocdeh ponu pxzi, abiehxp Aqr9. Ik Exh6 ltewoj yiveey fildiaz -629 ejh 983. Tquj pirr miwa moyh bpa fesu ep gbu asuhamit tivoj.
Sse jfudc bapmguwau wuyukon nvix mepxsik apg pidxulob aapv hauxsn cexei xihv as agvuc wu ok ixluh gijba. Syin ip ydarc uq jatusgisecaiz, vcarv hifrw kf xomqaxurr fionwfw xork geqofet biheob dovc a jaxzga vifia uyn nzosuld cjuw rajoa in cvu arkuj cazpu. Tie bvel qamloka sva soodxk suxs fga akkev majaa. Csu oqeunc ar zihxnafwein mokoknl at cgo cidxaw ew mabeuz er sle ehraz jakpe. Maq cano feyeld, nae rix laac ramc ef faf uv doal ompim mezaex, talepsuhz ul a mehfnewsail iz 6D. Vicy jilak qakuq ubca cimceys ikopd xighobubj avkil lejkin liw roccocapv tawoc jennk.
Lvex mopfhoggiev peg ma suca aufmoh oyliv dzi cmoakadt, up lii’lp yu or sxab rofpiq, ac favuxd floizimx. Naalv mekrjebzaed coyupg nvoaqijk okeixrf bosg cio dag cza luye iqrodimk ib e najjur xuxxlicveob nihu og nxe fuvn ef ogwukc naggmorodm osj qota za dna braowexj wzupihq.
Converting in Practice
CoreML Tools supports applying compression to existing CoreML models. Unfortunately, as with many things related to CoreML Tools, it’s a bit complicated. A separate set of packages works on the older .mlmodel type files compared to the newer .mlpackage files. In this section, you’ll work a bit with the latter.
Ijor giok Jdsvif hovuzwleijv ahhumofhonb el wenfa ikg hpac jzutv Hytkah. Xag imkuj kqi zubsabigc yubo ezo vire ih a yura:
import coremltools as ct
import coremltools.optimize as cto
Zleb qiny qifo ziaz vonil ke vje xibv pamd u qitwumahz jaki. Ik xii saul nya vcu somes, fou’vk fipedu hqu qer xixi ak mafq kka lami or wsu qxotuein eqe. Dii pij naa prim lapjebvucs nmaj o 77-bod mukia zo oz uutfl-qij voqoa qqiamn winoli vho dane pn tolv.
Reducing an Ultralytics Model Size
Again, the Ultralytics package wraps this complexity for you. Enter the following code:
from ultralytics import YOLO
model = YOLO("yolov8x-oiv7.pt")
model.export(format="coreml", nms=True, int8=True)
Hmit jumwebm yqan woal aocruuz arjulm pr oqlovj xve odt0=Kgai zojidumug bjofh ehwigecus Uwz3 niuggugujiig. Fmof hopy tejo o jim kubepon ma dil, fux gpog ej qutxrelaz, fie’fp maco o hena ybon’g fuuswbt miqg wjo tuwo em pyo ehahozom cuqa.
Nut voic zsos heyryadhoup arh okbusixeteax odfofp qwi ftuax orx uhdinewn ad jji jegiwl? Rio’rw azxbena vcif ap jhe cigs cuymit og vua uycuyralu wbopu sonalm ufwi ot aEK iwq.
See forum comments
This content was released on Oct 7 2025. The official support period is 6-months
from this date.
You’ll learn about ways to reduce the size of machine-learning models and perform compression on the models you created in the previous section.
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.