Want to use for your model, simply based on the expected model size andīelow are the latency and accuracy results for post-training quantization and The following decision tree helps you select the quantization schemes you might The following types of quantization are available in TensorFlow Lite: Technique This results inĪ smaller model size and faster computation. Parameters, which by default are 32-bit floating point numbers. Works by reducing the precision of the numbers used to represent a model's Which provides resources for model optimization techniques that are compatible TensorFlow Lite currently supports optimization via quantization, pruning and In rare cases,Ĭertain models may gain some accuracy as a result of the optimization process. Depending on yourĪpplication, this may or may not impact your users' experience. Size or latency will lose a small amount of accuracy. The accuracy changes depend on the individual model being optimized, and areĭifficult to predict ahead of time. Optimizations can potentially result in changes in model accuracy, which must beĬonsidered during the application development process. See each hardware accelerator's documentation to learn more about their Generally, these types of devices require models to be quantized in a specific
With models that have been correctly optimized. Latency can alsoĬurrently, quantization can be used to reduce latency by simplifying theĬalculations that occur during inference, potentially at the expense of someĮdge TPU, can run inference extremely fast To run inference using a model, resulting in lower latency. Some forms of optimization can reduce the amount of computation required Latency is the amount of time it takes to run a single inference with a given
#Cnn lite download#
Model for download by making it more easily compressible. Pruning and clustering can reduce the size of a Quantization can reduce the size of a model in all of these cases, potentiallyĪt the expense of some accuracy. Translate to better performance and stability.
#Cnn lite android#
For example, an Android app using a smaller model will take