Gait Analysis

🚀 Real-Time Video Processing with Flutter, Native C++, and TensorFlow Lite 🚀

Hector Andres Acosta Pozo

02 Jun 2025 — 3 min read

Ever tried squeezing smooth real-time video analysis into a mobile app without heating your phone to a crisp 🔥? You’re not alone! Let’s dive into our adventure of building a blazing-fast, efficient, and accurate real-time video processing pipeline for Android (and iOS) using Flutter, Kotlin, C++, and TensorFlow Lite!

🌟 Context: Why Did We Even Do This?

We wanted a buttery-smooth app experience where AI models detect humans in real-time videos—without delays, lags, or noticeable battery drain. The problem? Camera frames are usually in a tricky YUV format, and ML models want RGB float32 inputs. This means massive conversions at 30 FPS.

The Pain Points We Faced

YUV → RGB Conversion: Default Java-based approaches killed our performance.
Memory Churn: Continuous allocations led to garbage collection pauses.
Scaling & Rotation: Efficiently handling multiple video resolutions and orientations.
Latency & Overhead: Every millisecond counts in real-time ML inference!

🧙‍♂️ Our Magical Solution

We turned to native C++ and Google’s ultra-efficient libyuv library for image processing. Libyuv gave us lightning-fast conversions and scaling, reducing memory usage and CPU cycles significantly. We combined this with carefully crafted Kotlin code that kept the camera preview running smoothly—zero flicker ✨.

Why Not Just Use Existing Packages?

Before diving headfirst into the native route, we did look at Ultralytics’ official YOLO Flutter package. While it’s great for quickly demoing YOLO performance, we found it quite limited for more nuanced, production-ready use cases. Specifically:

🚫 Limited Camera Control: You can’t easily manage camera states like recording, orientation, or resolutions dynamically based on model detections.
🚫 Inflexible Workflow: Primarily focused on “plug-and-play” detection without granular control.
⏰ Development Time: Yes, Ultralytics was up and running in a couple of hours, while our custom approach took about 3 days to iron out perfectly.

But the payoff? Total control. We can make strategic, real-time decisions about the camera state and processing pipeline based directly on detection results—a necessity for production-grade apps.

We genuinely hope Ultralytics continues developing their Flutter package, offering more robust camera controls, flexibility, and practicality for production use cases beyond mere demos. 🙏

🏋️ The Power of libyuv and C++

Here’s a peek at our blazing-fast native implementation:

🎯 Optimized YUV → RGB Conversion (in C++)

// Efficiently scale YUV directly in native memory!
libyuv::I420Scale(
  srcY, yStride, srcU, uStride, srcV, vStride,
  srcWidth, srcHeight,
  yScaled.data(), scaledWidth,
  uScaled.data(), scaledWidth/2,
  vScaled.data(), scaledWidth/2,
  scaledWidth, scaledHeight,
  libyuv::kFilterBilinear
);

// Quick YUV→RGB conversion
libyuv::I420ToRGB24(
  yScaled.data(), scaledWidth,
  uScaled.data(), scaledWidth/2,
  vScaled.data(), scaledWidth/2,
  rgbBuffer.data(), scaledWidth * 3,
  scaledWidth, scaledHeight
);

// Normalize RGB → float32 directly into TensorFlow Lite buffer
for (size_t i = 0; i < pixelCount; ++i) {
  dstFloat[i*3 + 0] = rgbBuffer[i*3 + 0] / 255.0f;
  dstFloat[i*3 + 1] = rgbBuffer[i*3 + 1] / 255.0f;
  dstFloat[i*3 + 2] = rgbBuffer[i*3 + 2] / 255.0f;
}

✅ Why so fast?

Native buffers: Reused memory, no per-frame allocation overhead.
Thread-local storage: Zero contention between frames.
Libyuv’s optimized SIMD instructions: Unmatched performance compared to Java or naive C++ implementations.

📱 Flutter + Kotlin = Seamless Experience

We bridged native code to Flutter via Kotlin, effortlessly managing camera previews and real-time inference, achieving buttery-smooth transitions even while toggling between preview and recording modes.

🚧 Handling Multiple Orientations

We handled rotations automatically with minimal overhead:

val totalRotation = (sensorOrientation - displayRotation + 360) % 360

Calculated orientation efficiently.
Adjusted frame rotations directly in native C++ when necessary.

🤖 YOLO Models with TensorFlow Lite GPU Delegates

Using TensorFlow Lite and GPU delegates, we maintained a solid 30+ FPS inference 🚀:

val gpuDelegate = GpuDelegate(GpuDelegate.Options().setQuantizedModelsAllowed(false))
val interpreter = Interpreter(modelBuffer, Interpreter.Options().addDelegate(gpuDelegate))

✅ Benefits:

Leverages GPU power for blazing-fast inference.
Reduced battery drain compared to CPU-only models.

🎉 Novelty & What Sets Us Apart

Uninterrupted Recording & Preview: Seamless switching between camera states.
Native Efficiency: Minimal memory footprint; zero per-frame garbage collection.
Cross-platform Potential: Same C++ code ready for iOS integration via Flutter FFI.
Highly Scalable: Easy adaptation to different models and frame sizes.

🎩 How We Made This Magic Happen (In a Nutshell)

🔥 Used libyuv for optimal native YUV-RGB conversion.
🛠️ Built reusable buffers in C++ to avoid constant reallocations.
⚙️ Managed threading smartly via Kotlin.
🧪 Integrated smoothly with Flutter’s plugin system for cross-platform goodness.
🚀 Leveraged TensorFlow Lite GPU delegates for unbeatable inference speed.

📽️ Demo

0:00

/0:17

Final Thoughts

Real-time, mobile-friendly AI doesn’t have to be a headache. While the official YOLO Flutter package from Ultralytics offers rapid setup, our custom approach provides the flexibility, control, and efficiency crucial for production-grade applications.

We’re excited to see the official packages evolve, but until then—if you need fine-grained control and top-notch performance—rolling your own native solution is absolutely worth the effort!

📚 Libraries & Tools We Loved:

libyuv
TensorFlow Lite GPU delegate
Flutter & Kotlin Native bridges (JNI & FFI)
Android Camera2 API

Happy coding! 🚀