Scaling AI Operations


Architecting a high-efficiency internal annotation ecosystem to train advanced Object Detection and Vision-Language Models for complex UI pattern recognition.
* Disclaimer *
Due to the confidentiality of this project, the extent of work presented on this page has been limited in accordance with a non-disclosure agreement. All information in this case study reflects my contributions and does not necessarily reflect the views of the organization.




M Y   R O L E

User Experience Designer
AI Trainer

T E A M

AI Data Labelling Team, Machine Learning Engineers

S C O P E

Internal Labelling Tool deployment, AI training strategy refinement, and labeller workflow management.

T I M E

2025-2026












O V E R V I E W


Mobbin is a curated, web-based library of real-world mobile and web app screenshots designed for UI/UX designers to find design inspiration, analyze user flows, and identify industry best practices. To process, categorize, and tag thousands of complex mobile and web interfaces at scale, the platform relies heavily on advanced AI models. However, an AI is only as intelligent as the data it is trained on.

This project focused on the complete operational and experiential overhaul of Mobbin’s internal AI Labelling Tool and the strategic management of the human labelling team. By serving as the critical bridge between human annotators and AI engineers, I redesigned the internal tooling ecosystem to drastically improve the speed, accuracy, and contextual depth of our machine learning training pipeline.



O B J E C T I V E


To optimize the internal AI training pipeline by resolving severe operational friction within the human labelling team, overhauling the new labelling tool interface for maximum annotation throughput, and enhancing the contextual accuracy of the underlying AI models (transitioning from basic Object Detection to Vision-Language Models).








C H A L L E N G E


The human-in-the-loop (HITL) training ecosystem was fracturing under the weight of scaling operations, manifesting in three critical areas.

1. Operational Burnout & Labeller Friction

The human annotation team was experiencing severe operational bottlenecks and plummeting morale. This was driven by an opaque quota tracking system, confusing and often punitive rejection metrics, and deeply fragmented, hard-to-access documentation.

2. Technical Interface Bottlenecks

The legacy labelling interface was clunky and plagued with technical bugs. The UI actively slowed down the intricate work of drawing bounding boxes and applying metadata to complex screen architectures.


3. AI Model Accuracy Limits

The existing Object Detection Model (ODM) had hit a ceiling regarding contextual nuances. It consistently failed to differentiate visually similar but functionally distinct elements—for example, struggling to accurately distinguish a promotional "Banner" from a system "Badge" without understanding the surrounding context.








T H E   A P P R O A C H

To build a smarter AI, we first had to build a flawless operational environment for the humans training it. I structured this transformation into three strategic phases.

1. Establishing the Baseline

Operational & UX Auditing

Before touching the interface, I conducted deep-dive qualitative and quantitative research directly with the remote labelling team to uncover the root causes of their operational slowdowns.

Workflow Deconstruction

I shadowed annotators to map their exact end-to-end journey. I uncovered that labellers were spending excessive time context-switching away from the tool to reference massive, disconnected documentation sheets whenever they encountered ambiguous UI patterns.

Quota & Rejection Analysis

I audited the backend operational metrics, identifying that the current system for tracking daily quotas and managing task rejections was fundamentally broken. It lacked transparency, leading to extreme frustration and a high rate of defensive, low-quality labelling just to meet targets.



2. Architecting the Solution

Redesigning Labelling Tool

I completely overhauled the AI Labelling Tool to prioritize high-speed, high-accuracy human annotations while removing systemic operational blockers.

In-Context Documentation Integration

I designed a dynamic, intelligent workspace that embedded contextual guidelines directly into the new labelling tool interface. When a labeller hovered over or selected a complex tag (like "Banner"), the tool instantly surfaced visual examples and exact definitions, entirely eliminating the need for context-switching and drastically reducing human error.

Transparent Feedback Loops

I redesigned the quota tracking and QA rejection mechanisms. I shifted the system from a punitive model to an educational one—providing labellers with transparent dashboards, clear pathways to dispute or learn from rejections, and realistic, easily trackable daily quotas.

3. Enhancing AI Context

The VLM Integration

With the human operational pipeline stabilized, I shifted focus to the technical output, acting as the strategic liaison between the labelling operations and the AI engineering team.

Bridging Human and Machine Context

Recognizing the inherent limitations of standard Object Detection Models (ODM) in parsing complex UI, I spearheaded the operational pivot to support Vision-Language Models (VLM).

Redefining Annotation Taxonomy

I worked closely with engineers to adjust exactly how the human team fed data into the system. By refining the annotation taxonomy to capture deeper contextual clues (e.g., text hierarchy and spatial relationships), we empowered the new VLM to accurately comprehend and categorize highly nuanced UI components that the old model consistently failed to process.








I M P A C T   &   O U T C O M E S


1. Maximized Annotation Throughput & Quality

By resolving critical interface bugs and embedding contextual documentation directly into the labelling tool, we drastically reduced the average annotation time per screen. This highly optimized B2E interface minimized human error, directly resulting in a massive increase in the volume of high-fidelity training data fed into the AI pipeline.

2. Operationalized the Human-in-the-Loop Pipeline

We transformed a frustrated, bottlenecked labelling team into a highly efficient, motivated operational unit. By fixing the quota and rejection systems, we eliminated systemic operational friction, significantly reduced labeller churn, and established a scalable, transparent workflow that will govern all future AI training initiatives.

3. Elevated AI Model Accuracy

The strategic shift to support Vision-Language Models (VLM) alongside traditional ODM was a massive technical victory. Fueled by the higher-quality, context-rich human annotations generated in the labelling tool, the AI's ability to accurately categorize complex, context-dependent UI components, such as differentiating banners from badges, improved exponentially, directly enhancing the core value proposition of the entire platform.






©2026 Edwina Huiru Zhao