Lablup: The Fractional GPU Platform and Where WorkingAgents Fits

By James Aspinwall, co-written by Alfred Pennyworth (my trusted AI) — March 7, 2026, 07:57

Lablup makes AI infrastructure accessible. Their platform, Backend.AI, was the first NVIDIA DGX-Ready Software in Asia Pacific — a container-based MLOps platform with patented fractional GPU virtualization that lets organizations slice a single GPU into multiple isolated workloads without code changes. Founded in 2015 in Seoul by a team of physicists and computer scientists, the company now serves 100+ organizations across telecommunications, finance, healthcare, manufacturing, government, and education — including Samsung Electronics, Samsung Medical Center, KT Cloud, LG Electronics, Bank of Korea, and Hyundai Mobis.

In January 2026, Lablup was selected as part of the only all-startup consortium for Korea’s Sovereign AI Foundation Model project, deploying Backend.AI on SK Telecom’s “Haein” cluster with 504 NVIDIA B200 GPUs — where it reduced downtime by 47% and completed a 102-billion-parameter model’s 20 trillion token pre-training in 66 days, 40% faster than projected. They unveiled Backend.AI:GO at CES 2026 — a desktop application running small language models offline on consumer PCs.

This is a company building the GPU infrastructure layer for the AI economy. WorkingAgents operates at the operational layer above it. The two connect in ways that matter.

What Lablup Does

Lablup builds the platform that sits between raw GPU hardware and the people who need to use it. You have GPUs — a DGX cluster, a rack of H100s, a mix of consumer RTX cards and datacenter accelerators — and you need multiple teams, projects, and workloads sharing them efficiently. Backend.AI manages that.

The Core Technology: fGPU

This is Lablup’s differentiator. fGPU (fractional GPU virtualization) is a patented CUDA virtualization layer that partitions GPUs at the container level. An H100 with 80GB of VRAM becomes ten 8GB virtual GPUs, each isolated and allocated to different users or workloads.

Key details:

Container-level partitioning — No code changes required. Applications use CUDA normally; fGPU intercepts and regulates at the API level
Patented in Korea, US, and Japan
CUDA 8-12 support — Works across generations
Consumer to datacenter — RTX desktop GPUs through DGX systems
First released September 2018 — “global-first fractional GPU”
First commercial inference October 2023 — “world-first fractional GPU-based commercial AI inference”

This solves a real problem. A researcher training a small model does not need an entire H100. Without fractional GPU, that H100 sits mostly idle during their session. With fGPU, five researchers share it simultaneously, each getting the memory and compute they need, isolated from each other.

The Platform

Backend.AI — The core infrastructure platform. Container-based session management with pluggable accelerator support:

Component	Function
Manager	Central orchestrator — session scheduling, REST/GraphQL API routing
Agent	Per-node controller — container lifecycles, resource monitoring
Storage Proxy	Unified virtual folder interface with real-time metrics
Web Server	SPA hosting and session management
App Proxy	Service routing for in-container apps (Jupyter, VSCode, terminals)

Sokovan Scheduler — Multi-tier orchestration with NUMA-aware resource mapping. Extended Dominant Resource Fairness (DRF) algorithms distribute compute across mixed GPU clusters. Supports 11+ accelerator vendors: NVIDIA CUDA, AMD ROCm, Intel Gaudi, Google TPU, Graphcore IPU, FuriosaAI NPU, Rebellions, Tenstorrent, HyperAccel.

Backend.AI FastTrack 3 — Enterprise MLOps pipeline platform. DAG-based workflows for data preprocessing, model training, validation, deployment, and monitoring. Drag-and-drop GUI plus CLI. Airflow integration for existing users.

PALI Service Suite — Model inference launcher supporting Llama, Gemma, Qwen with card-based UI. Includes PALANG for fine-tuning and chatbot comparison.

Backend.AI Continuum — Distributed fault tolerance. Seamlessly bridges cloud and on-premises environments, automatically switching to local resources during cloud outages.

Backend.AI:GO — Desktop application running small language models offline on consumer PCs. Document analysis, image understanding, code review — all without an internet connection.

Deployment Options

Mode	Description
On-Premises	Full platform on your hardware, including air-gapped environments
Public Cloud	AWS, Azure, GCP, Naver Cloud
Hybrid	Cloud + on-prem with automatic failover (Continuum)
Air-gapped	Offline installer packages, private package repository (Reservoir)

Accelerator Support

Backend.AI is hardware-agnostic in a way most competitors are not:

Vendor	Accelerator
NVIDIA	CUDA (desktop RTX through DGX)
AMD	ROCm
Intel	Gaudi
Google	TPU
Graphcore	IPU
FuriosaAI	NPU
Rebellions	NPU
Tenstorrent	NPU
HyperAccel	NPU

Eleven accelerator vendors on one platform. Most MLOps competitors support NVIDIA only.

The Numbers

Lablup	Value
Founded	April 2015
Headquarters	Seoul, South Korea (Gangnam-gu)
US Office	San Jose, California
Total funding	$9.64M
Series A	$7.89M (April 2023)
Lead investors	K2 Investment Partners, LB Investment
Revenue (2024)	$1.5M
Profitable since	2020
Organizations served	100+
Team size	~40
GitHub stars	623
GPU patents	Korea, US, Japan
Accelerator vendors	11+
DGX-Ready	First in APAC (August 2021)
Sovereign AI cluster	504 NVIDIA B200 GPUs
Model trained	102B parameters, 20T tokens, 66 days

The Sovereign AI Story

In January 2026, Lablup joined the Upstage-led consortium — the only all-startup team — selected for Phase 1 of Korea’s Sovereign AI Foundation Model project. The consortium includes Upstage (lead, model development), Lablup (GPU infrastructure), Flitto (data preprocessing), and Nota (model compression). Academic partners span Sogang University, KAIST, Stanford, and NYU.

Lablup deployed Backend.AI on SK Telecom’s “Haein” GPU cluster:

504 NVIDIA B200 GPUs — 480 for pre-training, 24 backup
47% reduction in downtime through automated failure detection and recovery
20 trillion tokens pre-trained in 66 days (projected: 120 days — 40% faster)
102B-parameter Mixture of Experts model (12B activated during inference)
Open license nearly equivalent to Apache for commercial use

This is Backend.AI operating at national scale on current-generation hardware, managing the most demanding workloads in AI — large-scale pre-training — and delivering ahead of schedule.

Competitive Position

Backend.AI’s direct competitors include Run:ai (acquired by NVIDIA in 2024), Determined AI (acquired by HPE), ClearML, and Domino Data Lab. Two of the four have been acquired, which narrows the independent options.

Lablup’s differentiators:

Only APAC-native DGX-Ready Software — First-mover in Korean, Japanese, and Southeast Asian enterprise markets
Patented fractional GPU — Container-level virtualization, not hardware-dependent partitioning (MIG/MPS). Works on consumer GPUs, not just datacenter SKUs
11+ accelerator vendors — Broadest hardware support. Critical for organizations with mixed GPU fleets or planning multi-vendor strategies
Air-gapped deployment — Offline installer packages for classified and regulated environments
Open-source core — LGPL-3.0 server, MIT clients. Enterprise features layered on top
Profitable since 2020 — Sustainable business at 40 people, not a cash-burning scale-up

Why This Matters for WorkingAgents

Lablup manages GPU infrastructure. WorkingAgents manages operational workflows. Lablup decides which GPU runs which workload. WorkingAgents decides which task runs when, who gets notified, and what happens if something fails.

The gap between “your model is trained” and “the right person knows, the next step is scheduled, and the audit trail is complete” — that gap is where WorkingAgents lives.

The Synergy Map

1. Operational Orchestration for ML Pipelines

Backend.AI FastTrack 3 manages the computational side of ML pipelines — DAGs for preprocessing, training, validation, deployment. But ML pipelines generate operational needs that go beyond compute orchestration:

Training completes → Who gets notified? WorkingAgents’ Pushover integration alerts the ML engineer on their phone
Validation fails → Create a task. WorkingAgents’ task manager assigns the investigation, sets a deadline, tracks resolution
Model deployed → Schedule a performance review in 7 days. WorkingAgents’ alarm system fires and triggers an evaluation workflow
GPU quota exceeded → Escalate. WorkingAgents creates an escalation chain: notify the team lead → if no response in 2 hours, notify the manager → log the entire chain

Backend.AI handles: “run this training job on these GPUs with this data.” WorkingAgents handles: “when it finishes, tell these people, schedule these follow-ups, and escalate if nobody responds.”

FastTrack manages the ML lifecycle. WorkingAgents manages the human lifecycle around it.

2. GPU Resource Management + Scheduling

Backend.AI’s Sokovan scheduler allocates GPUs in real time based on demand. But GPU resource management has temporal dimensions that a real-time scheduler does not handle:

Scheduled training windows — “Run this training job every night at 2 AM when the cluster is least loaded.” WorkingAgents’ alarm system schedules the trigger. Backend.AI’s scheduler allocates the GPUs.
Recurring evaluations — “Re-evaluate model performance against production data every Monday.” WorkingAgents schedules. Backend.AI executes.
Resource recovery follow-ups — Backend.AI automatically reclaims idle GPUs (0% utilization, low CPU, no network). WorkingAgents can schedule a check: “If this user’s session was reclaimed 3 times this week, notify the admin to discuss resource allocation.”
Quota management — WorkingAgents tracks GPU usage per team per month in its per-user databases. Alarm fires when a team approaches their quota limit.

Backend.AI schedules compute. WorkingAgents schedules everything else.

3. Air-Gapped and Sovereign Deployment Alignment

Both platforms deploy in environments where no data leaves the perimeter:

Requirement	Backend.AI	WorkingAgents
Air-gapped deployment	Offline installer packages, Reservoir	Elixir/BEAM, zero cloud dependencies
On-premises	Full platform on customer hardware	Per-user SQLite on local storage
Data isolation	Container-level isolation, fGPU	Per-user databases, encrypted keys
No external calls	Private package repository	No external API requirements
Classified environments	DGX-Ready for government systems	Access control with audit trails

Lablup is already in Korea’s Sovereign AI project, deploying on classified infrastructure with 504 B200 GPUs. WorkingAgents was designed for the same constraints — per-user data isolation, access control at every tool call, crash-recoverable scheduling, all running on Elixir/BEAM with zero cloud dependencies.

A sovereign AI deployment needs both: Backend.AI to manage the GPU infrastructure, WorkingAgents to manage the operational workflows — training schedules, access permissions, escalation chains, audit trails — all inside the air gap.

4. Multi-Team Access Control

Backend.AI manages compute access — which users get which GPUs, how much memory, for how long. WorkingAgents manages operational access — which users can trigger which tools, who can see which data, who can escalate to whom.

In a multi-team AI research environment:

Backend.AI controls: Team A gets 4 H100s for training. Team B gets 2 A100s for inference. The intern gets 8GB of a shared T4.
WorkingAgents controls: Team A’s lead can schedule overnight training runs. Team B can query model performance dashboards. The intern can view task boards but cannot trigger deployments.

Both platforms enforce permissions, but at different layers. Backend.AI gates compute resources. WorkingAgents gates operational actions. Combined, every resource and every action is permission-gated.

5. Model Deployment Lifecycle Orchestration

Backend.AI FastTrack handles the technical deployment — containerize, serve, monitor. WorkingAgents handles the business process around deployment:

Model training completes on Backend.AI
  → WorkingAgents creates review task for ML lead
  → If not reviewed in 24 hours → Pushover notification
  → ML lead approves → WorkingAgents triggers deployment via Backend.AI API
  → Alarm schedules performance check in 48 hours
  → Performance check runs → if degradation detected
    → WorkingAgents creates rollback task
    → Escalates to team lead
    → Logs entire chain with timestamps

Every step is persistent. Every alarm survives restarts. Every escalation has an audit trail. This is the operational governance that turns model deployment from a technical event into a managed business process.

6. CRM and Contact Management for AI Teams

Lablup serves 100+ organizations. Each organization has ML engineers, data scientists, team leads, administrators, and executives who interact with the GPU platform. WorkingAgents’ NIS (CRM) module manages these relationships:

Contact tracking — Who are the key stakeholders at each customer? When was the last interaction?
Follow-up scheduling — “Check in with Samsung Medical Center about their GPU utilization in 2 weeks”
Pipeline management — Track prospects from initial demo to deployment to expansion
Interaction logging — Record every meeting, email, and support call with timestamps

For Lablup’s sales and customer success teams, WorkingAgents provides the operational CRM that keeps customer relationships active and accountable — with per-user data isolation ensuring each team member’s contacts and notes are private.

7. Backend.AI:GO + WorkingAgents: The Edge AI Stack

Backend.AI:GO runs small language models offline on consumer PCs. WorkingAgents runs on Elixir/BEAM anywhere. Together, they create an edge AI stack:

GO provides the inference — document analysis, code review, image understanding, all offline
WorkingAgents provides the orchestration — schedule recurring analysis tasks, track results, send notifications when anomalies are detected, maintain audit logs
Both work offline — No cloud dependency for either component

For regulated industries where data cannot leave the premises — healthcare facilities, financial institutions, government agencies — this combined edge stack provides AI capabilities with full operational control.

8. NVIDIA Ecosystem Connection

Both companies operate deep in the NVIDIA ecosystem:

Lablup: DGX-Ready Software, GTC multi-year sponsor (Silver Sponsor at GTC 2025, Booth #547), Inception Program member, B200 cluster deployments
WorkingAgents: Designed for deployment alongside NVIDIA infrastructure, compatible with NVIDIA NIM microservices, positioned for sovereign AI deployments on NVIDIA hardware

Lablup and WorkingAgents at GTC represent complementary layers: Lablup manages NVIDIA’s GPUs, WorkingAgents manages the operational workflows that surround GPU workloads. A joint demo — “fractional GPU training with automated scheduling, notifications, and escalation” — speaks directly to GTC’s enterprise audience.

The Partnership Path

Phase 1: API Integration

Connect WorkingAgents to Backend.AI’s REST/GraphQL API. WorkingAgents’ alarm system triggers training jobs, monitors completion, and manages follow-up workflows. Backend.AI events feed into WorkingAgents’ task manager for human-in-the-loop tracking.

Phase 2: Operational Layer for FastTrack

Position WorkingAgents as the business process layer around FastTrack 3’s ML pipelines. Training completion → notification → review → deployment approval → monitoring — all orchestrated by WorkingAgents with audit trails and access control.

Phase 3: Sovereign AI Reference Stack

Combine Backend.AI’s air-gapped GPU management with WorkingAgents’ air-gapped operational orchestration for Korea’s sovereign AI infrastructure. Both platforms deploy on-premises with zero cloud dependencies. Co-publish the reference architecture for government and regulated industry deployments.

Phase 4: APAC Market Entry

Lablup has deep enterprise relationships across Korea, Japan, and Southeast Asia — Samsung, KT, LG, Hyundai, Bank of Korea. WorkingAgents enters these markets as the operational companion to Backend.AI deployments. Same customers, different layer, one integrated stack.

The Bottom Line

Lablup built the GPU management layer — fractional virtualization that turns expensive hardware into shared, efficient infrastructure. WorkingAgents built the operational management layer — scheduling, task tracking, notifications, and escalation that turns compute events into managed business processes.

Backend.AI tells you which GPU is running which workload. WorkingAgents tells you who needs to know, what should happen next, and what to do when something goes wrong. Backend.AI’s fGPU slices a GPU into ten workloads. WorkingAgents’ alarm system ensures all ten workloads have follow-up schedules, completion notifications, and escalation chains.

Both platforms deploy on-premises, both work air-gapped, both enforce access control, both maintain audit trails. The convergence is not just technical — it is philosophical. Lablup’s motto is “Make AI Accessible.” WorkingAgents’ design principle is “Make AI Operational.” Accessible compute plus operational orchestration equals AI that works in production, not just in notebooks.

They built the GPU layer. We built the workflow layer. Both run inside the same locked room.

Sources: