The Unix Moment for the Humanoid Robotics Industry

Speech

A Note Up Front

This week, at China Mobile's inaugural Embodied AI Partner Conference, I gave a 10-minute keynote.

It was not a very long speech, but for BridgeDP, it was a weighty moment — the first time we fully and publicly explained one thing we had finally figured out over the past three years.

We had discussed this internally for a long time, and also hesitated for a long time about whether to say it this early. In the end, we decided to speak up — because we are increasingly convinced that this is not just BridgeDP's issue; it is the most important structural problem the entire humanoid robotics industry is now facing.

I want to expand on what I said before and put it here. If you heard that 10 minutes on site, this is its "extended version"; if you did not, this version is more complete.

Here is the main text.

16 Companies, One Thing

Over the past 18 months, inside the company, we have witnessed something the outside world has barely noticed, but we think is highly significant —

The motion control capability for 16 humanoid robotics companies, from zero to one, was completed here with us.

16 companies, 16 separate product lines, 16 different teams.

Among them are humanoid divisions of listed companies, star startups that have completed several funding rounds, projects incubated by research institutions, and seasoned integrators entering bipedal systems from other form factors. Their resources, starting points, and technical styles are all very different.

But we saw one thing — their algorithm engineers were doing almost exactly the same work.

The problems they brought to our engineering team were the same kind of problems. The error logs they posted were structurally very similar. And the way they solved problems eventually converged on nearly the same few paths.

In concrete terms, they were all solving these four things:

Hardware abstraction: how to unify bodies with different joints, motors, sensors, and control frequencies into one interface the upper-layer algorithm can call
Motion control: how to deploy reinforcement-learning policies onto real robots without problems
Simulation alignment: how to make models that perform well in simulation also perform well on real hardware
Data feedback: how to feed data from robots running in the real world back into the next training cycle so capability keeps evolving

These four things are being done by almost every humanoid robotics company.

In each company's engineering team, 60% to 70% of the manpower is being spent solving problems almost identical to those of everyone else.

This made us stop and think for a very long time inside the company.

What we were thinking was not "this is a business opportunity" — that kind of thinking is too short-term. What we were thinking was:

Why is this happening? Why are 16 completely independent companies doing the exact same thing? Has this happened before in history? If it has, how did it end?

Two Echoes from History

We are actually not unfamiliar with this at all.

Over the past half-century in the computing industry, almost the exact same storyline has happened at least twice.

More than half a century ago: before Unix

In the 1960s and 1970s, the mainframe era. IBM wrote its own operating system, DEC wrote its own operating system, Burroughs wrote its own operating system, Honeywell wrote its own operating system. Every computer company was writing a system for its own hardware — an OS that could only run on its own hardware.

To switch to another company's machine, an application developer had to relearn almost everything from scratch: system calls, file formats, development tools. The computing industry's capability was locked inside each company's own "vertical stack."

Until 1969, two engineers at Bell Labs, Ken Thompson and Dennis Ritchie, began writing something they called Unix. Unix's design philosophy was very simple — build an operating system that was cross-hardware, portable, modifiable by anyone, and extensible by anyone.

Unix was not one company's "product"; it was a redefinition of "what an operating system should be." It turned "fragments" into "infrastructure."

The story after Unix, we all know today: Linux, BSD, macOS, Android, iOS — nearly every operating system running in the world today is a direct or indirect descendant of Unix philosophy.

The real takeoff of the computing industry did not start with the transistor. It started with Unix.

Nearly 20 years ago: before Android

At the beginning of the 21st century, on the eve of the smartphone era. Nokia had Symbian, BlackBerry had BlackBerry OS, Motorola had its own firmware, and every phone maker was writing a system for its own hardware — a firmware that could only run on its own hardware.

For app developers, building an app that could run on multiple phones was almost impossible — you had to adapt separately for each phone maker's firmware. The entire mobile app ecosystem was locked inside each phone maker's "vertical stack."

In 2008, Android was released. Android is not a phone — Android is an operating system that any phone maker can take and use, and must use the same interface to connect to the app ecosystem.

The story after Android is also familiar to us: the entire mobile app ecosystem took 5 years to walk the road that the PC app ecosystem took 20 years to walk.

The same structure, twice

These two "operating system moments" had an astonishingly similar structure:

A new computing paradigm was emerging (mainframes / smartphones)
Every hardware company was rebuilding the same underlying software
The industry's real bottleneck was not "the algorithms are not good enough" or "the hardware is not strong enough", but the lack of a widely recognized infrastructure layer that allows capabilities to be reused across hardware
The moment that layer appeared, the entire industry took off at a speed far beyond what came before

After looking at this for a long time, our judgment is: today's humanoid robotics industry is in exactly the same structural position as those two moments.

What Is Missing Is Not Smarter Algorithms

What does this mean? Our judgment is this —

What the humanoid robotics era lacks is not smarter algorithms; what it lacks is robotics software becoming more general-purpose.

I know this may sound a bit counterintuitive.

Over the past two years, the industry's attention has been almost entirely focused on "algorithms" — stronger reinforcement learning, larger VLA models, smarter end-to-end policies. Every new paper, every new demo, has pulled the industry's gaze toward "the smarter the algorithm, the better."

We do not deny the importance of algorithms becoming smarter. But what we want to say is —

If the only thing blocking the humanoid robotics industry today were "the algorithms are not smart enough," then this problem would naturally be solved within 12 to 24 months — because algorithmic progress is in the fastest tier of today's AI industry.

But if you really look at the engineering hours these 16 companies spend every day, you will find that they spend far more time on "making a smart algorithm run stably, safely, and repeatably on a real robot" than on "making the algorithm smarter."

That is why I say — what is missing is not smarter algorithms; what is missing is software that must become more general-purpose.

A smart algorithm, if it can only run on one specific body, in one specific scenario, for one specific stretch of time — its value to the entire industry is limited.

Only when a smart algorithm can be deployed across embodiments, executed stably at millisecond scale, and continuously aligned with the real world does it truly become an "industrial capability."

And to make that happen, this industry needs an "operating system."

Today, we believe —

"The Unix moment for the humanoid robotics industry is happening now."

That is the line I said in the speech, and the core of this whole article.

What It Is Not

So, what exactly is the operating system of the humanoid robotics era?

This is a harder question than "what it is not." Because today the industry already has a few familiar names, and it is easy to mistakenly think that "that thing is the operating system."

So I want to first say three things about what it is not.

It is not ROS

ROS is a very good tool. Many engineers inside our company grew up in the ROS world, and we fully respect it.

But ROS is not the operating system for the humanoid robotics era.

In essence, ROS is an inter-module communication framework. It lets different modules in a robot system — perception, planning, control, and more — be assembled together. It originated from exploration in the research community around 2007, and its core abstraction is a publish-subscribe message passing mechanism based on Node/Topic/Service.

It does this very well. But it was never designed for large-scale production deployment — it does not solve cross-hardware abstraction, it does not solve millisecond-level real-time control, and it does not solve cross-embodiment capability accumulation.

In our era, ROS is more like the early Unix-era "pipe" — a valuable communication mechanism, but not the operating system itself.

It is not NVIDIA Isaac

Isaac is also a very good product. In simulation, training, and synthetic data, it provides the industry with very important infrastructure.

But Isaac is not the operating system for the humanoid robotics era.

Isaac is a training-time platform — it lets a robot learn to do something in a simulated world. Its capability boundary stops at "training" and "simulation."

It does not solve one thing — how a real robot runs every millisecond, every day, every year in a real factory, a real home, or on real roads.

There is a distinction in the industry that has been seriously underestimated — training-time and runtime are two completely different things.

Getting a robot to "learn" an action, and getting a robot to walk stably, move safely, and keep doing it continuously in the real physical world — these are two different engineering problems that require two different system capabilities.

Isaac does the "training-time" side very solidly. But on the "runtime" side, the industry today has no widely recognized answer.

It is not a VLA model

VLA (Vision-Language-Action) models are one of the most watched directions in the humanoid robotics industry from 2024 to 2026. Physical Intelligence, Google DeepMind, Galbot, AgiBot, and many other excellent teams are working on this.

A VLA model is not the operating system for the humanoid robotics era.

A VLA model is doing something very important — it solves the question of "what should the robot do?" Given a language instruction and a visual input, the VLA model outputs a high-level action intent.

This is the robot's cognitive brain. It is extremely important. But it does not solve another class of problems —

Once that high-level intent is given, how do you turn it into millisecond-stable joint torque output? How do you ensure it does not fall, hurt people, or damage itself during execution? How do you keep it stable 24/7? How do you keep it working as hardware wears, sensors drift, and environments change?

This class of problems is not solved by VLA models, and VLA models are not meant to solve it — because they deal with the "what to do" layer.

To turn a brain that "knows what to do" into a robot that can truly "do it stably, safely, and continuously," a whole system capability is needed in between. Today, that system capability does not yet have a universally accepted name in the industry.

So What Is It?

After talking about the three things it is not, let us talk about what it is.

We call it — Runtime Robot OS, a runtime robot operating system.

There is one keyword in this name: "Runtime" — runtime.

It is the counterpart to "training-time." Training-time is about "can the robot learn it," while runtime is about "can the robot run in the real world." These two things are equally important, but they are two different problems that require two different systems.

Runtime Robot OS must have three things at the same time:

Multi-embodiment hardware abstraction (Multi-Embodiment Abstraction)

It must let upper-layer policies, models, and applications remain unaware of differences in the underlying hardware — whether bipedal, quadrupedal, or wheeled-bipedal, whether the joints are direct-drive or gear-reduced, and no matter how the sensor stack is configured, the upper layer should access them through one unified interface.

Historically, operating systems have already done this twice:

PC operating systems let applications ignore whether the CPU is Intel or AMD, or which GPU is installed
Mobile operating systems let apps ignore which phone maker's hardware they are running on

Runtime Robot OS must do it a third time — let robot applications ignore which embodiment they are on.

Real-time safe execution at millisecond scale (Real-time Safe Execution)

It must be able, at every millisecond — not every second, not every 100 milliseconds — to guarantee that the robot's joint torque output, force-control constraints, and safety boundaries are stable, predictable, and not out of control.

This is the biggest difference between robots and "ordinary software" — if ordinary software stutters, the user can just restart it; if a robot stutters, it may hurt people, itself, or the environment.

Millisecond-level real-time safety is the hard baseline of Runtime Robot OS.

Continual real-world alignment (Continual Real-world Alignment)

A robot is not a one-time-burned, forever-unchanging device. It runs in the real world, its sensors drift, its hardware wears down, its environment changes, and its tasks evolve.

Runtime Robot OS must have the ability to capture the operating experience of every robot in the real world, store it, feed it back, train the next generation of capability from it, and safely push it back to all robots.

This is the real "data flywheel" of the robot era. It is not the same thing as the "reinforcement learning from user feedback" in today's large-model industry — the latter is text; the former is physical.

Existing systems do not do all three of these at once.

ROS does part of the communication abstraction, but it has no cross-embodiment capability, no real-time safety, and no learning loop.

Isaac does part of the training-time capability, but it is not on the runtime side.

VLA does part of the cognitive abstraction, but it does not solve execution and continual learning.

Only when one system simultaneously does all three does it qualify as the "operating system of the robotics era."

Why We Believe This Is Feasible

At this point, someone may ask —

"What you are describing sounds very important, but can it really be built? Or is this just a PPT concept?"

That is a very reasonable question. I want to answer it seriously.

Three years ago, when BridgeDP was founded, we made a decision that carries a deep meaning for us today — we would not build full robots, not build large models, and not build applications; we would focus only on the "motion control" layer.

That decision did not look very smart in 2023 — at that time, building full robots was sexy, building large models was hot, and building applications had customers. Focusing only on the "middle layer" was hard to explain externally and also brought a lot of internal pressure.

But we made that decision based on one judgment:

The robotics industry will eventually become layered. Whoever can turn motion capability from "project delivery" into "infrastructure" will control one of the key entry points into the AI-ification of the physical world.

Three years later, we can now present some facts that can be verified:

26 humanoid robotics companies are using our motion control capability
More than 50 legged robots with clearly different structures — bipedal, quadrupedal, wheeled-bipedal, different joint layouts, different drive methods — are running on the same motion learning and control system
The engineering cycle from connecting a new robot's hardware to getting the first usable gait has been compressed from the original "project-scale man-months" to a matter of weeks
The essence of this is — instead of building a separate system for every new embodiment, we replace it with "one system + a toolchain that automatically adapts to new embodiments"

I am not saying these numbers to talk about BridgeDP. I am saying them to make one point —

We are not describing a future story. We are using engineering facts to prove the existence of this category.

This layer is feasible. It can be abstracted. It can be reused across embodiments. It has already been voted on by 26 companies, with their actions.

By this point, this is no longer a PPT concept; it is a fact repeatedly validated by engineering practice — only today it is still concentrated on the one pillar of "motion capability."

The full form of Runtime Robot OS goes far beyond motion capability alone. But if one pillar has been proven feasible by engineering, it means the overall feasibility of this system has its first foundation stone.

On-Device, Edge, Cloud — Three Layers of the Full Form

At this point, I want to push one step further and paint a bigger picture.

We have built the on-device OS kernel. But the full form of Runtime Robot OS goes far beyond the device side.

It will have three layers —

On-device

The millisecond-level real-time control layer. It lets every robot walk safely and move stably in the physical world.

This is the minimum threshold for a robot to "stay alive." It is also the core of what BridgeDP is doing today. This layer must run on the robot itself and cannot depend on the network — because network latency will directly turn into a robot fall.

Edge

The scene-level skill orchestration layer. It lets multiple robots in a campus, on a production line, or in a home work together.

This layer does not require millisecond-level real-time, but it does require second-level coordination. Task allocation, space sharing, and capability complementarity among multiple robots — this does not happen on the device side (which has no global view), and it does not happen in the cloud (where latency is too high). It happens at the edge.

Cloud

The cross-embodiment capability accumulation and continual learning layer. It turns the experience of every robot into the capability of every robot.

This is the real "data flywheel" — if a robot in one home learns how to open a refrigerator, that capability can, under the conditions of safety, privacy, and ownership, become a shared capability for all robots of the same model.

All three layers are indispensable. Without the device layer, robots cannot survive in the physical world; without the edge layer, robots cannot collaborate in group scenarios; without the cloud layer, robots do not have true evolutionary capability.

Building these three layers in full is the complete form of Runtime Robot OS. This is not a one- or two-year matter; it is a decade-scale engineering effort.

New Infrastructure — OS + Compute Network

History has left a few matching shadows for this.

In the PC era, the new infrastructure was called "operating systems + the Internet backbone" — Windows / Linux paired with TCP/IP and fiber networks, allowing applications to flow globally.

In the mobile era, the new infrastructure was called "Android + 4G/5G" — one unified app stack paired with a wireless network covering billions of people, making the mobile Internet possible.

In the robotics era, it will be called —

"Runtime Robot OS + compute network."

This is the new infrastructure of the robotics era.

Its two components are both indispensable:

Runtime Robot OS solves "how a robot runs stably, safely, and continuously"
Compute network solves "how the compute in the device, edge, and cloud layers for millions and tens of millions of robots can be uniformly scheduled, orchestrated, and served"

No single company can do both of these on its own. An OS company cannot possibly build a nationwide compute network by itself; nor can a network and compute infrastructure provider start from scratch and build a runtime operating system for the robotics era.

It requires — deep collaboration between OS providers and network / compute infrastructure providers.

This is happening now. We stood on that stage because we believe this is no longer just a vision; it is a real industry process, being driven forward by multiple parties.

Closing Note: This Is an Invitation

I want to close with a passage from the end of the speech.

Every time a computing paradigm shifts, history leaves behind a Unix moment — someone steps forward and turns fragments into infrastructure.

Someone said this in the mainframe era, and Unix came to be. Someone said this in the smartphone era, and Android came to be. Today, in the humanoid robotics era, it is our generation's turn to say this sentence.

This is not one company's matter. Not one robot maker's matter. Not one operator's matter. Not one chipmaker's matter. Not any single company's matter.

This is the beginning of an era.

Over the next few months, we will gradually present our thinking, our products, and our engineering practice to the industry one item at a time. We will invite peers in the industry to discuss, debate, and help refine it until it becomes more right.

But more important than all of that is —

Everything we write today is not a conclusion; it is an invitation.

We look forward to defining together the shape this era's operating system should take with every engineer, researcher, founder, investor, and policymaker who cares about the robotics industry — together.

If you also believe that "fragments should become infrastructure" —

this is the invitation.

Shang Yangxing

Founder & CEO, BridgeDP

May 17, 2026 · Shenzhen