π0.5: A VLA with open-world generalization
Is it a throughput constraint given too much data from the environment sensors?
Is it processing the data?
I'm curious about where the bottleneck is.
One way of doing that is to write code with no bugs or unpredictable behaviour, a nigh-impossible feat - especially once you've got ML models in the mix.
Another option is to put a guard cage around your robot so nobody can enter pan-throwing distance without deactivating the robot first. But obviously that's not practical in a home environment.
Another option is just to go slowly all the time. The pan won't fly very far if the robot only moves 6 inches per second.
Some combination of distillation, new architectures, faster compute, can eventually attack these problems. Historically as long as something in tech has been shown to be possible, speed has almost always been a non-issue in the years afterwards.
For now even getting a robot to understand what to do in the physical world is a major leap from before.
It's slow because the original telop is slow, and the learned controllers through imitation learning is always a bit slower.
Source : i work on this (not at PI)
Pi0 uses ARX robot arm which weights 3-4kg per arm. It can easily break things or harm people if you allow it to move fast.
Depending on the model being used, we may get just one set of joint angle deltas or a series of them. In order to be able to complete a task, it will need to capture images from the cameras, current joint angles and send them to the model along with the task text to get the joint angle changes we will need to apply. Once the joint angles are updated, we will need to check if the task is complete (this can come from the model too). We run this loop till the task is complete.
Combine this with the motion planning that has to happen to make sure the joint angles we are getting do not result in colliding with the surroundings and are safe, results in overall slowness.
By the way, they’ve open-sourced their π0 model (code and model weights). More information can be found here: https://github.com/Physical-Intelligence/openpi
Doing your demo is significantly easier if you've already programmed/trained the robot to recognize the specific objects it has to interact with, even if those items are in different locations.
I think the future is in "softer" type of robots that can sense whether their robot fingers are pushing a cabinet door (or if it's facing resistance) and adjust accordingly. A quick google search shows this example (animated render) which is closer to what I imagine the ultimate solution will be: https://compliance-robotics.com/compliance-industry/
Human flesh is way too squishy for us to allow hard tools to interface with it, unless the human is in control. The difference between a blunt weapon and the robot from TFA is that the latter is very slow and on wheels.