Core AI Systems in Humanoid Robotics
Analysis of Sensor Data
A humanoid robot receives information about its surroundings and also about its own state at a given time and location through its sensors. This information is then used to decide the proper course of action accordingly and then executing these decisions. Sensor inputs are often analyzed as combined inputs in order to obtain better overall information and improve accuracy. There are various important aspects to sensor data analysis, such as time dependence, filtering noise, extraction of useful data, classification of data. See our post here for more detail on this topic.
Cognitive AI
Cognitive AI encompasses perception, reasoning, learning, decision making and planning, which are all key aspects of human intelligence. Without cognitive AI, a robot is merely an automated, programmed machine, unable to respond to dynamically changing environments and come up with adequate responses. Below we briefly explain the components of cognitive AI. In the recent few years, there has been great advances in AI, mainly by LLMs, meaning, Large Language Models, that can understand and respond like humans in many ways which already boosted production in so many fields. But these are still not yet cognitive AI or AGI. Cognitive AI is often used to mean one step further, which means AI that can think like human, which is also referred to as AGI (Artificial General Intelligence), which means, AI that can practically think and reason at least like a human in every way possible (if there are missing aspects, then it would not be AGI). As of the date of writing of this article, June 2025, many experts believe that AGI will be achieved in not so distant future. Based on the observations of the author, expert estimates range from anywhere between 2 to 15 years. We, humans live in a physical world, therefore in all aspects of AI that are listed below, mastering the physical world by AI is inherently a very important component, which is still not yet fully understood but great advancements are already being made on that as well. Many firms developed robot simulation systems, which enable simulation of physical world and testing the robot completely virtually which are far faster and cheaper than actual physical testing.
Perception:
A humanoid robot must perceive its surroundings and feed this information to learning and decision making aspects of cognitive AI. Perception from sensors such as visual, audio, tactile, ultrasonic input and analyzing its data is the key process here. Audio perception includes recognizing and understand speech and natural language processing.
Learning:
AI learning (as far as robots are concerned) may be grouped as machine learning, such as learning from human demonstration, continuous learning from user feedback, trial and error learning which is also called reinforcement learning, adaptive learning, which is the modification of actions depending on external changes in the context. For more on AI learning, see our post here.
Reasoning and Decision Making:
To make decisions, an AI system uses various methods of reasoning, such as deductive, inductive and probabilistic reasoning. The decisions are based mainly on goals and environmental context.
Memory usage can mainly be classified as long term memory and contextual memory. Long term memory is storing the learned knowledge from previous input or learning. Contextual memory is more dynamic and serves at present time, where the robot keeps track of changing conditions, ongoing tasks and continues decision making and planning.
Safety protocols and ethical concerns are also integrated into decisions.
For more on reasoning, decision making and planning of AI, see our post here.
Planning:
Task planning means making plans to reach goals and to break down complex tasks into simpler ones.
Planning also means changing course of action if something unexpected happens. Breaking down complex tasks into simpler ones (meaning, deciding how to break them down) all autonomously is one of the biggest challenges here.
Computer Vision
This means recognizing faces, objects, motions and surroundings and the relative position of everything in the robot’s environment. Computer vision also encompasses mapping the surroundings with Lidar as applicable.
Vision in humanoid robots is made possible by visual sensors, and algorithms that enable perception.
First a scene is captured by a camera in the form of pixels. This needs to be preprocessed through algorithms to reduce noise in data and prepare as raw input. Through object detection algorithms such as YOLO and DETR (both of which are open source), objects are not only recognized but also located with respect to their surroundings. In addition to object recognition, vision involves tasks such as depth estimation, understanding boundaries, scene understanding, facial recognition, detecting pixel level context, to distinguish objects and people.
Navigation and Path Planning
This is to plan the overall path the robot needs to take in order to reach from point A to point B in its surroundings. It involves recognizing obstacles and avoid them, while planning the most efficient route to reach target destination while making real time decisions.
Motion Planning
Motion planning is managing the balance, stability, force/torque/velocity control of joints, actuation to manage robot’s own body and limb positions to achieve motions such as walking, running, grasping, adjusting torso position.
While moving, a change of position must be performed safely, efficiently, collision free, at the same time considering changes in the environment. Kinematics concepts are applied to make required movements in motors in order to reach target state of body. There are a lot of variables and a complex level of physics and math involved in motion planning. Variables such as torque and speed capabilities, joint limits, center of mass, balance, inertia and momentum, sensor feedback in real time all must be considered. Robot’s overall balance and gait is constantly checked for bipedal motion using principles such as zero moment point, capture point, inverted pendulum.
Interaction with Humans and Consciousness
A humanoid robot must have an acceptable level of interaction skills with its environment and with humans, as the ultimate goal of making a humanoid robot is to exist near humans, performing tasks previously achieved by humans.
The robot must be able to detect emotions from humans by analyzing body language, facial expressions, tone of voice. Similarly, the robot can convey a sense of emotion by applying such things into its own body, face, gesture and voice, which is purely for human convenience of course, as real emotional awareness or self awareness has not been achieved so far and it is very much debated if it will ever be. In the author’s opinion, AI can mimic self awareness and emotions very well to the point that will be impossible to distinguish real from AI, but in the end, no matter how advanced, it is merely a set of constructed hardware and algorithms. Think of it this way: Our consciousness definitely doesn’t come from our arms or legs or even heart. Then for such way of materialistic thinking, the only remaining part will be our brains. But this logic has flaw. Because realize that science eventually, will also be able to construct a brain which may be 1000 times better than a human brain in every way. Then will that brain, which is 1000 times better than our biological brain in every way, have consciousness? Of course not, because, again, it is ultimately a set of hardware and algorithms in the end. If, even such brain does not have consciousness, then, our consciousness must not be stemming from our brain. That leaves the only choice of “soul”, as the source of our consciousness. Of course, one might argue here that we cannot yet even adequately define what consciousness means, but that would be kind of intentionally avoiding to accept that consciousness is real in order to avoid admitting that its source must be something not of material origin. Consciousness exists. You are consciously reading this article at the moment. “I think, therefore I am”.
By: A. Tuter
——————————————————
Terms of use:
Copying or republishing of any type of content from this website is not allowed without written permission from us. We make dated records of our posts, keep originals of our images.
The content in this website may contain errors, inaccuracies or be incomplete. User assumes all liability as a result of usage of content in our site. Also see our Terms page for more details.