The distinction between intuition and reason is very important for robotic natural language understanding systems because most natural language expressions that people utter are intuitive and could embarrass robots. In order to remove such a kind of human-robot cognitive divide, robots have to be provided with a certain capability to perform systematic bi-directional translation between mental image description language (Lmd) expressions yielded by human cognition and by robotic cognition. Mental image directed semantic theory has been applied to several versions of the intelligent system images, where robot manipulation is to be realized as cross-media operation via Lmd. The semantic understanding of human verbal suggestion makes the robot abstractly aware of which matters and attributes involved in its sensations should be attended to, and its pragmatic understanding provides the robot with a concrete idea of real matters with real attribute values significant for its action.