Yes. That’s exactly how we got the first image generating AIs - people took a huge amount of pictures and described in detail what’s in there. That’s how AI knows how to generate “a cat in a space suit standing on a moon” - there were a lot of pictures described “cat”, “space suit”, “standing”, “moon” etc. and the AI distilled the common part of each image matching the description.
And there are plenty use-cases to have a description of what’s on an image. For example for searching through images based on what’s in there.
Yep, he was full of absurd dissonance like that.