helper functions for processing and integrating visual language information with Qwen-VL Series Model