zjr2000
Master student at SUSTech, ShenZhen, China. My research focuses on Computer Vision, specifically exploring the intersection of vision and language learning.
Repositories
Select a repository to view its commits, contributors, and more.Awesome-Multimodal-Chatbot
Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience.
LLMVA-GEBC
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
GVL
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
REVERIE
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Untrimmed-Video-Feature-Extractor
A simple and effective feature extractor for untrimmed videos
Context-GEBC
Second-place solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2022 workshop)