GitXplorerGitXplorer
z

Awesome-Multimodal-Chatbot

public
62 stars
6 forks
0 issues

Commits

List of commits on branch main.
Unverified
b915d22fcea6509cc05e547867b4f2631c092ba0

Merge branch 'main' of github.com:zjr2000/Awesome-Multimodal-Chatbot into main

zzjr2000 committed a year ago
Unverified
ae82ea96d8f1ffff3a1faaa8ff832bbc0e04e5ce

Add papers

zzjr2000 committed a year ago
Verified
2ddd682612a9f475f3eff04726ba741ee25f7293

Update README.md

FFeiElysia committed a year ago
Verified
ad2f8b7934c442ad6dab2c2bfb878561c8d14db7

Update README.md

FFeiElysia committed a year ago
Unverified
00452ef9e20b69a7fe290705c32d3c3b6b8ad52b

Update repo name

zzjr2000 committed a year ago
Unverified
67e6d045fd664ed776e9c8dc7c72d70de1631aec

Update star

zzjr2000 committed a year ago

README

The README file for this repository.

Awesome-Multimodal-Chatbot Awesome

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience. It is designed to assist users in performing various tasks, from simple information retrieval to complex multimedia reasoning.

Multimodal Instruction Tuning

  • MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

    arXiv 2022/12 [paper]

  • GPT-4

    arXiv 2023/03 [paper] [blog]

  • Visual Instruction Tuning Star

    arXiv 2023/04 [paper] [code] [project page] [demo]

  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models Star

    arXiv 2023/04 [paper] [code] [project page] [demo]

  • mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Star

    arXiv 2023/04 [paper] [code] [demo]

  • LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Star

    arXiv 2023/04 [paper] [code] [demo]

  • Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding Star

    [code]

  • LMEye: An Interactive Perception Network for Large Language Models Star

  • arXiv 2023/05 [paper] [code]

  • MultiModal-GPT: A Vision and Language Model for Dialogue with Humans Star

    arXiv 2023/05 [paper] [code] [demo]

  • X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages Star

    arXiv 2023/05 [paper] [code] [project page]

  • Otter: A Multi-Modal Model with In-Context Instruction Tuning Star

    arXiv 2023/05 [paper] [code] [demo]

  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Star

    arXiv 2023/05 [paper] [code]

  • InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language Star

    arXiv 2023/05 [paper] [code] [demo]

  • VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksStar

    arXiv 2023/05 [paper] [code]

  • Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language ModelsStar

  • arXiv 2023/05 [paper] [code] [project page]

  • EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought Star

    arXiv 2023/05 [paper] [code] [project page]

  • DetGPT: Detect What You Need via Reasoning Star

    arXiv 2023/05 [paper] [code] [project page]

  • PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology Star

    arXiv 2023/05 [paper] [code]

  • ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst Star

    arXiv 2023/05 [paper] [code] [project page]

  • Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Star

    arXiv 2023/06 [paper] [code]

  • LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

    arXiv 2023/06 [paper]

  • Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

    arXiv 2023/06 [paper] [project page]

  • VALLEY: VIDEO ASSISTANT WITH LARGE LANGUAGE MODEL ENHANCED ABILITY Star

    arXiv 2023/06 [paper] [code]

LLM-Based Modularized Frameworks

  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Star

    arXiv 2023/03 [paper] [code] [demo]

  • ViperGPT: Visual Inference via Python Execution for Reasoning Star

    arXiv 2023/03 [paper] [code] [project page]

  • TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs Star

    arXiv 2023/03 [paper] [code]

  • Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions Star

    arXiv 2023/03 [paper] [code]

  • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action Star

    arXiv 2023/03 [paper] [code] [project page] [demo]

  • Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface Star

    arXiv 2023/03 [paper] [code] [demo]

  • VLog: Video as a Long Document Star

    [code] [demo]

  • Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions Star

    arXiv 2023/04 [paper] [code]

  • ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

    arXiv 2023/04 [paper] [project page]

  • VideoChat: Chat-Centric Video Understanding Star

    arXiv 2023/05 [paper] [code] [demo]