GitXplorerGitXplorer
m

visual-question-answering

public
3 stars
2 forks
1 issues

Commits

List of commits on branch master.
Verified
8e387cef1bb999cf54871630efb82cc432fa9a10

Delete .DS_Store

mmayank26saxena committed 5 years ago
Unverified
b9010f3c02e314fb4785a7e491ff546ce4600ff0

adding code

committed 5 years ago

README

The README file for this repository.

Visual Question Answering

Built four different neural network models for visual question answering using Tensorflow 2.0. Trained the model together on images of MS Coco and the VQA 2.0 dataset.

YouTube Demo

URL: https://www.youtube.com/watch?v=5wNP7VoB4tM

Dataset

We have used the VQA v2 dataset for training the models.

Models

Experimented by implementing 4 different models. The four models are as follows:

  • Model 1: Append Image as Word
  • Model 2: Prepend Image as word
  • Model 3: Question through LSTM with image
  • Model 4: Attention Based Model

Accuracy

  • Trained the above models with 30K examples and started with 30 epochs.
Train Accuracy Train Loss Test Accuracy Test Loss
Model 1 19.47 % 8.10 % 19.43 % 8.09 %
Model 2 19.40 % 8.11 % 19.43 % 8.09 %
Model 3 18.31 % 8.11 % 18.35 % 8.11 %
Model 4 22.49 % 4.07 % 24.57 % 4.09 %

Sample Predictions

Sample predictions