GitXplorerGitXplorer
m

visual-question-answering

public
3 stars
2 forks
1 issues

Commits

List of commits on branch master.
Verified
cf144e2532589b9472edb5f5e384a8e1251716ce

Delete .DS_Store

mmayank26saxena committed 5 years ago
Verified
7feb7924b814afc291794533a71765a68626f6cf

Update README.md

mmayank26saxena committed 5 years ago
Verified
a31cc29343e5dc3cf8cdbeb42f1b475b5abf1272

Update README.md

mmayank26saxena committed 5 years ago
Verified
e7c0b288665ab06c346823a423481d2bffe58bdb

Delete .DS_Store

mmayank26saxena committed 5 years ago
Unverified
509135bd630cd15eb248e97e2914cbf7c91759e7

changed extension

committed 5 years ago
Unverified
ea77a9d5785e09489d76b48078605b7fd45207a6

restructuring files

committed 5 years ago

README

The README file for this repository.

Visual Question Answering

Built four different neural network models for visual question answering using Tensorflow 2.0. Trained the model together on images of MS Coco and the VQA 2.0 dataset.

YouTube Demo

URL: https://www.youtube.com/watch?v=5wNP7VoB4tM

Dataset

We have used the VQA v2 dataset for training the models.

Models

Experimented by implementing 4 different models. The four models are as follows:

  • Model 1: Append Image as Word
  • Model 2: Prepend Image as word
  • Model 3: Question through LSTM with image
  • Model 4: Attention Based Model

Accuracy

  • Trained the above models with 30K examples and started with 30 epochs.
Train Accuracy Train Loss Test Accuracy Test Loss
Model 1 19.47 % 8.10 % 19.43 % 8.09 %
Model 2 19.40 % 8.11 % 19.43 % 8.09 %
Model 3 18.31 % 8.11 % 18.35 % 8.11 %
Model 4 22.49 % 4.07 % 24.57 % 4.09 %

Sample Predictions

Sample predictions