GitXplorerGitXplorer
W

show-attend-and-tell

public
3 stars
1 forks
0 issues

Commits

List of commits on branch master.
Verified
0f1b8bc3ddae7b47fa852a035e154ff2bae5caf5

Update README.md

WWentong-DST committed 7 years ago
Verified
2c835bad350930f3827a4fb2514b517839489f69

Update prepro.py

WWentong-DST committed 7 years ago
Verified
07eb1485d1dee8ca54895d8ef902f6a9f17480ec

Update train.py

WWentong-DST committed 7 years ago
Verified
a2d33d432c4e7d399c56a235c4bfea97a1a4f4f0

Update README.md

WWentong-DST committed 7 years ago
Verified
1b8ca3a0025f5c46b5153cc872dea9b198f276b6

Update README.md

yyunjey committed 7 years ago
Verified
1372f044a65b994df0e89af897db60e336354a1d

Merge pull request #42 from rubenvereecken/tf1.2.0

yyunjey committed 7 years ago

README

The README file for this repository.

Show, Attend and Tell

Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.

References

This is based on Yunjey's show-attend-and-tell repository.


Getting Started

Prerequisites

First, clone this repo and pycocoevalcap in same directory.

$ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git
$ git clone https://github.com/tylin/coco-caption.git

Replace captions_val2014.json in coco-caption/annotations/, and captions_val2014_fakecap_results.json in coco-caption/results/ when needed to change dataset.

Preparation

This code is written in Python2.7 and requires TensorFlow 1.2. In addition, you need to install a few more packages to process MSCOCO data set. I have provided a script to download the MSCOCO image dataset and VGGNet19 model.

Run commands below then the images will be downloaded in image/ directory (ensure that train2014/ and val2014/ exist) and VGGNet19 model (imagenet-vgg-verydeep-19.mat) will be downloaded in data/ directory.

In addition, ensure that caption files captions_train2014.json and captions_val2014.json are stored in data/annotations/ folder.

$ cd show-attend-and-tell-tensorflow
$ pip install -r requirements.txt
$ chmod +x ./download.sh
$ ./download.sh

For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in image/train2014_resized/ and image/val2014_resized/ directory.

$ python resize.py

Before training the model, you have to preprocess the MSCOCO caption dataset.

Train the model

To generate caption dataset and image feature vectors, run command below.

$ python prepro.py

Adjust max_length to truncate words length, word_count_threshold to filter word frequencies.

To train the image captioning model, run command below.

$ python train.py

Adjust n_time_step to fit various sentence length.

Evaluate the model

To generate captions, visualize attention weights and evaluate the model, please see evaluate_model.ipynb.


Results


Training data

(1) Generated caption: A plane flying in the sky with a landing gear down.

alt text

Validation data

(1) Generated caption: A large elephant standing in a dry grass field.

alt text

Test data

(1) Generated caption: A plane flying over a body of water.

alt text