[đź“•Project
][🤗Gradio Demo
][đź“•Paper
]
The Efficient Track Anything Model(EfficientTAM) takes a vanilla lightweight ViT image encoder. An efficient memory cross-attention is proposed to further improve the efficiency. Our EfficientTAMs are trained on SA-1B (image) and SA-V (video) datasets. EfficientTAM achieves comparable performance with SAM 2 with improved efficiency. Our EfficientTAM can run >10 frames per second with reasonable video segmentation performance on iPhone 15. Try our demo with a family of EfficientTAMs at [🤗Gradio Demo
].
[Dec.2 2024] We release the codebase of Efficient Track Anything.
Online demo and examples can be found in the project page.
SAM 2 | |
EfficientTAM |
Input Image, SAM, EficientSAM, SAM 2, EfficientTAM
Point-prompt | |
Box-prompt | |
Segment everything |
EfficientTAM checkpoints will be available soon on the Hugging Face Space.
If you're using Efficient Track Anything in your research or applications, please cite using this BibTeX:
@article{xiong2024efficienttam,
title={Efficient Track Anything},
author={Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest Iandola, Raghuraman Krishnamoorthi, Bilge Soran, Vikas Chandra},
journal={preprint arXiv:2411.18933},
year={2024}
}