GitXplorerGitXplorer
f

soundspaces-challenge

public
12 stars
6 forks
0 issues

Commits

List of commits on branch main.
Verified
0eaaa76c992bba49ad5cf0151493259bee771bcb

EvalAI challenge page made public

SSAGNIKMJR committed 2 years ago
Unverified
e8faeef2493a087ba251e02ebdea7c3ddfc832f8

README updated with challenge link

SSAGNIKMJR committed 2 years ago
Unverified
a25ca7b8701dd969f2cffb86835b6987d00c369b

active AV sep. challenge info init

SSAGNIKMJR committed 2 years ago
Unverified
7ea628a30dea42ee52b854abfc44aad7d71e2a48

readme for '23 challenge updated

SSAGNIKMJR committed 2 years ago
Unverified
62037b5d0a6fdb035a441917966435aba4c2096b

README updated with name of split for test-std phase

SSAGNIKMJR committed 3 years ago
Unverified
564f4dca11186e7d647932b3f8734d2e2d04f220

readme updated with new repo link

SSAGNIKMJR committed 3 years ago

README

The README file for this repository.


SoundSpaces Challenge 2023

This repository contains starter code for the 2023 challenge, details of the tasks, and training and evaluation setups. For an overview of SoundSpaces Challenge visit soundspaces.org/challenge.

This year, we are hosting two challenges: the first one is on the audio-visual navigation task [1], where an agent is tasked to find a sound-making object in unmapped 3D environments with visual and auditory perception, and the second one is on the active audio-visual source separation task [3], where an agent is tasked to separate a target sound-making object emitting time-varying sounds from an audio mixture comprising spatial time-varying sounds from multiple sound sources.

Task

In AudioGoal navigation (AudioNav), an agent is spawned at a random starting position and orientation in an unseen environment. A sound-emitting object is also randomly spawned at a location in the same environment. The agent receives a one-second audio input in the form of a waveform at each time step and needs to navigate to the target location. No ground-truth map is available and the agent must only use its sensory input (audio and RGB-D) to navigate.

In Active Audio-Visual Separation (active AV separation), an agent is spawned at a random starting position and orientation in an unseen environment. Multiple sound-emitting objects, each of which emits a time-varying sound, are also randomly spawned at a location in the same environment. The agent receives a one-second audio input in the form of a waveform, which is a mixture of the spatial sounds from all sources, at each time step and needs to navigate to separate the audio from a target source, denoted by a target class label, at every step of its motion. No ground-truth map is available and the agent must only use its sensory input (audio and RGB) to navigate. The current version of the challenge considers separation scenarios like speech vs. speech and speech. vs. music.

Dataset

The challenge will be conducted on the SoundSpaces Dataset, which is based on AI Habitat, Matterport3D, and Replica. For this challenge, we use the Matterport3D dataset due to its diversity and scale of environments. This challenge focuses on evaluating agents' ability to generalize to unheard sounds and unseen environments. For AudioNav, the training and validation splits are the same as used in Unheard Sound experiments reported in the SoundSpaces paper. They can be downloaded from the SoundSpaces dataset page (including minival). For active AV separation, the training and validation splits are the same as used in Unheard Sound experiments reported in the Active AV Dynamic Separation paper.

Evaluation

For AudioNav, after calling the STOP action, the agent is evaluated using the 'Success weighted by Path Length' (SPL) metric [2]. An episode is deemed successful if on calling the STOP action, the agent is within 0.36m (2x agent-radius) of the goal position.

For active AV separation, the agent is evaluated using the 'Scale-invariant source-to-distortion ratio' (SI-SDR) metric, averaged over the whole agent trajectory.

Participation Guidelines

Participate in the contest by registering on the EvalAI challenge page and creating a team. Participants will upload JSON files containing the evaluation metric values for both challenges. For AV Nav, participants will also upload the trajectories executed by their model, which will be used to validate the submitted performance values. For active AV separation, the winning teams will be later asked to turn in their code and checkpoints for inspection. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified. Instructions for evaluation and online submission are provided below.

Evaluation

For AudioNav,

  1. Clone the challenge repository:

    git clone https://github.com/facebookresearch/soundspaces-challenge.git
    cd soundspaces-challenge
  2. Implement your own agent or try one of ours. We provide an agent in agent.py that takes random actions:

    import habitat
    import soundspaces
    
    class RandomAgent(habitat.Agent):
        def reset(self):
            pass
    
        def act(self, observations):
            return numpy.random.choice(len(self._POSSIBLE_ACTIONS))
    
    def main():
        agent = RandomAgent(task_config=config)
        challenge = soundspaces.Challenge()
        challenge.submit(agent)
  3. Following instructions for downloading SoundSpaces dataset and place all data under data/ folder.

  4. Evaluate the random agent locally:

    env CHALLENGE_CONFIG_FILE="configs/challenge_random.local.yaml" python agent.py 

    This calls eval.py, which dumps a JSON file that contains a Python dictionary of the following type:

    eval_dict = {"ACTIONS": {f"{scene_id_1}_{episode_id_1}": [action_1_1, ..., 0], f"{scene_id_2}_{episode_id_2}": [action_2_1, ..., 0]}, "SPL": average_spl, "SOFT_SPL": average_softspl, "DISTANCE_TO_GOAL": average_distance_to_goal, "SUCCESS": average_success}

    Make sure that the json that gets dumped upon evaluating your agent is of the exact same type. The easiest way to ensure that is by not modifying eval.py.

For active AV separation, follow instructions in the challenge branch of the active-AV-dynamic-separation repository.

Online submission

Follow instructions in the submit tab of the EvalAI challenge page (will open soon!) to upload your evaluation JSON file.

Valid challenge phases are AudioNav {Minival, Test-Standard} Phase and AudioSep Test-Standard Phase.

The challenge consists of the following phases:

  1. AudioNav Minival Phase: This split is same as the one used in ./test_locally_audionav_rgbd.sh. The purpose of this phase/split is sanity checking -- to confirm that your online submission to EvalAI doesn't run into any issue during evaluation. Each team is allowed maximum of 30 submission per day for this phase.
  2. AudioNav Test-Standard Phase: The purpose of this phase is to serve as the public leaderboard establishing the state of the art for AudioNav; this is what should be used to report results in papers. The relevant split for this phase is test_multiple_unheard. Each team is allowed maximum of 10 submission per day for this phase. As a reminder, the submitted trajectories will be used to validate the submitted performance values. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified.
  3. AudioSep Test-Standard Phase: The purpose of this phase is to serve as the public leaderboard establishing the state of the art for active AV separation; this is what should be used to report results in papers. The relevant split for this phase is testUnheard_1000episodes. Each team is allowed maximum of 30 submission per day for this phase. As a reminder, the winning teams of the active AV separation challenge will be later asked to turn in their code and checkpoints for inspection. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified.

Note: If you face any issues or have questions you can ask them by mailing the organizers or opening an issue on this repository.

Baselines and Starter Code

  1. AudioNav: We included both the configs and Python scripts for av-nav and av-wan. Note that the MapNav environment used by av-wan is baked into the environment container and can't be changed. We suggest you to re-write that planning for loop in the agent code if you want to modify mapping or planning.

  2. Active AV Separation: We have included configs and Python in the challenge branch of the active-AV-dynamic-separation repository.

Acknowledgments

Thank Habitat team for the challenge template.

References

[1] SoundSpaces: Audio-Visual Navigation in 3D Environments. Changan Chen*, Unnat Jain*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman. ECCV, 2020.

[2] On evaluation of embodied navigation agents. Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir. arXiv:1807.06757, 2018.

[3] Active Audio-Visual Separation of Dynamic Sound Sources. Sagnik Majumder, Kristen Grauman. ECCV, 2022.

License

This repo is MIT licensed, as found in the LICENSE file.