Recently, voice wake-up has appeared in people everyday life more frequently through smart speakers and in-vehicle devices. Personalized voice wake-up, including customized wake-up word detection and specific voiceprint verification, also gains more attention. In personalized voice wake-up scenario, there are still many topics worth exploring. Some topics include, how the speech models can better adapt to different wake-up words, and how the models can jointly optimize the two tasks, the wake-up word detection and speaker voiceprint recognition. Also, given that automatic machine learning (AutoML), meta-learning and other methods in artificial intelligence field has already received successful results in speech recognition tasks, whether those methods can be used to improve the personalized wake-up scenario is another problem worth exploring.
In order to promote technological development and narrow the gap between academic research and practical applications, 4Paradigm, together with National Taiwan University, Northwestern Polytechnical University, and Southern University of Science and Technology, and ChaLearn will organize this Auto-KWS Challenge and Special session at INTERSPEECH 2021 conference). In this challenge we will release this multilingual dataset(dialect included) for Personalized Keyword Spotting (Auto-KWS) , which closely resembles the real world scenarios, as each recorder is assigned with an unique wake-up word and can choose their recording environment and familiar dialect freely. In addition, the competition will test and evaluate the participants' algorithms through AutoML.ai competition platform. Participants will submit code and pre-trained models to conduct algorithm evaluation under unified platform resources. After the competition, the dataset will be released on AutoML.ai platform as an open benchmark available for research, to further boost ideas exchange and discussions in this area.
The challenge website is: https://www.automl.ai/competitions/15.
Please sign up on AutoML.ai platform and register the competition by the following entrance: https://docs.google.com/forms/d/1Qs7erVOgNW8KFX2G-0nQlsOqYi62hbD4MFSC10PSYIM/edit. Once the registration is approved, you will receive a link to the training datasets from your registered email.
Note: Participants are only allow to submit code via one AutoML.ai account. We will manual check participants' identity and disqualify the duplicated and inauthentic accounts.
In the last decade, machine learning (ML) and deep learning (DL) has achieved remarkable success in speech-related tasks, e.g., speaker verification (SV), automatic speech recognition(ASR) and keyword spotting (KWS). However, in practice, it is very difficult to get proper performance without expertise of machine learning and speech processing. Automated Machine Learning (AutoML) is proposed to explore automatic pipeline to train effective models given a specific task requirement without any human intervention. Moreover, some methods belonging to AutoML, such as Automated Deep Learning (AutoDL) and meta-learning have been used in KWS and SV tasks respectively. A series of AutoML competitions, e.g., automated natural language processing (AutoNLP) and Automated computer vision (AutoCV), have been organized by 4Paradigm, Inc. and ChaLearn (sponsored by Google). These competitions have drawn a lot of attention from both academic researchers and industrial practitioners.
Keyword spotting, usually as the entrance of smart device terminals, such as mobile phone, smart speakers, or other intelligent terminals, has received a lot of attention in both academia and industry. Meanwhile, out of consideration of fun and security, the personalized wake-up mode has more application scenarios and requirements. Conventionally, the solution pipeline is combined of KWS and text dependent speaker verification (TDSV) system, and in which case, two systems are optimized separately. On the other hand, there are always few data belonging to the target speaker, so both of KWS and speaker verification(SV) in that case can be considered as low resource tasks.
In this challenge, we propose the automated machine learning for Personalized Keyword Spotting (Auto-KWS) which aims at proposing automated solutions for personalized keyword spotting tasks. Basically, there are several specific questions that can be further participants explored, including but not limited to:
• How to automatically handle multilingual, multi accent or various keywords?
• How to make better use of additional tagged corpus automatically?
• How to integrate keyword spotting task and speaker verification task?
• How to jointly optimize personalized keyword spotting with speaker verification?
• How to design multi-task learning for personalized keyword spotting with speaker verification?
• How to automatically design effective neural network structures?
• How to reasonably use meta-learning, few-shot learning, or other AutoML technologies in this task?
Additionally, participants should also consider:
• How to automatically and efficiently select appropriate machine learning model and hyper-parameters?
• How to make the solution more generic, i.e., how to make it applicable for unseen tasks?
• How to keep the computational and memory cost acceptable?
We have already organized two successful automated speech classification challenge AutoSpeech in ACML2019 and AutoSpeech2020 in INTERSPEECH2020, which are the first two challenges that combine AutoML and speech tasks. This time, our challenge Auto-KWS will focus on personalized keyword spotting tasks for the first time, and the released database will also serve as a benchmark for researches in this filed and boost the idea exchanging and discussion in this area.
1st Prize: 2000 USD
2nd Prize: 1500 USD
3rd Prize: 500 USD
Hung-Yi Lee, College of Electrical Engineering and Computer Science National Taiwan University
Lei Xie, Audio, Speech and Language Processing Lab (NPU-ASLP), Northwestern Polytechnical University
Tom Ko, Southern University of Science and Technology
Wei-Wei Tu, 4Pardigm Inc.
Isabelle Guyon, Universte Paris-Saclay, ChaLearn
Qiang Yang, Hong Kong University of Science and Technology
Chunyu Zhao, 4Paradigm Inc.
Jie Chen, 4Paradigm Inc.
Jingsong Wang, 4Paradigm Inc.
Qijie Shao, NPU-ASLP
Shouxiang Liu, 4Paradigm Inc.
Xiawei Guo, 4Paradigm Inc.
Xiong Wang, NPU-ASLP
Yuxuan He, 4Paradigm Inc.
Zhen Xu, 4Paradigm Inc.
Please contact the organizers if you have any problem concerning this challenge.
Previous AutoML Challenges:
- AutoSpeech2020@InterSpeech2020
Founded in early 2015, 4Paradigm is one of the world’s leading AI technology and service providers for industrial applications. 4Paradigm’s flagship product – the AI Prophet – is an AI development platform that enables enterprises to effortlessly build their own AI applications, and thereby significantly increase their operation’s efficiency. Using the AI Prophet, a company can develop a data-driven “AI Core System”, which could be largely regarded as a second core system next to the traditional transaction-oriented Core Banking System (IBM Mainframe) often found in banks. Beyond this, 4Paradigm has also successfully developed more than 100 AI solutions for use in various settings such as finance, telecommunication and internet applications. These solutions include, but are not limited to, smart pricing, real-time anti-fraud systems, precision marketing, personalized recommendation and more. And while it is clear that 4Paradigm can completely set up a new paradigm that an organization uses its data, its scope of services does not stop there. 4Paradigm uses state-of-the-art machine learning technologies and practical experiences to bring together a team of experts ranging from scientists to architects. This team has successfully built China’s largest machine learning system and the world’s first commercial deep learning system. However, 4Paradigm’s success does not stop there. With its core team pioneering the research of “Transfer Learning,” 4Paradigm takes the lead in this area, and as a result, has drawn great attention of worldwide tech giants.
ChaLearn is a non-profit organization with vast experience in the organization of academic challenges. ChaLearn is interested in all aspects of challenge organization, including data gathering procedures, evaluation protocols, novel challenge scenarios (e.g., competitions), training for challenge organizers, challenge analytics, result dissemination and, ultimately, advancing the state-of-the-art through challenges.
Starting kit and baselines can also be found here: https://github.com/janson9192/autokws2021
Since this is a challenge with code submission, we provide two baselines for testing purposes.
To make a test submission, download the starting kit and follow the readme.md file instruction. Click on the blue button "Upload a Submission" in the upper right corner of the page and re-upload the starting kit. Attention: You must first click the orange tab "Feedback Phase" if you want to get ranked on all datasets for your submission. You will not get ranked if you submit under the green tab "Dataset1". “Dataset1” contains the feedback dataset recorded from 20 speakers. To check progress on your submissions, please click "My Submissions" tab. To view your submission score, please click the "Results" tab. Notice that the detailed score for each speakers data could be found under "learning curve" button.
The starting kit contains everything you need to create your own code submission. Please follw the readme file and simply modify initialize.sh, enrollment.sh and predict.sh file and test it on your local computer with the same handling programs and docker image as those of the Automl.ai platform.
You could download starting kit via the following link: https://github.com/janson9192/autokws2021
Note that the version of cuda in this docker is 10, if the cuda version on your own PCs is less than 10, you may fail to use the GPU in this docker.
You can test your code in the exact same environment as the AutoML.ai environment using docker. You can run the ingestion program (to produce predictions) and the scoring program (to evaluate your predictions) on sample data.
1. If you are new to docker, install docker from https://docs.docker.com/get-started/.
2. In the shell, change to the starting-kit directory, run
docker run --gpus '"device=0"' -it -v "$(pwd):/app/auto-kws" janson91/autokws2021:gpu /bin/bash
3. Now you are in the bash of the docker container, run the local test program
python run_local_test.py --dataset_dir=`pwd`/sample_data/practice --solution_dir=`pwd`/sample_data/practice_solution --code_dir=./code_submission
It runs ingestion and scoring program simultaneously, and the predictions and scoring results are in sample_result_submissions and scoring_output directory.
If "docker pull" takes too long, you can just download docker from:
Google Drive: https://drive.google.com/file/d/18BjHpY_Hy9-Wd0eAVvTOOvlj4ok7vH41/view?usp=sharing
Baidu Yun:https://pan.baidu.com/s/10aSR6Rm_57hOF5PC-mISqg password: 9wyu
and run command "docker load -i autokws2021docker.tar"
The interface is simple and generic: you must supply these three file: initialize.sh, enrollment.sh and predict.sh
To make submissions, cd submission_dir and zip -r submission.zip * (without the directory), then use the "Upload a Submission" button. Please note that you must click first the orange tab "On-line Phase" if you want to make a submission simultaneously on all datasets and get ranked in the challenge. Besides that, the ranking in the public leaderboard is determined by the LAST code submission of the participants.
In the starting-kit, we provide a docker that simulates the running environment of our challenge platform. And a conda environment(py36) has been created in docker. Participants can check the python version and installed python packages with the following commands:
python --version
pip list
source activate py36
……
On our platform, for each submission, the allocated computational resources are:
All data are recorded by near-field mobile phones, (located in front of the speakers at around 0.2m distance). Each sample is recorded in single channel, 16-bit streams at a 16kHz sampling rate. There are 4 datasets: training dataset, practice dataset, feedback dataset, and private dataset. Training dataset, recorded from around 100 recorders, is used for participants to develop Auto-KWS solutions. Practice dataset contains 5 speakers data, each with 5 enrollment audio data and seveal test audio. Practice dataset together with the downloadable docker provides an example of how platform would call the participants' code. Both Training and practice dataset can be downloaded for local debugging. The feedback dataset and private dataset have the same format of practice dataset and are used for final evaluation and thus will be hide from participants. Participants final solution will be evaluated by the platform using those two datasets during feedback phase and private phase respectivaly without any human intervention.
Here is a summary of 4 datasets. (The specific number may be slightly adjusted when the challenge is officially launched.)
Dataset |
Speaker Num |
Phase |
Keywords/enrollment num |
Training Dataset | 100 | Before feedback phase | 10 |
Practice Dataset | 5 | Before feedback phase | 5 |
Feedback Dataset | 20 | Feedback phase | 5 |
Private Datasets | 40 | Final phase | 5 |
Here is the structor of the training dataset and practice dataset:
training dataset
├── T001
│ ├── enrollment
│ │ ├── 001.wav
│ │ └── ......
│ └── others
│ ├── 002.wav
│ └── ......
├── T002
└── ......
"enrollment" dir in training dataset contains the enrollment utterances of each speaker, and "others" dir contains other utterances recorded by the speaker, such as command words or some free-text. The audio in "others" are negative samples, and you can also use them to make more negative samples.
practice dataset
├── P001
│ ├── enrollment
│ │ ├── 001.wav
│ │ └── ......
│ └── test
│ ├── 001.wav
│ └── ......
├── P002
├── ......
└── test_label.txt
"test" dir in practice dataset contains both positive and negative samples. The labels of test samples record in "test_label.txt", 1 and 0 represent positive and negative samples, respectively. For each speaker, samples which contains the wake-up word from the speaker are positive (wake-up word only at the end of audio). All other samples, such as other utterances from the speaker, the wake-up word recorded by other speakers, and any other audio that NOT contains the wake-up word recoreded by the speaker, are negative.
Note: negative samples such as the wake-up word recorded by other speakers, and any other audio that does not contains the wake-up word recorded by the speaker are not included in "other" folder. We purposefully designed the competition this way to leave more space for the participants. You could generate more useful negative samples to train your model, through external resources or data augmentation techniques, etc.
This challenge has three phases: Feedback Phase, Check Phase and Final Phase.
Before the feedback phase, the participants are provided with the training dataset (recorded from around 100 persons) and the practice dataset (recorded from 5 persons). Participants can download those data and use them to develop their solutions offline. During the feedback phase, participants can upload their solutions to the platform to receive immediate performance feedback for validation. The data used in this phase are from another 20 recorders which will be hide from the participants. Then in the check phase, the participants can submit their code only once to make sure their code works properly on the platform. (Note: Using other open source data, such as dataset from http://www.openslr.org/resources.php is also allowed. Only remember to indicate the data source in your submission.) The platform will indicate success or failure to the users but detailed logs will be hide. Lastly, in the final phase, participants' solutions will be evaluated on a private dataset (recorded from another 40 persons). Once the participants submit their code, the platform will run their algorithm automatically to test against each recorder data with its own time budget. The final ranking will be generated based on the scores calculated in this phase.
The datasets in all phases maintain the same structure. And the platform also exploit the same evaluation logic in all phases. The evaluation task is constrained by the time budget. In each task, after initialization, the platform will call the 'enrollment' function, which runs for 5 minutes. Then the platform will call the 'test' function to predict the label of each test audio. For each speaker, the test process is constrained by the time budget, which is calculate by real time factor (RTF) and the total duration of test audio. When the time budget of 'enrollment' of 'test' runs up, the platform will automatically terminate the processes, and samples that still wait to be predicted will be counted as errors automatically.
For each dataset, we calculate the wake up score_i from the miss rate (MR) and the false alarm rate (FAR) in the following form:
scorei=MR + alpha × FAR
where the alpha constant is a penalty coefficient.
And the final score will take in to consideration of all scorei
About real time factor(RTF), denoted as Fr:
Fr for each dataset is calculated as follows,
Fr = Tprocess / Tdata
where Tprocess is the total processing time of all test data, and Tdata is the total duration of the test audio. Notice that Tprocess only includes the inference time for each test audio since the enrollment and initialization process has already been completed.
No, they can make entries that show on the leaderboard for test purposes and to stimulate participation, but they are excluded from winning prizes.
No, except accepting the TERMS AND CONDITIONS.
No, you can join the challenge until one week before the end of feedback phase. After that, we will require real personal identification (notified by organizers) to avoid duplicate accounts.
You can get the download link of "train datasets" and "practice datasets" only after your registrition has been approved by the committee.
To make a valid challenge entry, click the blue button on the upper right side "Upload a Submission". This will ensure that you submit on all 5 datasets of the challenge simultaneously. You may also make a submission on a single dataset for debug purposes, but it will not count towards the final ranking.
We provide a Starting Kit with step-by-step instructions in "README.md". Also additional information can be found on "Quick Start" page.
Yes, a $4000 prize pool.
1st place | 2nd place | 3rd place | |
Prize | $2000 | $1500 | $500 |
Yes, participation is by code submission.
No. You just grant to the ORGANIZERS a license to use your code for evaluation purposes during the challenge. You retain all other rights.
Yes, we will provide the fact sheet in a suitable time.
We are running your submissions on Google Cloud NVIDIA Tesla P100 GPUs. In non peak times we are planning to use 10 workers, each of which will have one NVIDIA Tesla P100 GPU (running CUDA 10 with drivers cuDNN 7.5) and 4 vCPUs, with 26 GB of memory, 100 GB disk.
The PARTICIPANTS will be informed if the computational resources increase. They will NOT decrease.
YES.
YES. The ranking of participants will be made from a final blind test made by evaluating a SINGLE SUBMISSION made on the final test submission site. The submission will be evaluated on five new test datasets in a completely "blind testing" manner. The final test ranking will determine the winners.
Wall time.
In principle no more than its time budget. We kill the process if the time budget is exceeded. Submissions are queued and run on a first time first serve basis. We are using several identical servers. Contact us if your submission is stuck for more than 24 hours. Check on the leaderboard the execution time.
Three per day, but up to a total computational time of 500 minutes (submissions taking longer will be aborted). This may be subject to change, according to the number of participants. Please respect other users. It is forbidden to register under multiple user IDs to gain an advantage and make more submissions. Violators will be DISQUALIFIED FROM THE CONTEST.
No. Please contact us if you think the failure is due to the platform rather than to your code and we will try to resolve the problem promptly.
This should be avoided. In the case where any of three processes (initialize.sh, enrollment.sh, predict.sh) exceeds its time budget, the submission handling process will be killed. And predictions made so far (with their corresponding timestamps) will be used for evaluation. In the other case where a submission exceeds the total compute time per day, all running tasks will be killed by AutoML.ai and the status will be marked 'Failed' and a score of -1.0 will be produced.
No, sorry, not for this challenge.
Yes. Any Linux executable can run on the system, provided that it fulfills our Python interface and you bundle all necessary libraries with your submission.
janson91/autokws2021:gpu
When you submit code to AutoML.ai, your code is executed inside a Docker container. This environment can be exactly reproduced on your local machine by downloading the corresponding docker image. The docker environment of the challenge contains Kaldi, Anaconda libraries, TensorFlow, and PyTorch (among other things).
Your last submission is shown automatically on the leaderboard. You cannot choose which submission to select. If you want another submission than the last one you submitted to "count" and be displayed on the leaderboard, you need to re-submit it.
No. If you accidentally register multiple times or have multiple accounts from members of the same team, please notify the ORGANIZERS. Teams or solo PARTICIPANTS with multiple accounts will be disqualified.
We have disabled AutoML.ai team registration. To join as a team, just share one account with your team. The team leader is responsible for making submissions and observing the rules.
You cannot. If you need to destroy your team, contact us.
It is up to you and the team leader to make arrangements. However, you cannot participate in multiple teams.
No. If we discover that you are trying to cheat in this way you will be disqualified. All your actions are logged and your code will be examined if you win.
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". UPSUD, CHALEARN, IDF, AND/OR OTHER ORGANIZERS AND SPONSORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. In case of dispute or possible exclusion/disqualification from the competition, the PARTICIPANTS agree not to take immediate legal action against the ORGANIZERS or SPONSORS. Decisions can be appealed by submitting a letter to the CHALEARN president, and disputes will be resolved by the CHALEARN board of directors. See contact information.
For questions of general interest, THE PARTICIPANTS should post their questions to the forum.
Other questions should be directed to the organizers.
Start: Feb. 21, 2021, midnight
Description: Please make submissions by clicking on following 'Submit' button. Then you can view the submission results of your algorithm on each dataset in corresponding tab (Dataset 1, Dataset 2, etc).
Color | Label | Description | Start |
---|---|---|---|
Dataset 1 | None | Feb. 21, 2021, midnight |
April 2, 2021, 11 p.m.
You must be logged in to participate in competitions.
Sign In