7 hours, 18 minutes to go
  • 9 Participants
  • 13 Submissions
  • Competition Ends: April 15, 2021, noon
  • Server Time: 4:41 a.m. UTC


Recently, voice wake-up has appeared in people everyday life more frequently through smart speakers and in-vehicle devices. Personalized voice wake-up, including customized wake-up word detection and specific voiceprint verification, also gains more attention. In personalized voice wake-up scenario, there are still many topics worth exploring. Some topics include, how the speech models can better adapt to different wake-up words, and how the models can jointly optimize the two tasks, the wake-up word detection and speaker voiceprint recognition. Also, given that automatic machine learning (AutoML), meta-learning and other methods in artificial intelligence field has already received successful results in speech recognition tasks, whether those methods can be used to improve the personalized wake-up scenario is another problem worth exploring.

In order to promote technological development and narrow the gap between academic research and practical applications, 4Paradigm, together with National Taiwan University, Northwestern Polytechnical University, and Southern University of Science and Technology, and ChaLearn will organize this Auto-KWS Challenge and Special session at INTERSPEECH 2021 conference). In this challenge we will release this multilingual dataset(dialect included) for Personalized Keyword Spotting (Auto-KWS) , which closely resembles the real world scenarios, as each recorder is assigned with an unique wake-up word and can choose their recording environment and familiar dialect freely. In addition, the competition will test and evaluate the participants' algorithms through AutoML.ai competition platform. Participants will submit code and pre-trained models to conduct algorithm evaluation under unified platform resources. After the competition, the dataset will be released on AutoML.ai platform as an open benchmark available for research, to further boost ideas exchange and discussions in this area.


The challenge website is: https://www.automl.ai/competitions/11.

Please sign up on AutoML.ai platform and register the competition by the following entrance: https://docs.google.com/forms/d/1Qs7erVOgNW8KFX2G-0nQlsOqYi62hbD4MFSC10PSYIM/edit. Once the registration is approved, you will receive a link to the training datasets from your registered email.

Note: Participants are only allow to submit code via one AutoML.ai account. We will manual check participants' identity and disqualify the duplicated and inauthentic accounts.

Important Dates

  • Feb 5th: Release of training data and practice data
  • Feb 10th: Release of baseline
  • Feb 26th: Feedback phase starts
  • Mar 26th: Feedback phase ends, private phase starts
  • Mar 26th: Paper submission deadline
  • Mar 27th: Check phase starts
  • Apr 1st: Check phase ends, final phase starts
  • Apr 5th: Final phase ends
  • Jun 2nd: Paper acceptance/rejection notification
  • Aug 31st: Opening of INTERSPEECH 2021

Auto-KWS 2021 Challenge

In the last decade, machine learning (ML) and deep learning (DL) has achieved remarkable success in speech-related tasks, e.g., speaker verification (SV), automatic speech recognition(ASR) and keyword spotting (KWS). However, in practice, it is very difficult to get proper performance without expertise of machine learning and speech processing. Automated Machine Learning (AutoML) is proposed to explore automatic pipeline to train effective models given a specific task requirement without any human intervention. Moreover, some methods belonging to AutoML, such as Automated Deep Learning (AutoDL) and meta-learning have been used in KWS and SV tasks respectively. A series of AutoML competitions, e.g., automated natural language processing (AutoNLP) and Automated computer vision (AutoCV), have been organized by 4Paradigm, Inc. and ChaLearn (sponsored by Google). These competitions have drawn a lot of attention from both academic researchers and industrial practitioners.

Keyword spotting, usually as the entrance of smart device terminals, such as mobile phone, smart speakers, or other intelligent terminals, has received a lot of attention in both academia and industry. Meanwhile, out of consideration of fun and security, the personalized wake-up mode has more application scenarios and requirements. Conventionally, the solution pipeline is combined of KWS and text dependent speaker verification (TDSV) system, and in which case, two systems are optimized separately. On the other hand, there are always few data belonging to the target speaker, so both of KWS and speaker verification(SV) in that case can be considered as low resource tasks.

In this challenge, we propose the automated machine learning for Personalized Keyword Spotting (Auto-KWS) which aims at proposing automated solutions for personalized keyword spotting tasks. Basically, there are several specific questions that can be further participants explored, including but not limited to:
• How to automatically handle multilingual, multi accent or various keywords?
• How to make better use of additional tagged corpus automatically?
• How to integrate keyword spotting task and speaker verification task?
• How to jointly optimize personalized keyword spotting with speaker verification?
• How to design multi-task learning for personalized keyword spotting with speaker verification?
• How to automatically design effective neural network structures?
• How to reasonably use meta-learning, few-shot learning, or other AutoML technologies in this task?

Additionally, participants should also consider:
• How to automatically and efficiently select appropriate machine learning model and hyper-parameters?
• How to make the solution more generic, i.e., how to make it applicable for unseen tasks?
• How to keep the computational and memory cost acceptable?

We have already organized two successful automated speech classification challenge AutoSpeech1 in ACML2019 and AutoSpeech2020 in INTERSPEECH2020, which are the first two challenges that combine AutoML and speech tasks. This time, our challenge Auto-KWS will focus on personalized keyword spotting tasks for the first time, and the released database will also serve as a benchmark for researches in this filed and boost the idea exchanging and discussion in this area. 


1st Prize: 2000 USD

2nd Prize: 1500 USD

3rd Prize: 500 USD



Hung-Yi Lee, College of Electrical Engineering and Computer Science National Taiwan University

Lei Xie, Audio, Speech and Language Processing Lab (NPU-ASLP), Northwestern Polytechnical University

Tom Ko, Southern University of Science and Technology

Wei-Wei Tu, 4Pardigm Inc.

Isabelle Guyon, Universte Paris-Saclay, ChaLearn

Qiang Yang, Hong Kong University of Science and Technology

Committee (alphabetical order)

Chunyu Zhao, 4Paradigm Inc.

Jie Chen, 4Paradigm Inc.

Jingsong Wang, 4Paradigm Inc.

Qijie Shao, NPU-ASLP

Shouxiang Liu, 4Paradigm Inc.

Xiawei Guo, 4Paradigm Inc.

Xiong Wang, NPU-ASLP

Yuxuan He, 4Paradigm Inc.

Zhen Xu, 4Paradigm Inc.


Organization Institutes




Please contact the organizers if you have any problem concerning this challenge.


About AutoML 

Previous AutoML Challenges:

First AutoML Challenge



- AutoML@PAKDD2019

- AutoML@KDDCUP2019










About 4Paradigm Inc.

Founded in early 2015, 4Paradigm is one of the world’s leading AI technology and service providers for industrial applications. 4Paradigm’s flagship product – the AI Prophet – is an AI development platform that enables enterprises to effortlessly build their own AI applications, and thereby significantly increase their operation’s efficiency. Using the AI Prophet, a company can develop a data-driven “AI Core System”, which could be largely regarded as a second core system next to the traditional transaction-oriented Core Banking System (IBM Mainframe) often found in banks. Beyond this, 4Paradigm has also successfully developed more than 100 AI solutions for use in various settings such as finance, telecommunication and internet applications. These solutions include, but are not limited to, smart pricing, real-time anti-fraud systems, precision marketing, personalized recommendation and more. And while it is clear that 4Paradigm can completely set up a new paradigm that an organization uses its data, its scope of services does not stop there. 4Paradigm uses state-of-the-art machine learning technologies and practical experiences to bring together a team of experts ranging from scientists to architects. This team has successfully built China’s largest machine learning system and the world’s first commercial deep learning system. However, 4Paradigm’s success does not stop there. With its core team pioneering the research of “Transfer Learning,” 4Paradigm takes the lead in this area, and as a result, has drawn great attention of worldwide tech giants. 

About ChaLearn

ChaLearn is a non-profit organization with vast experience in the organization of academic challenges. ChaLearn is interested in all aspects of challenge organization, including data gathering procedures, evaluation protocols, novel challenge scenarios (e.g., competitions), training for challenge organizers, challenge analytics, result dissemination and, ultimately, advancing the state-of-the-art through challenges.

Quick start

Train dataset, Practice dataset and 2 Baselines have been sent to participants register email.

Starting kit and baselines can also be found here: https://github.com/janson9192/autokws2021

Since this is a challenge with code submission, we provide two baselines for testing purposes.

To make a test submission, download the starting kit and follow the readme.md file instruction. Click on the blue button "Upload a Submission" in the upper right corner of the page and re-upload the starting kit. Attention: You must first click the orange tab "Feedback Phase" if you want to get ranked on all datasets for your submission. You will not get ranked if you submit under the green tab "Dataset1". “Dataset1” contains the feedback dataset recorded from 20 speakers. To check progress on your submissions, please click "My Submissions" tab. To view your submission score, please click the "Results" tab. Notice that the detailed score for each speakers data could be found under "learning curve" button.

Starting kit

The starting kit contains everything you need to create your own code submission. Please follw the readme file and simply modify initialize.sh, enrollment.sh and predict.sh file and test it on your local computer with the same handling programs and docker image as those of the Automl.ai platform. 

You could download starting kit via the following link: https://github.com/janson9192/autokws2021

Note that the version of cuda in this docker is 10, if the cuda version on your own PCs is less than 10, you may fail to use the GPU in this docker.

Local development and testing:

You can test your code in the exact same environment as the AutoML.ai environment using docker. You can run the ingestion program (to produce predictions) and the scoring program (to evaluate your predictions) on sample data.

1. If you are new to docker, install docker from https://docs.docker.com/get-started/.

2. In the shell, change to the starting-kit directory, run

  docker run --gpus '"device=0"' -it -v "$(pwd):/app/auto-kws" janson91/autokws2021:gpu /bin/bash

3. Now you are in the bash of the docker container, run the local test program

  python run_local_test.py --dataset_dir=`pwd`/sample_data/practice --solution_dir=`pwd`/sample_data/practice_solution --code_dir=./code_submission

It runs ingestion and scoring program simultaneously, and the predictions and scoring results are in sample_result_submissions and scoring_output directory. 

If "docker pull" takes too long, you can just download docker from:

Google Drive: https://drive.google.com/file/d/18BjHpY_Hy9-Wd0eAVvTOOvlj4ok7vH41/view?usp=sharing

Baidu Yun:https://pan.baidu.com/s/10aSR6Rm_57hOF5PC-mISqg  password: 9wyu

and run command "docker load -i autokws2021docker.tar"



The interface is simple and generic: you must supply these three file: initialize.sh, enrollment.sh and predict.sh

To make submissions, cd submission_dir and zip -r submission.zip * (without the directory), then use the "Upload a Submission" button. Please note that you must click first the orange tab "On-line Phase" if you want to make a submission simultaneously on all datasets and get ranked in the challenge. Besides that, the ranking in the public leaderboard is determined by the LAST code submission of the participants.

Computational limitations

  • A submission is limited to: 30 mins of initialization, 5 mins of enrollment, and 1 mins + 0.25 * total_test_duration. 
  • We currently limit participants to 500 minutes of computing time per day (subject to change, depending on the number of participants).
  • Participants are limited to 3 submissions per day per dataset.

Running Environment

In the starting-kit, we provide a docker that simulates the running environment of our challenge platform. And a conda environment(py36) has been created in docker. Participants can check the python version and installed python packages with the following commands:

 python --version

 pip list

 source activate py36


On our platform, for each submission, the allocated computational resources are:

  • CPU: 4 Cores
  • GPU: a NVIDIA Tesla P100 GPU (running CUDA 10 with drivers cuDNN 7.5) 
  • Memory: 26 GB
  • Disk: 100 GB

Challenge Rules

  • General Terms: This challenge is governed by the General ChaLearn Contest Rule Terms, the Codalab Terms and Conditions, and the specific rules set forth.
  • Announcements: To receive announcements and be informed of any change in rules, the participants must provide a valid email.
  • Conditions of participation: Participation requires complying with the rules of the challenge. Prize eligibility is restricted by Chinese government export regulations, see the General ChaLearn Contest Rule Terms. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the challenge design giving him (or her) an unfair advantage, are excluded from participation. A disqualified person may submit one or several entries in the challenge and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that ChaLearn and the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.
  • Dissemination: The challenge is orgnized in conjunction with the INTERSPEECH 2021 conference. 
  • Registration: The participants must register to AutoML.ai and provide a valid email address. Team participants and solo competitors must register only once and provide one email address. Teams or solo participants registering multiple times to gain an advantage in the competition will be disqualified. One participant can only be registered in one team. Note that you can join the challenge until one week before the end of feedback phase. Real personal identifications will be required (notified by organizers) at the end of feedback phase to avoid duplicate accounts and award claim.
  • Anonymity: The participants who do not present their results at the challenge session can elect to remain anonymous by using a pseudonym. Their results will be published on the leaderboard under that pseudonym, and their real name will remain confidential. However, the participants must disclose their real identity to the organizers to join check phase and final phase and to claim any prize they might win. See our privacy policy for details.
  • Submission method: The results must be submitted through this AutoML.ai competition site. The number of submissions per day is 3 and maximum total computational time per day is 500 minutes. Using multiple accounts to increase the number of submissions in NOT permitted. In case of problem, send email to autospeech2021@4paradigm.com. The entries must be formatted as specified on the Instructions page.
  • Prizes: The top 10 ranking participants in the Final Phase may be qualified for prizes. To compete for prizes, the participants must make a valid submission on the Final Phase website (TBA), and fill out a fact sheet briefly describing their methods before the announcement of the final winners. There is no other publication requirement. The winners will be required to make their code publicly available under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license, if they accept their prize, within a week of the deadline for submitting the final results. Entries exceeding the time budget will not qualify for prizes. In case of a tie, the prize will go to the participant who submitted his/her entry first. Non winners or entrants who decline their prize retain all their rights on their entries and are not obliged to publicly release their code.
  • Cheating: We forbid people during the development phase to attempt to get a hold of the solution labels on the server (though this may be technically feasible). For the final phase, the evaluation method will make it impossible to cheat in this way. Generally, participants caught cheating will be disqualified.

All data are recorded by near-field mobile phones, (located in front of the speakers at around 0.2m distance). Each sample is recorded in single channel, 16-bit streams at a 16kHz sampling rate. There are 4 datasets: training dataset, practice dataset, feedback dataset, and private dataset. Training dataset, recorded from around 100 recorders, is used for participants to develop Auto-KWS solutions. Practice dataset contains 5 speakers data, each with 5 enrollment audio data and seveal test audio. Practice dataset together with the downloadable docker provides an example of how platform would call the participants' code. Both Training and practice dataset can be downloaded for local debugging. The feedback dataset and private dataset have the same format of practice dataset and are used for final evaluation and thus will be hide from participants. Participants final solution will be evaluated by the platform using those two datasets during feedback phase and private phase respectivaly without any human intervention.

Here is a summary of 4 datasets. (The specific number may be slightly adjusted when the challenge is officially launched.)


 Speaker Num


 Keywords/enrollment num

 Training Dataset  100  Before feedback phase  10
 Practice Dataset  5  Before feedback phase  5 
 Feedback Dataset  20  Feedback phase  5
 Private Datasets  40  Final phase  5

Here is the structor of the training dataset and practice dataset:
practice dataset
├── P001
│   ├── enrollment
│   │   ├── 001.wav
│   │   └── ......
│   └── test
│   ├── 001.wav
│   └── ......
├── P002
├── ......
└── test_label.txt

training dataset
├── T001
│   ├── keywords
│   │   ├── 001.wav
│   │   └── ......
│   └── others
│   ├── 002.wav
│   └── ......
├── T002
└── ......

Competition protocol

This challenge has three phases: Feedback Phase, Check Phase and Final Phase.
Before the feedback phase, the participants are provided with the training dataset (recorded from around 100 persons) and the practice dataset (recorded from 5 persons). Participants can download those data and use them to develop their solutions offline. During the feedback phase, participants can upload their solutions to the platform to receive immediate performance feedback for validation. The data used in this phase are from another 20 recorders which will be hide from the participants. Then in the check phase, the participants can submit their code only once to make sure their code works properly on the platform. (Note: Using other open source data, such as dataset from http://www.openslr.org/resources.php is also allowed. Only remember to indicate the data source in your submission.) The platform will indicate success or failure to the users but detailed logs will be hide. Lastly, in the final phase, participants' solutions will be evaluated on a private dataset (recorded from another 40 persons). Once the participants submit their code, the platform will run their algorithm automatically to test against each recorder data with its own time budget. The final ranking will be generated based on the scores calculated in this phase.

The datasets in all phases maintain the same structure. And the platform also exploit the same evaluation logic in all phases. The evaluation task is constrained by the time budget. In each task, after initialization, the platform will call the 'enrollment' function, which runs for 5 minutes. Then the platform will call the 'test' function to predict the label of each test audio. For each speaker, the test process is constrained by the time budget, which is calculate by real time factor (RTF) and the total duration of test audio. When the time budget of 'enrollment' of 'test' runs up, the platform will automatically terminate the processes, and samples that still wait to be predicted will be counted as errors automatically.



For each dataset, we calculate the wake up score_i from the miss rate (MR) and the false alarm rate (FAR) in the following form:
scorei=MR + alpha × FAR
where the alpha constant is a penalty coefficient.
And the final score will take in to consideration of all scorei

About real time factor(RTF), denoted as Fr:
Fr for each dataset is calculated as follows,
Fr = Tprocess / Tdata
where Tprocess is the total processing time of all test data, and Tdata is the total duration of the test audio. Notice that Tprocess only includes the inference time for each test audio since the enrollment and initialization process has already been completed.



Can organizers compete in the challenge?

No, they can make entries that show on the leaderboard for test purposes and to stimulate participation, but they are excluded from winning prizes.

Are there prerequisites to enter the challenge?

No, except accepting the TERMS AND CONDITIONS.

Can I enter any time?

No, you can join the challenge until one week before the end of feedback phase. After that, we will require real personal identification (notified by organizers) to avoid duplicate accounts.

Where can I download the data?

You can get the download link of "train datasets" and "practice datasets" only after your registrition has been approved by the committee.

How do I make submissions?

To make a valid challenge entry, click the blue button on the upper right side "Upload a Submission". This will ensure that you submit on all 5 datasets of the challenge simultaneously. You may also make a submission on a single dataset for debug purposes, but it will not count towards the final ranking.

Do you provide tips on how to get started?

We provide a Starting Kit with step-by-step instructions in "README.md". Also additional information can be found on "Quick Start" page.

Are there prizes?

Yes, a $4000 prize pool.

  1st place 2nd place 3rd place
Prize $2000 $1500 $500

Do I need to submit code to participate?

Yes, participation is by code submission.

When I submit code, do I surrender all rights to that code to the SPONSORS or ORGANIZERS?

No. You just grant to the ORGANIZERS a license to use your code for evaluation purposes during the challenge. You retain all other rights.

If I win, I must submit a fact sheet, do you have a template?

Yes, we will provide the fact sheet in a suitable time.

What is your CPU/GPU computational configuration?

We are running your submissions on Google Cloud NVIDIA Tesla P100 GPUs. In non peak times we are planning to use 10 workers, each of which will have one NVIDIA Tesla P100 GPU (running CUDA 10 with drivers cuDNN 7.5) and 4 vCPUs, with 26 GB of memory, 100 GB disk.

The PARTICIPANTS will be informed if the computational resources increase. They will NOT decrease.

Can I pre-train a model on my local machine and submit it?


Will there be a final test round on separate datasets?

YES. The ranking of participants will be made from a final blind test made by evaluating a SINGLE SUBMISSION made on the final test submission site. The submission will be evaluated on five new test datasets in a completely "blind testing" manner. The final test ranking will determine the winners.

Does the time budget correspond to wall time or CPU/GPU time?

Wall time.

My submission seems stuck, how long will it run?

In principle no more than its time budget. We kill the process if the time budget is exceeded. Submissions are queued and run on a first time first serve basis. We are using several identical servers. Contact us if your submission is stuck for more than 24 hours. Check on the leaderboard the execution time.

How many submissions can I make?

Three per day, but up to a total computational time of 500 minutes (submissions taking longer will be aborted). This may be subject to change, according to the number of participants. Please respect other users. It is forbidden to register under multiple user IDs to gain an advantage and make more submissions. Violators will be DISQUALIFIED FROM THE CONTEST.

Do my failed submissions count towards my number of submissions per day?

No. Please contact us if you think the failure is due to the platform rather than to your code and we will try to resolve the problem promptly.

What happens if I exceed my time budget?

This should be avoided. In the case where any of three processes (initialize.sh, enrollment.sh, predict.sh) exceeds its time budget, the submission handling process will be killed. And predictions made so far (with their corresponding timestamps) will be used for evaluation. In the other case where a submission exceeds the total compute time per day, all running tasks will be killed by AutoML.ai and the status will be marked 'Failed' and a score of -1.0 will be produced.

The time budget is too small, can you increase it?

No, sorry, not for this challenge.

Can I use something else than Python code?

Yes. Any Linux executable can run on the system, provided that it fulfills our Python interface and you bundle all necessary libraries with your submission.

Which docker are you running on AutoML.ai?


How do I test my code in the same environment that you are using before submitting?

When you submit code to AutoML.ai, your code is executed inside a Docker container. This environment can be exactly reproduced on your local machine by downloading the corresponding docker image. The docker environment of the challenge contains Kaldi, Anaconda libraries, TensorFlow, and PyTorch (among other things).  

What is meant by "Leaderboard modifying disallowed"?

Your last submission is shown automatically on the leaderboard. You cannot choose which submission to select. If you want another submission than the last one you submitted to "count" and be displayed on the leaderboard, you need to re-submit it.

Can I register multiple times?

No. If you accidentally register multiple times or have multiple accounts from members of the same team, please notify the ORGANIZERS. Teams or solo PARTICIPANTS with multiple accounts will be disqualified.

How can I create a team?

We have disabled AutoML.ai team registration. To join as a team, just share one account with your team. The team leader is responsible for making submissions and observing the rules.

How can I destroy a team?

You cannot. If you need to destroy your team, contact us.

Can I join or leave a team?

It is up to you and the team leader to make arrangements. However, you cannot participate in multiple teams.

Can cheat by trying to get a hold of the evaluation data and/or future frames while my code is running?

No. If we discover that you are trying to cheat in this way you will be disqualified. All your actions are logged and your code will be examined if you win.

Can I give an arbitrary hard time to the ORGANIZERS?


Where can I get additional help?

For questions of general interest, THE PARTICIPANTS should post their questions to the forum.

Other questions should be directed to the organizers.

Final Phase

Start: Feb. 21, 2021, midnight

Description: Please make submissions by clicking on following 'Submit' button. Then you can view the submission results of your algorithm on each dataset in corresponding tab (Dataset 1, Dataset 2, etc).


Color Label Description Start
Dataset 1 None Feb. 21, 2021, midnight

Competition Ends

April 15, 2021, noon

You must be logged in to participate in competitions.

Sign In
Top Ten Results Full Results
# Username Average Rank Last Submission Total Compute Time
1 LCCF 1.0000 April 9, 2021, 4:01 p.m. 0:00:00
2 DKU-SMIIP 2.0000 April 9, 2021, 5:36 p.m. 9:53:21
3 baseline1 3.0000 March 3, 2021, 7:39 a.m. 1:57:15
4 sai1999gaurav 3.0000 April 14, 2021, 9:09 a.m. 2:16:46
5 victkid 4.0000 April 9, 2021, 5:36 a.m. 1:51:24
6 yucs 5.0000 April 12, 2021, 5:59 a.m. 0:00:00
7 baseline2 6.0000 March 3, 2021, 7:49 a.m. 3:01:32