InterSpeech 2020 will be held in Shanghai, China from September 14 to 18, 2020. AutoSpeech 2020 is one of the competitions in main conference provided by 4Paradigm, ChaLearn, Southern University of Science and Technology, Northwestern Polytechnical University and Google.
Speech topics including (but are not limited):
This special session follows the same submission policy of INTERSPEECH 2020.
(Regarding competition dates, please read following sections for details.)
1st Prize: 2000 USD
2nd Prize: 1500 USD
3rd Prize: 500 USD
Please contact the organizers if you have any problem concerning this challenge.
- Wei-Wei Tu, 4Pardigm Inc., China and ChaLearn, USA
- Tom Ko, Southern University of Science and Technology, China
- Lei Xie, Northwestern Polytechnical University Xian, China
- Hugo Jair Escalante, IANOE, Mexico and ChaLearn, USA
- Isabelle Guyon, Université Paris-Saclay, France, ChaLearn, USA
- Qiang Yang, Hong Kong University of Science and Technology, Hong Kong, China
- Xiawei Guo, 4Paradigm Inc., China
- Shouxiang Liu, 4Paradigm Inc., China
- Jingsong Wang, 4Paradigm Inc., China
- Zhen Xu, 4Paradigm Inc., China
Previous AutoML Challenges:
Founded in early 2015, 4Paradigm is one of the world’s leading AI technology and service providers for industrial applications. 4Paradigm’s flagship product – the AI Prophet – is an AI development platform that enables enterprises to effortlessly build their own AI applications, and thereby significantly increase their operation’s efficiency. Using the AI Prophet, a company can develop a data-driven “AI Core System”, which could be largely regarded as a second core system next to the traditional transaction-oriented Core Banking System (IBM Mainframe) often found in banks. Beyond this, 4Paradigm has also successfully developed more than 100 AI solutions for use in various settings such as finance, telecommunication and internet applications. These solutions include, but are not limited to, smart pricing, real-time anti-fraud systems, precision marketing, personalized recommendation and more. And while it is clear that 4Paradigm can completely set up a new paradigm that an organization uses its data, its scope of services does not stop there. 4Paradigm uses state-of-the-art machine learning technologies and practical experiences to bring together a team of experts ranging from scientists to architects. This team has successfully built China’s largest machine learning system and the world’s first commercial deep learning system. However, 4Paradigm’s success does not stop there. With its core team pioneering the research of “Transfer Learning,” 4Paradigm takes the lead in this area, and as a result, has drawn great attention of worldwide tech giants.
ChaLearn is a non-profit organization with vast experience in the organization of academic challenges. ChaLearn is interested in all aspects of challenge organization, including data gathering procedures, evaluation protocols, novel challenge scenarios (e.g., competitions), training for challenge organizers, challenge analytics, result dissemination and, ultimately, advancing the state-of-the-art through challenges.
Google was founded in 1998 by Sergey Brin and Larry Page that is a subsidiary of the holding company Alphabet Inc. More than 70 percent of worldwide online search requests are handled by Google, placing it at the heart of most Internet users’ experience. Its headquarters are in Mountain View, California. Google began as an online search firm, but it now offers more than 50 Internet services and products, from e-mail and online document creation to software for mobile phones and tablet computers. It is considered one of the Big Four technology companies, alongside Amazon, Apple and Facebook.
Baidu Wangpan: https://pan.baidu.com/s/1JSLyEjUIRaaJcWOplLtmtQ passwrod: hcnm
GoogleDrive: (public datasets) https://drive.google.com/drive/folders/1O1HbjMhugwrvcbqK59wOffnppNioNeMQ
Google Drive: (starting kit) https://drive.google.com/open?id=1dbIFKuXIg8oXfYD8v0_Kw8w-WDlvmtoz
This is a challenge with code submission. We provide one baseline above for test purposes.
To make a test submission, download the starting kit and follow the readme.md file instruction. click on the blue button "Upload a Submission" in the upper right corner of the page and re-upload it. You must click first the orange tab "Feedback Phase" if you want to make a submission simultaneously on all datasets and get ranked in the challenge. You may also submit on a single dataset at a time (for debug purposes). To check progress on your submissions goes to the "My Submissions" tab. Your best submission is shown on the leaderboard visible under the "Results" tab.
The starting kit contains everything you need to create your own code submission (just by modifying the file model.py) and to test it on your local computer, with the same handling programs and Docker image as those of the Codalab platform (but the hardware environment is in general different).
The starting kit contains toy sample data. Besides that, five practice datasets are also provided so that you can develop your AutoSpeech solutions offline. These five practice datasets can be downloaded from the link at the beginning.
Note that the version of cuda in this docker is 10, if the cuda version on your own PCs is less than 10, it may cause you to be unable to use the GPU in this docker.
You can test your code in the exact same environment as the Codalab environment using docker. You are able to run the ingestion program (to produce predictions) and the scoring program (to evaluate your predictions) on toy sample data.
1. If you are new to docker, install docker from https://docs.docker.com/get-started/.
2. At the shell, change to the starting-kit directory, run
(CPU) docker run -it -v "$(pwd):/app/codalab" nehzux/autospeech:gpu
(GPU) docker run --gpus '"device=0"' -it -v "$(pwd):/app/codalab" nehzux/autospeech:gpu
3. Now you are in the bash of the docker container, run the local test program
python run_local_test.py -dataset_dir=path_to_dataset -code_dir=path_to_model_file
It runs ingestion and scoring program simultaneously, and the predictions and scoring results are in sample_result_submissions and scoring_output directory.
The interface is simple and generic: you must supply a Python model.py, where a Model class is defined with:
The python version on our platform is 3.6.8. Below we define the interface of Model class in detail.
__init__(self, meta_data):
train(self, training_data, remaining_time_budget):
test(self, test_data, remaining_time_budget):
To make submissions, zip model.py and its dependency files (without the directory), then use the "Upload a Submission" button. Please note that you must click first the orange tab "On-line Phase" if you want to make a submission simultaneously on all datasets and get ranked in the challenge. You may also submit on a single dataset at a time (for debug purposes). Besides that, the ranking in the public leaderboard is determined by the LAST code submission of the participants.
In the starting-kit, we provide a docker that simulates the running environment of our challenge platform. Participants can check the python version and installed python packages with the following commands:
python --version
pip list
On our platform, for each submission, the allocated computational resources are:
Content file All the datasets consist of audio file, label file and meta file, where audio file and label file are split into train parts and test parts: Audio file ({train,test}.pkl) contains the samples of the audios, which format is a list of vectors.
Example:
[
[-1.2207031e-04, 3.0517578e-05, -1.5258789e-04, ..., -8.8500977e-04, -8.5449219e-04, -1.3732910e-03]),
[ 9.1552734e-05, 7.0190430e-04, 1.0375977e-03, ..., -7.6293945e-04, 2.7465820e-04, 1.0375977e-03]),
[ 1.8920898e-03, 1.6784668e-03, 1.4648438e-03, ..., 3.0517578e-05, -2.7465820e-04, -3.0517578e-04]),
[0.02307129, 0.02386475, 0.02462769, ..., 0.02420044, 0.02410889, 0.02429199]),
[ 6.1035156e-05, 1.2207031e-04, 4.5776367e-04, ..., -1.2207031e-04, -6.1035156e-04, -3.6621094e-04]),
[0.03787231, 0.03686523, 0.03723145, ..., 0.03497314, 0.03594971, 0.0350647 ]),
...,
]
Label file ({train, dataset_name}.solution) consists of the labels of the instances in one-hot format. Note that each of its lines corresponds to the corresponding line number in the content file.
Example:
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Meta file (meta.json) is a json file consisted of the meta information about the dataset. Descriptions of the keys
in meta file:
class_num : number of classes in the dataset
train_num : the number of training instances
test_num : the number of test instances
time_budget : the time budget of the dataset, 1800s for all the datasets
Example:
{
"class_num": 10,
"train_num": 428,
"test_num": 107,
"time_budget": 1800
}
- V. Panayotov, G. Chen, D. Povey and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, 2015, pp. 5206-5210.
- Weinberger, Steven. (2015). Speech Accent Archive. George Mason University. Retrieved from http://accent.gmu.edu
- http://www.expressive-speech.net/, Berlin emotional speech database
- CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages https://arxiv.org/abs/1903.11269
- D. Ellis (2007). Classifying Music Audio with Timbral and Chroma Features,Proc. Int. Conf. on Music Information Retrieval ISMIR-07, Vienna, Austria, Sep. 2007.
This challenge has three phases. The participants are provided with five practice datasets which can be downloaded, so that they can develop their AutoSpeech solutions offline. Then, the code will be uploaded to the platform and participants will receive immediate feedback on the performance of their method at another five validation datasets. After Feedback Phase terminates, we will have another Check Phase, where participants are allowed to submit their code only once on private datasets in order to debug. Participants won't be able to read detailed logs but they are able to see whether their code report errors. Last, in the Final Phase, Participants' solutions will be evaluated on five test datasets. The ranking in the final phase will count towards determining the winners.
Code submitted is trained and tested automatically, without any human intervention. Code submitted on Feedback (resp. Final) Phase is run on all five feedback (resp. final) datasets in parallel on separate compute workers, each one with its own time budget.
The identities of the datasets used for testing on the platform are concealed. The data are provided in a raw form (no feature extraction) to encourage researchers to use Deep Learning methods performing automatic feature learning, although this is NOT a requirement. All problems are multi-label classification problems. The tasks are constrained by the time budget.
It is the responsibility of the participants to make sure that neither the "train" nor the "test" methods exceed the “remaining_time_budget”. The method “train” can choose to manage its time budget such that it trains in varying time increments. Note that, the model will be initialized only one time during the submission process, so the participants can control the model behavior at each train step by its member variables. There is a pressure that it does not use all "overall_time_budget" at the first iteration because we use the area under the learning curve as the metric.
For each dataset, we compute the Area under Learning Curve (ALC). The learning curve is drawn as follows:
After we compute the ALC for all datasets, the overall ranking is used as the final score for evaluation and will be used in the leaderboard. It is computed by averaging the ranks (among all participants) of ALC obtained on the datasets.
Examples of learning curves:
No, they can make entries that show on the leaderboard for test purposes and to stimulate participation, but they are excluded from winning prizes.
No, except accepting the TERMS AND CONDITIONS.
No, you can join the challenge until one week before the end of feedback phase. After that, we will require real personal identification (notified by organizers) to avoid duplicate accounts.
You can download "practice datasets" only from the Instructions page. The data on which your code is evaluated cannot be downloaded, it will be visible to your code only, on the Codalab platform.
To make a valid challenge entry, click the blue button on the upper right side "Upload a Submission". This will ensure that you submit on all 5 datasets of the challenge simultaneously. You may also make a submission on a single dataset for debug purposes, but it will not count towards the final ranking.
We provide a Starting Kit in Python with step-by-step instructions in "README.md".
Yes, a $4000 prize pool.
1st place | 2nd place | 3rd place | |
Prize | $2000 | $1500 | $500 |
Yes, participation is by code submission.
No. You just grant to the ORGANIZERS a license to use your code for evaluation purposes during the challenge. You retain all other rights.
Yes, we will provide the fact sheet in a suitable time.
We are running your submissions on Google Cloud NVIDIA Tesla P100 GPUs. In non peak times we are planning to use 10 workers, each of which will have one NVIDIA Tesla P100 GPU (running CUDA 10 with drivers cuDNN 7.5) and 4 vCPUs, with 26 GB of memory, 100 GB disk.
The PARTICIPANTS will be informed if the computational resources increase. They will NOT decrease.
This is not explicitly forbidden, but it is discouraged. We prefer if all calculations are performed on the server. If you submit a pre-trained model, you will have to disclose it in the fact sheets.
YES. The ranking of participants will be made from a final blind test made by evaluating a SINGLE SUBMISSION made on the final test submission site. The submission will be evaluated on five new test datasets in a completely "blind testing" manner. The final test ranking will determine the winners.
20 min is granted for initialization. Each execution must run in less than 30 minutes (1800 seconds) for each dataset. Your cumulative time is limited to 500 minutes per day in total.
Wall time.
In principle no more than its time budget. We kill the process if the time budget is exceeded. Submissions are queued and run on a first time first serve basis. We are using several identical servers. Contact us if your submission is stuck for more than 24 hours. Check on the leaderboard the execution time.
Two per day, but up to a total computational time of 500 minutes (submissions taking longer will be aborted). This may be subject to change, according to the number of participants. Please respect other users. It is forbidden to register under multiple user IDs to gain an advantage and make more submissions. Violators will be DISQUALIFIED FROM THE CONTEST.
No. Please contact us if you think the failure is due to the platform rather than to your code and we will try to resolve the problem promptly.
This should be avoided. In the case where a submission exceeds 30 minutes of time budget for a particular task (dataset), the submission handling process (ingestion program in particular) will be killed when time budget is used up and predictions made so far (with their corresponding timestamps) will be used for evaluation. In the other case where a submission exceeds the total compute time per day, all running tasks will be killed by CodaLab and the status will be marked 'Failed' and a score of -1.0 will be produced.
No, sorry, not for this challenge.
Any time learning with balanced accuracy is used. For more details, go to 'Get Started' -> 'Evaluation' -> 'Metrics' section.
The code was tested under Python 3.6.8. We are running Python 3.6.8 on the server and the same libraries are available.
Yes. Any Linux executable can run on the system, provided that it fulfills our Python interface and you bundle all necessary libraries with your submission.
No.
nehzux/autospeech:gpu
When you submit code to Codalab, your code is executed inside a Docker container. This environment can be exactly reproduced on your local machine by downloading the corresponding docker image. The docker environment of the challenge contains Anaconda libraries, TensorFlow, and PyTorch (among other things).
Your last submission is shown automatically on the leaderboard. You cannot choose which submission to select. If you want another submission than the last one you submitted to "count" and be displayed on the leaderboard, you need to re-submit it.
No. If you accidentally register multiple times or have multiple accounts from members of the same team, please notify the ORGANIZERS. Teams or solo PARTICIPANTS with multiple accounts will be disqualified.
We have disabled Codalab team registration. To join as a team, just share one account with your team. The team leader is responsible for making submissions and observing the rules.
You cannot. If you need to destroy your team, contact us.
It is up to you and the team leader to make arrangements. However, you cannot participate in multiple teams.
No. If we discover that you are trying to cheat in this way you will be disqualified. All your actions are logged and your code will be examined if you win.
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". UPSUD, CHALEARN, IDF, AND/OR OTHER ORGANIZERS AND SPONSORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. In case of dispute or possible exclusion/disqualification from the competition, the PARTICIPANTS agree not to take immediate legal action against the ORGANIZERS or SPONSORS. Decisions can be appealed by submitting a letter to the CHALEARN president, and disputes will be resolved by the CHALEARN board of directors. See contact information.
For questions of general interest, THE PARTICIPANTS should post their questions to the forum.
Other questions should be directed to the organizers.
Start: March 11, 2020, 3:59 p.m.
Description: Please make submissions by clicking on following 'Submit' button. Then you can view the submission results of your algorithm on each dataset in corresponding tab (Dataset 1, Dataset 2, etc).
Color | Label | Description | Start |
---|---|---|---|
Dataset 1 | None | March 11, 2020, 3:59 p.m. | |
Dataset 2 | None | March 11, 2020, 3:59 p.m. | |
Dataset 3 | None | March 11, 2020, 3:59 p.m. | |
Dataset 4 | None | March 11, 2020, 3:59 p.m. | |
Dataset 5 | None | March 11, 2020, 3:59 p.m. |
May 15, 2020, 3:59 p.m.
You must be logged in to participate in competitions.
Sign In