0 minutes to go
  • 100 Participants
  • 168 Submissions
  • Competition Ends: July 3, 2021, 3 a.m.
  • Server Time: 11:38 a.m. UTC

IJCAI 2021 AutoRL Competition

Collocated with ICAPS 2021


Dear participants, the Check Phase has been launched at https://www.automl.ai/competitions/16.

Please submit your solution to the new website so that we can check if your solution can be correctly evaluated by the private environments. Please be noted that each participant will only have TWO chances to submit her code.


In a dynamic job shop scheduling problem (DJSSP), limited machine resources are used to produce a collection of jobs, each comprising a sequence of operations that should be processed on a specific set of machines. A job is considered done once the last operation is finished. In addition, stochastic events such as random job arrival, unintended machine breakdown, due time changing, etc., all of which happen frequently in real-world manufacturing, are considered in DJSSPs. In this challenge, participants are invited to design computer programs capable of automatically generating policies(usually in the form of agents) for a collection of DJSSPs. For a given DJSSP, the learned scheduling policy/agent(s) is expected to spatially and temporally allocate machine resources to jobs, with the purpose to finish all the jobs while optimizing some business metrics.

We developed a total of thirteen DJSSP virtual environments (simulators) for this challenge, with which agents are trained and tested. They are based on real-world production lines but are carefully designed to make the problem both challenging and manageable. A uniform interface is shared by all environments, but configurations of machines, jobs, metrics, and events vary among tasks to test the generalization ability of submitted solutions. A practical AutoRL solution should be able to generalize to a wide range of unseen tasks. In this challenge, the participant's solution should be able to automatically train agent(s) for each given DJSSP environment with the purpose to maximize the long-term return, and the evaluation is based on the performance of agents on a collection of different environments.

The competition will be held in three phases: In the Feedback Phase, the participants can upload their solutions and get daily feedback to see how their solution performs on five feedback environments and make improvements. Then, participants need to submit their final solution to Check Phase to test if they can be properly evaluated and each participant is offered one opportunity to make corrections. Lastly, in the final phase, namely the Private Phase, solutions are evaluated by five unseen private environments and are ranked. Participant whose solution has the top average-rank over all the private environments is the winner.


Important Dates

  • Apr. 21st, 2021´╝ÜBeginning of the competition (the Feedback Phase).

  • Jul. 2nd, 2021: End of the Feedback Phase and beginning of the Check Phase.

  • Jul. 9th, 2021: End of the Check Phase and beginning of the Private Phase.

  • Jul. 16th, 2021: End of the Private Phase.

  • Jul. 23rd, 2021: Announcement of the winner.

  • Aug. 21st, 2021: Beginning of IJCAI 2021 conference.


Dynamic Job Shop Scheduling Challenge

Job shop scheduling is the key problem in many real-world manufacturing systems and has been studied for decades. Dynamic job shop scheduling problem (DJSSP), extending the original problem with commonly encountered stochastic events, is recently attracting academic and industrial attention. Various methods have been proposed to solve DJSSP, ranging from classical operations research-based methods and heuristic methods to meta-heuristic algorithms. Reinforcement Learning (RL), an emergent intelligent decision-making method, is also employed to solve DJSSPs. Considering the nature of DJSSP as a sequential decision making problem that can be converted to a Markov Decision Process (MDP), and the recent successful applications of RL techniques in games and real-world problems, we believe that RL is a promising approach to DJSSP and a much larger improvement could be made based on existing works. However, most aforementioned successes of RL depend on domain expertise(usually both in RL and the business) and enormous computing power. Automatic Reinforcement Learning aims to lower the threshold of RL applications, so that more organizations and individuals can benefit. AutoRL tries to train agents with as little human effort as possible. Given an RL problem, an AutoRL method should automatically form the state and actions spaces, generate network architectures (if deep RL methods are employed), select proper training algorithm and hyper-parameters, and finally output agents while taking into consideration of effectiveness and efficiency, in order to train agents with reasonable performances and acceptable costs.

To summarize, the motivations for this DJSSP challenge, are:

  • Dynamic job shop scheduling is an important real-world problem;

  • RL is a promising solution to DJSSP;

  • AutoRL is necessary for the extensive application of RL techniques

To prevail in the proposed challenge, participants should propose automatic solutions that can effectively and efficiently train agents for a given set of DJSSP environments. The solutions should be designed to, with an environment as input, automatically construct the state and action spaces, shape rewards, generate neural network architectures (if deep RL method is used), select RL algorithm and its hyper-parameters, and train agents. Here, we list some specific questions that the participants should consider and answer:

  • How to improve the in-distribution generalization capability of RL approaches?

  • How to design a generic solution that is applicable in unseen DJSSP tasks?

  • How to represent the states and actions in DJSSPs so that learning performance and efficiency can be improved?

  • How to automatically shape the reward to make the learning more efficient?

  • How to automatically generate network architectures for a given task?

  • How to automatically choose the RL algorithm and tune its hyper-parameters?

  • How to improve data efficiency?

  • How to keep the computational and memory cost acceptable?


  • First Place Prize: 5,000 USD

  • Second Place Prize: 2,000 USD

  • Third Place Prize: 1,000 USD

  • Special Prize: Places 4th and 5th: 500 USD


Wei-Wei Tu, 4Pardigm Inc.

Hugo Jair Escalante, Instituto Nacional de Astrofisica, Optica y Electronica, INAOE, Mexico

Isabelle Guyon, Universte Paris-Saclay, ChaLearn

Qiang Yang, Hong Kong University of Science and Technology

Committee (alphabetical order)

Bin Feng, 4Paradigm Inc.

Mengshuo Wang, 4Paradigm Inc.

Tailin Wu, 4Paradigm Inc.

Xiawei Guo, 4Paradigm Inc.

Yuxuan He, 4Paradigm Inc.



Please contact the organizers if you have any problems concerning this challenge.


About AutoML

Previous experience in organizing AutoML Challenges:

AutoGraph@KDD2020 (KDD Cup 2020)

AutoML@KDD2019 (KDD Cup 2019)

AutoML@PAKDD2019 (competition session)

AutoML@PAKDD2018 (competition session)

AutoML@NeurIPS2018 (competition session)

AutoML@ICML2018 (workshop)

AutoML@IJCNN2019 (competition session)

AutoML@PRICAI2018 (workshop)

AutoML@WSDM2020 (competition session)

AutoDL@NeurIPS2019 (ongoing challenge)



About 4Paradigm Inc.

Founded in early 2015, 4Paradigm is one of the world’s leading AI technology and service providers for industrial applications. 4Paradigm’s flagship product – the AI Prophet – is an AI development platform that enables enterprises to effortlessly build their own AI applications, and thereby significantly increase their operation’s efficiency. Using the AI Prophet, a company can develop a data-driven “AI Core System”, which could be largely regarded as a second core system next to the traditional transaction-oriented Core Banking System (IBM Mainframe) often found in banks. Beyond this, 4Paradigm has also successfully developed more than 100 AI solutions for use in various settings such as finance, telecommunication, and internet applications. These solutions include, but are not limited to, smart pricing, real-time anti-fraud systems, precision marketing, personalized recommendation, and more. And while it is clear that 4Paradigm can completely set up a new paradigm that an organization uses its data, its scope of services does not stop there. 4Paradigm uses state-of-the-art machine learning technologies and practical experiences to bring together a team of experts ranging from scientists to architects. This team has successfully built China’s largest machine learning system and the world’s first commercial deep learning system. However, 4Paradigm’s success does not stop there. With its core team pioneering the research of “Transfer Learning,” 4Paradigm takes the lead in this area, and as a result, has drawn great attention from worldwide tech giants.

About ChaLearn

ChaLearn is a non-profit organization with vast experience in the organization of academic challenges. ChaLearn is interested in all aspects of challenge organization, including data gathering procedures, evaluation protocols, novel challenge scenarios (e.g., competitions), training for challenge organizers, challenge analytics, result dissemination, and, ultimately, advancing the state-of-the-art through challenges.

Dynamic Job Shop Scheduling Problem

In a Job Shop Scheduling Problem (JSSP), the task is to assign a collection of jobs to a set of machines. Each job has its own sequence of operations each of which should be processed on a fixed set of machines and takes a specific processing time.

The most studied static JSSP assumes that all information about the jobs and machines is known ahead of time, and no unexpected events will occur during production. However, static JSSP is usually unrealistic since various stochastic events, such as machine breakdown, new job arrival, due dates changing, occur in real-world manufacturing systems.

In contrast, Dynamic Job Shop Scheduling Problem (DJSSP) takes these events into consideration, and better matches the reality. Here we will introduce elements in DJSSPs that are considered in this challenge.

Machines are the major resources in DJSSP. Given enough time, a machine can process a specific operation. Machines of one type can process a specific set of operations. In our setting, a machine can only process one operation (i.e., one job) at a time.

Jobs A job consists of a sequence of operations, each should be processes on a specific set of machines and takes a certain time, namely the processing time. The following figure illustrates two types of jobs. Rectangles denote different operations, and the colors indicate the type of machines that can process the corresponding operations. For instance, a Type-A job comprises three operations "a1", "a2", and "a3", where "a1" and "a3" can be processed by one type of machine while "a2" needs another.

The arrival time of a job is the time the job arrives at the production line. It is the time that the first operation can be processed but not necessarily the time it is actually processed.

Each job is assigned a priority, and a job with a higher priority is more important.

Illustration of jobs


Policy A policy, in DJSSP, assigns specific jobs to an available machine at a specific point in time. The following figure gives a sample schedule that is generated by a policy. Three machines are considered, among which "M1" and "M2" are of one type, and "M3" the other. Two jobs, belonging to the two job types in the last figure, respectively, need to be processed. The Type-B job arrives slightly later than the Type-A one. The policy assigns the Type-A job to "M1" at time 0, and the Type-B job to "M2" at time 1. Then, at time 8, it simultaneously assigns them to "M3" and"M1", respectively. We will omit the remaining actions.

Illustration of a schedule and its makespan.

Metrics In this challenge, two metrics will be considered to evaluate a policy, namely makespan and pending time violation:

  • Makespan: makespan is the total length of the scheduling, i.e., the amount of time starting from the arrival of the first job and ending at the completion of all jobs. The last figure gives an example. The shorter the makespan is, the better.
  • Pending Time Violation (PTV): as shown in the next figure, pending time requirements are posed between any two adjacent operations. That is to say, after the finish of the former operation, the next operation should be assigned to an available machine within the corresponding pending time margin, and otherwise, a pending time violation occurs. For the first operation, the pending time starts at the job arrival. Violation of pending time will lead to a penalty that is in direct proportion to the job's priority (no PTV will be considered for a job with 0 priority).

In this challenge, these metrics will present in the form of rewards, which is a common element in reinforcement learning and will be discussed on the Environment page.

Stochastic Events In this challenge, we consider random machine breakdown, one of the most representative stochastic events in real-world manufacturing systems. When a machine breaks down, it remains unavailable until the end of the episode. In order to make the problem manageable, we carefully designed these events and ensure the randomness will not make the problem intractable.

Diversity In this challenge, participants' solutions are required to train agents for a set of different DJSSP environments. The diversity among these environments should be both challenging and manageable. In this challenge, we consider diversity in the following aspects:

  • Numbers and types of machines;
  • Numbers and arrival of jobs;
  • Occurrence of machine breakdowns.


Challenge Rules

  • General Terms: This challenge is governed by the General ChaLearn Contest Rule Terms, the Codalab Terms and Conditions, and the specific rules set forth.
  • Announcements: To receive announcements and be informed of any change in rules, the participants must provide a valid email.
  • Conditions of participation: Participation requires complying with the rules of the challenge. Prize eligibility is restricted by Chinese government export regulations, see the General ChaLearn Contest Rule Terms. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the challenge design giving him (or her) an unfair advantage, are excluded from participation. A disqualified person may submit one or several entries in the challenge and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that ChaLearn and the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.
  • Dissemination: The challenge is organized in conjunction with the IJCAI2021 conference.
  • Registration: The participants must register to AutoML.ai and provide a valid email address. Team participants and solo competitors must register only once and provide one email address. Teams or solo participants registering multiple times to gain an advantage in the competition will be disqualified. One participant can only be registered in one team. Note that you can join the challenge until one week before the end of feedback phase. Real personal identifications will be required (notified by organizers) at the end of feedback phase to avoid duplicate accounts and award claim.
  • Anonymity: The participants who do not present their results at the challenge session can elect to remain anonymous by using a pseudonym. Their results will be published on the leaderboard under that pseudonym, and their real name will remain confidential. However, the participants must disclose their real identity to the organizers to join check phase and final phase and to claim any prize they might win. See our privacy policy for details.
  • Submission method: The results must be submitted through this AutoML.ai competition site. The number of submissions per day is 1 and maximum total computational time per day is 18 hours. Using multiple accounts to increase the number of submissions is considered cheating and NOT permitted. In case of any problems, please send email to autorl2021@4paradigm.com. The entries must be formatted as specified on the Instructions page.
  • Prizes: The top 5 ranking participants in the Final Phase may be qualified for prizes. To compete for prizes, the participants must make a valid submission on the Final Phase website (TBA), and fill out a fact sheet briefly describing their methods before the announcement of the final winners. There is no other publication requirement. The winners will be required to make their code publicly available under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license, if they accept their prize, within a week of the deadline for submitting the final results. Entries exceeding the time budget will not qualify for prizes. In case of a tie, the prize will go to the participant who submitted his/her entry first. Non winners or entrants who decline their prize retain all their rights on their entries and are not obliged to publicly release their code.
  • Cheating: Any dishonest means to get advantage in the challenge (e.g., attempt to modify the environment or to get a hold of all unprovided configuration files) are considered cheating and will lead to disqualification.


Frequently Asked Questions

About submission

Q: How to submit my code? A: First follow the instructions in README.md file in the starting kit to prepare you zip file. Then, find My Submission page on the competition website, press Feedback Phase and then click the Upload a Submission button to upload your zip file.

Q: How many machine resources can I use? A: The submission will be run on a workstation with 6 CPU cores, 56GB RAM, and 1 K80 GPU.

About third-party packages/modules

Q: Can I use a non-open-source module/package in my submission. A: Yes, you can. However, in order to accept the prize, the whole solution should be made open-sourced under an OSI-approved license.

Q: How can I get the absolute path to solution.py on the competition platform: A: You can get the absolute path by adding the following code in solution.py:

import os
abs_path = os.path.abspath(os.path.dirname(__file__))

About the setup

Q: Can I use APIs of the Environment that are not listed on the "Get Started" page? A: Using these APIs is NOT suggested, since, in the Private Phase, these APIs may be changed or made unavailable.

Q: When a machine breaks down, what happens to the job it is processing? A: When a breakdown occurs, if the processing of the job is done, it enters the next stage (i.e., can be processed by next machines); Otherwise, if the current operation is not done, the job will be put back to the waitlist, and the operation needs to be re-processed from the very beginning by another machine. You can get the status of all jobs in job_status.

Q: Can a machine have more than one type? A: No, one machine has only one type.

Q: Can I know in advance when a machine will break down? A: No. The breakdown can be observed only at the very moment it happens.


In this challenge, DJSSP environments (simulators) are developed based on real-world production lines. An environment simulates the producing procedure of a job shop in an iterative manner. In each "step" it takes in the decisions (made by agents), updates the states of the job shop (e.g., assigns works to specified machines, proceeds operation under process, finishes operations and makes machine available again, etc.), and outputs information to the agents.

The unit of time used in all environments is "minute".


Each environment in this challenge is an instance of the Environment class, implemented with Python. In the rest of this section, we will assume env is an instance of Environment. The interface  Environment includes:

machines is a static member variable (i.e., does not change during simulation) that provides information about machines in the environment. Machines are organized by types (in the form of a dict), and for each type all machine ids (string) are given in a list. Here is an example of machines.

env.machines = {
    'A': ['M01', 'M02'],	# type: list of machine ids, ids are strings
    'B': ['M03' ],
    ...	# other types

job_types is a static member variable that provides information about all job types considered in the environment. For each type, the information of all operations is listed in order. For each operation, operation name (string), required machine type (string), processing time (int), pending time requirement (int) are given. Here is an example of job_types.

env.job_types = {
    'a': {  
            [	# operations
                {'op_name': 'AO1',	# operation name
                 'machine_type': 'A',	# required machine type
                 'process_time': 10,	# processing time (minutes)
                 'max_pend_time': 10,	# maximal pending time (minutes)
                {'op_name': 'AO2', 
                ...	# other operations
    ... # other job types

jobs is a static member variable that provides information about all jobs that should be processed in the environment. Jobs are organized in types (string), and for each job, the arriving time (int) and the priority (int, $0\sim 5$) can be found. Here is an example of jobs.

env.jobs = {
    'a': {
        'J01': {
            'arrival': 10, 
            'priority': 3,  # 0 ~ 5
        ... # other jobs
    ... # other job types

__init__() is the initialization method of the environment class. It takes only one input (besides self):

  1. config, the path to the configuration file that indicates the job types, machines, jobs, PTV weights, and breakdown events. Details on configuration files can be found below.

The usage of __init__() is as follows.

 # usage of env.__init__()
 env = Env(config='./config/xxx/train_1.yaml')

reset() is a method that initializes the environment, outputs machine and job status (machine_status and job_status respectively, see below for details), current time (time), as well as the candidate job list (job_list) for the first step. The usage of reset() is as follows.

# usage of env.reset()
machine_status, job_status, time, job_list = env.reset()
  • machine_status provides the current status of all machines in the environment. For each machine (key in a dict), machine type (string), current status ('down', 'idle', or 'work'), and remaining process time (int), job under processing (job id, string), and pending job list (set of job ids) will be given. Here is an example of machine_status.
# example of machine_status
machine_status = {
    'M01': {
        'type': 'A',	# machine type
        'status': 'idle',	# 'down', 'idle', 'work'
            # 'down' means the machine is suffering a breakdown,
            # 'idle' means the machine is available for an other job,
            # 'work' means the machine is processing a job
        'remain_time': 5,	# remaining time of processing. 
            # if the job is 'down' or 'idle', remain_time is 0
        'job': 'J01',	# job under processing, None if 'down' or 'idle'
        'job_list': {'J02', 'J05', ... }, # set() of job ids
    ...	# other jobs
  • job_status provides the current status of all jobs in the environment. For each job (key in a dict), job type (string), priority (int), current status ('work', 'pending', 'to_arrive', or 'done'), remaining arrival time(int), current operation (string), remaining process time (int), remaining pending time (int), and the processing machine (string) are given. Here is an example of job_status.
# example of job_status
job_status = {
    'J01': {
        'type': 'a', # job type
        'priority': 3, # int, 0 ~ 5
        'status': 'work',	# 'work', 'pending', 'to_arrive', or 'done'
            # 'work' means the job is under processing
            # 'pending' means the job is waiting for processing
            # 'to_arrive' means the job has not arrived
            # 'done' means the job is finished
        'arrival': 0,	# remaining time to arrive, 0 if the job has arrived, 
            # > 0 if the job has not arrived
            # None if the remaining time is out of the time window
        'op': 'AO2',	# the current operation name,
            # if not arrived, uses the first operation name
        'remain_process_time': 5,	# remaining process time
            # if 'pending', uses the processing time of the current operation
        'remain_pending_time': 8,	# remaining pending time
            # can be less than 0, if so, means pending time have been violated
        	# inf if the job has not yet arrived
            # if the status is 'work', keeps the last value before processing
        'machine': 'M01',	# processing machine, None if not under processing
    ... #other jobs
  • time is an int that indicates the current simulation time, in minutes.
  • job_list is a dict where each machine id is the key and the value is a list of jobs that can currently be processed by the machine.
# example of job_list
job_list = {
    'M01': ['J01', 'J02'],	# list of job ids
    'M02': ['J01', 'J02'],
    'M03': [],	# empty list if the machine is not idle or no jobs are available
    ...	# other machines

step() is a method that takes the decisions (job assignment) as input, simulates for one step further, and outputs the new machine and job status, new candidate-job lists, as well as the intermediate rewards(s) agent(s) receives in the step. It also indicates whether the environment is terminated (i.e., all jobs are done or the time limit is violated). The usage of step() and examples of its input and outputs are as follows.

# usage of step()
machine_status, job_status, time, reward, job_list, done = env.step(job_assignment)
  • machine_status, job_status, time, and job_list are the same as the outputs of reset().
  • job_assignment is the input  step() that indicates which machine accepts which job at the current step. It is a dict whose keys should be a subset of all machines that accept jobs at this step.
# input: job_assignment
job_assignment = {
    'M01': 'J01',	# assigned job
    'M02': 'J02',
    'M03', None,	# None if assign no jobs
    ...	# other jobs
  • reward describes the rewards agent gets for this step.
# examples of reward
reward = {
    'makespan': -1,	# reward related to makespan
    'PTV': 0	# reward related to pending time violation
                # for each violated job, for each timestep (min)
                # PTV = job_priority * ptv_weight
  • done is a bool variable that indicates whether the episode is finished(True) or not (False). An episode terminates if all jobs are done, or a maximum step number is reached.

Configuration File

A configuration file describes a DJSSP problem with specific machines, job types, jobs, machine breakdown events, and the importance of pending time violations. Each configuration file is a .yaml file that comprises five parts:

  • job_types: for each job type, information about all the operations are given, including: 1) the name of the operation, 2) the required machine type, 3) the maximal pending time, and 4) the processing time;
 a: 	# job type
  - op_name: A01
 	machine_type: A
 	process_time: 10
 	max_pend_time: 10
  - op_name: A02 # other operation
 ... # other type
  • machines: for each machine type, ids of the corresponding machines are listed;
 A:		# machine type
  - M001	# machiens of this type
  - M002
  - M003
  - M004
  ...	# other machines
 # other types
  • jobs: for each type, information about all corresponding jobs are given, including: 1) the arrival time, and 2) the priority of the job;
 a:		# job type
  J0001:		# job id
   arrival: 0	# arrival time
   priority: 0	# priority
  ...	# other jobs
 ... # other job types
  • breakdown: indicates the machine that will breakdown and the corresponding time;
 - M01:	0 # breakdown machine and breakdown time
 - ... # other machines
  • ptv: indicates the weight of pending time violation;
ptv: 10	# the weight of pending time violation

Reward Calculation

For each passing minute, the agent gets a -1 'makespan' reward. And for each minute that a job violates its maximal pending time constraint, the agent gets a penalty equal to job_priority * ptv for this job, where ptv is the pending time violation weight defined in the configuration file.

The rewards related to makespan and pending time violations are given separately. During the evaluation, we calculate the total reward by simply adding them up, but the participants can make use of the split reward signals.

Environment Characteristics

 Environment  Type  #Machines  #Jobs 
Env01 Public ~10 ~40
Env02 Public ~10 ~100
Env03 Public ~20 ~200
Env11 Feedback ~10 ~50
Env12 Feedback ~10 ~100
Env13 Feedback ~20 ~150
Env14 Feedback ~20 ~200
Env15 Feedback ~20 ~200
Env21 Private ~10 ~50
Env22 Private ~20 ~100
Env23 Private ~20 ~150
Env24 Private ~20 ~200
Env25 Private ~20 ~200



In this challenges, three phases are designed to enable participants develop their solutions and check the correctness, and the organizers to evaluate final submissions. To be more specific, the three phases are:

Feedback Phase

In this phase, participants develop their solution and make improvements based on feedback. Solutions will be uploaded to the platform and participants will receive feedback on the performance evaluated by five feedback environments. Each participant is allowed to make ONE submission per day.

Check Phase

In this phase participants will be able to check whether their submission (migrated from the previous phase) can be successfully evaluated by five private environments. Participants will be informed if failure caused by memory or overtime issues occurs, but no results on performances will be revealed. Each participant will be given one chance to correct her submission.

Private Phase

This is the blind test phase with no submissions. In this phase, solutions will be automatically evaluated by private environments, and the participants final scores will be calculated based on the results in this phase. In this phase, solutions will be automatically evaluated by private environments, and the participants final scores will be calculated based on the results in this phase.


The following table shows the numbers of environments for different phases.

 public  feedback  private  total 
3 5 5 13

The public environments will be included in the staring kit. The feedback and private environments can not be downloaded by participants.


The public environments will be included in the staring kit. The feedback and private environments can not be downloaded by participants.

Evaluation Method

In each phase, the solution will be evaluated by all the corresponding environments. Now we will demonstrate how the solution is evaluated by one specific environment.

For each environment, two stages, namely the training stage and the testing stage, are used to evaluate the solution. In each stage, five configurations will be used. All these ten configuration files are generated by a generator with fixed distribution or random process.

The participant should submit their solution in the form of a Trainer (details can be found in the Submission page).

In the training stage, the Environment class, and a list containing the paths to five training configuration files are sent to the Trainer and the latter automatically trains an Agent object (agent).

Then, in the Testing stage, the agent is tested using the Environment class and five unseen testing configuration files. Average long-term return (i.e., the summation of intermediate rewards throughout the episode) throughout these five configurations is used to evaluate the performance of the agent. The higher the better. If randomness is involved in the environment, repeated rollouts will be performed and the average return will be used.

The following figure illustrates the evaluation procedure on one environment.



For each phase, each solution will be ranked on each environment by the average long-term return, and the average rank through all environments is used as the solution's rank. For each environment, 10 repeated runs are performed to evaluate the policy, and the total testing time budget for each environment is 1800 seconds (i.e., 1800 seconds for 10 episodes).  The average rank in the Private Phase is the final rank of a participant's solution.

Computing Resource

The submission will be run on a workstation with 6 CPU cores, 56GB RAM, and 1 K80 GPU.

Running Time Constraints

For each environment, Train.train() is allowed to run for a given time budget (calculated in seconds, see Submission page for the detailed interface). If the time budget is exceeded, the solution will be assigned a low score for the environment. Hence, participants should track the running time in Train.train().

Python Environment

The Python environment on the competition platform has installed the following packages (only those that are used frequently in ML and RL are listed).

Package                  Version
------------------------ -------------------
conda                    4.9.2
gym                      0.18.0
numpy                    1.19.5
pandas                   1.2.3
pip                      20.2.4
ray                      2.0.0.dev0
redis                    3.5.3
scipy                    1.6.2
tensorflow               2.4.1
torch                    1.8.1

If the participants would like to install other third-party packages, please refer to the Submission page.


Submission, Starting Kit, and Baselines

The starting kit can be downloaded from Baidu Netdisk (the extraction code is hnjk) or Google Cloud.


The solution should be a python module solution.py where a Trainer class is implemented.


The Trainer should have two main methods, __init__() and train(), as stated below:

__init__(Env, train_conf_list)

The initialization method has two inputs (besides self):

  1. Env, an environment class defined in the Environment Page;
  2. train_conf_list, a list of paths to five training configuration files

The interface of the __init__() method is as follows.

class Trainer:
    def __init__(self, Env, train_conf_list):

In most cases, participants need to initialize Environment objects with specific configuration files in the trainer. The initialization method can be found in the Environment Page.

env = Env(config='path_to_the_configuration_file')


train() has one input, the time budget time that indicates the time budget for the training, calculated in seconds. train() outputs an Agent object for a given environment. The interface of Trainer.train() and Agent.act() is as follows.

class Trainer:
    def train(self, time):	
        # takes the time budget (sec) as input
        return agent	# outputs an Agent object

During the evaluation on a given environment, if the running time of train() exceeds the given time budget, the evaluation process will be terminated and the solution will be assigned a low score for the environment.


Agent is the output of Trainer.train() and it is an object that represents the learned agent(s) or policy for the given DJSSP environment.

An Agent object should have an act() method that takes machine status, job status, current time, and job list as inputs and outputs a job assignment (see Environment Page for more details).

Note that multi-agent algorithms can be employed in this challenge. The participants can integrate multiple agents into an agent object, as long as the interface protocol is followed. The interface of Agent.act() is as follows.

class Agent:
    def act(self, machine_status, job_status, time, job_list):	
        return job_assignment	# outputs job assignment

Dependencies and Third-party packages

The challenge platform will automatically append the folder where solution.py locates to $PYTHONPATH.

If third-party packages (that are not listed in the Setup page) are used, put a requirements.txt file in the same folder with solution.py.

Starting Kit

To help participants develop their solution codes, we provide the participants a starting-kit that includes:

  • the Environment class with three sets of public configuration files.
  • two baseline solutions;
  • programs that we use in our platform to run participants’ solutions and scoring their results;

Please find the downloading URL at the top of this page.

Please read the README.md carefully to learn how to run local test and upload your code.


Reinforcement Learning-based Baseline

A simple multi-agent reinforcement learning (MARL) method is included in the starting kit. Each machine is considered as an agent and a shared policy is learned for all machines. Besides machine status, statistical information on jobs and other machines are designed and used as observation of each agent, in order to improve the generalization ability of the solution. Simple priority rules (rules that used to determine which job to process first) are used as actions. To be more specific, "pending-time-first" (if at least one non-zero priority job is waiting, it processes the job with the shortest remaining pending time, weighted by the priority; otherwise, it processes the job with the shortest processing time, aka. shortest-processing-time) and "waiting" (doing nothing and wait for further jobs) are the two actions that an agent can take. All agents share a global reward signal which is simply the summation of makespan and PTV-related rewards. Parameter sharing PPO (PS-PPO) is selected as the MARL algorithm. The neural network architecture is automatically generated with a self-adaptive input layer but fixed other layers. For each environment, the agents are simply trained with the provided training configuration files, by sampling one out of five files per episode.

We believe the performance and efficiency of the baseline method can be much improved if more sophisticated methods are employed and automated, for instance, better action and state representations, hyper-parameter tuning, reward shaping, better training mechanisms, and model-based approaches, just to name a few.

Rule-based Baseline

We also provide a rule-based baseline method which is based on the pending-time-first priority rule. Different from pending-time-first rule, when all waiting jobs are assigned zero priority, this rule estimates if processing a job with the shortest processing time will cause future pending time violation. If the answer is yes, the rule will wait, and otherwise, it follows the shortest-processing-time rule.


During the feedback phase, after submitting your code, please see "My Submissions" tab for the running results.

A "Finished" status means the code is evaluated correctly, and the results can be found in the "Results" tab.

If the status is "Failed", a low score will be assigned to each environment and the results can be found in table "Results - Feedback Phase". ( Table "Results - Dataset x" will keep the score of your last successful submission, but the score will not be used for ranking. Hence, only the information in the "Feedback Phase" table counts!)


Feedback Phase

Start: April 21, 2021, midnight

Description: Please make submissions by clicking on following 'Submit' button. Then you can view the submission results of your algorithm on each dataset in corresponding tab (Dataset 1, Dataset 2, etc).


Color Label Description Start
Dataset 1 None April 21, 2021, midnight
Dataset 2 None April 21, 2021, midnight
Dataset 3 None April 21, 2021, midnight
Dataset 4 None April 21, 2021, midnight
Dataset 5 None April 21, 2021, midnight

Competition Ends

July 3, 2021, 3 a.m.

You must be logged in to participate in competitions.

Sign In