ACCURATE TOTAL WEIGHTED TARDINESS MINIMIZATION IN TIGHT-TARDY PROGRESSIVE SINGLE MACHINE SCHEDULING WITH PREEMPTIONS BY NO IDLE PERIODS

Background. The problem of minimization of total weighted tardiness can be solved either exactly by the corresponding models or heuristically. As of October 2019, nearly the best heuristic is one based on using remaining available and processing periods. The heuristic is extremely rapid compared to the exact solution models, but its accuracy can be both 100 % and intolerably low. Objective. Issuing from the lack of knowledge in relationship between the heuristic and Boolean linear programming model provided for exact solutions, the goal is to study statistical difference between them for the preemptive single machine scheduling problem by no idle periods, in which processing periods are equal by progressively running release and due dates set tightly. Methods. The relative gap of the heuristic is defined and then studied how it varies against increasing complexity of job scheduling problems. The complexity implies the number of jobs and the number of job processing periods. The computation times of the heuristic and the exact model are registered as well. Results. The heuristic has successfully replaced the exact model no less than in 72 % of non-timeout instances, where it schedules with the same minimal total weighted tardiness (100 % accuracy). This rate is about 90 % to 97 % on average, although huge gaps may appear in the rest of cases. In the practice of fast-refreshable schedules, the Boolean linear programming model is indeed hardly tractable in scheduling no less than 14 two-parted jobs and no less than 10 three-parted jobs. Scheduling jobs divided into a greater number of parts each will have a significantly lower worst gap than scheduling jobs divided into a lesser number of parts. If a job is divisible, it is strongly recommended to divide the job into as great number of its parts as possible. If scheduling only 2 jobs is impossible, it is strongly recommended to artificially increase the number of jobs to be scheduled. Conclusions. Total weighted tardiness minimization in tight-tardy progressive single machine scheduling with preemptions by no idle periods can be sufficiently accurate by the heuristic if no less than 7 jobs divided into no less than five parts each are scheduled (the “7/5” pattern). An exception from this rule is that the heuristic schedules just 2 jobs always at the 100 % accuracy, not depending on in how many parts the job is divided (the “2/any” exception). An intermediate between the “7/5” pattern and the “2/any” exception is that scheduling 3 jobs divided into either four or five parts is sufficiently accurate as well, where the inaccuracy does not exceed 0.7 %. In other cases the heuristic is either inapplicable or there is a high risk of obtaining intolerable gaps. The inapplicability does not directly imply a bad inaccuracy, but it implies an unpredictable accuracy drop. For example, 974 of 1000 instances of 3 two-parted jobs have been scheduled with the 100 % accuracy, but 26 instances have been scheduled with an average gap in 13.31 %, which is quite intolerable and thus inapplicable.


Introduction
In job scheduling on a single resource/machine, a very important problem arises when some jobs are required to be completed till due dates. Due date is a kind of schedule expiration date, before which the job can be scheduled in any way favorable to the system. If a job is completed after its due date, an additional payment is imposed [1,2]. The payment can be expressed as financially, as well as by reduction of availability. The purpose is to minimize the additional payments. More formally, it is referred to as minimization of tardiness.
The problem is related to minimization of total weighted completion time [2,3]. Whereas the latter operates over associating each job with its processing time/period, release date, and priority weight, the tardiness problem includes due dates. Thus, if priority weights differ, the general problem is to minimize total weighted tardiness. A special requirement, by which no idle time intervals are allowed [1,4,5], can be attached. Additionally, preemptions can be allowed [6].
It is known that the preemptive single machine scheduling problem of minimizing total tardiness (when jobs have no weights) with arbitrary release and due dates by equal processing periods is polynomially solvable [7]. However, struggling to compute a schedule with the exactly minimal total weighted tardiness even for a few jobs may become very resource-consuming (implying processor clock speed, memory space, and time of computations) [2,5]. As the number of jobs and the numbers of their processing periods increase, intractability of the problem dramatically grows [5,8,9].
Obviously, the minimal number of jobs is 2. Release and due dates are often given as integers.
Setting priority weights at integers is always realizable. Therefore, a schedule ensuring the exactly minimal total weighted tardiness can be found with the respective integer linear programming problem. Models based on the branch-and-bound approach are commonly used for that [8,10]. They resemble those intended to minimize the exact total weighted completion time [8,9]. Along with models of obtaining an exact solution, there are a lot of heuristics allowing to find an approximate solution, which often coincides with the exact one and ensures thus the minimal total weighted tardiness also [4,5,7,11]. The heuristics operate with the remaining available period [5], remaining slack [5,11], and remaining processing period [8,9]. However, the accuracy of the heuristics for minimizing total weighted tardiness is not as high as the accuracy of the rule of weighted shortest remaining processing period for total weighted completion time minimization [5,11]. For instance, the recently substantiated heuristic, in which the decisive ratio is the priority weight divided by the maximum of a pair of the remaining processing period and remaining available period [5], may produce schedules of a few jobs whose total weighted tardiness is 50 % greater or even worse than the minimum. Nevertheless, the heuristics are extremely rapid compared to the exact solution models [8].
An open question is how close the exact and approximate solutions are for definite types of the scheduling problem. The closeness is meant as the average gap, although maximal gaps are considered as well. Another open question is what computational time is taken to find an approximate schedule by the heuristic compared to the computational time of the exact model. Finally, it would be very useful to learn whether "pathological" cases exist in which the gap is too big, and thus application of the heuristic for such cases is practically impossible. Therefore, real benefits of schedule approximation by the heuristic are to be estimated along with emphasizing cases in which it is inapplicable.

Problem statement
In a way, there are a lot of types of the preemptive single machine scheduling problem with arbitrary release and due dates by equal processing periods. A special class is that which has monotonously increasing both release and due dates. A subclass of this one is that where almost all the jobs are tardy as their due dates are tightly set after the respective release dates, although one job can always be completed without tardiness. Hence, issuing from the lack of knowledge in relationship between the mentioned heuristic and Boolean linear programming model provided for exact solutions, the goal is to study statistical difference between them for the preemptive single machine scheduling problem by no idle periods, in which processing periods are equal by progressively running release and due dates set tightly. The tight-tardy progressive single machine scheduling with preemptions by no idle periods is one of the hardest cases, which could serve as an "upper bound" for the statistical difference. When the tightness is relaxed, the difference will be expected to be less. To achieve the said goal, a computational study will be carried out with a purpose to see the inaccuracy of the heuristic. The inaccuracy will be studied how it varies against increasing complexity/size of job scheduling problems. The computation times are to be compared as well. The research result is expected to find a point, in which a hardly tractable exact model could be "linked" to a sufficiently accurate heuristic (a "lossless transfer" from the exactness to approximation). Otherwise, such a point may not exist (e. g., the exact solution is searched impracticably long, whereas the heuristic solution is still too inaccurate), but this should be shown and discussed anyway.

Exact solution by the Boolean linear programming model
First, consider an approach to find the exactly minimal total weighted tardiness. Let N be a number of jobs, . Integer n r is the time moment, at which job n becomes available for processing. So, in the case of equal processing periods, is a vector of processing periods, is a vector of priority weights, is a vector of release dates, and is a vector of due dates. Without loss of generality, to ensure the condition of "the proper start", which is let vector (3) consist of a non-decreasing set of integers, where 1 1 r  . This is the first additional constraint to release dates (3). The second one comes from that there are no idle periods, i. e. condition holds as well. Narrowing the problem to the already mentioned tight-tardy progressive single machine scheduling (with preemptions by no idle periods), the release dates are n r n  1, n N   (7) and the due dates are Condition (8) is equivalent to condition In other words, condition (9) implies that one job herein can always be completed without tardiness. The goal is to minimize the total weighted tardiness, i. e. to schedule the jobs so that sum 1 max{0, ( ; can be used [8,9]. So, sum (12) is the exactly minimal total weighted tardiness for those N jobs. Obviously, a few optimal schedules ensuring the same minimum (28) can exist.

A heuristic based on remaining available and processing periods
The heuristic is an online scheduling algorithm, which returns a schedule stepwise as time t progresses. Let be a starting vector containing the remaining processing periods. Later on, elements of vector (29) will be decreased as time t progresses. Denote by . For every set of available jobs the remaining available time is calculated: Paper [5] claims that the remaining slack must be found also. Then a set of decisive ratios is calculated. With remaining slack (32), however, it is easy to see that the ratio in (33) factually is because the denominator in the central fraction of statement (34) becomes equal to i q by i is considered instead of (33). The maximal ratio is achieved at subset Assignment (39) executed by condition (38) for subset (36) implies that, in a case when there are two or more maximal decisive ratios, the earliest job is preferred to be scheduled. This is especially reasonable for the tight-tardy progressive single machine scheduling, in which the earlier releasable job has an earlier due date. An approximately minimal total weighted tardiness is calculated successively for every 1, n N  as follows: if then job n is completed after moment ( ; ) n H   . Finally, using formula (10), is an approximately minimal total weighted tardiness that corresponds to schedule . This schedule often coincides with the schedule produced by exact solution (22):

Computational study
In computational studies, the relative error or gap of total weighted tardiness minimization in scheduling N jobs is in percentage terms. Inasmuch as computation time of the heuristic is always far less than computation time * ( , ) N H  of achieving minimum (23) by (24)-(27), then it is suitable to use a computation time ratio Consider the following generator of the scheduling problem instances. Priority weights (2) are where operator (1, ) N  returns a pseudorandom 1 N  vector whose entries are drawn from the standard uniform distribution on the open interval (0; 1) , and function ( )   returns the integer part of number  (e. g., see [8,12]). Release dates (3) are taken as (7), and due dates are taken as (9). At first, we try to schedule up to 12 jobs. The minimal number of job parts is 2, whereas scheduling jobs of the single part, which herein is * , has no tardiness. A reasonable time period, through which a schedule is expected to be found, is of order of a minute. As the size of the scheduling problem grows, the exact model (1) -(27) Fig. 1 shows gap (43) averaged over 1000 instances generated for each 2, 12 N  by when each job has only two processing periods (i. e., is divided into two identical parts). The "correct" gap   60 ( , 2) N is plotted on the same axes. There is no gap in scheduling only a pair of jobs. Scheduling 12 jobs is the most inaccurate on average, although the gap in 0.38 % seems to be tolerable. However, the instance with priority weights whose total weighted tardiness is 282 (the second part of job 1 and the two parts of job 2 are like to have been just interchanged). The overall number of such bad gap instances is not as small as it would have seemed with the top average gap in 0.38 %: there are 72 instances (out of grand total 11000 ones) when the gap is no less than 10 %, and there are 203 instances when the gap is no less than 5 % (that still can be intolerable). There are no one-minute-timeout cases when scheduling less than 10 jobs. But starting off 10 jobs, a difference between gaps ( , 2) N  and   60 ( , 2) N becomes more apparent. The "incorrect" gap 60 ( , 2) N   in Fig. 2 is shown just for such an interval of the number of jobs. It is expectedly decreasing as the number of one-minute-timeout cases for the interval is clearly increasing (see Fig. 3, where the number of "successful" timeouts 0 ( , 2) U N  is shown as well). Obviously, 29 % of timeouts in scheduling 12 jobs mean that the exact model is too slow for this case. Meanwhile, the respective computation time ratio resembles an exponential increase (Fig. 4).
In scheduling 11 two-parted jobs, 88 one-minute-timeout cases came out of 1000. Therefore, we study scheduling three-parted jobs only for up to 10 jobs. Fig. 5 shows gap ( , 3) N  which drops since 8 jobs because of timeouts (averaging over 1000 instances becomes incorrect since this moment). The "correct" gap 60 ( , 3) N   and "incorrect" gap 60 ( , 3) N   are both shown on the same plot (Fig. 6). Note that the "correct" gap for 10 jobs 60 (10, 3)   is about 1.6 % (which is far greater than the "correct" gap in 0.38 % for 12 jobs in Fig. 1) but it is averaged over just 86 cases (see Fig. 7) as against those 910 cases without timeouts for 12 jobs (Fig. 3). This is why the respective computation time ratio herein does not resemble an exponential increase (Fig. 8) ІНФОРМАЦІЙНІ ТЕХНОЛОГІЇ, СИСТЕМНИЙ АНАЛІЗ ТА КЕРУВАННЯ scheduling problem has grown and the heuristic takes more time to find a solution whereas the exact model, in cases without timeouts, still takes no more than 60 seconds). Not considering scheduling of 11 and 12 jobs, ratio ( , 3) N  is apparently greater than ( , 2) N  . Another noticeable fact is that the worst gap here is 18.05 %, and it is produced by the instance with priority weights similarly to the instance with priority weights (46) and schedules (47) and (48) (once again, the last two parts of job 1 and the three parts of job 2 are like to have been just interchanged). The overall number of bad gap instances now is smaller: there are only 23 instances (out of grand total 9000 ones) when the gap is no less than 10 %, and there are 54 instances when the gap is no less than 5 %. In scheduling 9 three-parted jobs, 612 one-minute-timeout cases came out of 1000. Therefore, we study scheduling four-parted jobs only for up to 8 jobs. Fig. 9 shows gap ( , 4) N  which drops since 7 jobs because of timeouts. The "correct" gap   60 ( , 4) N and "incorrect" gap 60 ( , 4) N   both shown on the same plot (Fig. 10) resemble those ones in Fig. 6 (8,4) 119.65 (7,4) , which is pretty weird itself. Indeed, Fig. 11 showing the number of one-minute-timeout cases does not contain a huge jump. As in the previous case, the respective computation time ratio herein does not resemble an exponential increase (Fig. 12) having a drop at 8 N  . Meanwhile, the worst gap here in 9.28 % is caught in scheduling 6 jobs. The overall number of bad gap instances is small: there are only 3 instances (out of grand total 7000 ones) when the gap is no less than 5 %, and there are 50 instances when the gap is no less than 1 % (that can be interpreted as a tolerable value). Furthermore, only 89 instances have been scheduled with the gap no less than 0.1 %. Consequently, we continue to study scheduling for an increased number of job parts. Fig. 13 shows gap ( , 5) N  for five-parted jobs which drops since 6 jobs because of timeouts. Once again, the "correct" gap 60 ( , 5) N   and "incorrect" gap 60 ( , 5) N   both shown on the same plot (Fig. 14) resemble those ones in the previous cases (for threeand four-parted jobs). However, now there is an unexpected drop at 7 jobs, whereas all 1000 instances at 8 jobs are timeouts (Fig. 15). The respective computation time ratio (Fig. 16) therein does not resemble that in Fig. 12. Meanwhile, the worst gap here drops to 7.12 % caught in scheduling 6 jobs once again. The overall number of bad gap instances is small similarly to the previous case: there are only 4 instances when the gap is no less than 5 %, and there are 32 instances when the gap is no less than 1 %. Furthermore, only 64 instances have been scheduled with the gap no less than 0.1 % (that is a quite tolerable value). As it is easily seen, the worst scheduling cases, when the gap achieves practically intolerable values, cannot be excluded or prevented. In such cases, the heuristic by (29) -(39) does not help and the exact model remains the only means to find a schedule. Knowing the worst gap values thus is fundamentally needful. Maximal relative gap denoted by ˆ( , ) N H  is shown for the studied schedulings in Fig. 17 with   (12, 2) 1054,  it is clearly seen that the greater number of processing periods lessens the worst gap values. The top worst maximal relative gap ˆ(5, 2)  is obtained by the instance with (46) whose optimal schedule (47) with * (5, 2) 210   is "torn" into an approximate schedule (48) with   (5, 2) 282.  Fig. 17 also allows to suppose that the top worst gaps are concentrated over scheduling 3 to 6 jobs.
It is clear that the study of the predominant one-minute-timeout cases should be supplemented also. For this, let the timeout threshold be elongated to 10 minutes. So, denote the relative gap with excluding 10-minute timeouts by   600 ( , ). Unlike the initial study, for the supplementary study, we take 200 instances (instead of 1000). This is forced by the two factors. Firstly, the increased span for the solving (from 1 minute to 10 minutes), considering the number of jobs at which timeouts are very likely, may require a way longer computation time than that for the initial study. Secondly, the amount of 1000 instances itself is some overstated for the reliable averaging and therefore can be reduced.   Fig. 18 shows the "correct" gap averaged over only cases from which 10-minute timeouts are excluded. This can be imagined as a natural extension of the "correct" gap in Fig. 1. Obviously, the peak at 15 jobs is casual. The respective gap (43) and the "incorrect" gap are shown in Fig. 19, where we see that the 10-minute timeouts start here at 12 jobs. The number of 10minute-timeout cases (Fig. 20) is immensely increasing, and only 34 instances of scheduling 16 two-parted jobs have been solved in no more than 600 seconds. The respective computation time ratio (Fig. 21) starts roughly resembling an S-shaped curve. As it is clearly seen, in scheduling 16 twoparted jobs, the heuristic by (29) -(39) is about one million times (!) faster than the Boolean linear programming model by (1) -(27).
In scheduling three-parted jobs the "correct" gap averaged over only cases from which 10-minute timeouts are excluded has a decreasing trend (Fig. 22). Now at 10 jobs we can see only a casual peak, rather than that huge jump up at about 1.6 % in Fig. 6. Consequently, that value of 60 (10, 3) Fig. 6 is a computational artifact, although it has been revealed under the same generator of the scheduling problem instances, where priority weights (2) are pseudorandomly generated by (45). The respective gap (43) and the "incorrect" gap are shown in Fig. 23, where we see that the 10-minute timeouts start at 9 three-parted jobs. The number of 10-minute-timeout cases (Fig. 24) is more immensely increasing, and now only 12 instances N N of scheduling 16 three-parted jobs have been solved in no more than 600 seconds. This is why the respective computation time ratio (Fig. 25) has been "contorted" again (similarly to Figs. 8, 12, 16). Nevertheless, in scheduling 12 three-parted jobs, the heuristic is about half a million times faster than the Boolean linear programming model. This ratio in scheduling 12 two-parted jobs (Fig. 21) is about 2.5 times less.
Maximal relative gap for the supplementary study (Fig. 18 -25) is shown in Fig. 26. These two polylines confirm the previously inferred suspicion about the worst gap values dropping as the number of processing periods increases. Indeed, the huge value of ˆ(15, 2)  is a computational artifact issued from the instance with priority weights . The frequency of such hugegap computational artifacts is not significant for considering them as statistically regular.
In the end, it is very important to learn a ratio of non-timeout instances, in which the heuristic gives the minimal total weighted tardiness, to the total number of non-timeout instances (i. e., a fraction or percentage of cases when the exact model is factually needless). Denote this ratio by 0 ( , ).
N H   Fig. 27, in which the supplementary study results are drawn with a thicker line, and a bigger square marker corresponds to a greater number of processing periods, shows that the heuristic has successfully replaced the exact model no less than in 72 % of non-timeout instances. As the number of jobs increases, the fluctuations in those polylines become severer, that is explained with the decreasing number of non-timeout instances causing less statistical reliability of the ratio. In scheduling just 2 jobs, whichever way they are parted, the heuristic always gives the minimal total weighted tardiness.
Any further extensions and supplements of the computational study are worthless as they would require raising the timeout threshold up to half an hour and even to a few hours, and that is leading to N what we call "intractability" of the exact model. For example, if there is an international airport, its schedules of departures and arrivals are frequently corrected (e. g., by reason of specific meteorological conditions, delays, flight cancellations, etc.), and thus recalculating an optimal schedule for even a few minutes may be critical. This is not a direct task of the air traffic controller. The special software should fast-refreshably trace the optimality of the schedules. Otherwise profitability of the airline will drop, the rate of flights will be reduced, and subsequently a lot of the concerned passengers will suffer losses.

Discussion
Turning back to the expected research result, where is the point of the "lossless transfer" from the exactness to approximation? Could the hardly tractable Boolean linear programming model by (1) -(27) eventually be "linked" to the sufficiently accurate heuristic by (29)-(39)? Based on Figs. 1 -3, 5 -7, 9 -11, 13 -15, 17 -20, 22 -24, and 26, the answer to the first question is negativethere is no the "lossless transfer" point. On average, the heuristic produces schedules with the same minimal total weighted tardiness just as the exact model does at pretty high rate (see Fig. 27). However, "pathological" cases like those with priority weights (46), (49), (52), (55), whose corresponding optimal schedules (47), (50), (53), (56) differ from respective heuristic schedules (48), (51), (54), (57) not so much, do exist. Unfortunately, they cannot be predicted or systemized. This is so because the way in which the heuristic "tears" the optimal schedule is unclear so far. For example, the instance with priority weights   The similar seeming "tear"-in-schedule example is that with priority weights (55), heuristic schedule (57) for which is not optimal.
The Boolean linear programming model by (1) -(27) is indeed hardly tractable in scheduling no less than 14 two-parted jobs (see Fig. 20) and no less than 10 three-parted jobs (see Fig. 24). Whereas the top worst maximal gap for three-parted jobs is not greater than 6 % (for no less than 10 three-parted jobs), it may be beyond 10 % for two-parted jobs (see those peaks in Fig. 26). Therefore, scheduling three-parted jobs by the heuristic herein is preferable to scheduling two-parted jobs. Moreover, owing to Fig. 17, the following can be generalized for the heuristic: 1. Scheduling jobs divided into a greater number of parts each will have a significantly lower worst gap than scheduling jobs divided into a lesser number of parts.
2. If a job is divisible, it is strongly recommended to divide the job into as great number of its parts as possible (thus, allowing more preemptions, job shifts, job "tears", etc.).
3. If scheduling only 2 jobs is impossible, it is strongly recommended to artificially increase the number of jobs to be scheduled (along with superdividing each job).
Hence, the listed three items make a "link" to the sufficiently accurate heuristic by (29)-(39) plausible. Besides, scheduling a fewer number of jobs (up to 12) divided into only two parts each by the heuristic is inapplicable. The high risk of obtaining a huge heuristic gap exists for three-parted jobs as well. On the other hand, scheduling either 3 or no less than 7 jobs divided into five parts each ensures the most accurate heuristic schedules (very close to the minimal total weighted tardiness). Specifically, scheduling 3 five-parted jobs have had no gaps through 1000 instances (see Fig. 14).

Conclusions
The heuristic based on remaining available and processing periods is an extremely rapid and simple technique of scheduling with minimizing total weighted tardiness: it has taken on average between 0.13 to 1.2 milliseconds to complete a schedule for the generated instances. In the worst cases, the heuristic's computation time has varied between 1.1 to 44 milliseconds. It is obvious that the computation time (both for the heuristic and the Boolean linear programming model) stretches out as the job scheduling problem complexity/size increases. And even with this stretch the computation time ratio is still gigantic: it can reach beyond 10 6 in scheduling multiple jobs having two processing periods each, although it drops down to 10 4 when the number of processing periods (or job parts) increases. Meanwhile, the risk of obtaining a huge heuristic gap is higher at the computation time ratio maxima.
Total weighted tardiness minimization in tighttardy progressive single machine scheduling with preemptions by no idle periods can be sufficiently accurate by the heuristic if no less than 7 jobs divided into no less than five parts each are scheduled. This is the main conclusion (let it be called the "7/5" pattern). An exception from this rule is that the heuristic schedules just 2 jobs always at the 100 % accuracy, not depending on in how many parts the job is divided (let it be called the "2/any" exception). An intermediate between the "7/5" pattern and the "2/any" exception is that scheduling 3 jobs divided into either four or five parts is sufficiently accurate as well, where the inaccuracy does not exceed 0.7 %. These three cases (the "7/5" pattern, "2/any" exception, and intermediate) constitute a domain where the heuristic is fully applicable and should entirely replace the exact approach. In other cases (which can be thought of as the complement of the domain) the heuristic is either inapplicable or there is a high risk of obtaining intolerable gaps. In particular, the complement includes any number of jobs greater than 2 divided into either two or three parts each (wherein the heuristic is inapplicable). The adjacent cases (like 3 threeparted jobs or 6 five-parted jobs) are risky of 3.5 % to 7 % gap. Nevertheless, the inapplicability does not mean that the heuristic schedules, e. g., 3 two-parted jobs badly inaccurately because 974 of 1000 instances have been scheduled with the minimal total weighted tardiness (i. e., with the 100 % accuracy); however, 26 instances have been unpredictably scheduled with an average gap in 13.31 %, which is quite intolerable and thus inapplicable.
This research has confirmed that, despite the computation time gain may become significantly lesser, it is better to schedule big-sized job problems. As the size grows, the inaccuracy drops, but the drop is expected to be even deeper for tardy-relaxed problems owing to that the research has been the "upper bound". Whether it will be corrected or no when the processing periods are unequal is a question requireing a further research.