Performance bounds and suboptimal policies for multi-class queue
Бесплатный доступ
In this paper, we consider a general class of a queuing system with multiple job types and flexible service facility. We use a stochastic control policy to determine the performance loss in multi-class M/M/1 queue. The considered system is originally a Markov decision processes (MDP). The author showed how to compute performance bounds for the stochastic control policy of MDP with an average cost criteria. In practice, many authors used heuristic control policies due to some hardness in computing and running mathematically optimal policies. The authors found bounds on performance in order to an optimal policy where the goal of this job is to compute the difference of optimality and a specific policy. In other words, this study shows that, the optimal bounds of the average queue length for any non-idling policies can be found by a factor of service rates.
Queueing system, multiple job classes, stochastic control policy
Короткий адрес: https://sciup.org/147232928
IDR: 147232928 | DOI: 10.14529/mmp190104
Текст научной статьи Performance bounds and suboptimal policies for multi-class queue
We are interested to use an average cost per period (ACPP) in a multi-class job M/M/1 queue to determine the performance loss associated with using a control policy. We consider non-idling policies, which always mean serving jobs as long as there are jobs in the queue. The problem considered in this paper is to control a single server queue with multiple job class. [1,2] have shown that the optimal policy for this problem is known where the implementation of that optimal policy needs the exact information about the service rate for each class. However, note that it is a little difficult to analyse the performance of some policies, such as FIFO (see [1]).
Indeed it is known (see for example [3, 4]) that the cµ -rule is the optimal control in two main settings: (i) generally distributed service requirements among all non-preemptive disciplines and (ii) exponentially distributed service requirements among all preemptive disciplines. In the preemptive case cµ -rule is only optimal if the service times are exponentially distributed. The queuing system that we considered to work on, is in the discrete-time case where the other case and its optimal control policy is studied before and is cµ rule. In the discrete-time case, optimality of cµ rule was established in [4, 5]. We recall that cµ -rule is the discipline that gives a strict priority in descending order of c k µ k , where c k and µ k refer to a cost and the inverse of the mean service requirement, respectively, of class k .
The problem that we considered is Markov decision processes with an objective of an average cost per period. The main job of this paper is to find a method to determine the performance loss associated with using an optimal control. We produce a systematic approach to reach that and to evaluate the difference between optimality and the costs from a specific policy. We use the presented methods to supply a relation between two costs, one by a specific sub-optimal policy and the other one by an optimal policy. We believe that our results open an interesting topic for the further research. For instance, well-known optimality results in a single-class queue like the optimality of the Shortest Service time discipline or the optimality of FCFS can all be derived as corollaries of the queue. In order to get insights into the structure of the optimal policy in the multi-class case we consider several relevant cases where the service time distributions are exponential.
Finding an optimal control policy for a Markov decision process is one of the highlighted topics for infinite (or even very large) state space systems where computing it often is intractable [3]. It is not easy to run even the known general form of the optimal control policy due to some difficulties since at each time there is a slot in the discrete time process. In each time slot, we need high computational work to evaluate the costs of control action [6]. As a result, these are the reasons why we use sub-optimal heuristic control policies in practice.
In the present work, by using the general discussed methodology we present a factor which contains only the service rates for sub-optimality of the queue for any considered policy. The obtained bound establishes that there should not be much effects on the queue length when the service rates are approximately the same. Our contribution bounds on ACPP for the problem of controlling multi-class queues. Many authors have already worked to find bounds in Markov chain problems and also in finding the optimal control and analysis of multi-class queues.
Due to the average cost of Markov decision processes for finite state Markov chains, as it is done in [7], bounds are used to provide convergence of an iteration algorithm. On the other hand for general state spaces that we considered here, the obtained bounds are associated with Lyapunov theorems for Markov chains where one can find a similar upper bound in [5]. For systems with positive unbounded costs, the standard Lyapunov theorems are used to produce upper bounds. The bounds can be considered universalization of the Lyapunov bounds and also the finite state bounds. These bounds have a feature that with unbounded costs one can provide an upper bound and lower bound.
1. Problem Definition
The model that we want to consider is a multi-classes queuing system where each job belongs to a variety distinct classes. On the assumption that the system deals with one class of the job, then the order of serving jobs is not important and also does not impress the quantities such as average queue length, the results from any simple control policies (such as first come-first service (FCFS)) and more complex control policies are the same. While jobs depend on service time then the order of serving impress on quantities such average queue length. There are many policies which minimizes an average queue length for finite multi-class M/M/1 queuing system where one can find in [3]. There are some differences between multi-classes queue rather than single class queue which, for instance, are (1) the frequently of the job arrival of some classes is more than some other classes, (2) the service time of some classes is longer than other classes.
Due to the optimal policy for the considered model, we give priority to classes according to an average service time which means job classes with shorter average service times have a higher priority than job classes with longer average service times. Also, we consider preempting in the service where it means we temporarily stop serving a low priority job when a job of higher priority arrives. This means we consider the difference between classes, need statistically information of serving all classes, and allow to preempt jobs in the service if it is needed. Due to sub-optimal control policy, let consider the set of policies that server serves all jobs in the queue without idling and taking rest. We call these theme non-idling policies. We denote that there is a factor which contains only the service rates when it bounds the queue lengths. The obtained bound, for almost the same service rates for all classes, produces quantifying the inherent sense where the control policy affects just a little bit on the queue length.
-
1.1. Notation
Let consider discrete-time Markov decision processes where for such a system we consider Y be a measurable general state space with respect to some given σ -field B(Y ) , a finite set of available actions A at each time slot, and measurable cost function z : Y x A ^ R . At state y eY , action a E A is the cause of specified cost.
Let consider stochastic kernel p to evolute the state. The kernel is p : Y xY xA ^ [0,1] and its definition is presented by
p(B,y,a) = /V^V , E B
Y t = y,A t = a|, V y E Y , a E A , B EB(B).
In addition, p(B,.,.) : Y x U ^ [0,1] is a measurable function for each B E B(B). In the supposed system, we considered the performance of systems subject to the static state-feedback policies. We consider p : Y ^ A to be a measurable function that depends on the running system state, it chooses the action in each time slot. Let define a set of all measurable policies p by set Q = {p : Y ^ A|p is measurable}.
The state evolution is a random process which is based on policy ρ ∈ Q , is time-homogeneous Markov chain (Y 0 ,Y 1 ,...) . Stochastic kernel p(B,y,p(y)) specifies the transition probability for the Markov chain.
To convenience
1 E E^Y k .?№))
and to ease of use of notation we define EC
Y o = y] and Ec = 1 E E|"c(Y k ) J k=o L
Y o = y and use Eg to
show the expectation of g (Y +1 ) on condition Y t = y . Now we consider the following
performance function for a system under specific policy ρ ∈ Q in terms of ACPP
J (p) = limsup EC, t→∞ where we can define
J opt = inf f lim inf EC } . ρ ∈ Q t →∞
Finding and developing tools to determine policies which achieve ACPP J (p) close to J opt , is our goal.
2. Bounds of Markov Chain without Control
-
Y. Wang and S. Boyd in [8] showed how to compute performance bounds for finite-horizon stochastic control problems with linear system dynamics and arbitrary constraints, objective, noise distribution, and with asymmetric costs and constraint sets. Here we present a methodology to find an approach to determine bounds on ACPP. In this order and before presenting a lower bound on ACPP, first we try to find the bounds for Markov chain without control (or with a given state-feedback control) and then we develop this approach to present a lower bound on ACPP for any policy and then we investigate the
difference between the optimality and the given policy with using an upper bound and a lower bound on the cost by a given policy and by any policy, represently. Here we consider
Theorem 1. Suppose g : Y ^ R is a measurable function. We suppose C (y) = c(y) +
Eg - g(y ) and define
a u = sup y ∈ Y
{ C (y)}.
al = inf l y∈Y
{ C (y)}.
If there exists an e > 0 such that sup y∈Y
{ E [| g(Y' t +1 ) | 1+'
Y t = yl - | g(y) | 1+ 'j < TO ,
then for all y ∈ Y, au > lim sup Ec, au < lim inf Ec. t→∞ t→∞
This theorem is the main result of this paper and we use it to determine upper and lower bounds on the average cost incurred by Markov chains with general measurable state spaces. But before we prove Theorem 1, we express the existence of all expectations that we need.
Lemma 1. Suppose g : Y ^ R is a measurable function, consider sup C(y) < to, inf C(y) > -to, y∈Y y∈Y and there exists an e > 0 such that suP EE [|g(Y+1)| y∈Y
Y t = yj - | g(y) | 1+ £} < to,
then for all y ∈ Y
E | g(Y t +1 ) l 1+
E^c(Y k )
Y o = y < to,
Yo = yj
< ∞ .
Proof. Suppose that there is M such that (1) equals to M , so the immediate result is
E [| g(Y' t +1 ) | 1+'
With clarity E [ | g(Y 0 ) | 1+e | Y 0 E [ | g(Y t ) | 1+e | Y o = y ] < to , then
Y t = yl < M + | g(y) | 1+* . V y e Y .
y ] < to . Also, using induction, if for some t ,
E | g(Y t +1 ) | 1+e Y o = y < M + E | g(Y t ) | 1+e Y o = y < to.
and as a result,
e[ I g(Y t ) 1 1+
Y o = yl < TO .
Also from the fact that g(y ) < 1 + | g(y) | 1+e ,
E g(Y t ) Y o = y < 1 + E | g(Y t ) |
Y o = y < to, V y G Y .
And also, since — g(y) > 1 + | g(y) | 1+e ,
Now if
then
—to < — 1 — E |j| g(Yt) |
Y o = y < E g ( Y t )
Y o = y , V y G Y .
sup C (y) < a u < to, y ∈ Y
c(y) < a u + g(y) —
E g(Y t+ i ) Y t = y , Vy G Y ,
where | E[g(Y t +i ) | Y t = y ] | = | E[g(Y i ) | Y o = y] | < to . Accordingly,
E[c(Y)
In the similar way, if
then
Accordingly
Y o = yj < to, Vt and Vy G Y .
inf |C(y)| > a i > —то , y ∈ Y
c(y) > a l + g(y) —
E g(Y t+i ) Y t = y , Vy G Y .
—to < E |jc(Y t )
Y o = y , V t and V y G Y .
Now it is clear that (3) is the result of combining of (4), (5).
□
Now we can prove theorem 1.
Proof. [Proof of Theorem 1] By the definition of αu we have t-i au > ^E IkC C(Yk)
Yo = yj =1 kk ^J E [cYk )
Y o =y]+"t(E [g(Y t )
Y o = yj - g(y/),
which means
t-i tYE c(yk)
T k=o L
In the similar way,
t - i
Y o = yj < a u +1 (g(y) - E [g(Y t ) | Y o = y f).
1 t 1 Г "11
t 52 E c(Y k ) Y o = y > a i + t
k=o
g ( y ) - E [ g ( Y t ) | Y o = y ] .
Now if we show that lim t i ^ the proof.
1 E^g (Y t )
Yo = yj
= 0 , for all y E Y , then we have completed
Suppose that the supreme of E |j| g(Y t+1 ) |
M where M < ∞ . So
M≥
tE Ц ( e [ |s(H +1 ) | 1+'
t^E [| g(Y t ) I
Y 0 = y
Y t = yj - | g(y) |
I П] - | g(Y k ) l 1+'
- | y(y) | 1+ ')
1+e for all y E Y is equal
)lY=y] =
with some algebra we have that. Also we know that
( | g(y) | 1+* + tM ) 1+ - < | s(y) | + (tM ) ' - ,
therefore (6) and (7) imply that
E [| g (Y t ) |
Y 0 = y
< | g(y W + UM ) *.
By taking limsup as t approaches to, from both sides of (8) we have lim sup 1 E |g(Yt) | Yo = y ti^ t
< tnim t (|g(y)| + (tM)1+-) = 0, and so lim -E t→∞ t
g(Y t ) Y 0 = y
= 0, V y E Y .
□
3. Bounds with Control
We used Theorem 1 to provide bounds on ACPP that acquired by Markov chains. These bounds are established in the general measurable state spaces of the Markov chains. In this section we extend the result to establish a lower bound on ACPP acquired by any policy to bound the difference between J (p) and J opt for some specific p .
Lemma 2. Suppose h : Y ^ R is a measurable function, consider al be the infimum of the {c(Y, a) + E[g(Yt+1)|Yt = y, At = a] - g(y)} for all y E Y and a E A. And for any static state-feedback control policy p : Y ^ A such that sup {E[|g(Yt+1)|1+e|Yt = y,At = p(y)] - |g(y)|1+e} < to, y∈Y has ACPP where satisfies
1 t — 1
a i < lim inf - ^E [c(Y k , p(Y k )) | Y 0 = y ], V y E Y .
t→∞ t k=0
Proof. For any state feedback policy ρ ∈ P ai = inf /c(y,a) + E[g(Yt+i)|Yt = y,At = a] - g(У)} < y∈Y,a∈A
< inf { c(y,p(y)) + E[g(Y t +i ) | Y t = y,A t = p(y )] - g(y ) } .
y ∈ Y
Now by applying Theorem 1 the proof is done.
□
4. Control Policy of Multi-Class Queue
The considered queuing system is a discrete time model. Let us consider in the each time slot t , we have W ti G { 0,1 } arrival jobs in class i to the queue. Also let’s put restriction to have at most one arrival in each time slot (we can consider a small time interval to have it), which means E W ti < 1 . Let consider W t = (W t 1 ,...,W tN ) denotes the arrival vector.
i
Also consider that any vectors W t and W t ‘ are i.i.d for any individual time slots, t ‘ = t . We consider A i = E [W ti ] , where it is independent from t . Let Yt i denote the number of jobs in class i at given time t , and consider Y t = (Y t1 ,..., Y tN ) as a state vector. Also we characterize control sequence U i in each time slot t such that
A i =
-
1 , if a job of class i is being serviced , 0, otherwise .
A service, of a job of class i that is being served in a given time slot, will be completed with probability θ i where this θ i is independent from the service history. Also, let’s define the number of departure jobs of class i in time slot t (those are served successfully) as random variable D i = A i I (Y^B i . In this random variable, parameter B i briefs a Bernoulli random variable where E [B i ] = 9 i and I is the indicator function,
I (y) = { 0
if y = 0, otherwise.
We have the form of the queue length dynamic according to Y ti+1 = Y ti + W t — D t for each i G { 1,...,N } .
This problem is a Markov decision process with ACPP criteria since we aimed to choose how to serve different job classes to minimize an average queue length. The state space, for the considered model, is Y = Z ^ , where the actions are chosen from the A = {a G { 0,1 } N | E a i = 1У
In each time slot t , the cost gives the number of jobs of all classes where it represents
N by c(Yt) = E Yi. Let p : Y ^ A be a policy, and consider sequence of (Y0, Y1,...) for the i=1
queue length process supporting by this policy. Then the average queue length supporting by policy ρ is
1 t 1
J (p) = limsup- V^E [c(Y k ) |Y q = 0].
t^ t k =0
The control policy affects on the average queue length where it is the problem that we want to consider. Stability is a measure that under, the bounded queue lengths exists for any non-idling policy. We call a policy is non-idling if y = 0, then we have y. > 0 when p(y) = a.. Lemma 3 is for the cases that the system is not stabilized under a non-idling policy and since the result is standard (see, [16]), we omitted the proof.
N
Lemma 3. On condition that ^i > 1, we do not have any non-idling policy with bounded i=1 1
average queue length.
Theorem 2. Consider that Q NI be the set of all policies that are non-idling one and let p E Q NI be an arbitrary one, and also let
J opt = inf ρ ∈ Q NI
{liminf EC } . t →∞
N
If 52 i i < 1, then J (p) < to and we have i =1 i
J (p)
J opt
<
max { 0 . }
i min{0.} i
Proof. In order to have a lower bound for J opt , we define
/ / N \ 2 N
»
where the coefficients K 1 and K 2 are
K i = -
min { 0 . } i
(1 - .s
■ K ('2E ;)•
Let A l (y, a) = E [g l (Y t+1 ) | Y t = y, A t = a] — g l (y) . For all y = 0 , action a which minimizes
z(y) + A i (y,a) has
E [E D« = y-A t = a]
= 1.
Therefore,
N mm{z(y) + Al(y, a)} = V a∈A i=1
min { 0 j- }\
j
1--7---- y . + a min{0 j }.
θ i j
In order to have an upper bound on J ρ , we use following function,
N2 N gu(y) = K3 I E yi + k2 E yi
. =1 . =1
where constant K 3 is
К з =
max { 0 i }
i
-
2 (1 - £ ")
Let consider A u (y) = E [g u (Y t +i ) | Y t = y ] — g u (y ) . With using the dynamic of queue length Y +1 = Y' + W i — D i we have
N
z(y) + A u (y) = ^
i =1
1 —
max { 0 j-}
j
θ i
y i + a max £e } ,
where
N N2
E 1 + i —2 E i=1 i=1
-
2 (1 — i )
To finish our job in this proof, we need to show that for any policy p E Qni which is non-idling we have, sup{E[gu(Yt+i)2|Yt = y,At = p(y)] — gu(y)2} < to, y∈Y and sup{E[gi(Yt+i)2|Yt = y, At = p(y)] — gi(y)2} < to. y∈Y
Functions g u and g l have the following form
g(x) = К (x 2 + K 2 x),
N where in the above equation, variable x = 52 yi. Now by squaring g we have i=1 i
g(x) 2 = К 2 (x 4 + 2k 2 x 3 + k 2 x 2 ).
The expected drift for any non-idling policy, E [g(Y t+1 ) 2 | Y = y, A t = p(y)] — g(y ) 2 is a polynomial of degree third where its third order term is equal to
4“ 6 Si ') ffi yi)’
It is obviously negative for all y = 0 when the system is stable, where it means, the expected drift in g 2 has upper bound for all policies in Q NI . Hence, bounds α u and α l are well-founded and satisfy
J ( p ) < a u =ma^ i }
J opt "" a i min { 0 i } .
i
□
The point of this job is that we did not consider simply the maximum and minimum service rates of queues to find the lower and upper bound of the average queue length of the multi-class queue, respectively. Although these type approaches, that consider the minimum and maximum service rates of queues give the bounds but they can be made a large difference between these bounds for the given service rates. In fact, the bounds by provided in Theorem 2 is tighter than the bound obtaining from considering minimum and maximum service rate.
Conclusion
In this paper, we considered a queuing problem which is Markov decision process in the general state space and we described the method for computing bounds on the costs in such processes with average cost per period. Our method naturally yields the factor that can be used at the problem of controlling of a multi-class queue to find a bound on. The bound that we found is a relation between the average queue length acquired by any policies and acquired by an optimal policy where this bound is totally different with the bound that is obtained by applying minimum and maximum service rate in queues which serve multi-class jobs.
Список литературы Performance bounds and suboptimal policies for multi-class queue
- Atar R., Mandelbaum A., Reiman M.I. Schesuling a Multi-Class Queue with Many Exponentioal Servers: Asymptotic Optimality in Heavy Traffic. The Annals of Applied Probability, 2004, vol. 14, no. 3, pp. 1084-1134. DOI: 10.1214/105051604000000233
- Regan K., Boutilier C. Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies. Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, July, 2010, pp. 1127-1133.
- Kebarighotbi A., Cassandras C.G. Optimal Scheduling of Parallel Queues with Stochastic Flow Models: The cu-rule Revisited. IFAC Proceedings Volumes, 2011, no. 44, pp. 8223-8228.
- Shanthikumar J., Yao D. Multiclass Queueing Systems: Polymatroidal Structure and Optimal Scheduling Control. Operations Research, 1992, vol. 40, no. 2, pp. 293-299. DOI: 10.1287/opre.40.3.S293
- Meyn S., Tweedie R. Markov Chains and Stochastic Stability. London, Springer, 1993. DOI: 10.1007/978-1-4471-3267-7
- Puterman M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New Jersey, John Wiley and Sons, 2009.
- Schweitzer P.J., Seidmann A. Generalized Polynomial Approximations in Markovian Decision Processes. Journal of Mathematical Analysis and Applications, 1985, vol. 110, pp. 568-582.
- DOI: 10.1016/0022-247X(85)90317-8
- Yang Wang, Boyd S. Performance Bounds and Sub-Optimal Policies for Linear Stochastic Control via LMIs. International Journal of Robust and Nonlinear Control, 2011, no. 21, pp. 1710-1728.
- DOI: 10.1002/rnc.1665
- Osipova N., Ayesta U., Avrachenkov K. Optimal Policy for Multi-Class Scheduling in a Single Server Queue. 21st International Teletraffic Congress, 2009, p. 10951139.
- Jia Li, Zhang H.M. Bounding Queuing System Performance with Variational Theory. Transportation Research Procedia, 2015, no. 7, pp. 519-535.
- Senderovich A., Weidlich M., Gal A., Mandelbaum A. Queue Mining for Delay Prediction in Multi-Class Service Processes. Information Systems, 2015, no. 53, pp. 278-295.
- DOI: 10.1016/j.is.2015.03.010
- Huang Qing, Chakravarthy S.R. Analytical and Simulation Modeling of a Multi-Server Queue with Markovian Arrivals and Priority Services. Simulation Modelling Practice and Theory, 2012, no. 28, pp. 12-26.
- DOI: 10.1016/j.simpat.2012.05.010
- Casale G., Sansottera A., Cremonesi P. Compact Markov-Modulated Models for Multiclass Trace Fitting. European Journal of Operational Research, 2016, vol. 255, no. 3, pp. 822-833.
- DOI: 10.1016/j.ejor.2016.06.005
- Lefeber E., Lammer S., Rooda J.E. Optimal Control of a Deterministic Multiclass Queuing System For Which Several Queues Can Be Served Simultaneously. Systems and Control Letters, 2011, vol. 60, no. 7, pp. 524-529.
- DOI: 10.1016/j.sysconle.2011.04.010
- Walraevens J., Bruneel H., Fiems D., Wittevrongel S. Delay Analysis of Multiclass Queues with Correlated Train Arrivals and a Hybrid Priority/Fifo Scheduling Discipline. Applied Mathematical Modelling, 2017, no. 45, pp. 823-839.
- DOI: 10.1016/j.apm.2017.01.044
- Kleinrock L. Queueing Systems. Volume II: Computer Applications. New Jersey, John Wiley and Sons, 1976.
- Ching-Tarng Hsieh, Lam S.S. Two Classes of Performance Bounds for Closed Queueing Networks. Performance Evaluation, 1987, vol. 7, no. 1, pp. 3-30.
- DOI: 10.1016/0166-5316(87)90054-X
- Koukopoulos D., Mavronicolas M., Spirakis P. Performance and Stability Bounds for Dynamic Networks. Journal of Parallel and Distributed Computing, 2007, vol. 67, no. 4, pp. 386-399.
- DOI: 10.1016/j.jpdc.2006.11.005