What does Wikipedia say about Queueing Theory: According to Wikipedia Queueing Theory can be defined as, “The mathematical study of waiting lines, or queues. In queueing theory a model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service. Queueing theory has its origins in research by Agner Krarup Erlang when he created models to describe the Copenhagen telephone exchange. The ideas have since seen applications including telecommunication, traffic engineering, computing and the design of factories, shops, offices and hospitals.
Agner Krarup Erlang, a Danish engineer who worked for the Copenhagen Telephone Exchange, published the first paper on what would now be called queueing theory in 1909. He modeled the number of telephone calls arriving at an exchange by a Poisson process and solved the M/D/1 queue in 1917 and M/D/k queueing model in 1920. In Kendall’s notation:
- M stands for Markov or memoryless and means arrivals occur according to a Poisson process
- D stands for deterministic and means jobs arriving at the queue require a fixed amount of service
- k describes the number of servers at the queueing node (k = 1, 2,…). If there are more jobs at the node than there are servers then jobs will queue and wait for service“
You can read more about the basics of Queueing Theory at Wikipedia.
Why Do We Need Queueing Theory: As any system gets congested or loaded a queue of sorts starts to build up. This build up of congestion could apply to components within the system and also the system as a whole with queueing building up externally. A good understanding of the service times, the queueing times and the relationship of workload (increase and decrease) is essential to understanding the principles underpinning Queueing Theory. Queueing is part of our every day life i.e. from queueing at ATMs to withdraw money to waiting to be served at the counter when purchasing your cup of freshly brewed coffee every morning. We don’t generally realize how life around us plays out as a system of systems with Queueing Theory applicable to most of them.
Lets now focus on compute infrastructure which is a very common application of Queueing Theory for Systems Performance Engineers. There are various reasons why you need to understand the basics of Queueing Theory here.
- Like every system any information technology system has limited resources
- Every system provides certain functionality and does so with a given set of service times
- As the workload processed by the system goes up the utilization levels of the system go up and queueing at the various resources gradually starts to build up
- An increase in queueing does not change the service time for the system but rather increases the amount of wait time at a given resource
- The increase in workload causes building up of queues outside the system and potentially at various key resource blocks across the system (depending on how the system is designed)
- A perceived increase in response time is noted. This includes a constant “Service Time” and a variable “Wait Time”
So What Do We Do With All This Queueing Models: Thus as you have noted above an understanding of Queueing Theory helps Systems Performance Engineers understand how an increase in user workload within a system can cause an increase in customer wait times and hence overall perceived response times. Armed with this information the System Performance Engineer can do a few things. He or she can:
- Tune the system bottlenecks so as to increase transactional throughput and hence delay the impact of queues
- Impact the system configuration by increasing the amount of compute resources available for processing. This again will change the behavior of the system
- Impact the way the users access the system and prioritize user requests. This allows for processing of higher prioritized user requests first followed by the lower priority ones.
- Provide a pool of resources for the system to consume so that an increase in workload triggers an increase in compute resources and avoids building up of queues
There are various ways a Performance Engineer could impact the design of a system with a solid understanding of queues.
Kendall’s Notation for Queueing Theory: Before we look into the various Queueing Theory equations let us go over the notation system designed by Kendall to describe a Queueing system.
Let us denote a system by A / B / m / K / n/ D, where
- A: distribution function of the inter-arrival times,
- B: distribution function of the service times,
- m: number of servers,
- K: capacity of the system, the maximum number of customers in the system including the one being serviced,
- n: population size, number of sources of customers,
- D: service discipline.
Exponentially distributed random variables are notated by M, meaning Markovain or
memoryless. Furthermore, if the population size and the capacity is infinite, the service discipline is FIFO (First In First Out), then they are omitted.
Hence M/M/1 denotes a system with Poisson arrivals, exponentially distributed service times and a single server. M/G/m denotes an m-server system with Poisson arrivals and generally distributed service times. M/M/r/K/n stands for a system where the customers arrive from a finite-source with n elements where they stay for an exponentially distributed time, the service times are exponentially distributed, the service is carried
out according to the request’s arrival by r severs, and the system capacity is K.
Kendall’s equations are generally used to describe Queueing Systems making it easy to convey the system configuration using a single equation.
Queueing Theory Equations: Here is a summary of the various key Queueing Theory equations:
- U = (St*X)/M …[U = System Utilization, St = Service Time, X = Throughput, M = No Of Servers/CPU’s]
- Ql = λ Q … [Q = Queueing Time, Ql= Queue Length, λ = Arrival Rate]
- Rt-cpu (For CPU Systems) = St / [1-Um] …[Rt-cpu=CPU Response Time, St=Service Time, U = System Utilization, M = No Of Servers/CPU’s]
- Rt-io (For IO Systems) = St / [1-U] …[Rt-io=IO Response Time, St=Service Time, U = System Utilization]
- Rt = St + Qt … [ Rt = Response Time, St=Service Time]
- λq = λsys / Qn … [ λq = Arrival Rate Per Queue, λsys = Total Arrival Rate for the System, Qn = Number of Queues ]
- Ec = ErlangC (m, St, λq) = [ (M St λq)m / M! ] / [ (1 – m St) SigmaK=0M=1 [ ((m St λq)K / K! ) + ] ((m St λq)M / M! ) ]
The above equations have probably made your head spin a bit and I wouldn’t blame you for that. Not being a mathematician myself it took me a while to piece together the whole picture and work out how to best use the above equations to model performance of the system for increases in workload. For those who want to avoid creating complex models using error prone excel see the option provided below.
Modeling Systems & Forecasting Performance : To teach yourself the concepts of Performance Modeling & to experience how easy Forecasting System Performance could be, please visit VisualizeIT.