In the past, programmers would find reassurance to their problems of running extremely large programs in the yearly speed up of computer processors. Every year or so, the speed of computers would double and a faster computer would be in the market which would be able to rapidly execute their large sized code. But, in today's world this is not the case, analysing the GHz speed of the processors in a desktop computer of 2 years ago, versus the speed new desktop computers present, proves that it has barely, if any, increased, this mainly due to the fact that it is very difficult to build reliable low-cost processors that run significantly faster. This is the reason why, the solution to running faster code has focused on doubling the number of processors that exist on a single chip. Actually researchers believe, that in the following years we will have systems, which present twice the number of cores with every new technology generation.
It is important to note, that these multi-processors have laid the road for using parallel computing, in which a large program can be divided into smaller programs, each of which is then assigned to a processor with shared or independent resources. Parallelism is what is generally used today, for providing Performance improvement.
This approach although generally highly functional, has shown in some cases to degrade the performance considerably! The problem is that the scheduling of parallel jobs is a very complicated task which is highly dependent on a series of different factors: the workload, the blocking algorithm being utilised, the local operating system, the parallel programming language and the machine architecture. Expert humans are who tend to make the design specifications for these highly complicated tasks,and as a result they tend to be somewhat rigid and unsophisticated .
Because of this, machine learning techniques have in recent years provided a solution to this problem. Machine learning is a field which intends to build computer systems that automatically improve with experience. Researchers have been applying machine learning algorithms to problems of resource allocation, scheduling, load balancing, and design space exploration, among other things.
Such is the work done, in Cost-Aware Parallel Workload Allocation Approach based on Machine Learning Techniques. Here the authors tackle the problem of finding adequate workload allocations in a cost-aware manner, by learning from training examples how to allocate parallel workload among Java threads.
Given a program, their system computes its feature vectors, and utilising a nearest neighbour approach finds from the training examples, the best parallel scheme for this new program.
One may initially wonder, what type of training examples were utilised for this problem and how were they generated?
The training examples came from a series of programs coded in java, which presented different for loops. From each for loop its corresponding feature vector was calculated along with its associated label, this conformed each training example. In this case, the feature vector corresponded simply to the workload the for loop presented, and the label to the optimal number of threads that should be utilised with that specific workload.
The programs which were utilised for the training examples were manually selected, each had the purpose of bringing a certain workload variety to the training pool.The labels were set by an automatic program, which tested each workload (loop description) with a different number of threads, and then calculated what was the optimal thread number required for that specific workload to achieve optimal performance. It is important to note, that in this approach the computation cost of calculating the feature vectors was diminished by calculating an implicit estimate of the workload, the features which conformed the workload were: 1) loop depth; 2) loop size; 3) number of arrays used; 4) number of statements within the loop body, and 5) number of array references.
Since not all program features play an equal role in workload estimation different weights were assigned to different features during classification, with higher weights given to feature 1), 2) and 4). Within the paper it was not clearly explained how the values of these weights were assigned or what their values were. It might have been adequate to also utilise a learning algorithm which was capable of finding the most adequate weights given a certain training example, because it might be the case that under certain conditions a feature might be weaker for classification than an other, and therefore other weights need to be utilised. A broader explanation on the weight manner could have provided more insight and restricted this speculation, but it is interesting to ponder none the less.
On the other hand, in this example the authors opted for a supervised learning approach, where each training example that was handed to the system was manually selected and labelled. This is clearly a tedious task to do, and at times may not be the most optimal, since manually finding which examples provide more information for the learning process in comparison to other possible training examples is difficult and non-trivial. One therefore wonders if an unsupervised learning algorithm could have provided better results. In this type of approach, the machine can be "thrown into the wild" and through observations discover previously unknown structures or relationships between instances or their components, this could eliminate the problems mentioned previously, but has the shortcoming that if the learning phase is done online the algorithm might take much longer than if an instance based learning approach had been taken.
Furthermore it does not seem that their approach accounts for any long-term consequences. Each decision within a for loop was done independently of what had been decided for the other for loops within the program, this might mean, the decision to use X amount of threads for that workload might be locally optimal but not globally optimal, this could in the long run deteriorate the performance. This situation seems to suggest that for this problem instead of using instant based learning, a better approach would have been to use Reinforcement Learning. In Reinforcement Learning, the machine is not told which actions to take, but rather must infer them, by analysing what yields the best reward. The following figure presents an overview of how reinforcement learning works
For this particular case, if they authors had used this other learning method, it could have been possible for the machine to analyse the program as a whole, and decide then what the best long-term thread allocation for each workload would be. In specific, the machine would have interacted with the "environment" (in this case the supplied java program) over a discrete set of time steps. In each step the machine would have "sensed" the environment's current state (which would match the number of threads being used in each for loop) and executed an action(an action would correspond to assigning or removing more threads to certain for loops). This action would modify the environment’s state (which the machine could sense in the next time step) and produced an immediate reward (The reward would be the overall performance obtained for that particular thread assignation).
The machine’s objective would be to maximise its long-term cumulative reward by learning an optimal policy that maps states to actions.
In its most basic form, Reinforcement Learning brings a knowledge-free trial-and-error methodology in which the machine intents various actions in numerous system states, and learns from the consequences of each action.
From this, it is clear that the advantage of using this learning method is that no explicit model of either the computing system being managed or of the external process that generates workload or traffic are necessary.Additionally, Reinforcement Learning is capable of treating dynamical phenomena in the environment, as mentioned before, it can analyse how current decisions may have delayed consequences in both future rewards and future observed states.
Now, while this can sound very promising, it is necessary to also take into consideration, the challenges which Reinforcement Learning faces in the real world. Firstly, Reinforcement Learning can suffer from poor scalability in large state spaces, furthermore in times the performance obtained during online training can be below average, due to the lack of domain knowledge or good heuristics. In addition, because reinforcement learning procedures need to include "exploration" of actions, the selection of actions can be exceedingly costly to implement in a live system. This is the reason why, many modern applications that utilise reinforcement learning in order to address the above practical limitations,take a hybrid approach. Such an example, is the work done in A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation. Here the authors propose for the machine to have an offline training phase. They suggest that given enough training examples which follow a certain optimisation policy , the learner (machine) using reinforcement learning will be able to converge to the correct value function, it will be able to find a new policy which greedily maximises the value function and is able to improve the original policy that was given. In this form, the poor performance that is obtained by using live online training is avoided. Another benefit of their method is that multiple iterations can be done: Through training a new policy, which is the improved version of the original policy, is obtained. This improved policy can then be feed into the system again, acting as the original non-optimal policy, with this second policy a second data set is collected, which can then be used to train a further improved policy. This enables the possibility of running the algorithm iteratively till a desired "reward" is obtained.
It was mentioned before that reinforcement learning, presents the problem of having expensive exploration of actions, the authors tackled this problem by replacing the generally used lookup table for representing the value function with a nonlinear function approximator, in particular a neural network. A function approximator provides a solution to the mentioned issue, because it is mechanism for generalising training experience across states, therefore it is no longer necessary to visit every state in the state space. It also allows for generalisation across actions, so that the need for exploratory off-policy actions is also greatly reduced.
Their hybrid Reinforcement Learning approach was tested on realistic prototype Data Center, which dynamically allocates servers among multiple web applications so as to maximise the expected sum of SLA (service level agreement) payments in each application.
Although their proposed solution resolves most of the problems encountered with reinforcement learning, we can observe an aspect of their work, that might call for improvement: In their algorithm with each iteration, the model of the system is modified. They always assume the model "learned" from the use of a certain set of policies can never be applied to a set conformed of other policies. The authors never explored if this is always the case, could a model learned with certain policies still be valid under other policies which hold a degree of similarity to the original policies, or is it always necessary to learn from scratch the model, as a result of changes to an active set of policies?
Additionally, the authors utilised a neural network for finding the states to explore, and although this did solve the exploratory problem mentioned before, because the neural network has hidden states it is not possible to determine beforehand exactly how many states will be explored given the current used policies, knowing beforehand this number could improve computational costs as better planning can be done. In the work done in "An Adaptive Reinforcement Learning Approach to Policy-driven Automatic Management", the authors addressed this problems and show how a Reinforcement Learning Model can be adapted to accommodate this.
The authors analysed how previously learned information about the use of policies can be effectively used in a new scenario. For this, they consider policy modifications as well as the amount of time used to form the model before the changes. Similarly to the work in Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation, a state transition model, which uses a set of active expectation policies is defined, but in difference to Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation, instead of using a neural network, the authors capture the management system's behaviour through a state-transition graph, what their system is lacking and could be beneficial in the future is mapping directly how changes in policies effect the state-transition models