The activity selection problem is a combinatorial optimization problem concerning the selection of non-conflicting activities to perform within a given time frame, given a set of activities each marked by a start time (si) and finish time (fi). The problem is to select the maximum number of activities that can be performed by a single person or machine, assuming that a person can only work on a single activity at a time. The activity selection problem is also known as the Interval scheduling maximization problem (ISMP), which is a special type of the more general Interval Scheduling problem.
A classic application of this problem is in scheduling a room for multiple competing events, each having its own time requirements (start and end time), and many more arise within the framework of operations research.
Formal definition
Assume there exist n activities with each of them being represented by a start time si and finish time fi. Two activities i and j are said to be non-conflicting if si ≥ fj or sj ≥ fi. The activity selection problem consists in finding the maximal solution set (S) of non-conflicting activities, or more precisely there must exist no solution set S' such that |S'| > |S| in the case that multiple maximal solutions have equal sizes.
Optimal solution
The activity selection problem is notable in that using a greedy algorithm to find a solution will always result in an optimal solution. A pseudocode sketch of the iterative version of the algorithm and a proof of the optimality of its result are included below.
Algorithm
Greedy-Iterative-Activity-Selector(A, s, f): Sort A by finish times stored in f S = {A[1]} k = 1 n = A.length for i = 2 to n: if s[i] ≥ f[k]: S = S U {A[i]} k = i return SExplanation
Line 1: This algorithm is called Greedy-Iterative-Activity-Selector, because it is first of all a greedy algorithm, and then it is iterative. There's also a recursive version of this greedy algorithm.
- A {\displaystyle A} is an array containing the activities.
- s {\displaystyle s} is an array containing the start times of the activities in A {\displaystyle A} .
- f {\displaystyle f} is an array containing the finish times of the activities in A {\displaystyle A} .
Note that these arrays are indexed starting from 1 up to the length of the corresponding array.
Line 3: Sorts in increasing order of finish times the array of activities A {\displaystyle A} by using the finish times stored in the array f {\displaystyle f} . This operation can be done in O ( n ⋅ log n ) {\displaystyle O(n\cdot \log n)} time, using for example merge sort, heap sort, or quick sort algorithms.
Line 4: Creates a set S {\displaystyle S} to store the selected activities, and initialises it with the activity A [ 1 ] {\displaystyle A[1]} that has the earliest finish time.
Line 5: Creates a variable k {\displaystyle k} that keeps track of the index of the last selected activity.
Line 9: Starts iterating from the second element of that array A {\displaystyle A} up to its last element.
Lines 10,11: If the start time s [ i ] {\displaystyle s[i]} of the i t h {\displaystyle ith} activity ( A [ i ] {\displaystyle A[i]} ) is greater or equal to the finish time f [ k ] {\displaystyle f[k]} of the last selected activity ( A [ k ] {\displaystyle A[k]} ), then A [ i ] {\displaystyle A[i]} is compatible to the selected activities in the set S {\displaystyle S} , and thus it can be added to S {\displaystyle S} .
Line 12: The index of the last selected activity is updated to the just added activity A [ i ] {\displaystyle A[i]} .
Proof of optimality
Let S = { 1 , 2 , … , n } {\displaystyle S=\{1,2,\ldots ,n\}} be the set of activities ordered by finish time. Assume that A ⊆ S {\displaystyle A\subseteq S} is an optimal solution, also ordered by finish time; and that the index of the first activity in A is k ≠ 1 {\displaystyle k\neq 1} , i.e., this optimal solution does not start with the greedy choice. We will show that B = ( A ∖ { k } ) ∪ { 1 } {\displaystyle B=(A\setminus \{k\})\cup \{1\}} , which begins with the greedy choice (activity 1), is another optimal solution. Since f 1 ≤ f k {\displaystyle f_{1}\leq f_{k}} , and the activities in A are disjoint by definition, the activities in B are also disjoint. Since B has the same number of activities as A, that is, | A | = | B | {\displaystyle |A|=|B|} , B is also optimal.
Once the greedy choice is made, the problem reduces to finding an optimal solution for the subproblem. If A is an optimal solution to the original problem S containing the greedy choice, then A ′ = A ∖ { 1 } {\displaystyle A^{\prime }=A\setminus \{1\}} is an optimal solution to the activity-selection problem S ′ = { i ∈ S : s i ≥ f 1 } {\displaystyle S'=\{i\in S:s_{i}\geq f_{1}\}} .
Why? If this were not the case, pick a solution B′ to S′ with more activities than A′ containing the greedy choice for S′. Then, adding 1 to B′ would yield a feasible solution B to S with more activities than A, contradicting the optimality.
Weighted activity selection problem
The generalized version of the activity selection problem involves selecting an optimal set of non-overlapping activities such that the total weight is maximized. Unlike the unweighted version, there is no greedy solution to the weighted activity selection problem. However, a dynamic programming solution can readily be formed using the following approach:1
Consider an optimal solution containing activity k. We now have non-overlapping activities on the left and right of k. We can recursively find solutions for these two sets because of optimal sub-structure. As we don't know k, we can try each of the activities. This approach leads to an O ( n 3 ) {\displaystyle O(n^{3})} solution. This can be optimized further considering that for each set of activities in ( i , j ) {\displaystyle (i,j)} , we can find the optimal solution if we had known the solution for ( i , t ) {\displaystyle (i,t)} , where t is the last non-overlapping interval with j in ( i , j ) {\displaystyle (i,j)} . This yields an O ( n 2 ) {\displaystyle O(n^{2})} solution. This can be further optimized considering the fact that we do not need to consider all ranges ( i , j ) {\displaystyle (i,j)} but instead just ( 1 , j ) {\displaystyle (1,j)} . The following algorithm thus yields an O ( n log n ) {\displaystyle O(n\log n)} solution:
Weighted-Activity-Selection(S): // S = list of activities sort S by finish time opt[0] = 0 // opt[j] represents optimal solution (sum of weights of selected activities) for S[1,2..,j] for i = 1 to n: t = binary search to find activity with finish time <= start time for i // if there are more than one such activities, choose the one with last finish time opt[i] = MAX(opt[i-1], opt[t] + w(i)) return opt[n]External links
References
Dynamic Programming with introduction to Weighted Activity Selection http://www.cs.princeton.edu/~wayne/cs423/lectures/dynamic-programming-4up.pdf ↩