## What will we cover in this tutorial?

The Knapsack Problem is defined as follows.

Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

https://en.wikipedia.org/wiki/Knapsack_problem

In this tutorial we will give an efficient algorithm, the greedy approximation algorithm, which gives at least 1/2 of the optimal solution.

## Step 1: Understand the Knapsack Problem

Imagine you have 3 items. Item 1 has weight 2 kg with value $20. The next item, item 2, has weight 1 kg with value $15. Finally, item 3 has weight 3 kg, but value $35.

The challenge is that you can only bring 4 kg. How to get the most value with you?

You can take item 1 and item 2, then you have 3 kg and for $45 of value. You cannot take item 3, as that will add up to 6 kg, which is too much.

On the other hand, you could take item 2 and item 3, then you have 4 kg in total and for $50.

The Knapsack Problem is to determine the most valuable items to take within some bound of weight.

Now this seems easy to solve, or does it? Actually, the Knapsack Problem is NP-complete, which means it is difficult to solve.

## Step 2: A greedy approximation algorithm to solve it efficiently

To get the optimal solution is difficult, but there exists efficient approximations that are easy to implement.

Let’s take a look at the greedy approximation algorithm.

```
def greedy_algorithm(w, v, W):
ordered = []
for vi, wi in zip(v, w):
ordered.append((vi/wi, vi, wi))
ordered.sort(reverse=True)
X = []
weight = 0
val = 0
for _, vi, wi in ordered:
if weight + wi <= W:
X.append((vi, wi))
weight += wi
val += vi
return X, val
```

The above algorithm is not final. But let’s start by understanding what it does.

First it orders the items based on the highest value per weight. Then it simply adds the items along in that order.

Hence if we have the items from step 1.

Then we would have them ordered, Item 2 (15 $/kg), Item 3 (11.67 $/kg), and Item 1 (10 $/kg). Hence, the algorithm would add Item 2 first, as it gives the most value per kg. Then it adds Item 3, as this is the item left, which adds the most value per kg. Then there is no more to add.

Why is this optimal? Well you add the items in the order of what adds the most value.

## Step 3: Modifying the greedy approximation to give at least 1/2 of the optimal solution

In the example from Step 2, it all turned out fine.

But we have a problem.

Imagine the items. Item 1′ of value $15 and 1 kg. Then Item 2′ of value $40 and 4 kg. We still assume we can bring 4 kg.

The above algorithm would order it by Item 1′ (15 $/kg) and Item 2′ (10 $/kg).

That would only result in taking Item 1′ of 1 kg and value $20. Because the next item, Item 2′, weighs too much to bring along.

The optimal solution is taking Item 2′ of $40 of value.

The algorithm from previous step can be modified a bit to ensure at least 1/2 of value of the optimal solution. Notice, that the above example does not even give that.

Let’s examine the following modification.

```
def modified_greedy_algorithm(w, v, W):
ordered = []
for vi, wi in zip(v, w):
ordered.append((vi/wi, vi, wi))
ordered.sort(reverse=True)
S1 = []
weight = 0
val = 0
for _, vi, wi in ordered:
if weight + wi <= W:
S1.append((vi, wi))
weight += wi
val += vi
else:
S2 = [(vi, wi)]
val2 = vi
if val > val2:
return S1, val
else:
return S2, val2
```

It is almost identical. Except that when it reaches the first item, which does not fit in, it checks if that item alone gives higher value than the already added items.

Why does that give at least the 1/2 of the optimal solution.

Well, if we could take partial items, then the optimal value would be given by the following. Let **k** be the first item that does not fit in. Then items **1, 2, …, k – 1**, and a portion of item **k** be the optimal solution. Let that portion be given by * alpha. *Notice, that

*is great or equal to 0 end less and equal to 1.*

**alpha**Hence an upper bound of the optimal value is given by **v_1 + v_2 + … + v_(k-1) + alpha*v_k = V_max_bound**.

Hence, we have that items **1, 2, …, k – 1** or item **k** alone, must be at least half of **V_max_bound**. Hence, it shows that the above algorithm gives at least half of the optimal solution.