As Data Scientists we become acquainted with the concept of optimization very early in our careers. Optimization lies at the heart of every machine learning model. But our relationship with optimization goes way back; we’ve been [unknowingly] solving optimization problems since before we can remember:
- The fastest way to get to work
- Organizing our budget to get the most out of it
- Planning workouts for maximum impact in the least amount of time
- Doing meal-prep every Sunday
- Packing a suitcase for a long vacation
These are just a few examples of everyday optimization. Optimization is the way of life. It can be as simple as the examples I just mentioned, or as complex as The Traveling Salesman Problem.
Mathematical Optimization is the problem of selecting the best alternative, given some constraints, from a set of available alternatives. An optimization problem has three basic components: (1) an objective function, which we either want to maximize or minimize; (2) a set of variables which control the objective function; (3) a set of constraints that allows the variables to take certain values while excluding others.
Linear Programming (LP) is one of the major subfields of mathematical optimization. It is also one of the simplest approaches to solving an optimization problem.
What is Linear Programming?
Let me start by saying that programming in this context does not refer to computer programming. Linear Programming — a.k.a Linear Optimization is a technique to find the best outcome in a mathematical model where the objective function and the constraints are represented by linear relationships.
In order to formulate a linear program, we need to understand the concepts of its parts.
- Decision variables: what we’d like to determine. The unknowns.
- Objective function: the linear function representing the quantities to be maximized or minimized.
- Constraints: the system of equalities or inequalities representing the restrictions on the decision variables.
- Non-negativity restrictions: the values of the decision variables should be greater than or equal to zero.
Solving a Linear Programming Problem
Taking our four definitions from above, we can generically set up a process for solving an LP problem:
- Identify decision variables
- Identify the objective function
- Specify the constraints
- State the non-negativity restrictions
There are a few different ways to solve an LP problem: the Graphical Method, the Fourier-Motzkin Elimination, the Simplex Method, the Criss-Cross Algorithm, Karmarkar’s Algorithms, to name a few. There are also a few commercial and open-source software options available: Gurobi, CPLEX, MATLAB, SAS, GAMS, Mathematica, PuLP, Pyomo, SciPy, are some examples of what’s available.
Learn by example — A practical tutorial using PuLP
Let’s look at an example based on The Transportation Problem, which was one of the original applications of linear programming models.
The following is taken from Introduction to Operations Research by Juraj Stacho:
A furniture company owns three warehouses in the New York City area and needs to deliver chairs to its three shops in the city for tomorrow. The three shops demand 4, 2, and 3 units respectively. The current stock level of chairs in the three warehouses is 8, 6, and 3 respectively. Delivery costs from each warehouse to each store are different due to different distances. Find the least expensive way to deliver the chairs to the stores.
Formulating the Linear Problem:
- Decision variable: the number of chairs to be delivered from the i-th warehouse to the j-th shop
- m sources: three warehouses
- n destinations: three shops
- Constraints: supply from the i-th warehouse, demand from the j-th shop
- The objective: minimize the total cost of transportation
Since a solution only exists if supply = demand
a dummy demand variable was created. Below is the code for this LP problem using the PuLP library.
import pulp as plp # Start with creating a problem instance lp_prob = plp.LpProblem("transportationProblem", plp.LpMinimize) # Supply w1 = 8 w2 = 6 w3 = 3 # Demand s1 = 4 s2 = 2 s3 = 3 dummy_demand = 8 # Delivery rates from shops to warehouses in USD # Since there is an 8 unit surplus in supply, # We will created a dummy destination for that at USD 0 cost s1_w1 = 7 s2_w1 = 3 s3_w1 = 4 sd_w1 = 0 s1_w2 = 4 s2_w2 = 2 s3_w2 = 2 sd_w2 = 0 s1_w3 = 2 s2_w3 = 1 s3_w3 = 5 sd_w3 = 0 # Decision variables # We want the number of items transported from the i-th source # To the j-th destionation x = plp.LpVariable.dicts("item_quatity", list(range(1, 13)), lowBound = 0, cat = "Integer") # Objective Function lp_prob += (x[1] * s1_w1 + x[2] * s2_w1 + x[3] * s3_w1 + x[4] * sd_w1 + x[5] * s1_w2 + x[6] * s2_w2 + x[7] * s3_w2 + x[8] * sd_w2 + x[9] * s1_w3 + x[10] * s2_w3 + x[11] * s3_w3 + x[12] * sd_w3) # Constraints # Supply Constraints lp_prob += x[1] + x[2] + x[3] + x[4] == w1 lp_prob += x[5] + x[6] + x[7] + x[8] == w2 lp_prob += x[9] + x[10] + x[11] + x[12] == w3 # Demand Constraints lp_prob += x[1] + x[5] + x[9] == s1 lp_prob += x[2] + x[6] + x[10] == s2 lp_prob += x[3] + x[7] + x[11] == s3 lp_prob += x[4] + x[8] + x[12] == dummy_demand # Solve lp_prob.solve() plp.LpStatus[lp_prob.status] # Print our decision variable values print("Items from Warehouse 1 to Store 1 = {}".format(x[1].varValue)) print("Items from Warehouse 1 to Store 2 = {}".format(x[2].varValue)) print("Items from Warehouse 1 to Store 3 = {}".format(x[3].varValue)) print("Items from Warehouse 2 to Store 1 = {}".format(x[5].varValue)) print("Items from Warehouse 2 to Store 2 = {}".format(x[6].varValue)) print("Items from Warehouse 2 to Store 3 = {}".format(x[7].varValue)) print("Items from Warehouse 3 to Store 1 = {}".format(x[9].varValue)) print("Items from Warehouse 3 to Store 2 = {}".format(x[10].varValue)) print("Items from Warehouse 3 to Store 3 = {}".format(x[11].varValue)) # Print our objective function value print("Total transportation cost is {}".format(plp.value(lp_prob.objective)))
Conclusion
Optimization is crucial for investigating complex issues, identifying and solving problems, and data-driven decision-making. Furthermore, understanding optimization theory is essential to Data Science as a craft. Many practical problems can be modeled and solved with mathematical optimization techniques, where inputs and constraints are obtained from machine learning model’s predictions — but that’s just one thought from a newbie Data Scientist.
Source: towardsdatascience