Professor Powell’s Decision Analytics Series — Topics 20-

Optimal Dynamics
4 min readDec 27, 2020

This is the continuation of my decision analytics series. The first, part, covering Topics 1–19, can be found here.

Topic 20: Making decisions under uncertainty

Posted: Nov 29, 2020

I am now going to return to the challenge of actually making decisions over time, in the presence of different forms of uncertainty. The decision might be how much inventory to order, what treatment to give a patient, how much to invest in an asset, what price to charge, or what turn to take in a network while driving to an appointment … the list is endless.

In a deterministic world, we focus on making a decision at time t (call this x_t). When we are working in the presence of uncertainty, we want to choose a *method* for making a decision. We are going to call this method a *policy*, which could be as simple as “sell if the stock price rises by $10”. Over the next series of posts, I am going to summarize four classes of policies (these are actually “meta-classes”) that will include every possible method for making decisions, including whatever method you are already using.

This series will make you aware of all four classes (so you can make the best choice), and ensure you are doing the best within a class.

Topic 21: The four classes of policies

Posted: December 21, 2020

When making decisions over time, we need to choose a *policy* which is a method for making decisions. There are four (meta) classes of policies (which means classes of classes). These four classes cover any method studied in the academic literature, or any method you might be using today.

The four classes can be divided into two major classes:

  1. The policy search class — This is where we have a family of functions (such as order more supplies when your inventory is below s, and order up to S). These policies are always governed by tunable parameters (such as s and S). These are the simplest policies (but you have to tune).
  2. The lookahead class — This is where I start in a state S_t (such as my location on a network, or current inventory), make a decision x_t that takes me to a new state S_{t+1}. Now, maximize the contribution from the decision, plus the value (future contributions) from the state that the decision takes you to.

Each of these classes can be divided into two subclasses to give us the four classes. I will step through all four in the coming weeks, starting with the simplest.

Topic 22: Policy function approximations

Posted: December 18, 2020

The first of the four classes of policies are policy function approximations (or PFAs). PFAs are the simplest and most widely used class of policy, and the one most familiar to us humans. Examples are simple rules such as buy low, sell high; order when the inventory is below s, in which case order up to S; if it is cold, wear a coat. More sophisticated examples might use a linear model (set the speed of a ship proportional to how late it is running) or even a neural network. All PFAs use tunable parameters that have to be optimized to make them work well over time. PFAs are just like machine learning models, and include any function that might be considered when doing statistical model fitting. The only difference is that instead of fitting a model to data, we tune the function to optimize some metric.

PFAs are simple, but the price of simplicity is tunable parameters, and tuning is hard! The easiest way to do tuning is in a simulator (this is offline), although there are settings where tuning has to be done in the field (online).

Topic 23: Cost function approximations

Posted: January 1, 2021

CFAs are like policy function approximations, with the exception that they require solving some kind of optimization problem. We would like to choose the ad to display that will produce the most clicks, but we don’t know how many clicks each ad will produce. A powerful strategy is to maximize the expected number of clicks (this is known as “exploitation”), but sometimes our estimates are off and we just have to try other choices to learn (known as “exploration”). Other examples of CFAs arise when you add a buffer to the time your navigation system estimates to get to a destination. Airlines use CFAs when they add schedule slack in their optimization models for scheduling aircraft. CFAs are all simplified optimization problems, but h ave tunable parameters that have to be tuned just like PFAs. But since they are optimization problems, we can solve them using commercial solvers (linear programs, integer programs, …) which makes it possible to handle high-dimensional problems. Parametric CFAs have been widely overlooked by the research community.

--

--