Distributionally Robust Optimization

We consider another risk-averse decision making approach using distributionally robust optimization (DRO). To this end, we begin with a nominal stochastic optimization problem:

(24)\max_{s \in S} \sum_{\sigma \in \Sigma} q^\sigma f(s,\xi^\sigma).

In our context, the nominal model is a stochastic capital budgeting problem in which we maximize expected NPV, so we may take f(s,\xi^\sigma) = NPV(s,\xi^\sigma). Here, s \in S indicates the constraints that a prioritized solution must satisfy when we prioritize project selection subject to uncertainty in costs and NPV for each project, as well as uncertainty in resource availability. The goal is to prioritize in order to maximize expected NPV under the nominal distribution specified by the probability mass function q^\sigma, \sigma \in \Sigma.

We suppose that \xi is a discrete random vector with finite sample space \Omega, so that \xi^\omega, \omega \in \Omega, enumerates all possible realizations. We further suppose that we only have observations \xi^\sigma, \sigma \in \Sigma \subset \Omega, possibly a strict subset, arising in a data-driven setting. In such a data-driven setting we might have, for example, q^\sigma = 1 / |\Sigma| for all \sigma \in \Sigma.

A DRO variant of this nominal stochastic optimization model is:

(25)\max_{s \in S} \min_{p \in P} \sum_{\omega \in \Omega} p^\omega f(s,\xi^\omega).

Here, we may view this DRO model as playing a “game” against nature. First we select s \in S, and then, knowing s, nature selects a worst-case probability distribution p \in P to minimize the expected NPV, which we seek to maximize. The set P is the distributional uncertainty set (DUS), which we will define more precisely below. For now, it is enough to think of P as a neighborhood of probability distributions centered on the given probability mass function q, with the radius of the neighborhood specified by a parameter \varepsilon.

If \varepsilon = 0, the DRO model reduces to the nominal model.
If \varepsilon is very large, nature will select the single worst-case scenario (e.g., lowest budgets, highest costs, lowest NPVs). This is typically too conservative to be useful.

With moderate values of \varepsilon, we obtain solutions that hedge against deviations from q without being excessively conservative.

We do not interpret nature as malevolent. Rather, the inner minimization \min_{p\in P} acts as a regularizer to avoid over-fitting our solution to one specific estimated distribution, in much the same way that regularizers are used in high-dimensional statistics and machine learning.

Defining a Distributional Uncertainty Set via the Wasserstein Distance

We constrain nature to choose a probability mass function p that is “close” to the nominal (or empirical) distribution q. We define the DUS as:

(26)P = \Bigl\{ p : D(p,q) \le \varepsilon,\; \sum_{\omega \in \Omega} p^\omega = 1,\; p^\omega \ge 0,\; \omega \in \Omega \Bigr\},

where D(p,q) is a distance between nature’s choice p and the nominal distribution q. The radius \varepsilon governs how much we allow p to deviate from q, and thus the degree of conservatism.

There are many ways to measure D(p,q), including the Kolmogorov–Smirnov distance, Kullback–Leibler divergence, chi-squared distances, total variation, and more general \psi-divergences. The Wasserstein distance, based on optimal transport, is particularly useful in DRO. For a distribution with known finite support, the Wasserstein distance D(p,q) between a nominal distribution q^\sigma, \sigma \in \Sigma, and a candidate robust distribution p^\omega, \omega \in \Omega, is given by the optimal value of the following transportation problem:

(27)\begin{aligned} D(q,p) = \min_z \quad & \sum_{\sigma \in \Sigma} \sum_{\omega \in \Omega} d_{\sigma,\omega} z_{\sigma,\omega} \\ \text{s.t.} \quad & \sum_{\omega \in \Omega} z_{\sigma,\omega} = q^\sigma, && \sigma \in \Sigma, \\ & \sum_{\sigma \in \Sigma} z_{\sigma,\omega} = p^\omega, && \omega \in \Omega, \\ & z_{\sigma,\omega} \ge 0, && \sigma \in \Sigma,\; \omega \in \Omega. \end{aligned}

Intuitively, z_{\sigma,\omega} denotes how much probability mass q^\sigma must be transported a distance d_{\sigma,\omega} from \xi^\sigma to \xi^\omega. If \Omega = \Sigma, p^\omega = q^\omega for all \omega, and d_{\omega,\omega} = 0, then D(p,q) = 0.

To fully specify D(p,q), we must define d_{\sigma,\omega} = \text{dist}(\xi^\sigma,\xi^\omega). We may choose \text{dist}(\cdot,\cdot) to be, for example, an \ell_\eta-norm:

(28)d_{\sigma,\omega} = \|\xi^\sigma - \xi^\omega\|_\eta.

With the Wasserstein distance, and a given distribution q, we define:

(29)P = \Bigl\{ p : D(p,q) \le \varepsilon,\; \sum_{\omega \in \Omega} p^\omega = 1,\; p^\omega \ge 0,\; \omega \in \Omega \Bigr\}

for a chosen radius \varepsilon. We think of P as a ball (or neighborhood) of probability distributions centered at q. With \Sigma \subset \Omega:

If \varepsilon = 0, then P collapses to the singleton \{q\}.
Larger \varepsilon creates larger neighborhoods and more conservative robust models.

We can represent P using the following extended-variable constraints:

(30)\begin{aligned} & \sum_{\sigma \in \Sigma} \sum_{\omega \in \Omega} d_{\sigma,\omega} z_{\sigma,\omega} \le \varepsilon, \\ & \sum_{\omega \in \Omega} z_{\sigma,\omega} = q^\sigma, && \sigma \in \Sigma, \\ & \sum_{\sigma \in \Sigma} z_{\sigma,\omega} = p^\omega, && \omega \in \Omega, \\ & z_{\sigma,\omega} \ge 0, && \sigma \in \Sigma,\; \omega \in \Omega. \end{aligned}

Towards a Computationally Tractable Reformulation

Due to the nested \max \min in the DRO model, it is not directly amenable to off-the-shelf solvers. We therefore reformulate it.

For the moment, fix s \in S, so that f(s,\xi^\omega) is a known numerical value for each \omega \in \Omega. Nature’s problem can then be written as:

(31)\begin{aligned} \min_{p,z} \quad & \sum_{\omega \in \Omega} p^\omega f(s,\xi^\omega) \\ \text{s.t.} \quad & \sum_{\sigma \in \Sigma} \sum_{\omega \in \Omega} d_{\sigma,\omega} z_{\sigma,\omega} \le \varepsilon && : [-\gamma], \\ & \sum_{\omega \in \Omega} z_{\sigma,\omega} = q^\sigma, && \sigma \in \Sigma : [\nu^\sigma], \\ & \sum_{\sigma \in \Sigma} z_{\sigma,\omega} = p^\omega, && \omega \in \Omega : [\beta^\omega], \\ & z_{\sigma,\omega} \ge 0, && \sigma \in \Sigma,\; \omega \in \Omega. \end{aligned}

Here \gamma, \nu^\sigma, and \beta^\omega are dual variables. Nature optimizes over z and p to select a worst-case distribution within radius \varepsilon of q.

Taking the dual of this linear DRO program, and substituting \beta^\omega = -f(s,\xi^\omega), we obtain:

(32)\begin{aligned} \max_{\gamma,\nu} \quad & -\gamma \varepsilon + \sum_{\sigma \in \Sigma} \nu^\sigma q^\sigma \\ \text{s.t.} \quad & -\gamma d_{\sigma,\omega} + \nu^\sigma \le f(s,\xi^\omega), && \sigma \in \Sigma,\; \omega \in \Omega, \\ & \gamma \ge 0. \end{aligned}

To build intuition, consider two extremes:

Case \varepsilon = 0. There is no penalty for large \gamma in the objective. As \gamma grows large, the constraints for \sigma \ne \omega become vacuous. For \sigma = \omega, d_{\sigma,\sigma} = 0, so the constraint becomes \nu^\sigma \le f(s,\xi^\sigma). Optimizing yields \sum_\sigma q^\sigma f(s,\xi^\sigma), which is exactly the nominal expected value.
Case \varepsilon \to \infty. To avoid a large penalty -\gamma \varepsilon in the objective, we must have \gamma = 0. The constraint reduces to \nu^\sigma \le f(s,\xi^\omega) for all \omega, so \nu^\sigma = \min_\omega f(s,\xi^\omega). The objective becomes:

(33)\sum_\sigma q^\sigma \min_\omega f(s,\xi^\omega) = \min_\omega f(s,\xi^\omega),

which corresponds to nature placing probability 1 on the worst-case scenario.

Specializing s \in S to the capital budgeting prioritization decisions, and f(s,\xi^\omega) to the NPV under scenario \omega, the DRO variant of the stochastic capital budgeting problem is:

(34)\begin{aligned} \max_{x,y,\gamma,\nu} \quad & -\gamma \varepsilon + \sum_{\sigma \in \Sigma} \nu^\sigma q^\sigma \\ \text{s.t.} \quad & -\gamma d_{\sigma,\omega} + \nu^\sigma \le \sum_{i \in I} \sum_{j \in J_i} a_{ij}^\omega x_{ij}^\omega, && \sigma \in \Sigma,\; \omega \in \Omega, \\ & y_{ii'} + y_{i'i} \ge 1, && i < i',\; i,i' \in I, \\ & \sum_{j=1}^{J_i} x_{ij}^\omega \ge \sum_{j=1}^{J_i} x_{i'j}^\omega + y_{ii'} - 1, && i \ne i',\; i,i' \in I,\; \omega \in \Omega, \\ & \sum_{i \in I} \sum_{j \in J_i} c_{ijkt}^\omega x_{ij}^\omega \le b_{kt}^\omega, && k \in K,\; t \in T,\; \omega \in \Omega, \\ & \sum_{j \in J_i} x_{ij}^\omega \le 1, && i \in I,\; \omega \in \Omega, \\ & \gamma \ge 0. \end{aligned}

LOGOS Settings for DRO Problems

The DRO approach extends the stochastic optimization approach discussed in Section Prioritizing Project Selection to Hedge Against Uncertainty. Both share the same input structure except for the <Settings> block.

In both cases, the user specifies a collection of scenarios via an <Uncertainties> block. The <problem_type> within <Settings> selects the type of DRO problem. Currently available DRO problem types are:

'droskp'
'dromkp'
'dromckp'

The user can use <radius_ambiguity> to control the Wasserstein radius \varepsilon for DRO problems.

Example LOGOS input XML for DRO:

<Settings>
<Logos>
  <solver>cbc</solver>
  <solverOptions>
    <StochSolver>EF</StochSolver>
    <!-- epsilon radius -->
    <radius_ambiguity>0.1</radius_ambiguity>
  </solverOptions>
  <sense>maximize</sense>
  <problem_type>droskp</problem_type>
</Settings>
</Logos>

DRO for Single Knapsack Problem

Model formulation.

(35)\begin{aligned} \max_{x,y} \max_{\gamma,\nu} \quad & -\gamma \varepsilon + \sum_{\sigma \in \Sigma} \nu^\sigma q^\sigma \\ \text{s.t.} \quad & -\gamma d_{\sigma,\omega} + \nu^\sigma \le \sum_{i \in I} a_i^\omega x_i^\omega, && \sigma \in \Sigma,\; \omega \in \Omega, \\ & \sum_{i \in I} c_i^\omega x_i^\omega \le b^\omega, && \omega \in \Omega, \\ & y_{ii'} + y_{i'i} \ge 1, && i < i', \\ & x_i^\omega \ge x_{i'}^\omega + y_{ii'} - 1, && i \ne i', \\ & x_i^\omega, y_{ii'}^\omega \in \{0,1\}, \\ & \gamma \ge 0. \end{aligned}

See Section DRO for Multi-Dimensional Knapsack Problem for an example LOGOS input file. The multi-dimensional knapsack problem is a straightforward extension of the single-dimensional case, and both share <problem_type> 'droskp'.

DRO for Multi-Dimensional Knapsack Problem

Model formulation.

(36)\begin{aligned} \max_{x,y} \max_{\gamma,\nu} \quad & -\gamma \varepsilon + \sum_{\sigma \in \Sigma} \nu^\sigma q^\sigma \\ \text{s.t.} \quad & -\gamma d_{\sigma,\omega} + \nu^\sigma \le \sum_{i \in I} a_i^\omega x_i^\omega, && \sigma \in \Sigma,\; \omega \in \Omega, \\ & \sum_{i \in I} c_{it}^\omega x_i^\omega \le b_t^\omega, && t \in T,\; \omega \in \Omega, \\ & y_{ii'} + y_{i'i} \ge 1, && i < i', \\ & x_i^\omega \ge x_{i'}^\omega + y_{ii'} - 1, && i \ne i', \\ & x_i^\omega, y_{ii'}^\omega \in \{0,1\}, \\ & \gamma \ge 0. \end{aligned}

Example LOGOS input XML:

<Logos>
  <Sets>
    <investments>
      1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
    </investments>
    <time_periods>
      1,2,3,4,5
    </time_periods>
  </Sets>

  <Parameters>
    <net_present_values index="investments">
      2.315,0.824,22.459,60.589,0.667,5.173,4.003,0.582,0.122,
      -2.870,-0.102,-0.278,-0.322,-3.996,-0.246,-20.155
    </net_present_values>
    <costs index="investments, time_periods">
      0.219,0.257,0.085,0.0,0.0,
      0.0,0.0,0.122,0.103,0.013,
      5.044,1.839,0.0,0.0,0.0,
      6.74,6.134,10.442,0.0,0.0,
      0.425,0.0,0.0,0.0,0.0,
      2.125,2.122,0.0,0.0,0.0,
      2.387,0.19,0.012,2.383,0.192,
      0.0,0.95,0.0,0.0,0.0,
      0.03,0.03,0.688,0.0,0.0,
      0,0.2,0.763,0.739,2.539,
      0.081,0.032,0,0,0,
      0.3,0,0,0,0,
      0.347,0,0,0,0,
      4.025,0.297,0,0,0,
      0.095,0.095,0.095,0,0,
      5.487,5.664,0.5,6.803,6.778
    </costs>
    <available_capitals index="time_periods">
      18,18,18,18,18
    </available_capitals>
  </Parameters>

  <Uncertainties>
    <available_capitals>
      <totalScenarios>10</totalScenarios>
      <probabilities>
        0.012, 0.019, 0.032, 0.052, 0.086,
        0.142, 0.235, 0.188, 0.141, 0.093
      </probabilities>
      <!--
        scenarios is ordered by numberScenarios * parametersIndex,
        the number of scenarios is determined by the length of
        <probabilities>; here: 10 * 5
      -->
      <scenarios>
        11, 11, 11, 11, 11,
        12, 12, 12, 12, 12,
        13, 13, 13, 13, 13,
        14, 14, 14, 14, 14,
        15, 15, 15, 15, 15,
        16, 16, 16, 16, 16,
        17, 17, 17, 17, 17,
        18, 18, 18, 18, 18,
        19, 19, 19, 19, 19,
        20, 20, 20, 20, 20
      </scenarios>
    </available_capitals>
  </Uncertainties>

  <Settings>
    <mandatory>10,11,12,13,14,15,16</mandatory>
    <solver>cbc</solver>
    <solverOptions>
      <StochSolver>EF</StochSolver>
      <!-- epsilon radius -->
      <radius_ambiguity>0.1</radius_ambiguity>
    </solverOptions>
    <sense>maximize</sense>
    <problem_type>droskp</problem_type>
  </Settings>
</Logos>

DRO for Multiple Knapsack Problem

Model formulation.

(37)\begin{aligned} \max_{x,y} \max_{\gamma,\nu} \quad & -\gamma \varepsilon + \sum_{\sigma \in \Sigma} \nu^\sigma q^\sigma \\ \text{s.t.} \quad & -\gamma d_{\sigma,\omega} + \nu^\sigma \le \sum_{i \in I} \sum_{m \in M} a_i^\omega x_{im}^\omega, && \sigma \in \Sigma,\; \omega \in \Omega, \\ & \sum_{i \in I} c_i^\omega x_{im}^\omega \le b_m^\omega, && m \in M,\; \omega \in \Omega, \\ & y_{ii'} + y_{i'i} \ge 1, && i < i', \\ & \sum_{m=1}^M x_{im}^\omega \ge \sum_{m=1}^M x_{i'm}^\omega + y_{ii'} - 1, && i \ne i',\; i,i' \in I,\; \omega \in \Omega, \\ & x_{im}^\omega, y_{ii'}^\omega \in \{0,1\}, \\ & \gamma \ge 0. \end{aligned}

Example LOGOS input XML:

<Logos>
  <Sets>
    <investments>
      1,2,3,4,5,6,7,8,9,10
    </investments>
    <capitals>
      unit_1, unit_2
    </capitals>
  </Sets>

  <Parameters>
    <net_present_values index="investments">
      78, 35, 89, 36, 94, 75, 74, 79, 80, 16
    </net_present_values>
    <costs index="investments">
      18, 9, 23, 20, 59, 61, 70, 75, 76, 30
    </costs>
    <available_capitals index="capitals">
      103, 156
    </available_capitals>
  </Parameters>

  <Uncertainties>
    <available_capitals>
      <totalScenarios>10</totalScenarios>
      <probabilities>
        0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
      </probabilities>
      <scenarios>
        101, 154,
        102, 155,
        103, 156,
        104, 157,
        105, 158,
        106, 159,
        107, 160,
        108, 161,
        109, 162,
        110, 163
      </scenarios>
    </available_capitals>
  </Uncertainties>

  <Settings>
    <solver>cbc</solver>
    <solverOptions>
      <StochSolver>EF</StochSolver>
      <!-- epsilon radius -->
      <radius_ambiguity>0.1</radius_ambiguity>
    </solverOptions>
    <sense>maximize</sense>
    <problem_type>dromkp</problem_type>
  </Settings>
</Logos>

DRO for Multiple-Choice Knapsack Problem

Model formulation.

(38)\begin{aligned} \max_{x,y} \max_{\gamma,\nu} \quad & -\gamma \varepsilon + \sum_{\sigma \in \Sigma} \nu^\sigma q^\sigma \\ \text{s.t.} \quad & -\gamma d_{\sigma,\omega} + \nu^\sigma \le \sum_{i \in I} \sum_{j \in J_i} a_{ij}^\omega x_{ij}^\omega, && \sigma \in \Sigma,\; \omega \in \Omega, \\ & \sum_{i \in I} \sum_{j \in J_i} c_{ijkt}^\omega x_{ij}^\omega \le b_{kt}^\omega, && k \in K,\; t \in T,\; \omega \in \Omega, \\ & y_{ii'} + y_{i'i} \ge 1, && i < i', \\ & \sum_{j=1}^{J_i} x_{ij}^\omega \ge \sum_{j=1}^{J_i} x_{i'j}^\omega + y_{ii'} - 1, && i \ne i',\; i,i' \in I,\; \omega \in \Omega, \\ & x_{ij}^\omega, y_{ii'}^\omega \in \{0,1\}, \\ & \gamma \ge 0. \end{aligned}

Example LOGOS input XML:

<Logos>
  <Sets>
    <investments>
      1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
      11, 12, 13, 14, 15, 16, 17
    </investments>
    <options index="investments">
      1;
      1;
      1;
      1,2,3;
      1,2,3,4;
      1,2,3,4,5,6,7;
      1;
      1;
      1;
      1;
      1;
      1;
      1;
      1;
      1;
      1;
      1
    </options>
  </Sets>

  <Parameters>
    <net_present_values index="options">
      2.046
      2.679
      2.489
      2.61
      2.313
      1.02
      3.013
      2.55
      3.351
      3.423
      3.781
      2.525
      2.169
      2.267
      2.747
      4.309
      6.452
      2.849
      7.945
      2.538
      1.761
      3.002
      3.449
      2.865
      3.999
      2.283
      0.9
      8.608
    </net_present_values>
    <costs index="options">
      36538462
      83849038
      4615385
      2788461538
      2692307692
      5480769231
      1634615385
      2981730768
      7211538462
      9038461538
      649038462
      650000000
      216346154
      212500000
      3076923077
      3942307692
      1144230769
      675721154
      1442307692
      99711538
      4807692
      123076923
      138461538
      86538462
      108653846
      75092404
      6413462
      147932692
    </costs>
    <available_capitals>
      15E9
    </available_capitals>
  </Parameters>

  <Uncertainties>
    <available_capitals>
      <totalScenarios>3</totalScenarios>
      <probabilities>
        0.2,0.6,0.2
      </probabilities>
      <scenarios>
        5E9,10E9,15E9
      </scenarios>
    </available_capitals>
  </Uncertainties>

  <Settings>
    <solver>glpk</solver>
    <solverOptions>
      <StochSolver>EF</StochSolver>
      <!-- epsilon radius -->
      <radius_ambiguity>0.1</radius_ambiguity>
    </solverOptions>
    <sense>maximize</sense>
    <problem_type>dromckp</problem_type>
  </Settings>
</Logos>