Robust and Pareto Optimality of Insurance Contracts

The optimal insurance problem represents a fast growing topic that explains the most efficient contract that an insurance player may get. The classical problem investigates the ideal contract under the assumption that the underlying risk distribution is known, i.e. by ignoring the parameter and model risks. Taking these sources of risk into account, the decision-maker aims to identify a robust optimal contract that is not sensitive to the chosen risk distribution. We focus on Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)-based decisions, but further extensions for other risk measures are easily possible. The Worst-case scenario and Worst-case regret robust models are discussed in this paper, which have been already used in robust optimisation literature related to the investment portfolio problem. Closed-form solutions are obtained for the VaR Worst-case scenario case, while Linear Programming (LP) formulations are provided for all other cases. A caveat of robust optimisation is that the optimal solution may not be unique, and therefore, it may not be economically acceptable, i.e. Pareto optimal. This issue is numerically addressed and simple numerical methods are found for constructing insurance contracts that are Pareto and robust optimal. Our numerical illustrations show weak evidence in favour of our robust solutions for VaR-decisions, while our robust methods are clearly preferred for CVaR-based decisions.


Introduction
Finding the optimal insurance contract has represented a topic of interest in the actuarial science and insurance literature for more than 50 years. The seminal papers of Borch (1960) and Arrow (1963) had opened this field of research and since then, many papers discussed this problem under various assumptions on the risk preferences of the insurance players involved in the contract and how the cost of insurance (known as premium) is quantified. Specifically, the optimal contracts in the context of Expected Utility Theory are investigated amongst others in Kaluszka (2005), Kaluszka and Okolewski (2008) and Cai and Wei (2012). Extensive research has been made when the preferences are made via coherent risk measures (as defined in Artzner et al., 1999; recall that CVaR is an element of this class) and VaR; for example, see Cai and Tan (2007), Balbás et al. (2009 and2011), Asimit et al. (2013b), Cheung et al. (2014) and Cai and Weng (2016) among others.
The choice of a risk measure is usually subjective, but VaR and CVaR represent the most known risk measures used in the insurance industry. Solvency II and Swiss Solvency Test are the regulatory regimes for all (re)insurance companies that operate within the European Union and Switzerland, respectively, and their capital requirements are solely based on VaR and CVaR.
For these reasons and not only, these standard risk measures have received special attention by academics, practitioners and regulators, and therefore, vivid discussions have risen among them. VaR is criticised for its lack of sub-additivity and it may create regulatory arbitrage in an insurance group (see Asimit et al., 2013a). A detailed discussion on possible regulatory arbitrages in a CVaR-based regime is provided in Koch-Medina and Munari (2016). A desirable property for a risk measure is the elicitability, which allows one to compare competitive forecasting methods, a property that VaR does have (see Gneiting, 2011). The lack of elicitability for CVaR has been adjusted via the joint elicitability, concept formalised in Fissler and Ziegel (2016), but earlier flagged out by Acerbi and Szekely (2014). Robustness properties of a risk measure are also of great interest since they imply that the estimate is insensitive to data contamination. Parameter risk (uncertainty with parameter estimation) and model risk (uncertainty with model selection) are the two main sources of uncertainty in modelling. The robust statistic has its roots in the papers of Huber (1964) and Hampel (1968), which has been shown to be less appropriate in the context of risk management (see for example, Cont et al., 2010). A more informative discussion is given in the next section due to its length. Finally, a summary of all properties exhibited by the two risk measures is detailed in the comprehensive work of Emmer et al. (2015), but the general conclusion is that there is no evidence for global advantage of one risk measure against the other.
Whenever the model and parameter risks are present, it is prudent to consider insurance contracts that are optimal under a set of plausible models and this is precisely what robust optimisation does. It is a vast area of research with applications in various fields and a standard reference is Ben-Tal et al. (2009), while comprehensive surveys can be found in Ben-Tal and Nemirovski (2008), Bertsimas et al. (2011) and Gabrel et al. (2014).
The aim of the paper is to identify the optimal insurance contract under the model/parameter risk in the robust optimisation sense and understand how robust these solutions are from the practical point of view. That is, we aim to explain how large the uncertainty set should be for relatively small or medium sized historical data sets as is expected in insurance practice. At the same time, since the insurance contract is in fact a risk allocation, it is of great interest to find whether or not our robust contracts are Pareto optimal. Robust optimisation may lead to inefficient risk allocations, i.e. not Pareto optimal, which are clearly not acceptable, and special attention is given to this issue by providing a simple methodology to overcome such caveats of robust optimisation. Our numerical illustrations have shown weak evidence in favour of our robust solutions for VaR-based decisions, which is not surprising due to the erratic behaviour of VaR. On the contrary, CVaR-based decisions are more robust via robust optimisation than using statistical methods, which can be explained by the fact that CVaR takes into account some part of the tail risk as opposed to VaR. Either Worst-case scenario or regret robust optimisations is preferred (comparing to the classical statistical methods) for less (statistically) robust risk measures that are purely tail risk measures, where the estimation is based on a small portion of the sample that explains only the tail risk. We also find that the Worst-case optimisation is once again advantageous even for risk measures that are sensitive to the entire sample, i.e. are not only based on the tail risk.
The structure of the paper is as follows: the next section contains the necessary background and the mathematical formulation of our problems, while Sections 3 and 4 investigate the VaR and CVaR-based optimal insurance contracts, but also discuss simple extensions for distortion risk measures when the moral hazard is removed; these robust solutions are further investigated in Section 5 to becoming Pareto optimal as well; extensive numerical examples are elaborated in Section 6, which help in justifying our conclusions summarised in Section 7.

Background and Problem Definition
2.1. Optimal insurance. An insurance contract represents a risk transfer between two parties, insurance buyer (or simply buyer ) and insurance seller (or simply seller ). When the buyer is also an insurance company, then the transfer becomes a reinsurance contract and the seller is called reinsurer. Let X ≥ 0 be the total amount that the buyer is liable to pay in the absence of any risk transfer. In addition, the seller agrees to pay R[X], the amount by which the entire loss exceeds the buyer's amount, I[X], and clearly we have I[X] + R[X] = X. The most common risk transfers are the Proportional and Stop-loss contracts for which I[X] = cX (with 0 ≤ c ≤ 1) and I[X] = min{X, M }, respectively. Note that in order to avoid moral hazard issues (both players are incentivised to reduce the overall risk, i.e. I and R are non-decreasing functions), The comonotonic risk transfers (as defined above) are omnipresent in practice, but it is not always the case and the mathematical formulation of the feasibility set becomes Let P be the insurance premium, and it is further assumed that any feasible contract satisfies 0 ≤ P ≤ P , where P represents a maximal amount of premium that the buyer would accept to pay. If the loss distribution is known, then the premium calculations are possible via certain rules, known as premium principles. A concise review of premium principles can be found in Young (2004). Specifically, if P is the probability measure for X, then P ≥ ω 0 +(1+θ)H P R[X] , where ω 0 ≥ 0 represents some fixed/administrative costs, θ ≥ 0 is the risk loading parameter/factor, and H is a monotone functional on the space of non-negative random variables that depends on the seller's choice of premium principle. The monotonicity property is of practical importance and it means that if two random losses satisfy Y ≤ Z, then H P (Y ) ≤ H P (Z). A commonly encountered premium principle is the distortion premium principle (see Wang et al., 1997), for any non-negative loss random variable Y , where g : [0, 1] → [0, 1] is non-decreasing with g(0) = 0 and g(1) = 1 known as distortion function. When the distortion function is taken to be the identity function, we obtain the expected value premium principle, which is standard in the insurance industry. The mathematical formulation of the optimal insurance problem becomes min (R,P )∈C× where ρ P is a risk measure chosen by the buyer to order its preferences to risk. As explained in Section 1, it is first assumed in this paper that ρ P ∈ {VaR, CVaR}. Recall that the lower script P indicates the probability measure under which the risk measurement is made. The VaR of a loss variable Y at a confidence level α ∈ (0, 1), is given by VaR α (Y ; P) = inf y∈ P(Y ≤ y) ≥ α . Note that VaR α is representable as in (2.1) with g(t) = I {t>1−p} , where I A represents the indicator operator that assigns the value 1 if A is true and 0 otherwise. The CVaR risk measure is defined in Rockafeller and Uryasev (2000) as follows Alternative representations are known in the literature (see for example, Acerbi and Tasche, 2002) and one of them is as in (2.1) with g(t) = t 1−α ∧ 1. Due to the monotonicity property of VaR, CVaR and the functional H, (2.2) becomes much simpler when removing the economic constraint P ≤ P and it has been investigated under various sets of assumptions. Recently, Cheung and Lo (2015) included the latter constraint and analytically solved (2.2) for a large class of premium principles and risk measures, including the class from (2.1).
The existing literature assumes that the loss distribution is certainly known, and as a result, the parameter and model risks are removed. Small and medium sized samples (present in non high frequency data, as is usually the case in insurance data) raises many questions when estimating any parameter even if the model risk is completely removed, i.e. the chosen model is correct. Large samples are more concerned with the model risk, which can be reduced if the model is carefully selected. Thus, if we know what we need to estimate, for example the optimal objective function value from (2.2), for which its closed-form solution is required, the elicitability (see Gneiting, 2011) of this functional (induced by the optimal objective function value) is the next step in order to compare various models and reduce the model risk. While VaR is elicitable, VaR and CVaR are jointly elicitable, our functional may not be elicitable or impossible to assess the presence of this property, since one has to find a scoring function to measure the estimation error under the plausible models. Therefore, the model selection for VaR/CVaR do not apply for our problem, even though these two simple risk measures are well-accepted as "good" risk measures. Now, even if we can select the "best" possible model that reduces the uncertainty with the optimal objective function value, we in fact solve a secondary problem, since the main purpose of this exercise is to obtain a robust decision (with respect to the insurance contract, i.e. R).
Therefore, it would be interesting to identify a more robust optimal insurance contract that would take into account the parameter and/or model error. Thus, we assume that the reference probability measure P is unknown and could be one of the m possible probability measures {P 1 , P 2 , . . . , P m }. Consequently, the premium feasibility constraint becomes A prudent and hopefully robust decision is obtained when investigating the worst-case scenario optimisation problem (2.4) An alternative prudent decision can be achieved via the worst-case regret optimisation problem where the buyer's "regret" is measured with respect to some m benchmark values ρ * k . Naturally, these values are the optimal objective values for the individual models and are variants of (2.2). Specifically, (2.6) These robust representations have been seen before in various ways. The worst-case type decisions were axiomatically investigated by Gilboa and Schmeidler (1989) in the expected utility context. Not surprisingly, the robust optimisation within the Portfolio Theory has its counterpart; among others, see El Ghaoui et al. (2003), Fukushima (2009), Polak et al. (2010), Zymler et al. (2013) and Kakouris and Rustem (2014). The worst-case and worst-case regret CVaR-based decisions in portfolio optimisation are discussed in Huang et al. (2010). According to our knowledge, the optimal insurance contract problem under parameter/model uncertainty has been investigated only by Balbás et al. (2015), where only the worst-case is investigated for a large class of risk measures that includes CVaR, but not VaR, and a particular choice of the uncertainty set of probability measures.
We now discuss the choice of the feasibility set, i.e. C or C co . Note that whenever the risk transfer is made between two large insurance companies, the moral hazard may not be an issue, due to the presence of rating agencies; rating downgrading has a huge negative commercial impact for such insurance companies and thus, moral hazard is less likely to occur. One may also argue that a risk transfer within an insurance group does not necessarily have to exclude the moral hazard due to the common ownership of the buyer and seller. Nevertheless, the insurance regulator requires the insurer buyer to justify the commercial purpose of such a risk transfer. In the absence of distributional uncertainty, there is a huge literature that discusses whether or not the indemnity of an insurance contract should be comonotone, but in general, the conclusion depends on the nature of the underwritten risk. On the other hand, the classical Pareto optimality problem explains the shape of an "optimal" contract and the extensive existing literature discusses how viable the comonotonic property is; an interesting discussion appears in Huberman et al. (1983). Optimal transfers are shown to be comonotone (for a large class of risk preferences) in Landsberger and Meilljson (1994) if the total risk is finite, while Ludkovski and Rüschendorf (2008) extends this results to unbounded risks. In summary, choosing between a set of feasible contracts given by C or C co is related to the specific nature of the total risk that is shared and the insurance players' risk preferences whenever the total risk distribution is known.
In the presence of distributional uncertainty, the choice of feasibility set is sensitive to the nature of the total risk. Therefore, solutions to Problems 2.4-2.6 are given to non-comonotone contracts set, C, whenever possible, otherwise the comonotone contracts set C co is chosen. Recall that we do not intend to characterise the optimal contract, but instead we examine when our proposed robust methods reduce the effect of distributional uncertainty.
Note that the feasible sets of Problems 2.4-2.6 are empty if ω 0 > P . We now gather the set of assumptions, stated as Assumption 2.1, under which the results of the paper hold.
Assumption 2.1. We consider m possible probability models {P 1 , . . . , P m } and the reference probability model P may or may not belong to this set. Denote M = {1, . . . , m}. Let X ≥ 0 be a loss random variable and denote F k (·) = P k (X ≤ ·), k ∈ M, its cumulative distribution function (cdf ) under P k , we write X ∼ P k , and F k (·) = 1 − F k (·) its corresponding survival function. Moreover, ω 0 ≤ P . The premium principle is based on a monotonic functional H.

Robustness of risk measures.
In the last few years there has been a wide and open debate on the robustness properties of VaR and CVaR, with relevant contributions from regulators, practitioners and academics. These risk measures, that we denote for brevity ρ depend on the probability model P used. The key question is whether a small perturbation of the probability model P would result in a small perturbation of ρ P , which is detailed in the next definition.
Definition 2.1. Let X n , n ≥ 1 be a sequence of random variables with distribution P n , n ≥ 1 and X a random variable with distribution P. A risk measure ρ P (X) is (statistically) robust at continuity with respect to the ϕ-weak topology. We refer the interested reader to Emmer et al. (2015) for a brief summary on the topic. Statistical robustness is particularly relevant when the probability measure is estimated from available data; indeed if the estimated probability measure P n is sufficiently close to the real one (that is d(P n , P) → 0) and the risk measure is robust, than ρ Pn can be considered as a good approximation of ρ P .
Due to data scarcity, as it is often the case in practice, the estimates based on the empirical measure exhibit weak statistical evidence and alternative methods are necessary to consider.
For example, a more conservative approach is to consider a robustified risk measureρ defined as follows:ρ where ρ P (X) represents the risk measure for the random loss X with probability distribution P and S is a set of candidates models. This approach is not new in the literature, it is at the basis of decision making under ambiguity (that is when there is uncertainty about the probability distribution). The simple idea of this approach is that when there is ambiguity between different models, a conservative and therefore robust approach is to select the one that represents the worst scenario. In Assumption 2.1, we assume that the real probability model P may not belong to the set S. Indeed, since P is unknown, we cannot guarantee that it belongs to the set of models considered. Note that taking the supremum over a set of models reduces the impact of model risk, but it cannot eliminate it completely.
The specification of the set S plays a crucial role in the worst-case approach and, in general, is a difficult task. Clearly selecting a wide set, increases the chances of including the real model which is precisely what Zhu and Fukushima (2009) consider when ρ ≡ CVaR. It is shown that holds for any risk measure. Proposition 2.1 shows that the "worst-case" definitions are identical if ρ ≡ VaR and it is followed by an example showing that the above inequality may hold strictly if ρ ≡ CVaR.
Proposition 2.1. Let {P 1 , . . . , P m } be a set of candidate models. Then, Proof. Without loss of generality, we may assume that m = 2. It is well-known that VaR has convex level sets, i.e. if two probability models P 1 , P 2 are such that VaR α X; P 1 = VaR α X; P 2 , then VaR α X, λP 1 + (1 − λ)P 2 = VaR α X; P 1 for any λ ∈ [0, 1] (see Gneiting, 2011). Further, VaR is monotone and translation invariant (see properties (a) and (b) from Section 2.3) and therefore, we can apply Lemma 2.2 in Bellini and Bignozzi (2015) to obtain that VaR is quasilinear. That is, VaR α X, P .
which in turn implies that VaR α (X, P).
The latter and (2.9) conclude the proof.
As it has been anticipated, the same result does not hold for CVaR. Indeed, the following example shows that max k∈M CVaR α X; P k < sup P∈S CVaR α X; P may hold.
Proof. Let X and Y be comonotone and assume ρ is comonotonic additive, then If ρ is comonotonic subadditive then it is sufficient to replace the second equality from above with a less than or equal to inequality.
Relaxing the assumption of properties (f)-(g) is rather common in a model uncertainty setting, where no pre-specified reference probability measure is available (for example, see Song and Yan, 2009). The next example illustrates that CVaR α (X) := sup k∈M CVaR α X; P k may be strictly comonotonic subadditive, i.e.
Example 2.2. Consider a discrete random variable X which takes only three values, i.e.

VaR Robust Optimisation
In this section, we solve the worst-case scenario optimisation problem (2.4) and worst-case regret optimisation problem (2.5) under the C co × feasibility set, when the risk measure ρ is for any I ∈ C co . For brevity, we denote a k = VaR α X; P k .
3.1. Worst-case scenario VaR optimisation problem. We first observe that the objective function is increasing and continuous in P and any feasible premium P is bounded below by for any fixed R ∈ C co . The latter leads us to define the following subset of C co , which essentially puts an upper bound on the set of feasible contracts: Eq. (3.1) helps in justifying the next lemma.
Lemma 3.1. If Assumption 2.1 holds with ρ ≡ VaR α , then any contract R ∈ C co is feasible for Problem 2.4 with a feasibility set C co × if and only if R ∈ C . Further, for any fixed R ∈ C , the optimal premium is given by P and the optimisation problem from (2.4) is equivalent to Due to the presence of the premium constraint P * R ≤ P , the set C ξ could be empty if ξ is too large. The next result explains the effective range of ξ of the outer minimisation of Problem 3.3.
The proof relies on the simple observation that the insurance layer contract belongs to C co with R * ξ [a * ] = ξ and the fact that this contract is minimal in the following sense: Lemma 3.2. If Assumption 2.1 holds with ρ ≡ VaR α , then for any ξ ∈ [0, a * ], the set C ξ is non-empty if and only if Proof. If 0 ≤ ξ ≤ ∆, the contract R * ξ belongs to C ξ by construction. To prove the converse, suppose that there exists a contract R ∈ C ξ with ξ > ∆. Since R[X] ≥ R * ξ [X], we have which contradicts the definition of C .
Recall that the premium principle H is monotone. By Lemma 3.2 and the minimality of the insurance layer contract R * ξ in the set C ξ for 0 ≤ ξ ≤ ∆, the inner minimisation of Problem 3.3 is solved by the contract R * ξ , whenever 0 ≤ ξ ≤ ∆. Therefore, it remains to obtain the optimal value of ξ for the outer minimisation, which is essentially a one-dimensional problem. We summarise our findings for the worst-case scenario VaR optimisation problem in the next theorem.
Theorem 3.1. If Assumption 2.1 holds with ρ ≡ VaR α , then the solution (R * , P * ) of Problem 2.4, assumed to be solved over the set C co × , is given by Moreover, the optimal objective value is a * − ξ * + P * .
Remark 3.1. The solution of (3.4) is unique as long as g and all F i are strictly increasing functions.
In the rest of this section, we demonstrate how Problem 3.4 can be solved rather explicitly, whenever H is a distortion premium principle as given in (2.1). Note that g is non-decreasing, and thus, H holds the comonotonic additivity property (for details, see Dhaene et al., 2012). Consequently, Also, the above function is convex in ξ ∈ 0, a * for any k ∈ M, since g is non-decreasing. Thus, is convex in ξ ∈ [0, ∆]. Therefore, Problem 3.4 can be solved by finding the directional derivatives of G. To this end, we define the directional derivative of an arbitrary convex function H at ξ along the direction d ∈ , if exists, as Lemma 3.3. Assume that H satisfies (2.1) and denote for any 0 ≤ ξ ≤ ∆. The directional derivative of G at ξ along the direction d ∈ is given by Its right-hand and left-hand derivatives at ξ are given by g k (ξ, 1) = g F k (a * − ξ)− and − g k (ξ, −1) = g F k a * − ξ , respectively. Therefore, the directional derivative of g k at ξ ∈ [0, ∆] along the direction d ∈ equals to Our claim follows from the classical Danskin's Theorem (see for example, Corollary 1.30 of Güler, 2010), which asserts that The proof is now complete.
3.2. Worst-case regret VaR optimisation problem. We turn our attention to the worstcase regret VaR optimisation from (2.5). Since we are no longer able to use similar argumentation as in the previous subsection, the usual approach in the existing literature is to assume a discrete distributed X. That is, X = x 1 , . . . , x n , where without loss of generality, it can be assumed that x 1 ≤ · · · ≤ x n . Let us denote p ik = P k (X = x i ). Clearly, p k ≥ 0 and 1 T p k = 1 for all k ∈ M, where 0 and 1 are the n-dimensional column vector of zeroes and ones, respectively.
By convention, the inequality and equality between two vectors is understood componentwise.
Denote R[x i ] = y i and if R ∈ C co , then we should have where by convention y 0 = x 0 = 0. The above can be rewritten as 0 ≤ y ≤ x and 0 ≤ Ay ≤ Ax with the (n − 1) × n matrix A given by Since x − y is increasingly ordered, then In order to make our optimisation problems tractable, we assume that H satisfies (2.1). Thus, as a result of Dhaene et al. (2012). Specifically, If g is a left continuous function, then Consequently, Problem 2.5 is an LP and we state this result as Proposition 3.1.
Proposition 3.1. Let Assumption 2.1 hold with ρ ≡ VaR α . If X is a discrete random variable that takes the values x 1 , . . . , x n such that x 1 ≤ . . . ≤ x n and H satisfies (2.1), then solving Problem (2.5) over the set C co × is equivalent to solving where ρ * k is the optimal objective value of Problem 2.6. That is, Remark 3.2. Keeping the same set of assumptions as given in Proposition 3.1, solving Problem 2.4 over the set C co × is equivalent to solving x π(k) − y π(k) + P ≤ r, k ∈ M, Remark 3.3. Due to relation (3.5), a variant of the LP reformulations from Proposition 3.1 and Remark 3.2 can be written for any case in which the risk measure ρ is a distortion risk measure.
The key assumption is that R ∈ C co and the fact that distortion risk measures are comonotonic and thus, Problems 2.4-2.6 can be reformulated as LPs for any comonotone additive risk measure ρ. For example, any risk measure that satisfies (2.1) is comonotone additive (see Dhaene et al., 2012). Note that y and x − y are increasingly ordered (as x is increasingly ordered), which make the optimisation problems under the set C co × tractable. The lack of ordering could be overcome only if ρ ≡ CVaR α , as it can be seen in Section 4, where the comonotonic assumption is removed. Finally, if the cost of insurance follows a different premium calculation, i.e. does not satisfy (2.1), then the corresponding constraints may not be linear, but are Second-order cone programming (SOCP) representable for any well-known premium calculations (for details, see Asimit et al., 2017), case in which, we only require I ∈ C co , i.e. x − y is increasingly ordered, in order to preserve the linearity of the objective functions. Thus, if H does not satisfy (2.1), the optimisation problems are of SOCP-type.

CVaR Robust Optimisation
The current section provides numerical solutions to the CVaR-type of Problems 2.4 and 2.5 under similar assumptions to the ones made in Section 3.2. The crucial change is made by the fact that the set of feasible solutions, namely C × , is larger and moral hazard is permitted. Recall that if moral hazard is excluded, then the optimisation problems could have been solved as in (4.1) where without loss of generality H P k = E P k with k ∈ M could be assumed (see Remark 3.3).
In addition, Problem 2.5 is given by (4.2) Recall that ρ * k represents the optimal value of the objective function from Problem 2.6 with ρ ≡ CVaR α and is given by In the remaining part of this section, we show that Problems 4.1-4.3 can be reduced to LP reformulations. The next theorem deals with Problem 4.1.
Theorem 4.1. Let Assumption 2.1 hold with ρ ≡ CVaR α . If X is a discrete random variable that takes the values x 1 , . . . , x n and H P k = E P k for all k ∈ M, then solving Problem 4.1 over the set C × is equivalent to (4.4) Proof. Let ξ k = ξ 1k , . . . , ξ nk T for all k ∈ M. Then, (4.1) may be equivalently formulated as: Note that the objective function in (4.5) is increasing in ξ ik for all 1 ≤ i ≤ n and k ∈ M. Thus, the first two constraints from the latter optimisation problem ensure that ξ k = x − y − 1t k + for k ∈ M. Thus, (4.5) can be rewritten as follows: (4.6) We now show that y * , ξ * , P * , z * solves Problem 4.6 if and only if t * , y * , ξ * , P * , z * solves Suppose that y * , ξ * , P * , z * solves Problem 4.6, which implies that t * , y * , ξ * , P * , z * is a feasible solution to Problem 4.4. If t * , y * , ξ * , P * , z * does not solve Problem 4.4, then there exists a feasible solution t , y , ξ , P , z such that z < z * . Now, for all k ∈ M we have that Thus, y , ξ , P , z is feasible to Problem 4.6, which contradicts that y * , ξ * , P * , z * ) is an optimal solution to Problem 4.6.
Conversely, suppose that t , y , ξ , P , z solves Problem 4.4. Equation (4.8) implies that y , ξ , P , z is feasible to Problem 4.6. If y , ξ , P , z does not solve Problem (4.6), then there exists a feasible solution y * , ξ * , P * , z * such that z * < z . Then, t * , y * , ξ * , P * , z * solves Problem 4.4, where t * is defined as in (4.7). The latter contradicts our initial assumption that t , y , ξ , P , z is an optimal solution to Problem 4.4. The proof is now complete.
Finally, we solve Problems 4.2 and 4.3. By following the same arguments as provided in the proof of Theorem 4.1, one may show our claims from Proposition 4.1, and therefore, the proofs are left to the reader.
Proposition 4.1. Let Assumption 2.1 hold with ρ ≡ CVaR α . If X is a discrete random variable that takes the values x 1 , . . . , x n and H P k = E P k for all k ∈ M, then solving Problem 4.3 over the set C × is equivalent to Moreover, Problem 4.2 is equivalent to

Pareto Robust Optimisation
Robust optimal contracts have been found in Sections 3 and 4 without discussing the draw- in the sense that no improvement could be made for one or more players without affecting the allocation of at least one player. The mathematical formulation of the Pareto solution set corresponding to (5.1) is given by: for all k ∈ M and at least one inequality is strict .
It is not surprising that a Pareto solution may not be an element of X Ro , since worst-case type solutions are concerned only with extreme scenarios. Further, x * ∈ X Ro does not always imply that x * ∈ X P a , when (5.1) admits multiple solutions. It is not difficult to show that if x * is the unique solution of (5.1) then, x * ∈ X P a . Therefore, it is possible to solve (5.1) and produce a robust solution that is suboptimal for all concurrent objectives, which plays havoc with the entire decision process. Recall that Remark 3.1 explains when the closed-form solution is unique and one may show in that case that the unique solution is Pareto optimal as well. Appa (2002) and Mangasarian (1979) provide methodologies to check the uniqueness property of an LP and therefore, there is no issue with linear-type (5.1) optimisation problems with a unique solution. It is still not clear how to verify if a solution of (5.1) is an element of X P a . In addition, it would be interesting to provide a constructive method to generate solutions from X Ro X P a . These are the aims of this section. Specifically, we first note that the discrete versions of (2.4) and (2.5) have the following linear representation: Theorem 5.1. Let x * be any optimal solution of (5.2), where the latter problem is assumed to be non-trivial, i.e. c k with k ∈ M are not all null vectors. Consider the following optimisation problem: If the optimal value in (5.3) is zero, then x * ∈ X Ro X P a in (5.2). If the optimal value in (5.3) is negative, then x * + y * ∈ X Ro X P a in (5.2), where y * is an optimal solution of (5.3).
Proof. It is not difficult to find that the objective function of (5.3) is always non-positive. Assume now that the objective function is zero such that x * is not Pareto solution, but is a robust optimal solution of (5.2). Therefore, there exists a feasible solutionx of (5.2) such that (5.4) and at least one inequality holds strictly. Denoteŷ =x − x * and sincex is a feasible solution of (5.2), we get that Recall that Eq. (5.4) tells us that c T kx ≤ c T k x * for all k ∈ M, and in turn we get that Thus,ŷ is a feasible solution of (5.3). Moreover, Eq. (5.4) suggests that c T kx < c T k x * for some k ∈ M, and therefore, c T kŷ is negative for some k ∈ M. Consequently, the optimal objective value in (5.3) is negative, which contradicts our assumption of a null optimal objective value.
Assume now that the optimal objective value in (5.3) is negative. Note first that x * + y * is feasible in (5.2), since it is feasible in (5.3). Assume that x * + y * is not a Pareto solution, but is a robust optimal solution of (5.2). Thus, there exists a feasible solution, x, of (5.2) such that x * + y * is Pareto dominated by x. The mathematical formulation of the former is that (5.5) and at least one inequality holds strictly. Denote y = x − x * and since x is a feasible solution of (5.2), one may find that x * + y is a feasible solution to (5.2) as follows: Now, Eq. (5.5) and the fact that y * is feasible solution for (5.3) imply that which shows that y is feasible in (5.3). We also know that at least one inequality from Eq. (5.5) is a strict inequality and as a result, one of the above inequality holds strictly, which results in k∈M c T k y < k∈M c T k y * . The latter contradicts that y * is an optimal solution of (5.3). The proof is now complete.

Numerical analysis
This section provides numerical illustrations to our worst-case scenario and regret optimisation problems from (2.4) and (2.5), respectively. Recall that in order to empirically solve these problems, a sample x = (x 1 , x 2 , . . . , x n ) T , is drawn from the underlying distribution of X, and in turn, we find the optimal insurance contract y * = (y * 1 , y * 2 , . . . , y * n ) T and the optimal premium P * . Let (y * wc , P * wc ) and (y * wr , P * wr ) denote the empirical optimal solutions to our robust models (2.4) and (2.5), respectively. Our main aim is to give a quality comparison between (y * w , P * w ), w ∈ {wc, wr}, and a best possible choice (y * , P * ). Essentially, the latter is the "best solution" based on estimating a particular model chosen via two well-known standard statistical goodnessof-fit methods, namely Akaike Information Criterion (AIC) and Corrected Akaike Information Criterion (AICc), which are denoted as (y * AIC , P * AIC ) and (y * AICC , P * AICC ), respectively. We believe that those comparisons are fair and explain the advantages and disadvantages of robust optimisation over a standard optimisation after choosing the most significant model (in the statistical sense). Finally, recall that all optimisations are implemented on a desktop with 6 the smallest AIC value, i.e. k * = argmin k∈M AIC k . On the other hand, the AICc value of Model k penalises the utility of each model for its complexity when the sample size n is not large, i.e.
AICc k = 2q k × n n−1−q k − 2Ln L k . Similarly, the preferred Model k * * under the AICc criterion is chosen to have the smallest AICc value, i.e. k * * = argmin k∈M AICc k . Finally, (y * AIC , P * AIC ) and (y * AICC , P * AICC ) are obtained by solving (2.4) with M = {k * } and M = {k * * }, respectively. Denote the underlying distribution of X as Model 0 that is equipped with its discretised probability vector p * 0 obtainable as before: where F 0 is the cdf of X, i.e. a LogNormal distribution with parameters defined earlier. Let (y * T , P * T ) be the optimal solution obtained by solving a non-robust version of Problem 2.4 with M = {0} as given by p * 0 . This optimal solution mimics the ideal optimal decision, since the "true" distribution is assumed to be known and thus, all possible robust methods are compared with the decision under Model 0. Clearly, the model risk induces uncertainty with the model choice and this issue is numerically experimented in the remaining part of this section. In order to compare various decisions, we need to measure the distance between the robust methods and the one obtained via Model 0. That is, each optimal contract y * ξ , where ξ ∈ {wc, wr, AIC, AICC}, is compared to the benchmark optimal contract y * T as follows: |y * iξ − y * iT | × p i0 for all ξ ∈ {wc, wr, AIC, AICC}.
Clearly, the smaller the value for ∆ ξ is, the more robust of a decision is achieved. This criterion, further called simple criterion, assesses the choice of the optimal contract and it would be interesting to understand the possible drawbacks of those robust contracts, which may require increased premiums. A composite criterion would be needed when comparing (y * w , P * w ) to (y * c , P * c ) and is given by: a) (y * w , P * w ) is preferred and called "good scenario" if ∆ w < ∆ c and P * w − P * c < 10 −2 ; b) (y * c , P * c ) is preferred and called "bad scenario" if ∆ w > ∆ c and P * w − P * c > −10 −2 .
for any given w ∈ {wc, wr} and c ∈ {AIC, AICC}. Our numerical illustrations generate samples of size n ∈ {25, 50, 100, 250} for N = 500 times and compare the robust optimal decisions to the AIC non-robust optimal decisions under the two criteria (simple and composite). Extensive numerical experiments (for various parametric models and sample sizes) have shown that the AIC and AICc-based optimal decisions lead to similar results and for this reason, only AIC results are further reported. That is, we display the number of "good" and "bad" scenarios, namely G w,AIC and B w,AIC , respectively, where w ∈ {wc, wr}.
We first examine the VaR 0.75 -based optimal solutions for the simple test, where only the robustness of the risk transfer is analysed.       Tables 6.7-6.10 are the replica of Tables 6.3-6.6, where the set of feasible solutions is reduced such that the insurance contracts are assumed to be comonotone. That is, the first two rows of Tables 6.7-6.10 are computed as explained in Remark 3.2, while the results of the third and fourth rows are based on an LP formulation similar to the one from Proposition 3.1. As before, the last two rows are obtained by optimising the W CV aR risk measure, as defined in (2.8), but adding the comonotonicity constraint. Restricting our optimisation to comonotone contracts does not change our results, but we observe a loss of power amongst all three robust methods, which could be explained by the fact that an additional constraint increases the complexity of the problem. The general conclusions do not change and there is clear evidence to recommend our worst-case scenario method that outperforms the W CV aR robust method from Zhu and Fukushima (2009)   As a final remark, it is worth mentioning that applying Theorem 5.1 to our robust methods, all numerical results remain unchange. Therefore, the power of results are similar to those displayed in the section, which suggests that one should use our worst-case method in conjunction with  Theorem 5.1 in order to obtain a robust insurance contract that economically is viable to both insurance players.

Conclusions
The VaR and CVaR-based optimal insurance contract has been investigated under uncertainty, where the model risk is taken into account. This source of uncertainty is considered by incorporating multiple plausible models that the decision-maker would have available via estimation, proxy models or expert opinion consultation. Model risk always represents an important source of uncertainty in risk modelling and it is more pronounced when data scarcity is present.
Our aim has been to provide a robust decision and not to produce a distribution robust method of the underlying insurance risk. Two robust methods are proposed, namely the Worst-case and Worst-regret. Our numerical results have shown that our Worst-case method outperforms the Worst-regret method for CVaR-based decisions. Moreover, our Worst-case method proved to be more robust than the Worst-case CVaR method proposed by Zhu and Fukushima (2009). Unfortunately, the VaR-based decisions are not efficiently robustified for all sample sizes by neither methods proposed in this paper, though encouraging results are obtained for small samples. Another achievement of this paper is related to the well-known caveat in robust optimisation that optimal decision may be economically unacceptable. That is, the optimal contract may be not efficient in the Pareto optimality sense. We resolve this issue by providing a simple numerical method that allows one to identify an optimal Pareto and robust decision that is (numerically) shown to be efficient for reducing the model risk.