Necessary and sufficient conditions for kun tucker. Necessary conditions for the minimum of a function

Formulation of the problem

Let's consider a nonlinear optimization problem. Let there be functions

Under conditions .

William Karush in his thesis found the necessary conditions in the general case when the imposed conditions can contain both equations and inequalities. Independently, Harold Kuhn and Albert Tucker came to the same conclusions.

Necessary conditions for the minimum of a function

If, under the imposed restrictions, there is a solution to the problem, then there is a non-zero vector of Lagrange multipliers such that for the Lagrange function conditions are met:

Sufficient conditions for the minimum of a function

The listed necessary conditions for the minimum of a function are not sufficient in the general case. There are several options additional conditions, which make them sufficient.

Simple wording

If for an admissible point the conditions of stationarity, complementary non-rigidity and non-negativity, as well as λ 1 > 0, are satisfied, then .

Weaker Conditions

If for an admissible point the conditions of stationarity are satisfied, complementary to non-rigidity and non-negativity, as well as ( Slater's condition), That .


Wikimedia Foundation.

2010.

function f(x), where, relative to m constraints, i varies from one to m. Contents 1 Description of the method ... Wikipedia The Kuhn-Tucker theorem is a generalization of solution methods optimization problems

Linear programming to the nonlinear case, which by analogy received the not very successful name of “nonlinear programming”;

Nonlinear equality constraints on inequality constraints.

Karush-Kuhn-Tucker method and conditions ( Karush-Kuhn-Tucker conditions, KKT) are formally necessary conditions for the optimality of a nonlinear programming problem. In addition, the necessary conditions include the so-called regularity conditions for stationary points - the non-degeneracy of the set of constraint gradients. The method is a generalization of the Lagrange multiplier method to the case of inequality constraints.

Kuhn and Tucker generalized the Lagrange multiplier method (for use in constructing optimality criteria for problems with constraints in the form of equalities) to the case common task nonlinear programming with restrictions, both in the form of equalities and inequalities.

The main method in nonlinear programming still remains the use of the Lagrange function obtained by transferring restrictions to the objective function. The Kuhn-Tucker conditions are also derived from this principle.

William Karush in his diploma work in 1931 he found necessary conditions in the general case of restrictions on equalities and inequalities. Independently, Harold Kuhn and Albert Tucker came to the same conclusions later in 1941. Their results were more general.

If, under the imposed restrictions, is a solution to the problem, then there is a vector of Lagrange multipliers such that for the Lagrange function conditions are met:

- stationarity: ;

- complementary softness: ;

- non-negativity:.

The listed necessary conditions for the minimum of a function are not sufficient in the general case. There are several options for additional conditions that make them sufficient:

- simple condition – 1) the point is stationary, 2) the condition of complementary non-rigidity is satisfied, 3) Lagrange multipliers;

- Slater's condition (weaker) – 1) the point is stationary, 2) the condition of complementary non-rigidity is satisfied, 3) .

Let us proceed directly to the proof of the Kuhn-Tucker theorem.

If a continuous function of n variables x = (x1,...,xn) F(x) has at the point X opt maximum, then there exists ε > 0 such that for all x from the ε-neighborhood of the point X wholesale

F(x)≤F(x opt)

F(x)-F(x opt)≤0.

Let's choose two types of increment x j along j th coordinates

Δx j =x j -x jopt >0,

Δx j =x j -x jopt<0.



Passing in these relations to the limit at Δ x j→0, we get:

From these relations it follows that

(3.6.1)

A similar relation can be obtained for the case of a minimum function. Thus, the necessity of conditions (3.6.1) to achieve at the point x wholesale maximum or minimum function f(x), i.e. if there is an extremum, then conditions (3.6.1) are satisfied. But the equality to zero of all derivatives at the point x wholesale does not yet ensure the existence of an extremum in it, i.e., conditions (3.6.1) are not sufficient. Geometrically, this means that in the case of a zero derivative of a function of one variable there may be an inflection point, and not a maximum (or minimum), and in the case of a function of two variables, a saddle point, and not an extremum, etc. Therefore, the points x wholesale, in which relations (3.6.1) are satisfied are called stationary.

Note that condition (3.6.1) was obtained thanks to the ability to assign a variable X increments of two signs, which is where the two inequalities arose at and at . If the valid range X limited to non-negative values X≥0, then inside the region where X> 0, condition (3.6.1) remains valid, since increments of both signs are allowed there. On the border of the region X≥ 0, where X= 0, only positive increment Δ is allowed X>0, we can only talk about a one-sided derivative, and from (3.6.1) the following necessary condition for a maximum follows:

Accordingly, the necessary condition for a minimum at the boundary of the region x j= 0 will be written as

b) Conditional extremum problem

When determining the conditional extremum of a function, when it is necessary to determine the maximum (or minimum) of the function F(x) under limiting conditions:

g i (x) = b i , i = 1, ..., m,

f(x)=max;

g i (x)=b i ;

The method of Lagrange multipliers is also used, which, as in the case of classical calculus of variations, consists in introducing the Lagrange function

(3.6.2)

where λ i are the undetermined Lagrange multipliers.



Assuming that the function is a special case of the functional, we obtain that the necessary conditions for the extremum are found by direct differentiation of relation (3.6.2) and are written in the form


If we introduce vectors

relations (17-8) and (17-9) will be rewritten as

grad Φ = grad F - λ grad φ = 0; (3.6.6)

where the equality of vectors to zero is understood componentwise.



Figure 3.6.1 - Explanation of the conditional extremum problem

When n= 2 and m = 1 geometric problem about finding a conditional extremum comes down (Fig. 17-6) to finding the point of tangency A of the curve φ = b to one of the constant level curves f= const.

Theorem 7.2. For the convex programming problem (7.4)–(7.6), the set admissible solutions which has the property of regularity, the plan is optimal plan if and only if there exists a vector such that
– saddle point of the Lagrange function.

Assuming that the objective function
and functions are continuous and differentiable, then the Kuhn-Tucker theorem can be supplemented with analytical expressions that determine the necessary and sufficient condition for the point
was the saddle point of the Lagrange function, i.e. was a solution to the convex programming problem. These expressions have the following form:

Where And are the values ​​of the corresponding partial derivatives of the Lagrange function calculated at the saddle point.

The Kuhn-Tucker theorem occupies a central place in the theory of nonlinear programming. It is also valid for so-called quadratic programming problems, i.e. where is the objective function
with restrictions:

.

In the general case of solving nonlinear programming problems to determine the global extremum of the problem, there is no effective computational method if it is not known that any local extremum is also global. Thus, in a convex and quadratic programming problem, the set of feasible solutions is convex, then if the objective function is concave, any local maximum is also global.

Example 7.3. Using the Kuhn-Tucker theorem we find
under restrictions

We solve graphically (Fig. 7.2) and find:
;
;
(see solution below for more details). Let us show that there is a vector Y 0 0, at which the Kuhn-Tucker conditions are satisfied at the optimum point. Let's compose the Lagrange function:

Solution
we find from the system:
. From the second equation
substitute into the first equation of the system. Differentiate by :
. In expressions And substitute the values
And
. We have values ​​of partial derivatives greater than zero, and according to the condition they must be equal to zero, then =0 And =0. On the other side, can take non-zero values, because
from here
; All
those. the Kuhn-Tucker conditions are satisfied, therefore, this point is an extremum point.

Fig.7.2. Graphical solution to the nonlinear problem

programming example 7.3

Necessary condition for optimality for a nonlinear programming problem can be formulated as follows. Let
optimal solution problems (7.4)–(7.6), and the functions
,
,
differentiable at this point. If
is a nonsingular point of the admissible set of problem (7.4)–(7.6), then it is a Kuhn-Tucker point of this problem.

Thus, if an admissible set
problem (7.4)-(7.6) has no singular points, and the functions F(M),g i (M), (
) differentiable at all points
, then the optimal solution to this problem should be sought among the Kuhn-Tucker points.

As is known from mathematical analysis, special point sets of admissible solutions to a system of linear equations and inequalities of the form

i.e., sets of feasible solutions

if at the point
the gradients of those functions are linearly dependent
, which in it turn into . For example,
– singular point of the set

Really,

Thus, it is a linearly dependent system.

Gradient function
is a vector whose coordinates are respectively equal to the values ​​of the partial derivatives of the function
at the point M 0 . This vector shows the direction of the fastest growth of the function.

The necessary optimality condition is of little use for solving most nonlinear programming problems, because Only in rare cases is it possible to find all the Kuhn-Tucker points of a nonlinear programming problem.

Sufficient condition for optimality for a nonlinear programming problem: if
is the saddle point of the Lagrange function for problem (7.4)–(7.6), then
is the optimal solution to this problem.

This condition is not necessary in the general case: the Lagrange function may not have saddle points, at the same time original problem nonlinear programming has an optimal solution.

Various methods are known that allow one to find the optimal solution to a nonlinear programming problem approximately. However, these methods provide a fairly good approximation only for the convex programming problem, where every local extremum is also global. In general, under gradient methods understand methods in which the movement to the optimum point of a function coincides with the direction of the gradient of this function, i.e. starting from some point
a sequential transition is carried out to some other points until an acceptable solution to the original problem is identified. In this case, gradient methods can be divided into 2 groups.

TO first This group includes methods in which the points under study do not go beyond the range of feasible solutions to the problem. The most common of these methods is the method Frank-Wolf (Wolf). Such methods include the method projected Rosen gradients, method valid directions of Zeutendijk.

Co. second This group includes methods in which the points under study may or may not belong to the region of feasible solutions. However, as a result of the implementation of the iterative process, a point in the region of feasible solutions is found that determines an acceptable solution.

Of these methods, the most commonly used method is penalty functions or method Arrow-Hurwitz.

When finding solutions to problems using gradient methods, including those mentioned above, the iterative process is carried out until the gradient of the function
at the next point
will not become equal to zero or until
, Where – a fairly small positive number characterizing the accuracy of the resulting solution.

Solving complex nonlinear programming problems using gradient methods involves a large amount of calculations and is only advisable using a computer.

Using an example, we will show the general meaning of gradient methods for solving nonlinear programming problems.

Let there be a function of two variables
. Let the initial values ​​of the variables be
, and the value of the function
. If
is not an extremum, then such new values ​​of the variables are determined
And
so that the next value of the function
turned out to be closer to the optimum than the previous one.

How is the size of the increments determined? And ? To do this, at points And The direction of change of the function towards the extremum is determined using the gradient. The greater the value of the partial derivative, the faster the function changes towards the extremum. Therefore the increments And are chosen proportional to the value of the partial derivatives And at points
And
. Thus,
And
, Where – a value called step, which sets the scale of change And .

Example 7.4.

Let it be necessary to maximize the function

(maximum at point (3;2))


.

At the point
,
at
we have

;
,

Let's do one more iteration. At the point
,
at
we have

;
,

Fig.7.3. Geometric interpretation of two steps

gradient method for example 7.4

In Fig. 7.3 shows the movement along the gradient, which was carried out in in this example. Attitude indicates the direction of change of the function towards the maximum. If we take the gradient with a minus sign, then we will move towards the minimum.

A specific problem with gradient methods is the choice of step size t. If we take small steps, we will look for the optimum for a very long time. If you choose its value too large, you can “overshoot” the optimum. The intermediate problem of choosing the optimal step size is solved using appropriate methods. If step t determined approximately, then, as a rule, they fall not into the optimum, but into its zone. Gradient methods provide the determination of local optima. When searching for a global optimum using a gradient, there is often doubt that the found optimum is global. This is the disadvantage of this method when solving non-convex programming problems.

Self-test questions

    Statement of the nonlinear programming problem.

    The process of finding a solution to a nonlinear programming problem using its geometric interpretation.

    Algorithm of the Lagrange method for solving a nonlinear programming problem.

    Application of the Lagrange method to solving a nonlinear programming problem in the case where the connection conditions are inequalities.

    Auxiliary definitions and theorems necessary for considering the central theorem of nonlinear programming - the Kuhn-Tucker theorem.

    Kuhn-Tucker theorem.

    Necessary and sufficient optimality condition for a nonlinear programming problem.

    The general meaning of gradient methods for approximate solution of nonlinear programming problems.

    Groups of gradient methods for approximate solution of nonlinear programming problems.

In the previous section, the Kuhn conditions were constructed-Tucker for tasks conditional optimization. Using the Lagrange multiplier method, an intuitive idea was obtained that the Kuhn-Tanker conditions are closely related to the necessary optimality conditions. IN this section Rigorous formulations of necessary and sufficient conditions for the optimality of solving a nonlinear programming problem are considered.

Theorem 1. Necessity of the Kuhn-Tucker conditions

Let's consider the nonlinear programming problem (0) - (2). Let be differentiable functions, and x* be an admissible solution to this problem. Let's put it. Further, let them be linearly independent. If x* is the optimal solution to a nonlinear programming problem, then there is a pair of vectors that is a solution to the Kuhn-Tucker problem (3) - (7).

The condition that they must be linearly independent is known as the linear independence condition and is essentially some kind of regularity condition valid area, which is almost always satisfied for optimization problems encountered in practice. However, in general, checking the fulfillment of the condition of linear independence is very difficult, since it is required that the optimal solution to the problem be known in advance. At the same time, the condition of linear independence is always satisfied for nonlinear programming problems that have the following properties.

  • 1. All restrictions in the form of equalities and inequalities contain linear functions.
  • 2. All inequality constraints contain concave functions, all equality constraints contain linear functions, and there is also, according to at least, one feasible point x, which is located in the interior of the region defined by the inequality constraints. In other words, there is a point x such that

If the condition of linear independence at the optimum point is not satisfied, then the Kuhn-Tucker problem may not have a solution.

Minimize

under restrictions

In Fig. Figure 1 shows the region of feasible solutions to the nonlinear problem formulated above. It is clear that there is an optimal solution to this problem. Let us show that the condition of linear independence is not satisfied at the optimum point.

Rice.

It is easy to see that the vectors are linearly dependent, i.e. the condition of linear independence at the point is not satisfied.

Let's write down the Kuhn-Tucker conditions and check whether they are satisfied at point (1, 0). Conditions (3), (6) and (7) take the following form;

At, it follows from equation (11) that, whereas equation (14) gives, Therefore, the optimum point is not a Kuhn-Tucker point.

Note that violation of the linear independence condition does not necessarily mean that the Kuhn-Tucker point does not exist. To confirm this, let's replace the objective function from this example with the function. In this case, the optimum is still achieved at point (1,0), at which the condition of linear independence is not satisfied. Kuhn-Tucker conditions (12) - (16) remain unchanged, and equation (11) takes the form

It is easy to check that the point is a Kuhn-Tucker point, i.e. satisfies the Kuhn-Tucker conditions.

The theorem on the necessity of the Kuhn-Tucker conditions allows us to identify non-optimal points. In other words, Theorem 1 can be used to prove that a given feasible point that satisfies the linear independence condition is not optimal if it does not satisfy the Kuhn-Tucker conditions. On the other hand, if at this point the Kuhn-Tucker conditions are satisfied, then there is no guarantee that the optimal solution to the nonlinear problem has been found. As an example, consider the following nonlinear programming problem.

The following theorem establishes the conditions under which the Kuhn-Tucker point automatically corresponds to the optimal solution to a nonlinear programming problem.

Theorem 2. Sufficiency of the Kuhn-Tucker conditions

Let's consider the nonlinear programming problem (0) - (2). Let the objective function be convex, all inequality constraints contain concave functions, and equality constraints contain linear functions. Then if there is a solution that satisfies the Kuhn-Tucker conditions (3) - (7), then x* is the optimal solution to the nonlinear programming problem.

If the conditions of Theorem 2 are met, then finding the Kuhn-Tucker point provides an optimal solution to the nonlinear programming problem.

Theorem 2 can also be used to prove the optimality this decision nonlinear programming problems. To illustrate, consider the example again:

Minimize

under restrictions

Using Theorem 2, we prove that the solution is optimal. We have

Since the matrix is ​​positive semidefinite for all x, the function turns out to be convex. The first inequality constraint contains linear function, which is both convex and concave at the same time. For that

to show that the function is concave, we calculate

Because the matrix is ​​negative definite, the function is concave. The function is included in the linear constraint in the equality equation. Consequently, all conditions of Theorem 2 are satisfied; if we show that is the Kuhn-Tucker point, we will indeed establish the optimality of the solution. The Kuhn-Tucker conditions for example 2 have the form

The point satisfies constraints (24) - (26) and, therefore, is admissible. Equations (22) and (23) take the following form:

Putting it, we get and. Thus, the solution x*=(1, 5) satisfies the Kuhn-Tucker conditions. Since the conditions of Theorem 2 are satisfied, then the optimal solution to the problem from Example 3. Note that there are also other values ​​of and that satisfy system (22) - (29).

Notes

  • 1. For problems encountered in practice, the condition of linear independence is usually satisfied. If all functions in the problem are differentiable, then the Kuhn-Tucker point should be considered as possible point optimum. Thus, many of the nonlinear programming methods converge to the Kuhn-Tucker point. (Here it is appropriate to draw an analogy with the case unconditional optimization, when the corresponding algorithms allow one to determine a stationary point.)
  • 2. If the conditions of Theorem 2 are met, the Kuhn-Tucker point at the same time turns out to be a global minimum point. Unfortunately, checking sufficient conditions is very difficult, and, in addition, applied problems often do not have the required properties. It should be noted that the presence of at least one nonlinear constraint in the form of equality leads to a violation of the assumptions of Theorem 2.
  • 3. The sufficient conditions established by Theorem 2 can be generalized to the case of problems with non-convex functions included in constraints in the form of inequalities, non-convex objective functions and nonlinear equality constraints. In this case, such generalizations of convex functions as quasi-convex and pseudo-convex functions are used.

Kuhn-Tucker theorems are a generic name for statements that represent generalizations

application of Lagrange’s theorem to the case of optimization problems with constraints in the form of inequalities, i.e. problems

the following type:

gj(x) > 0, j = 1, .

M, (?)

x = (x1, . . . , xn) 2 X.

Here f: X 7! R - (in accordance with established terminology) objective function, gr: X 7! R,

r = 1, . . . ,m, are constraint functions, X _ Rn is an open set.

Theorem 196 (John's theorem in terms of a saddle point):

Let the functions f( ), g1( ), . . . , gn( ) are concave and?x is a solution to problem (?), such that?x 2 intX.

Then there are Lagrange multipliers _j >

X is a solution to the problem

We present these statements for the case when the functions f, gr are differentiable (theorems of Ku-

on-Tucker in differential form).

Recall that the function

L(x,_) = _0f(x) +

is called the Lagrange function (Lagrangian) of this problem, and the coefficients _j are multipliers

Lagrange.

The following statement holds.

Theorem 197 (John's theorem for differentiable functions):

Let?x be a solution to problem (?), such that?x 2 intX and functions f( ), g1( ), . . . , gn( ) differential

are quantifiable at point?x.

Then there are Lagrange multipliers _j > 0, j = 0, . . . ,m, not all equal to zero, such that

the following relations are satisfied (Kuhn-Tucker conditions):

0, i = 1, . . . , n

J = 0 (conditions of complementary

non-rigidity).

Note that the conditions for complementary slackness can be written in the form

gj(?x)_j = 0, j = 1, . . . , m.

From these conditions it follows that if the Lagrange multiplier is positive (_j > 0), then the corresponding

the constraint in solving the problem (at x = ?x) is satisfied as an equality (i.e. gj(?x) = 0). Others

in other words, this limitation is active. On the other hand, in the case when gj(?x) > 0, then the corresponding

the Lagrange multiplier _j is equal to zero.

If in the problem (?) some of the restrictions have the form of restrictions on the non-negativity of some xi,

then for them you can not introduce Lagrange multipliers by writing the following restrictions separately:

gj(x) > 0, j = 1, . . . , m, (??)

xi > 0, i 2 P _ (1, . . . , n). At the interior point (in the sense that1 ?x 2 intX) the first order conditions for i 2 P are then

will look like this:

For i /2 P here, as in the case of representing the problem in the form (?), the derivative of the Lagrange function

for that variable will look like @L(?x,_)

In addition, the conditions of complementary non-rigidity are also satisfied

From the second of these conditions it follows that for?xi > 0 (i 2 P)

On the other hand, if @L(?x,_)/@xi Another modification of the theorem is associated with the presence of constraints in the form of equalities in the problem. Designation

let us define the set of corresponding indices through E. The problem takes the following form:

gj(x) > 0, j 2 (1, . . . ,m)\E,

gj(x) = 0, j 2 E, (???)

xi > 0, i 2 P _ (1, . . . , n).

At the same time, John’s theorem removes the condition that all Lagrange multipliers are non-negative -

Lagrange multipliers _j for j 2 E can have an arbitrary sign.

John's theorem does not guarantee that the Lagrange multiplier of the objective function, _0, is nonzero.

However, if _0 = 0, then the Kuhn-Tucker conditions characterize not the solution to the problem under consideration, but

structure of the set of restrictions at the point?x and the theorem has no direct connection with the interest

our current task of maximizing the function f( ), since the gradient of the function f( ) itself disappears. from

Kuhn-Tucker conditions.

Therefore, it is important to characterize the conditions that guarantee that _0 > 0.

Such conditions are called regularity conditions.

In the case when the problem under consideration is convex, one of the conditions for regularity is

the so-called Slater condition has the form:

In the case when the objective function and the constraints of the problem are differentiable, the simplest

The regularity condition is formulated in terms of gradients of constraint functions and has the form:

the gradients of active constraints at point?x are linearly independent. (Among the restrictions considered are

restrictions on non-negativity should also be included.)

Let us denote by A the set of indices of those constraints that are active at the optimum point?x

(including indices of all constraints in the form of equalities), i.e.

gj(?x) = 0, j 2 A.

Then if the constraint gradients are vectors

are linearly independent2, then _0 > 0. This condition is called the Kuhn-Tucker regularity condition.

Note that if _0 > 0, then without loss of generality we can assume _0 = 1, which is usually done.

The corresponding theorem is called the (direct) Kuhn-Tucker theorem. Theorem 198 (Direct Kuhn-Tucker theorem, necessary condition for optimality):

Let the functions f( ), g1( ), . . . , gn( ) are differentiable, and?x is a solution to problem (?), such that

X 2 intX and the Kuhn-Tucker regularity condition is satisfied.

Then there are Lagrange multipliers _j > 0, j = 1, . . . ,m, such that when _0 = 1 are satisfied

the following ratios:

0, i = 1, . . . , n

It is easy to reformulate this theorem for problems (??) and (???). The same capabilities are required here.

modification of the Kuhn-Tucker conditions, as in John's theorem.

0, i = 1, . . . , n

can be rewritten as:

This relationship shows that at the optimum point the gradient of the objective function is a linear com-

combination of antigradients of restrictions, and all the coefficients of this linear combination are non-negative

valuable. Rice. Figure 17.1 illustrates this property. Intuitively, the idea behind this property is that

if any coefficient of a linear combination were negative, then it would be possible

increase the value of the objective function by moving along this constraint. One of the inverse versions of the Kuhn-Tucker theorem states that when functions are concavity

f( ), (gk( )) fulfillment of these conditions in an admissible solution?x (i.e., a point satisfying the constraint

values) for some Lagrange multipliers that satisfy the requirements of the direct theorem,

guarantees that?x is a solution to the problem.

Theorem 199 (Inverse Kuhn-Tucker theorem /sufficient condition for optimality/):

Let f( ) be a differentiable concave function, g1( ), . . . , gn( ) - differentiable

quasi-concave functions, the set X is convex and the point?x is admissible in the problem (?), and?x 2

Let, in addition, there exist Lagrange multipliers _j > 0, j = 1, . . . ,m, such that when

0 = 1 the following relations are satisfied:

0, i = 1, . . . , n

Then?x is the solution to the problem (?).

The theorem can be reformulated in an obvious way for problems (??) and (???). For the task (???)

constraints in the form of equalities can only be linear (this is due to the fact that the constraint in the form

equalities, gj(x) = 0, should be represented using two restrictions in the form of inequalities, gj(x) > 0

and?gj(x) > 0, each of which is given by a quasi-concave function; this can only happen if

the constraint is linear).

In another version of the sufficient optimality condition, the assumption that the target is concavity

function is replaced by the assumption of quasi-concavity with the addition of the condition rf(?x) 6= 0.