Organizacija, Volume 41                                                               Research papers                                              Number 5, September-October 2008
Organization in Finance Prepared
by Organization in Finance Prepared
by Stochastic Differential Equations
with Additive and Nonlinear Models
and Continuous Optimization
Pakize Taylan1,2, Gerhard-Wilhelm Weber1,3
1Middle East Technical University, Institute of Applied Mathematics, 06531 Ankara, Turkey, gweber@metu.edu.tr 2Dicle University, Department of Mathematics, 21280 Diyarbakir, Turkey, ptaylan@dicle.edu.tr 3University of Siegen, Faculty of Economics, Management and Law, 57076 Siegen, Germany
A central element in organization of financal means by a person, a company or societal group consists in the constitution, analysis and optimization of portfolios. This requests the time-depending modeling of processes. Likewise many processes in nature, technology and economy, financial processes suffer from stochastic fluctuations. Therefore, we consider stochastic differential equations (Kloeden, Platen and Schurz, 1994) since in reality, especially, in the financial sector, many processes are affected with noise. As a drawback, these equations are hard to represent by a computer and hard to resolve. In our paper, we express them in simplified manner of approximation by both a discretization and additive models based on splines. Our parameter estimation refers to the linearly involved spline coefficients as prepared in (Taylan and Weber, 2007) and the partially nonlinearly involved probabilistic parameters. We construct a penalized residual sum of square for this model and face occuring nonlinearities by Gauss-Newton’s and Levenberg-Marquardt’s method on determining the iteration step. We also investigate when the related minimization program can be written as a Tikhonov regularization problem (sometimes called ridge regression), and we treat it using continuous optimization techniques. In particular, we prepare access to the elegant framework of conic quadratic programming. These convex optimation problems are very well-structured, herewith resembling linear programs and, hence, permitting the use of interior point methods (Nesterov and Nemirovskii, 1993).
Key words: Stochastic Differential Equations, Regression, Statistical Learning, Parameter Estimation, Splines, Gauss-Newton Method, Levenberg-Marquardt’s method, Smoothing, Stability, Penalty Methods, Tikhonov Regularization, Continuous Optimization, Conic Quadratic Programming.
1   Introduction
This paper is devoted to a modeling of financial processes which may serve as a basis of analysis and structural investigation. An important expression of this structure, the composition of its parts - its organization of financial assets - is called portfolio consisting of securities such as bonds, stocks, certificates, etc.. The organization of this portfolio requests pricing, hedging, optimization and optimal control. Those processes are on single assets and price processes, and on larger portfolios as well. The present study focusses on the first part of this modeling called regression, especially, parameter estimation.
Real-world data from the financial sector and science are often characterized by their great number and by a high variation. At the same time, the data need to become well understood and they have to serve as the basis of future prediction. Both the real situation and the practical requests are hard to balance (Hastie, Tibshirani and Friedman, 2001; Taylan and Weber, 2007; Taylan, Weber and Beck, 2007).
In fact, related mathematical modeling faces with non-differentiability and a high sensitivity of the model with respect to slightest perturbations of the data. Our paper analyzes this challenge by discussing and elaborating the corresponding parameter estimation problem by means of Tikhonov regularization, conic quadratic programming
185
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
and nonlinear regression methods. Herewith, we offer an alternative view and approach to stochastic differential equations (SDEs), and we invite to future research and practical applications. As a preparation, we firstly introduce into our methodology of statistical learning entitled additive models, which we will then exploit systematically. Indeed, we will apply them to SDEs, using modern methods of regularization and optimization. We shall address both the linear and the nonlinear case of parameter estimation. By this we develop and improve the results made in (Taylan and Weber, 2007).
2   Classical Additive Models
Regression models are very important in many applied areas, the additive model (Buja, Hastie and Tibshirani, 1989) is one of them. These models estimate an additive approximation of the multivariate regression function. For N observations on a response (or dependent) variable Y, denoted by y= (y1, y2 ,..., yn )T measured at N design vector xi = (xi1, xi1,..., xim )T , the additive model is defined by
(2.1)
Y =b 0 +L,fj [Xj )+e
where the errors e are independent of the factors, Xj, E(e) = 0 and Var(e) = s2. Here, the functionsf j are arbitrary unknown, univariate functions, they are mostly considered to be splines and we denote the estimates by fj. The standard convention consists in assuming at X that E (fj (Xj ))= 0, since otherwise there will be a free constant in each of the functions (Hastie, Tibshirani and Friedman, 2001); all such constants are summarized by the intercept (bias) b0.
1.1  Estimation Equations for Additive Model
Additive models have a strong motivation as a useful data analytic tool. Each function is estimated by an algorithm proposed by (Friedman and Stuetzle, 1981) and called backfitting (or Gauss-Seidel) algorithm. As our estimator for b0, the mean of the response variable Y is used: b 0 = E(Y) . This procedure depends on the partial residual against Xj :
(2.2)               rj=Y-b 0-Xfk(xO,
and it consists of estimating each smooth function by holding all the other ones fixed . Then, E(r X )= f (X ) which minimizes E(Y-b 0 -Ym f (X S\ (Friedman and Stuetzle, 1981; Hastie and Tibshirani, 1987).
3    Stochastic Differential Equations
3.1  Definition (Stochastic Differential Equations)
Many phenomena in nature, technology and economy are modelled by means of a deterministic differential equation with initial value x0 e D :
x (:=dx/dt) = a(x,t), x(0) = x0.
But this type of modeling omits stochastic fluctuations and is not appropriate for, e.g., stock prices. To consider stochastic movements, stochastic differential equation (SDE) are used since they arise in modeling many phenomena, such as random dynamics in the physical, biological and social sciences, in engineering and economy. Solutions of these equations are often diffusion processes and, hence, they are connected to the subject of partial differential equations. We try to find a solution for these equations by an additive approximation (cf. Section 2), which is very famous in the statistical area, using spline functions.
Typically, a stochastic differential equation, equipped with an initial value, is given by
(3.1)
X(t) = a(X, t) + b(X, t)dt     (t g [0, oo)), X(0) = x0,
where a is the deterministic part, bdt is the stochastic part, and dt denotes a generalized stochastic process (Kloeden, Platen and Schurz, 1994; Oksendal, 2003).
An example of a generalized stochastic processes is white noise. For a generalized stochastic processes, derivatives of any order can be defined. Suppose that W is a generalized version of a Wiener process which is used to model the motion of stock prices, which instantly responds to the numerous upcoming informations. A one-dimensional Wiener process (or a Brownian motion) is a time continuous process with the following properties.
1.     W0 = 0, with probability one.
2.     Wt D N(0, t) for all te (0 < t < T), that is, for each t the random variable W is normally distributed with mean E\Wt] = 0 and variance Var[Wt]= E[Wt/] = t .
3.    All increments AWt =: Wt+At-Wt on nonoverlapping time intervals are independent. That is, the displacements W —W and W —W are independent for all
2                   1                             4                   3
0 < tj < t2 < t3 < t4.
We note that a multi-dimensional Wiener processes can be similarly defined. Usually a Wiener process is diffe-rentiable almost nowhere. To obtain our approximate and, then, smoothened model, we treat Wt as if it was differentiable (a first approach which is widespread in literature). Then, white noise dt is defined as dt =Wt= dWtdt and a Wiener process can be obtained by smoothing the white
186
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
noise. If we replace dtt dt by dWt in equation (3.1), then, this stochastic differential equation can be rewritten as
(3.2)
dXt = a(Xt ,t)dt +b(Xt ,t)dWt ,
where a(X t) and b(X t) are drift and diffusion term, respectively, and Xt is a solution which we try to find based on the experimental data. Equation (3.2) is called Ito SDE. Here we want to simulate values of Xt, since we do not know its distribution. For this reason, we simulate a discretized version of the SDE.
3.2 Discretization of SDE
There are a number of discretization schemes available; we choose the Milstein scheme. Then, we represent an approximationX  , in short:X  (je IN), of the process
t j
Xt by
X    j + l     =X   j    +a(X  j,tj)(tj + l    ~ t  j )   +  b (X  j , t  j )(Wj + j    ~ Wj )
(3.3)
+ -(b'b)(X  t)[(W    -Wf -(t    -t )],
where the prime “' ” denotes the derivative with respect to t. Now, particularly referring to the finitely many sample (data) points (X j , tj ) ( j =1, 2,..., N), we get
(3.4)
X            —  _           —   _ AWj
X j = a(X j,tj) + b(X j,tj)   —
h
j
i           _  _ ((AWX     1 +1/2(b b)(Xj,tj)\=j------1
ç
e
h
)
Here, the value X   represents a difference quotient
j
based on thejth experimental data X and on step lengths Atj =hj :=tj+1 — tj between neighbouring sampling times:               X       _
\j+i-Xj
X
h
if  j =1,2,..., N - 1,
 XN —XN ,
h'',  if j = N. {       K
The relations (3.4) cannot be expected to hold in an exact sense, since they include real data, but we satisfy them best in the approximate sense of least squares of errors. For the sake of convenience, we still write “=” instead of the approximation symbol “ ~ ”, and we shall study the least-squares estimation in Subsection 3.3.
Since Wt D N(0, t), the increments AW are independent on non-overlapping intervals and moreover, Var(AW ) = AT, hence, the increments having normal
j         tj
distribution can be simulated with the help of standard normal distributed random numbers Zj. Herewith, we obtain a discrete model for a Wiener process:
AW = Z .At., Z 0 N(0,l) .
j             j  v        j           j                        /
(3.5)
If we use this value in our discretized equation, we obtain
(3.6)
X j =a(Xj,tj) + b(Xj,tj) Z j— + — (b'b)(Xj,tj)(Z2 — 1). 4hj    2
For simplicity, we write equation (3.6) as
(3.7)             X j =Gj +H c +(H H )d,
 jj                jjj
where
cj '-=Zj   Jhj, dj := 1/2 ( Zj2 -1 \ Gj := a(X   t) and H   :=b(X ,t ) .
To find the unknown values of G   and H , we consi-
j                          j
der the following optimization problem:
(3.8)
nin     N  IX —(Gj +Hjcj +(HjH )dj) I
Here, y is a vector which comprises all the parameters in the Milstein model. We point out that also vector-valued processes could be studied, then referring to sums of terms in the Euclidean norm II |2. Data from the stock market, but also from other sources of information or com-muncation, have a high variation.
Then, we must use a parameter estimation methods which will diminish this high variation and will give a smoother approximation to the data. Splines are more flexible and they allow us to avoid large oscillation observed for high-degree polynomial approximation. We recall that these functions can be described as linear combinations of basis splines and approximate the data (X ,tj) smoothly. Therefore, we approximate each function underlying the_values G =a(X ,tj), H =b(X tj) and Fj =b'b(X ,t) in an additive way established on basis splines. This treatment is very useful for the stability of the model in the presence of the many and highly varying data. Let us use basis splines for each function characterized by a separation of variables (coordinates); e.g., in equation (3.7):
(3.9)
2    dp
Gj = a(Xj,tj) =a 0 +^jfp(Uj p)=  a0 + 5^5^ a lp Bl (Uj  ),
Hjcj =b(Xj,tj)cj =b 0 +^jgr(Uj r)= b 0 + ^j^j b rm Cm(Uj),
p=1 l=1
2drh
+
r=1 m=1 2dsf
Fjdj =b'b(Xj,tj)dj =j 0 +~y\hs(Ujs)= j0 + ^.^.j n Dn(Uj),
where U =(U , U „ Y= (X tj^\ Let us give an exam-ple on how one can gain bases of splines. If we denote the k th order basis spline by B k, a polynomial of degree k -1 with knots, say x , then a great benefit of using the base splines is provided by the following recursive algorithm (De Boor, 2001):
187
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
(3.10)
fi,   if x h < x< x h j
B j(x) = <
[O,  otherwise,
B h k(x)
x — x
x   k_l — xh
xh+k ~x
B],k-\ (x) "'                         B +i,k-i (x) •
3.4 The Penalized Residual Sum of Squares Problem for SDE
We construct the penalized residual sum of squares for SDE in the following form:
(3.11) PRSS(q,f,g,h) := f,^j-(Gj+Hjcj+Fjdj)j + '^l pl[f"(Up)]IdUp
Here, for convenience, we use the integral symbol “ [ ” as a dummy in the sen se of    f   ; where [ak,b k] {k =p,r,s)
[a    b k ]
are appropriately large intervals where the integration takes place, respectively. Furthermore, l m j > 0 are smoothing (or penalty) parameters, they represent a tradeoff between first and second term. Large values of l m j yield smoother curves, smaller values result in
p~     r'T s
more fluctuation. If we use an additive form based on the basis splines for each function, then PRSS will become
(3.12)      X^-Gj +Hjcj +Fj dj
N r
For simplicity, we introduce the following matrix notation:
N    |\        (              2    dhp                    _                         2    drg                      _                         2    df                    _        }\
X\Xj-\a„+XXa lp Blp(UjJ+b«+Tlilb m Crm(Uj^+j«+Tls jnDn(UjJ
(3.13)
Gj +Hjcj +Fjdj =a0 +^^ alp Bp(Uj p) +
+ b« + XX bm Crm(Ujr)+jo + XXjn Dn(Uj,s)
where
Aj = (Bj    Cj    Dj ),Bj = (1    B1j    B2j ),Cj = (1   C1j    C2j ),Dj = (1   D1j    D2j )
Bjp = (B1p (U j,p ),Bp2 (Uj, p ),...,Bdp gp (U j, p ))   (p =1,2), Crj =(Cr1(U j,r ),Cr2 (Uj,r ),...,Crdrh (U j,r ))(r
Dsj = (D1s (U j,s ),Bs2 (Uj,s ),...,Bsdsf (U j,s ))(s =1,2)    and
= 1,2),
q = ( a T,b T,jT T,a = ( aa,aT,aTT, ap = (a'p,a2p,...,a g T  p = 1,2), b = ( b0,bLb TT br = \b r,b2r,--,b rdrh )(r = l,2),j  = j0:jt tj T ) >js = j s'js'-'jsds )(s = ^2).
Now, we can obtain the residual sum of squares as the squared length of the difference between X and Aq , whe-
re A is matrix which contains the row vectors Aj, and X is our vector of difference quotients standing for the change rates of the experimental data:
(3.14)
where
-N     «-•          A      -2.          _•_        __      2
y   <X   —      0 > =   X — A q    ,
*-j X j      J    \
j_i     *-                          -*                                      2
A=(A T,A T,...,ANT^ ,X = (X1,X2,...,XN j .
Indeed, we get a discretized form of each integration term in the following way:
(3.15)
2                     N-1                            2
oéë fpcc (Up ) uu dUp @ aéë fpcc (Ujp )uu (Uj+1, p -Ujp )
p      jp                j+hp       jp
N-i\dgp Bl»     T
Using Riemann sums, we can discretize and represent each integration by the squared length of a vector, namely,
(3.16)
2                      N-1
oéë fpcc (Up ) uu dUp @ a
a                                                   j=1
Bjpccujap
APBap        (p =1,2),
2
a [g"(Ur)] dUr =^
 2                   N-l
dU = >
cc
Crj v jbr
ACb
(r = 1,2),
b
oéëhscc (Us )
Dj w j    Dj=\\AD>           s = 1,2).
Here,                                                      T
ArC := ( C rTvi,CLTv2,...,CrN_?TvN_i) , vj :=Uj+lr-Uj
j+\,p        w j,p
AD := lD s T
^DTwdstwds
1        2         2              N-1         N-1
T wN_1 1 ,w := JUj+1 s —U s      (j = 1,2,...,N-1)
Using this discretized form in (3.17), PRSS looks as follows: (3.17)
-      q        2        A        2    _L        -       p    _L       A PRSS(q,f,g,h) = \\X-A0j +2l ppap    +Zjm r ArC b r \ +Zjf' K V-
But, rather than a singleton, there is a finite sequence of the tradeoff or penalty parameters l ={ll,l2,m 1,m2 ,jx,j2\ such that this equation is not yet a Tikhonov regularization problem with a single such parameter. For this reason, let us make a uniform penalization by taking the same value l = m =j =l=d2
 p             r        t s
for each term. Then, our approximation of PRSS can be rearranged as
(3.18)           PRSS{q,f,g,h) =
with the (6(Í -l)xm )-matrix

+ d2 Lq

2

2
2

2
2
s=1 n=1

2
2
188
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
fo   AB     O    O    O      000      o  i 0    0     AB    0     0      000      o
2
00      0     0    AC     0     0     0      0 L :=\
0       000     A     00       O
I
O       000       0     0A D      O I o    o     000     0    0    0A1
Herewith, based on the basis splines, we have identified the minimization of PRSS for some stochastic differential equation as an Tikhonov regularization problem (Aster, Borchers and Thurber, 2005):
(3.19)
min
 2             2
\\Gm — d\\ + d \\Lm\\
with penalty parameter l = d2. This regularization method is also known as ridge regression; it is very helpful for problems whose solution does not exist, or which is not unique or not stable under perturbations of the data. MATLAB Regularization Toolbox can be used for solution (Aster, Borchers and Thurber, 2005).
4   An Alternative Solution for Tikhonov Regularization Problem with Conic Quadratic Programming
4.1  Construction of the Conic Quadratic Programming Problem
We just mentioned that we can solve a Tikhonov regularization problem with MATLAB Regularization Toolbox. In addition, we shall explain how to treat our problem by using continuous optimization techniques which we suppose to become a complementary key technology and alternative to the concept of Tikhonov regularization. In particular, we apply the elegant framework of conic quadratic programming (CQP). Indeed, based on an appropriate, learning based choice of a bound M, we reformulate our Tikhonov regularization as the following optimization problem:
(4.1)
min
Aq- X
subject to     \Lq\  <M.
Here, the objective function in (4.1) is not linear but quadratic, however, the original objective function can be moved to the list of constraints, and we can write an equivalent problem as follows:
(4.2)
min    t,
t,q
subject to
Aq- X
<t2,   t>0,
\\Lq\\  <M,
 2
or	min    t, t,q	
(4.3)	subject to	Aq- X
\lq\ <vm.
Then, if we consider the form of a conic quadratic optimization problem (Nemirovski, 2002) (4.4) min c x,      subject to    \\Dix — di\\<piTx-qi  (i = \,2,...,k),
x
we can see that our optimization problem for SDE is a conic quadratic program with             c = (,   <fTm,
x = ( t   q   ) , Dl = (0N,A),   dx=X,
p\ = (1j0,...,0) ,    qx = 0,
D2 = (06(NV L), d2 = 0, px = 0Tm+l, qx = —y/M ,
2                   2                   2
m = V dg + V dh + y dh + 3.
p=\               r=\               s=1
In order to state the optimality conditions, we firstly
reformulate our problem as
(4.5) min    t,
— \          /  X \
( 0N     A\(t\   f —X I
such that    c '.=
{
10T
+
q ) (of
6(N—\)
)

Here,c and h belong to LN+1 and L(N-V>+1^ whereLN+1 and L6<-N-r>+l are the ( N +1 )- and ( 6(N -1) +1 )-dimen-sional ice-cream (or second-order Lorentz) cones, defined by
Ln :=<x = (xl,x2,...,xnf e Rn \xn >yjx2 + x2+... + x2nl > (n>2).
Then, we can also write the dual problem to the latter problem as (4.6)
max   (XT ,G)kl + ( 0T6(N_i),-y[MSk2
((L     1 I       (oT N ..     0 ^       ( \\
A T                 kl + \           T                      k2=\   n
such that
keLN+\ k2eL6(N-1)+1.
Mor ever, (t,q,c,h,kl,k2) is the primal-dual optimal solution if the following constrains are provided in the corresponding ice-cream (second-order Lorentz) cones:
2
2
2
2
2
189
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
(0N    A \( t i
(4.7)
r°6(N-i) I °
\a t
T mAqJ
-X) o / fo
6(N—l)    |
M  f
6(N-l)
0  1
1
l
= 0,    k 2h = 0,
(I ')
1°m/
1                   '         2                             '
ceLn+1,   heLlnl1.
4.2   On Solution Methods for Conic Quadratic Programming
For solving “well-structured” convex problems like conic quadratic problems, there are interior point methods (IPMs) which were firstly introduced by Karmarkar (1984). IPMs classically base on the interior points of the feasible set of the optimization problem; this set is assumed to be closed and convex. Then, an interior penalty function (barrier) F(x) is chosen, well defined (and smooth and strongly convex) in the interior of the feasible set. This function is “blowing up” as a sequence from the interior approaches a boundary point of the feasible set (Nesterov and Nemirovskii: 1993). Of great importance are primal-dual IPMs which refer to the pair of primal and dual variables.
The  canonical barrier function  for  second-order (Lorentz) cones
L n '¦= { x= (xl,x2,...,xnf e Rn \xn >yjxf +... + xn_i }
(n>2) is defined by L n (x) := - ln(xn2 - x\
2    \
 x n  -,     j
— ln(x Jnx), where J =\                  . The parameter of
I    0       II
this barrier is a (L ) = 2.
These algorithms have the advantage of employing the structure of the problem, of allowing better complexity bounds and exhibiting a much better practical performance.
5   On Nonlinear Dependence on Parameters and Their Estimation
Let return to equation (3.2) again, with two ways of generalization. (i) The model functions a(.) and b(.) may not only depend on the parameters which appear as coefficients in the linear combination with base splines, but also on really probabilistic (stochastic) parameters. (ii) Differently from the earlier linear dependence on the parameters, the dependence on the newly considered parameters may be nonlinear. In that case, we should use
any nonlinear parameter estimation methods like, e.g., Gauss-Newton’s method or Levenberg-Marquardt’s method (Nash and Sofer, 1996).
Let us look at (i), for example, we consider following the stochastic differential equation:
f dXt = rvXtdt + s XtdWt, [X(0)=x0,
where Xt =X(t) denotes the (random) price of a stock
at time t > 0, and m > 0 and s are parameters called the
drift and volatility of the stock and x0 is the starting price,
respectively. Then, referring to the finitely many sample
(data) points (Xk,t k)(k =1,2,...,N) we get
X                       AWK    1   .    .   ^ J(AWK)2     ^
X   =mXk+sXk h k +   s(P'P)(tk)\y  _k—1 '
K      2                     [    hk          J
= g (Xk ,m , s \ To determine the unknown values m, s we consider following optimization problem: (5.1)
min f(b):=
L  Xk -g(X k,m , s )
I
¦Yf 2( b )       or -Yf k 2 ( b )
Here, b = (/, s f, P(X):=X, hence P'( k ):=0 (since P does not depend on t), and the objective function f(b ) of parameter estimation is defined linearly in auxiliary functions fk squared(K= l,2,...,N). This prob-lem representation holds true also if the quadratic term (1/2)s2(P'P)(t k)((AWk)2lhk —\) would not vanish and in many further examples where (ii) the parametric dependence may be nonlinear indeed.
Nonlinear parametric dependence can occur by the composition of stochastic processes. For example, in financial modelling of the dynamics of wealth from time t to t + dt or maturity time T, Vt, may be given by
dVt =   q tT (m - re) + r Vt
v0,
dt — ctdt + qTts VtdWt
V   =  u
where qt is the fraction of wealth invested in the risky asset at time t and and ct is the consumption at time t. We can easly identify both a(t, Vt,ct,qt;r,m ) := (q tT (m - re) + r IVt - ct and b(t,Vt,qt;s):=q tT sVt. Here, r is the short-term interest rate, m denotes the vector of expected rates of return, e is the vector consisting of ones, s stands the volatility matrix of the risky assets. The entire parameter b := (r,m , s )T (arranged as a column vector) is assumed to be constant through time (Akume, 2007). Finally, W is a Wiener process with the property that dW is NYO du distributed. While the dependence of the right-hand side of the stochastic differential equation on b is linear, nonlinear parametric dependencies can occur via the insertion
2
0
)
m
_
190
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
of the processes ct and qt in a and b, but also if r becomes a stochastic process rt , e.g., in the following way. Namely, as a direct example of nonlinearity, the stochastic interest rate rt for each x e D   may be given by
drt=a  (R — rt^dt + st rf dWt,
where st and Wt are volatility and a Brownian motion, respectively Here, a is a positive constant, and the drift term a {R — rt\ is positive for R > rt and negative for R < rt (Seydel, 2003). We denote a(t,rt,R):=a(R — rt^\ and b(t, rt,s t',t}'.= strt t . This process on the interest rate can be attached to a price or wealth process. By this interest rate processes and the composition of stochastic processes, further parameters such as (R,t), can implicitly and in a partially nonlinear way enter the interest
rate dynamics rt and processes beyond of that dynamics. In fact, the financial sector with the modeling and prediction of stock prices and interest rate are the most prominent application areas here. Moreover, mixed linear-nonlinear dependences on the parameters may be possible due to the linearly and the nonlinearly involved parameters of various kinds. This optimization problem
(5.1)  means a nonlinear least-squares estimation (or nonlinear regression). In the context of data fitting, each of the functions fk corresponds to a residual in our discrete approximation problem which may arise in a mathematical modelling or in an inverse problem. Let us represent basic ideas of nonlinear regression theory with the help of (Nash and Sofer, 1996).
Now, (5.1) can be represented in vector notation:
(5.2)                        min  f(b):=   —F T (b)F(b), where     F      is     the     vector-valued     function
F(b) '¦= (fi(b ),..., fN(b)~) (b e ? p ) and where the factor 1/2 serves for a more “optimal” normalization of the derivatives. In fact, by the chain rule we obtain
(5.3)
N f (b):= NF(b)F(b),
where Vf(b) is an (pxNI-matrix-valued function. By row-wise differentiation of \f(b) and using this gradient representation, we obtain the Hessian matrix of f :
N
(5.4) V2f(b):=   VF(b)VFT(b) + k  fk(bj72fk(b).
K=l
Let b * be a solution of (5.1) and suppose f(b*) = 0 . Then, f (b*) = 0 (k = 1,2,...,N), i.e., all the residuals rK are vanishing and the model fits data without error. As a result, F(b*)= 0 and, by (5.3), Vf(b*) = 0, which just confirms our first-order necessary optimality condition. Furthermore, we can obtain the Hessian of f being
N2 f(b*):=   NF(b*) NT F(b*) ,
which is a positive semi-definite matrix, just as we expected by our second-order necessary optimality condition. In case where NF(b*) is a matrix of full rank, i.e., rank (NF(b*)) = p, then N 2 f (b*) is positive definite, i.e., second-order necessary optimality condition is provided such that b* is also a strict local minimizer.
From this basic idea, a number of specialized nonlinear least-squares methods come from. The simplest of this methods, called Gauss-Newton uses this approximative description in an indirect way. It make a replacement of the Hessian in the formula
(5.5)                          N 2 f(b)q = -N f(b) , such that we have relation
(5.6)                      NF(b)NT F(b)q = -NF(b)F(b) ,
where q is Gauss-Newton increment  q=b1 -b0 . If
F(b*) » 0 and rank(NF(b*)) = p (L N), then, near
to a solutionb*, Gauss-Newton behaves like Newton’s method. However, we need not pay the computational cost of calculating second derivatives. Gauss-Newton’s method sometimes behaves poor if there is one or a number of outliers, i.e., if the model does not fit the data well, or if rank(NF(b*)) is not of full rank p. In these cases, there is a poor approximation of the Hessian.
Many other nonlinear least-squares methods can be interpreted as using an approximation of the second additive form in the formula for the Hessian. i.e., of
(5.7)
%f k(b)v2fk(bl
Levenberg-Marquardt’s method uses the simplest of these approximation:
(5.8)
Xf k(b b)V2f kGb) = np
with some scalar l > 0. This approximation yields the following linear system:
(5.9)         (VF(b)VTF(b)+l lp)q = -VF{b)F{b).
We can often find Levenberg-Marquardt method implemented in the context of a trust-region strategy. There, q is obtained, e.g., by minimizing a quadratic model of the objective function with Gauss-Newton approximation of the Hessian:
191
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
(5.10)
i1 min Q(q) := f (b) +qTNF(b)F(b) +   qTNF(b)NT F(b)q
i
2
<A.
subiect to     \\q\\
^           J                            -*   2
Here, A is indirectly determined by picking a value of A. The scalar A can be chosen based on the effectiveness of the Gauss-Newton.
Levenberg-Marquardt method can be interpreted as a mixture between Gauss-Newton method (if A ~ 0 ) and steepest-descent method (if A is very large) (Aster, Borchers and Thurber, 2005; Nash and Sofer, 1996). An adaptive and sequential way of choosing A and, by this, of the adjustment of mixture between the methods of Gauss-Newton and steepest-descent, is presented in (Nash and Sofer, 1996). We note that the term “A Ip” can also be regarded as a regularization term that shifts the eigenvalues of VF(ß)VTF(ß) away from 0.
Another way to solve the system (5.9) for given ß = ßk, i.e., to find the (k+1)-st iterate q = qk, constists in an application of least-squares estimation. If we denote (5.9) by Gq = d, where G = VF(ß)VTF(A)+ßl and d = —VF(ß)F(ß), then we can study the regularized problem by adding to the squared residual norm | Gq — d\\l a penalty or regularization term of the form S2 \\Lq ||2, i.e.,
(5.11)
11 (VF(ßf7TF(ß) + XlN)q-(-VF(/?)F(ß))|\\ + S2 11Lq11
where L may be the unit matrix, but it can also represent a discrete differentiation of first or second order. This regularization serves to diminish the complexity of the model. We recall (Aster, Borchers and Thurber, 2004) for closer explanation about this Tikhonov regularization. But instead of the penalization approach, we can again bound the regularization term \\Lq\f2 by an inequality contraint. What is more, we can turn the optimization problem to a CQP problem in order to find the step q and, herewith, the next iterate ß := ßk+qk. By this conic quadratic modelling and solution technique we are back in the methodology that we presented in Section 4. Indeed, with a suitable and maybe adaptive choice of an upper bound Mj (Iscanoglu Çekiç, Weber and Taylan, 2007; Taylan and Weber, 2007; Taylan, Weber and Beck, 2007) we can write our problem as
|| (VF(ß)VTF(ß)+l. IN)q-(-VF(ß)F(ß))\\22, subject to   || Lg ||2<Mj,
(5.12) mbin
or we can write an equivalent problem as follows:
min    t,
subjectto        MVF(ß)VTF(b)+\ I   )<j- ( -VF(j8)F(j8) ) ||2J   <t2,    t>0,
 ^                                                            '                                                                 2
11 Lq 112 < M1.
Then, if we consider the general problem form (Nemi-rovski, 2002)
min c x,      subject to    \Dix — di\<pTjx-qj  (/=1,2,...,&),
we can see that our optimization problem for determining step length q is a conic quadratic program with
c = 0    °Tp)T>
x = ( t   qT ) ,Dl = (0 ,A),  dl=—VF(ß)F(ß),
p1 = (l,0,...,0)r,    q1=0,
D2=(0 ,L x ), d2 = 0 , p2 = 0 +1  and   q2 = —¦^Ml ,
6    Concluding Remarks
This paper gave a new contribution to problems related with SDEs using regression under an additive model or a nonlinear formulation, as a preparatory step on the way of organizing assets in terms of portfolios. We made modern methods of inverse problems and continuous optimization, especially, CQP and methods from nonlinear regression, become accessible and usable. Herewith, a bridge has been offered between statistical learning and data mining on the one hand, and the powerful tools prepared for well-structured convex optimization problems (Boyd and Vandenberghe, 2004) and Newton- and steepest-descent type regression methods (Nash and Sofer, 1996) on the other hand. We hope that future research, theoretical and applied achievements on this fruitful interface will be stimulated by our paper. The study on prediction of credit-default risk (Iscanoglu Çekiç, Weber and Taylan, 2007) already showed the value of our generalized additive model approach. Indeed, further combined applications of our methods on real-word data from areas of finance, science and technology may be expected, where our contribution can be utilized.
References
Akume, D. A. (2007). Risk Constrained Dynamic Portfolio Management, Doctoral’s Thesis, University of Yaounde I, Department of Mathematics.
Aster, R., Borchers, B. & Thurber, C. (2005). Parameter Estimation and Inverse Problems, Elsevier Academic Press, London.
Boyd, S. & Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press, Cambridge.
Buja, A., Hastie, T. & Tibshirani, R. (1989). Linear smoothers and additive models, The Ann. Stat. 17(2): 453-510.
De Boor, C. (2001). Practical Guide to Splines. Springer Verlag, New York.
Friedman, J. H. & Stuetzle, W. (1981). Projection pursuit regression, J. Amer. Statist Assoc. 76: 817-823.
Hastie, T. & Tibshirani, R. (1987). Generalized additive models: some applications, J. Amer. Statist. Assoc. 82(398): 371-386.
192
Organizacija, Volume 41
Research papers
Number 5, September-October 2008
Hastie, T., Tibshirani, R. & Friedman, J.H. (2001). The Element of Statistical Learning, Springer Verlag, New York.
Iscanoglu Çekiç, A., Weber G.-W. & Taylan, P. (2007). Predicting Default Probabilities with Generalized Additive Models for Emerging Markets, Invited Lecture, Graduate Summer School on New Advances in Statistics, METU, August 11-24, 2007. http://144.122.137.55/gweber/.
Kloeden, P. E, Platen, E. & Schurz, H. (1994). Numerical Solution of SDE Through Computer Experiments, Springer Verlag, New York.
Nash, G, and Sofer, A. (1996). Linear and nonlinear programming, McGraw-Hill, New York.
Nemirovski, A.(2002) Five lectures on modern convex optimization, Available from: http://iew3.technion.ac.il/Labs/Opt/ opt/LN/Final.pdf (Accessed 5. April 2007).
Nesterov, Y E. & Nemirovskii, A. S. (1994) Interior Point Methods in Convex Programming, SIAM Publications, Philadelphia.
Oksendal, B. K. (2003). Stochastic Differential Equations: An Introduction with Applications, Springer, Berlin.
Seydel, R. U. (2003). Tools for Computational Finance, Springer, Berlin.
Taylan, P. & Weber, G.-W. (2007). New approaches to regression in financial mathematics by additive models, J. Comp. Techn. 12( 2): 3-22.
Taylan, P. & Weber, G.-W. (2007) Approximation of stochastic differential equations by additive models using splines and conic programming, to appear in the proceedings of CASYS’07, Eighth International Conference on Computing Anticipatory Systems, Edited by Dubois, D. M. American Institute of Physics.
Taylan, P., Weber, G.-W. & Beck, A. (2007) New approaches to regression by generalized additive, models and continuous optimization for modern applications in finance, science and technology, Optimization 56(5-6): 675-698.
Pakize Taylan joined the Art and Science Faculty, Mathematics Department, of Dicle University in Diyarbakir, Turkey, in February 1990. She received an Msc. degree in 1993 and Ph.D. degree in 1999 from Dicle University in the field of mathematical statistics, and she earned her assistant professor degree in 2000. She tought courses in Turkish language, Probability and Statistics and Mathematical Statistics. She worked at Middle East Technical University,
Ankara, at a Post-Doctoral position. Her research interests are liner regression, nonlinear regression, spline regression, optimization and Financial Mathematics. She has published journal articles and has presented her studies at national and international conferences. She still works as a head of Applied Mathematics at the Department of Mathematics, Art and Science Faculty, of her university.
Gerhard-Wilhelm Weber works at IAM of METU, Ankara, Turkey, in Department of Financial Mathematics and Department of Scientific Computing; he Assistant to the Director. Furthermore, he is Guest Professor at the Faculty of Economics, Management and Law of University of Siegen, Germany, and Collaborator at Center for Research on Optimization and Control (CEOC), University of Aveiro, Portugal. He received his Diploma and Doctorate in Mathematics and Economics at RWTH Aachen, and his Habilitation at TU Darmstadt. He held professorships by proxy at Institute of Mathematics, University of Cologne, and at Faculty of Mathematics, TU Chemnitz, Germany, then he worked at Cologne Bioinformatics Center; since 2003 he is at METU, Ankara. He chaired EUROPT where he is Past Chair now, he is Past Chair of EURO Working Group on OR in Computational Biology, Bioinformatics and Medicine, Vice Chair of EURO Working Group on OR for Development, Honoray Chair of EURO WG on Complex Societal Problems, and he represents German OR Society in Turkey. Prof. Weber got a number of awards, and he has been member in different research projects. His research interest are lying in the areas of Continuous Optimization, Financial Mathematics, Operations Research, Optimal Control, selected topics from Discrete Optimization, Dynamical Systems, Statistical Learning and Computational Statistics, Inverse Problems, Topology, Complexity Theory, Computational Biology and Bioinformatics, Environmental Protection, the sector of Development and Societal Complexity. Prof. Weber has published numerous articles, guest-edited 12 special issues, he has been in editorial boards of journals, wrote many referee and other reports. He co-organized about 50 scientific events (among of them also large ones of the EURO and INFORMS series) and is member in about 20 scientific organizations. Prof. Weber gave presentations all over the world, at scientific events and in seminars.
Organizacija v financah izhajajo~ iz stohasticnih diferencialnih ena~b in nelinearnih modelov zvezne optimizacije
Osrednji element v organizaciji finan~nih sredstev, tako sredstev posameznika kot tudi podjetja ali dru`bene skupine, je oblikovanje, analiza in optimizacija portfelja. To zahteva modeliranje ~asovno spremenljivih procesov. Tako kot na mnoge procese v naravi, tehniki ali gospodarstvu tudi na finan~ne procese vplivajo naklju~ne fluktuacije. Zato smo uporabili stohasti~ne diferencialne ena~be, saj v realnosti, še posebej v finan~nem sektorju, na mnoge procese vpliva naklju~ni šum. Pomanjkljivost tega na~ina pa je, da je te ena~be te`ko predstaviti v obliki primerni za ra~unalnik, in jih je te`ko reševati. V tem ~lanku smo jih izrazili na poenostavljen na~in, tako, da smo uporabili aproksimacijo tako z diskretizacijo in kot tudi aditivnimi modeli, ki temeljijo na zlepkih. Dolo~anje parametrov se nanaša na linearne koeficiente zlepkov in delno nelinearne probabilisti~ne parametre. Izgradili smo penalizirano residualno vsoto kvadratov za ta model in obravnavali nelinearnosti, ki os se pojavljale, z Gauss-Newtonovo in Levenberg-Marquardt-ovo metodo za dolo~anje iteracijskih korakov. Raziskovali smo tudi kdaj je s tem povezani program za minimizacijo lahko napisan kot Tikhonov problem regularizacije , in ga obravnavamo z uporabo zveznih optimizacijskih tehnik. Bolj natan~no, pripravimo dostop do elegantnega okvirja koni~nega kvadratnega programiranja. Ti konveksni optimizacijski problemi so zelo dobro strukturirani, zato so podobni linearnim programom, torej omogo~ajo uporabo metod interne to~ke.
Klju~ne besede: stohasti~ne diferencialne ena~be, regresija, statisti~no u~enje, dolo~anje parametrov, Gauss-Newtonova metoda, Levenberg-Marquardt-ova metoda, glajenje, stabilnost, metode penalov, regularizacija po Tikhonovu, kontinuirna optimizacija, koni~no kvadratno programiranje
193