Motion Representations for Learning


Robots and Human-Robot Interaction

Polynomial Splines and Movement Primitives



Aleˇs Ude

Joˇzef Stefan Institute, Ljubljana

Faculty of Electrical Enginnering, University of Ljubljana



__________________________________________________________________________

Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 222235395

ISBN 978-961-243-475-5 (PDF)

___________________________________________________________________________

URL: https://www.ijs.si/usr/aude/teaching/motion_reps.pdf

Copyright © 2025 Založba FE. All rights reserved. Razmnoževanje (tudi fotokopiranje) dela v

celoti ali po delih brez predhodnega dovoljenja Založbe FE prepovedano.

Naslov: Motion Representations for Learning Robots and Human-Robot Interaction

Avtor: Aleš Ude

Založnik: Založba FE, Ljubljana,

Izdajatelja: Fakuleta za elektrotehniko, Ljubljana

Institut Jožef Stefan, Ljubljana

Urednik: prof. dr. Sašo Tomažič

Recenzenta: Matjaž Mihelj, Andrej Gams

Kraj in leto izida: Ljubljana, 2025

1. elektronska izdaja



Preface



The performance of complex tasks in robotics often involves executing a variety of move-ments that require distinct control policies. For instance, using a tool might involve picking it up, positioning it appropriately, and performing a task – all of which demand separate motion trajectories. This textbook focuses on the specification of robot motion, bridging the gap between task-level planning and precise motion execution.

Traditional industrial robots often rely on straightforward linear or circular move-ments, whereas robots in unstructured environments require more complex and adaptive trajectories. Representing and learning such motions presents a significant challenge, as they are difficult to program directly using conventional methods. Consequently, learning from demonstration and reinforcement learning have become key methodologies for ac-quiring flexible motion representations. These methods not only enable robots to replicate demonstrated behaviors but also allow them to adapt learned skills to novel scenarios. This way their versatility can be enhanced.

In dynamic environments, static time-parameterized representations, such as polyno-mials and splines, are often insufficient due to their inability to accommodate interruptions or environmental changes. To mitigate this, motion representations based on autonomous dynamical systems have been developed, eliminating the dependency on time as a trajec-tory parameter. These representations ensure that desired motions remain adaptable to variations in task conditions, enabling smooth and safe operation even in unpredictable scenarios. The ability to dynamically adapt trajectories based on real-time feedback has become a cornerstone of modern robotics, ensuring both safety and task success in diverse environments.

The ability to encode multiple trajectories into a unified representation is another criti-cal advancement in robotics. By leveraging data from multiple demonstrations, robots can learn generalized motion patterns that adapt to new situations. This requires combining computational efficiency with the ability to encode variability and uncertainty. Methods like probabilistic and dynamic movement primitives address this by providing parametric representations that support on-the-fly adaptations to changes in the environment or task objectives. Moreover, these representations facilitate the integration of motion planning and control into a cohesive framework, bridging the gap between abstract task planning and physical execution.

This textbook introduces a range of modern motion representation techniques, includ-ing polynomial splines, dynamic movement primitives (DMPs), probabilistic movement

3

4

primitives (ProMPs), and dynamic systems based on Gaussian mixture models (GMMs). These representations address the limitations of static trajectory planning, enabling adap-tation to dynamic environments and variability in task requirements. Computational methods for learning and parameterizing these representations are presented. The em-phasis is on their role in encoding complex robot skills. Practical examples and case studies illustrate how these techniques are applied in real-world scenarios. Readers are provided with a clear understanding of their strengths and limitations.

The materials are in part based on extensive research on motion representations at the Department of Automatics, Biocybernetics, and Robotics at Joˇzef Stefan Institute. They are intended for advanced students and researchers in robotics. The text requires a foundational understanding of mathematics, control engineering, and robotics. It serves as an essential resource for those wishing to deepen their knowledge of motion representa-tion and learning in robotics. By providing a comprehensive overview of state-of-the-art methodologies, the text aims to equip readers with the tools to address current and future challenges in robot motion control and learning.





Contents




1 Preliminaries 9

1.1 Control policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Motion specification in Cartesian space . . . . . . . . . . . . . . . . . . . . 11

1.2.1 Representation of orientation by rotation matrices . . . . . . . . . . 12

1.2.2 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.3 Logarithmic map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3 Programming by demonstration . . . . . . . . . . . . . . . . . . . . . . . . 20

1.3.1 Programming by Demonstration Levels . . . . . . . . . . . . . . . . 21

1.3.2 Challenges of Programming by Demonstration . . . . . . . . . . . . 24

1.4 Data collection for robot programming by demonstration . . . . . . . . . . 25

1.4.1 Marker-Based Optical Trackers . . . . . . . . . . . . . . . . . . . . 25

1.4.2 Inertial Measurement Units (IMUs) . . . . . . . . . . . . . . . . . . 29

1.4.3 Electromagnetic Trackers . . . . . . . . . . . . . . . . . . . . . . . . 29

1.4.4 Computer Vision Techniques . . . . . . . . . . . . . . . . . . . . . . 30

1.4.5 Teleoperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.4.6 Virtual Reality Systems . . . . . . . . . . . . . . . . . . . . . . . . 31

1.4.7 Wearable Sensor Systems . . . . . . . . . . . . . . . . . . . . . . . . 32

1.4.8 Kinesthetic Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Polynomials and splines 35

2.1 Point-to-point motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.1.1 Minimum jerk trajectories . . . . . . . . . . . . . . . . . . . . . . . 36

2.1.2 SLERP trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2 Polynomial splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.2.1 B-splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.2.2 Natural and complete smoothing splines . . . . . . . . . . . . . . . 41

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 Dynamic Movement Primitives 47

3.1 Control Policies as Dynamic Systems . . . . . . . . . . . . . . . . . . . . . 48

3.1.1 Discrete (point-to-point) movements . . . . . . . . . . . . . . . . . 49

5

6 CONTENTS

3.1.2 Periodic (rhythmic) movements . . . . . . . . . . . . . . . . . . . . 52

3.2 Computing DMP parameters from a single demonstration . . . . . . . . . . 52

3.2.1 Adaptive frequency oscillators . . . . . . . . . . . . . . . . . . . . . 57

3.2.2 Integration of DMPs for robot control . . . . . . . . . . . . . . . . . 58

3.3 Cartesian Space DMPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3.1 Quaternion based DMPs . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3.2 Position trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.3.3 Rotation matrix based DMPs . . . . . . . . . . . . . . . . . . . . . 63

3.3.4 Comparison of DMPs defined on SO(3) . . . . . . . . . . . . . . . . 64

3.4 Third-order DMPs and Sequencing . . . . . . . . . . . . . . . . . . . . . . 67

3.4.1 Third-order DMPs in Cartesian space . . . . . . . . . . . . . . . . . 68

3.5 Modulation of DMPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.5.1 Phase stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.5.2 Robustness to perturbations through coupling of DMPs . . . . . . . 70

3.5.3 Obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.6 Learning of DMPs from Multiple Demonstrations . . . . . . . . . . . . . . 73

3.6.1 Action generalization using dynamic movement primitives and local

weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4 Dynamic Systems and Gaussian Mixture Regression 81

4.1 Robot Motion Representation and Dynamical

Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.2 Lyapunov Stability Theory and Conditions . . . . . . . . . . . . . . . . . . 82

4.3 Design of Dynamic Systems by PBD . . . . . . . . . . . . . . . . . . . . . 83

4.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3.2 Gaussian Mixture Model (GMM) . . . . . . . . . . . . . . . . . . . 84

4.3.3 Gaussian Mixture Regression (GMR) . . . . . . . . . . . . . . . . . 85

4.3.4 Stable Estimator of Dynamical Systems (SEDS) . . . . . . . . . . . 87

4.3.5 Selecting Optimal Number of Mixture Components for SEDS . . . . 90

4.3.6 Accuracy of Motion Reproduction with SEDS . . . . . . . . . . . . 91

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5 Probabilistic Movement Primitives (ProMPs) 95

5.1 Mathematical Formulation of ProMPs . . . . . . . . . . . . . . . . . . . . 95

5.1.1 Using Phase Variables Instead of Time . . . . . . . . . . . . . . . . 96

5.1.2 Marginalizing Over Weights to Compute Likelihoods . . . . . . . . 97

5.1.3 Product of Gaussian Distributions . . . . . . . . . . . . . . . . . . . 98

5.2 Learning the Parameters of the Weight Distribution . . . . . . . . . . . . . 99

5.2.1 Motion Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.3 Ensuring Specific Configuration at a Given Time . . . . . . . . . . . . . . 103

5.3.1 Conditioning on a Specific Configuration . . . . . . . . . . . . . . . 103

CONTENTS 7

5.3.2 Detailed Derivation of Conditioned Mean and Covariance . . . . . . 103

5.3.3 Reproducing the Conditioned Trajectory . . . . . . . . . . . . . . . 104

5.3.4 Bayesian Basis for Conditioning . . . . . . . . . . . . . . . . . . . . 106

5.4 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6 Motion Representations and Robot Learning 109



8 CONTENTS





Chapter 1




Preliminaries



When specifying robot motion, we distinguish between the concepts of path and trajectory. A path specifies the robot’s spatial motion without considering its time course, i.e. the time at which the robot reaches a point on the path remains undefined. The path becomes a trajectory once the timing is determined, i.e. the time at which the robot reaches each point on the path is specified. The term path planning refers to finding the spatial path of robot motion from the given initial position to the final position while satisfying spatial constraints, e.g. avoiding obstacles. If the path is specified, the term trajectory planning refers to finding the optimal velocity and acceleration profiles for the given path that satisfy time constraints, e.g. velocity and acceleration limits. However, the spatial course and timing of the trajectory can also be determined simultaneously, in which case the term trajectory planning refers to the specification of a complete robot trajectory. Finally, robot control deals with accurately tracking the specified trajectory with a real (physical) robot in all controllable degrees of freedom (DOF).

Trajectory planning can be used to generate short-term, continuous motions for robot manipulators. Such motions are often called movement, motion or motor primitives. They are considered the basic building blocks of robot behaviors. Generally speaking, motion primitives are the building blocks from which complex robot skills are generated by querying, sequencing and superposition of movement primitives. Here the term robot skill refers to an entire class of short-term motion plans that can be used to achieve the desired result, e.g. move the robot hand to different desired end-positions and orientations along a certain type of reaching trajectory, throwing motions so that an object hits the desired target locations, etc.

More complex behaviors such as cooking, assembling a product, making up the room, etc. are never planned as one super long trajectory or skill but are rather constructed from a sequential execution of multiple movement primitives or skills. The problem of finding an optimal sequence of robotic operations / skills to solve a given task is referred to as task planning.

In this textbook we focus on representations that can be used for the specification of movement primitives and on data-driven approaches for the learning of robot skills

9

10 CHAPTER 1. PRELIMINARIES

constructed from movement primitives. Most of the motion representations discussed in this textbook have built-in mechanisms to specify complex skills from a combination of movement primitives. We primarily present optimizaton-based and statistical approaches and omit other approaches such as reinforcement learning, which would require another book to cover adequately.



1.1 Control policies

To analyze what kind of representation is most suitable to drive the motion of a robot in an unstructured, dynamic environment, we analyze the controllers used to track the desired joint space motion trajectory, which is for example specified as a superposition of basis functions with parameters α α α (see Chapter 2). A proportional derivative (PD) controller may be used for this purpose

∆y(y, α α α, t) = Ky(yd(t) − y) + Ky ˙ ( ˙ yd(t) − y ˙). (1.1)

Here ∆ n y ∈ is the computed change of the joint angle positions. It depends on the R desired and current robot configuration and velocity n y , ˙ y ∈ R, respectively. K, K ∈ d

Rn×n n y t y are respectively the positional and velocity gains, d d R ( ), ˙ ( t ) ∈ are the desired robot configuration and velocity at time t, and n is the number of robot degrees of freedom. If specified as a linear superposition of basis functions, the desired trajectory is computed as

yd( X t ) = α α α B (t), (1.2)

i i





i


y X ˙ ( t ) = α α α ˙B (t), (1.3)

d i i





i


where Bi are the basis functions used to encode the desired trajectory and α α αi from Eq. (1.1) specifies a time-dependent control policy that the robot follows to realize the desired motion.

Formulation (1.1) works well for robots operating in known environments where no disturbances occur. However, as explained in [31], such control policies are not very flexible. For example, if a robot arm is wiping a glass and is held stationary for a few seconds, the PD controller will try to catch up, i.e. reduce the discrepancy between the desired position yd(t) and the actual robot position y, by the shortest possible path once the robot arm is released. This can cause the robot to attempt moving through the glass, which will not result in successful task execution, may trigger the robot’s safety mechanism to abort the motion, and potentially cause damage either to the environment or to the robot.

Thus a more flexible approach is needed to plan the motions in dynamic environments. An alternative is offered by autonomous dynamic systems, where the change of state is 1.2. MOTION SPECIFICATION IN CARTESIAN SPACE 11

computed using differential equation

y ˙ = f (y, α α α). (1.4)

Given the current state of the robot y and the control parameters α α α, the above system is used to compute the desired change of robot configuration, ∆y = ˙ y∆t, and other kinematic target quantities, e.g. the desired velocity and acceleration. The computed target values can then be used by robot controllers to compute the motor commands [31]. This way it is easier to deal with perturbations in dynamic environments.

The above system is called autonomous because it removes the explicit time depen-dency from motion planning. When motion is computed according to (1.4), the system automatically deals with perturbations like when the robot is prevented from moving be-cause the same target values are computed by Eq. (1.4) as long as the robot state y does not change. In addition, in some cases it is possible to build into Eq. (1.4) mechanisms that deal with changes to the desired final position of the robot, avoidance of robot joint limits, obstacle avoidance, motion speed scaling, etc.



1.2 Motion specification in Cartesian space

A rigid body is a solid object in which the distances between any two points remain constant, ensuring that the object does not deform and maintains its shape and size throughout its motion. In robotics, we are typically interested in motion of solid objects, such as the link of a robot manipulator or an object a robot manipulates. Rigid body motion can be divided into translation, where every point in the object moves uniformly in the same direction, and rotation, where the object turns about a fixed axis. Importantly, during rigid body motion, both the distances between any two points and the cross product of any two vectors within the object are preserved, maintaining the relative angles and orientations of vectors.

The specification of robot motion in 3-D Cartesian space involves the specification of the position and orientation of the robots’s end-effector at any moment in time. The coordinate frame of the robot’s end-effector is usually attached to the tool center point. The robot end-effector position is defined as the coordinates T 3 p = [ x, y, z ] ∈ R of the frame attached to the robot’s tool center point in the robot’s base system. The end-effector orientation is defined as the instantaneous rotation of the coordinate frame attached to the tool center point with respect to the robot’s base frame. Unlike the position, the robot’s orientation cannot be represented as a 3-D vector because the set of all orientations is not a vector space. Since any rotation can be characterized by a 3 × 3 rotation matrix, the space of all orientations can be represented by rotation matrices, which form a special orthogonal group SO(3)

SO(3) = 3×3 T R ∈ , RR I, . R = det( R ) = 1 (1.5)

12 CHAPTER 1. PRELIMINARIES

While SO(3) is not a vector space, it forms a group with respect to matrix multiplication. It is also a real three dimensional manifold [22]. Thus planning the orientational robot motion is equivalent to trajectory planning in SO(3). Every orientation can be identified with a unique R ∈ SO(3). Any orientation trajectory on time interval [0, T ] can be written as R(t) ∈ SO(3), 0 ≤ t ≤ T .

As a 3-D manifold, SO(3) can be parameterized by three parameters, e. g. Euler angles. However, such parameterizations always contain singularities. It turns out that there exists no minimal (3-D) representation of orientation that 1. contains no singularities, i. e. a continuous orientational motion in 3-D space is continuous also in the 3-D parameter space, and 2. is continuously differentiable, i. e. the partial derivatives of the parameters with respect to any differential rotation, at any orientation, are finite.

It is preferable to use parameterizations that do not introduce artificial discontinuities not existing in the real world into motion planning. As outlined above, such represen-tations of orientation have more than three parameters and must therefore fulfill addi-tional constraints in a higher dimensional space to be guaranteed to lie on the constraint manifold. In the following we study two such representations: rotation matrices and quaternions.



1.2.1 Representation of orientation by rotation matrices

The position of a point p(t) on a rigid body that is rotating about some axis can be expressed as a function of the initial position vector p0 and a rotation matrix R(t)

p(t) = R(t)p0. (1.6)

To find the angular velocity ω ω ω(t), we take the time derivative of the position vector p(t):

p ˙(t) = ˙ R(t)p0. (1.7)

Since T T T T ˙ R ( t ) R ( t ) = I , we can compute ˙ R ( t ) R ( t ) + R R = 0, We obtain ˙ R ( t ) R ( t ) =

T ˙

− T T R ( t ) ˙ R ( t ) = − R ( t ) R ( t ) , which means that

Ω( Ω Ω T t ) = ˙ R ( t ) R ( t ) (1.8)

is a skew symmetric matrix, i.e. Ω T Ω Ω( t ) = −Ω Ω Ω(t). Any skew symmetric matrix can be written as

    0 − ω ( t ) ω ( t ) ω ( t )

z y x

Ω Ω Ω(t) = [ω ω ω(t)]  × = ωz(t) 0 −ω ω x ( t )  , ω ω(t) =  ωy (t)  . (1.9)    

−ωy(t) ωx(t) 0 ωz(t)

1.2. MOTION SPECIFICATION IN CARTESIAN SPACE 13

We can write

p T ˙ ( t ) = ˙ R ( t ) p = ˙ R ( t ) R ( t )p(t) = Ω Ω Ω(t)p(t) = ω ω ω(t) × p(t). (1.10)

0

ω ω ω (t) defines the angular velocity of a rotational motion.

If a rigid body rotates with constant angular velocity ω ω ω, we obtain a homogeneous linear differential equation with constant coefficients

p ˙(t) = ω ω ω × p(t) = [ω ω ω]×p(t). (1.11)

It is well known from calculus that such equations have an analytic solution given by a matrix exponential map ,

p t[ω ω ω]× ( t ) = e p(0), (1.12)

where p(0) is the initial point position. By comparing Eq. (1.6) and (1.12), we obtain the following expression for rotation with constant angular velocity ω ω ω

R [ω ω ωt]× ( t ) = e (1.13)

[ω ω ωt] 2 3 × 1 1

= e = I + [ω ω ωt]× + [ω ω ωt] ω ωt × + [ ω]× + . . . (1.14)

2! 3!

where (1.14) is obtained using the definition of the matrix exponential.

We next show that the angular velocity ω ω ω is equal to the product of rotation angle θ in unit time and rotation axis n, i.e. ω ω ω = θn. Under rotational motion, the points along the axis of rotation do not move. We can deduce from Eq. (1.11) that this is the case for points along ω ω ω since ω ω ω × ω ω ω = 0. Thus the angular velocity vector ω ω ω is aligned with the axis of rotation, n = ω ω ω/∥ω ω ω∥. Let’s also define θ = ∥ω ω ω∥. Assuming small θ and using Rodrigues’ formula (1.16), we obtain the following approximation for R(n, θ)

R(n, θ) ≈ I + θ[n]×.

We can then calculate the positional change as follows

∥∆p∥ = ∥p(t) − p0∥ = ∥(I + θ[n]×)p0 − p0∥ = θ∥n × p0∥ = θ∥p0 ∥ sin(β),

where β is the angle between the vectors n and p0 and we applied the formula ∥n × p0∥ = ∥n n n∥∥p 0∥ sin(β). For the points orthogonal to the axis of rotation, we have β = π/2 and we obtain ∥∆p∥ = θ∥p 0∥. For the point p0 on a circular path around the origin, the positional change is equal to θ∥p0∥, which shows that θ is indeed the rotation angle.

From the angular velocity we can thus compute the axis of rotation T n = [ n , n , n ] =

x y z

ω ω ω/∥ω ω ω∥ and the angle θ = ∥ω ω ω∥ travelled per time unit. The rotation about axis n by angle θ in unit time can then be expressed by

θ 2 3 θ

R [ω ω ω] θ[n] ×× 2 3 ( n , θ ) = e = e = I + θ [ n ] + [ n ] + [ n ] . . . + (1.15)

× × × 2! 3!

14 CHAPTER 1. PRELIMINARIES

Using the fact that for a unit axis vector 3 n it holds [ n ] − n = [] and using the formulae

× ×

for Taylor series expansion of sine and cosine, the exponential map (1.15) can be computed by Rodrigues’ formula

3 5 2 4 6 θ θ θ θ θ

eθ[n]× 2 = I + θ − + − . . . [ n ] + − + − . . . [ n ]

3! × × 5! 2! 4! 6!

= 2 I + sin( θ )[ n ] × × θ + (1 − cos( )) [ n ]. (1.16)

It can be shown also algebraically (see [22], page 29) that 3×3 R ( n , θ ) ∈ R is a rotation matrix, i.e. it is orthogonal, T RR = I, and has a determinant equal to 1.

The exponential map (1.15) is surjective, i.e. for any rotation matrix R there exists unit axis n and angle 0 ≤ θ ≤ π so that Eq. (1.16) holds (see Section 1.2.3 for a constructive proof). Furthermore, it is easy although somewhat tedious to verify that equation (1.16) can be rewritten as

  2 n (1 − c ) + c n n − s n n (1 − ) + s (1 − c ) n c n

x x y z x z y

R( 2 n , θ ) =  n y n x (1 − c ) + n z s n − c c n n s y (1 ) + y z (1 − c ) − n  , x (1.17)  

n 2 n (1 − c ) − n s n n (1 − c ) + n s n c (1 −) + c

z x y z y x z

where c = cos(θ) and s = sin(θ).



1.2.2 Quaternions

The parametrization of rotation matrices by exponential map defined in equation (1.15) shows that the space of all orientations is three-dimensional. Since rotation matrices have 9 parameters, it can be cumbersome to use them to represent orientation trajectories. An alternative representation is offered by quaternions, a generalization of complex numbers developed by William Rowan Hamilton.

A quaternion q is an ordered four-parameter vector that consists of a scalar part v ∈ R and vector part T 3 u = [ u x y z R , u , u ] ∈. The space of all quaternions H forms an algebra, i.e. it is a vector space with multiplication ∗. The addition of two quaternions is defined as usually

" # " # " #

v1 v2 v1 + v2

+ = . (1.18)

u1 u2 u1 + u2

By interpreting real numbers v ∈ v R as quaternions with zero vector part and real 3-D vectors 3 u ∈ R as quaternions with zero scalar part, we can write

" #





v


= v + u. (1.19)

u

The distinguishing property of quaternions is that we can define a quaternion multiplica-1.2. MOTION SPECIFICATION IN CARTESIAN SPACE 15



tion " # " # " # T v v 1 2 v 1 v 2 − u u 1 2 ∗ = . (1.20) u 1 u v + v × u 2 1 u 2 2 u 1 + u 1 2

Note that since in general u 1 × u2 ̸= u2 × u1, the quaternion multiplication is not commu-tative, i.e. q1 ∗ q2 ̸= q2 ∗ q1. It is, however, easy to check that quaternion multiplication is associate, i.e. q1 ∗ (q2 ∗ q3) = (q1 ∗ q2) ∗ q3. The inverse of quaternion q ̸= 0 with respect to multiplication ∗ is given by

" # 1 − " # ( + )

v 2 2 v/ v ∥ u ∥

q−1 = = . (1.21)

u 2 2 − u / ( v + ∥ u ∥)

The conjugation of a quaternion is defined as

" # " # v v

q = = . (1.22)

u −u

Note that for unit quaternions √ p 2 q , ∥ q ∥ = q ∗ q = 2 v + ∥ u ∥ = 1, thus conjugation and inverse are in this case equivalent, i.e. −1 q = q.



Unit quaternions are a subset of all quaternions q ∈ ∥q∥ H such that = 1. They form a unit sphere S3 4 3 in p R . Let’s interpret an arbitrary point ∈ R as quaternion with zero scalar part. Such quaternions are called pure quaternions. For any unit quaternion q, we define the following operator

Lq(p) = q ∗ p ∗ q. (1.23)

We can write

" # " # " # " # " # T v 0 v −

u p v

q ∗ p ∗ q = ∗ ∗ = ∗

u p −u vp + u × p −u

" # T T − v u p + −

= T ( v p u

u p )u + v(vp + u × p) − vp × u − (u × p) × u

" #

= T 2 T T ( 0

u p )u + v p + 2vu × p − ((u u)p − (p u)u)

" #

= T 2 2 2( 0

u p)u + v p + 2vu × p + (v − 1)p

" #

= . T 2 (1.24) 2 0

uu + (2v − 1)I + 2v[u]× p 16 CHAPTER 1. PRELIMINARIES

From (1.24) we can see that just like p, L q(p) is also a pure quaternion. We can write

  2 2 2( v + u − u u x x y vu ) 1 2( − ) 2( u u + vu )

z x z y

L  2 2 q ( p ) = 2( u x u y + vu z ) 2( v + u y ) − 1 2(uy uz − vux)  p. (1.25)  

2( 2 2 u u − vu ) 2( u u + vu ) 2( v + u) − 1

x z y y z x z

Let’s assume 0 T 3 ≤ θ < 2 π and n = [ n x y z R , n , n ] ∈, ∥n∥ = 1. We define a quaternion q(n, θ) ∈ H as follows

  cos( θ/ 2)

  v

" # sin(θ/2)n

v   u

q  n , θ ) = = = . (1.26)

(   x  x

 

 

u  u   sin(θ/2)n   y  y  

uz sin(θ/2)nz

We can easily see that p 2 ∥ q ( n , θ ) ∥ = cos ( θ/ 2) + sin(θ/2)n n = 1, thus q(n, θ) is a 2 T

unit quaternion. Moreover, for any unit quaternion T T q = [ v, u ], we can compute θ = 2 arccos(v), n = u/∥u∥ (with n = 0 for the special case of v = 1, u = 0). Thus the map q(n, θ) is surjective, i.e. for any unit quaternion q there exists θ, n so that q = q(n, θ).

This means that for any quaternion q that defines the operator L q in Eq. (1.25), there exists n, θ so that q = q(n, θ). Let’s write

  r r r

11 12 13

R =  r  . 21 r 22 r 23 (1.27)  

r31 r32 r33

Using standard trigonometric formulae and algebraic manipulations, we now show that the matrix R(n, θ) from Eq. (1.17) is equal to the matrix specified in Eq. (1.25), where q = q(n, θ) is defined as in equation (1.26). For example,

r 2 2 2 2 11 x n = θ (1 − cos( θ )) + cos( ) = n (1 − (1 − 2 sin ( θ/ 2)) + 2 cos(θ/2) − 1 x

= 2 cos2 2 2 2 2 ( θ/ 2) + n θ/ u sin ( 2) − 1 = 2( v + − ) 1. (1.28)

x x

The proof for other coefficients of R(n, θ) is similar. Thus

Lq(p) = R(n, θ)p. (1.29)

We observe that q ∗ p ∗ q is equivalent to the rotation of p about the rotation axis u by angle θ. Thus just like rotation matrices, unit quaternions can be used to compute rotations. This expression also makes it clear that we can use quaternions to a new compute rotation from two successive rotations:

q 2 ∗ (q1 ∗ p ∗ q1) ∗ q2 = (q2 ∗ q1) ∗ p ∗ (q2 ∗ q1). (1.30)

1.2. MOTION SPECIFICATION IN CARTESIAN SPACE 17

Thus the quaternion multiplication can be used to compute a new quaternion q 2 ∗ q1 combining two successive rotations. It follows that the unit quaternion ∆q that rotates the starting orientation q1 to the target orientation q2 can be computed as follows

∆q = q2 ∗ q1. (1.31)



Unit quaternions provide a singularity-free, non-minimal representation of orientation (it has four instead of three parameters). However, this representation is not unique. This can be seen by observing that the matrix in Eq. (1.25) only contains quadratic terms. Thus unit quaternion 3 3 q ∈ S and its antipod − q ∈ S are transformed into the same rotation matrix R(n, θ). Since

cos((2π − θ)/2) = cos(π − θ/2) = − cos(θ/2),

sin((2π − θ)/2)(−n) = − sin(π − θ/2)n = − sin(θ/2)n,

and using the fact that each unit quaternion pair q and −q represents the same orien-tation, we can conclude that the rotation about axis −n by angle 2π − θ results in the same orientation as the rotation about axis n by angle θ. Thus unit quaternions provide a double covering of the space of all orientations.



We already know that using formula (1.25), we can transform any unit quaternion to its corresponding rotation matrix. To derive the formulae for mapping a rotation matrix to the corresponding quaternion, we equate the matrices in Eq. (1.27) and (1.25). Using simple algebraic manipulations and considering that 2 2 2 2 2 ∥ q ∥ = v + u + u + u = 1, we

x y z

obtain the following formulae



v 1p 2 = ± ( r 11 + r 22 + 2 r 33 + 1) + ( r 32 − r 23 ) + (r13 − r31)2 + (r21 − r12)2, (1.32) 4

u 1p 2 2 2 2 x = s x ( r 32 − r 23 ) + ( r 11 − r 22 − r 33 + 1) + ( r 21 + r 12 ) + ( r 31 + r 13 ), (1.33) 4

u 1 p 2 2 y = s y ( r 13 − r 31 ) + ( r 21 + r 12 ) + (r22 − r11 − r33 + 1)2 + (r32 + r23)2 , (1.34) 4

u 1 p 2 z = 2 s z ( r 21 − r 12 ) 2 + ( r 31 + r 13 ) + ( r 32 + r 23 ) + (r33 − r11 − r22 + 1)2, (1.35) 4

where sx, sy , sz ∈ {−1, 1} respectively determine the sign of ux, uy and uz. This choice arises because of the sign ambiguity (unit quaternions q and −q represent the same rotations and thus map onto the same rotation matrix). If we select the nonnegative values of v , i.e. v ≥ 0, then s x, sy and sz must be assigned the signs of r32 − r23, r13 − r31, and r21 − r12, respectively. This mapping was developed by Cayley and has been shown to be superior to other mappings for the conversion of rotation matrices to unit quaternions used in robotics, both due to its simplicity and numerical stability.

18 CHAPTER 1. PRELIMINARIES

1.2.3 Logarithmic map



The rotation matrix logarithm is defined as an inverse of the exponential map (1.15). Given rotation matrix R and using Eq. (1.26) and (1.32), we can compute the angle of rotation as follows

1

θ p 2 = 2 arccos ( r + r + r + 1) + (r − r ) + ( − r ) + (r − r ) r .

2 2 2

4 11 22 33 32 23 31 21 12 13

(1.36)

In the above formula we selected the positive value of v from Eq. (1.32). For θ ̸= 0, the unit axis of rotation can be obtained by normalizing the vector u as computed by Eq. (1.33) – (1.35), n = u/∥u∥. The matrix logarithm can now be computed as follows

h iT

 , R I 0 , 0 , 0 =

log(R) = . (1.37)

θn n n, otherwise

The components of log( 3 R ) ∈ R are called exponential coordinates of R. This repre-sentation of orientation is commonly referred to as axis-angle representation. Note that ∥ log(R)∥ = ∥θn∥ = θ because n is a unit vector and we use the non-negative value of v from Eq. (1.32) to compute θ, thus 0 ≤ θ ≤ π. For θ = π we cannot compute n n n using the above formula because in this case, the off-diagonal terms of rotation matrices do not provide information about the direction of rotation axis (which is nevertheless defined up to a sign ambiguity). Recall that the rotation about axis n by angle θ is equal to the rota-tion about axis −n by angle 2π − θ. For θ = π. Thus that the rotations by angle π about axis n and axis −n are the same. However, we can still compute ∥ log(R)∥ = ∥± πn∥ = π. Thus the norm ∥ log(R)∥ is defined for all R ∈ SO(3). It can be used to define a metric on SO(3)



 T ∥ log( R 2 1 R ∥, θ < π )

d(R1 , R2 ) = . (1.38)

π, otherwise

Note that d(I, R) = ∥ log(R)∥.

To define a logarithmic map for unit quaternions, we first define the quaternion expo-nential. Note that for any pure unit quaternion n, i.e. unit quaternion with scalar part equal to 0, we can compute n ∗ n = −1, i.e. its square is equal to quaternion −1, which has zero vector part. Let’s define

θ 2 3 2 4 3 5 θ θ θ θ θ

exp( 2 3 θ n ) = 1 + θ n + n + n + . . . = (1 − + − . . .) + (θ − + − . . .)n

2! 3! 2! 4! 3! 5!

= cos(θ) + sin(θ)n. (1.39)

Here θn is interpreted as pure quaternion with zero scalar part. By comparing Eq. (1.26)

1.2. MOTION SPECIFICATION IN CARTESIAN SPACE 19

and (1.39) we can write

θ θ θ

q(n, θ) = exp n = cos + sin n. (1.40)

2 2 2



For any unit quaternion q T T T = v, u ̸ = [ − 1 , 0 , 0 , 0], its logarithm is defined as inverse of the exponential. It can easily be computed analytically from (1.40)

 u

" #! arccos(v) , ∥u∥ ̸= 0



log(q) = log = . (1.41) T u v  ∥u∥

h i

 0, 0, 0 , otherwise



Note that unlike with rotation matrix logarithm, where several choices needed to be made regarding signs to compute the logarithm as defined in Eq. (1.37), there are no choices regarding signs to be made in Eq. (1.41). This makes it easier to use quaternions instead of rotation matrices for planning orientational motions. The logarithm exists for any unit quaternion except for q = −1, where the rotation axis is undefined. Nevertheless, we can compute the angle θ for any unit quaternion including q = −1, where θ = arccos(−1) = π.

Let’s now assume that we have two quaternions q 1 and q2 and compute the difference quaternion ∆q = q2 ∗ q according to (1.31). We have shown in Eq. (1.26) that there 1 exists angle θ and axis n so that ∆q = exp(1/2 θn). We can thus compute the angular velocity that takes q1 to q2 within unit time as ω ω ω = θn. Using the quaternion logarithm (1.41), we can now compute the constant angular velocity that rotates unit quaternion q1 to unit quaternion q2 within unit time as follows

ω ω ω = θn = 2 log(exp(θn/2)) = 2 log(q2 ∗ q1). (1.42)



The quaternion logarithm can also be used to define a distance metric on S 3 [34]

( T 2 π, q ∗ q − , , , = [ 1 0 0 0]

d( 1 2 q1, q2 ) =

. (1.43)

2∥ log(q1 ∗ q )∥, otherwise 2

Note that d defined above is not a metric on SO(3) because d(q, −q) = 2π although q and −q represent the same orientation. It is nevertheless possible to compute the distance on SO(3) also using the quaternion logarithm. This can be done by checking the norms of the two logarithms, ∥2 log(q1 ∗ q )∥ and ∥2 log(−q1 ∗ q )∥. The smaller value should then 2 2 be taken to obtain the distance on SO(3).

Finally, we derive the relationship between quaternion derivative and angular velocity. For a small time interval ∆t, the rotation can be approximated as a small rotation by an angle ∆θ around the axis given by the angular velocity vector ω. According to (1.40), 20 CHAPTER 1. PRELIMINARIES

such a rotation is defined by a quaternion

∆ θ ∆ θ ω ω ω ∆θ ω ω ω ω ω ω∆t

q(∆θ, ω ω ω/∥ω ω ω∥) = cos + sin ≈ 1 + ≈ 1 + . (1.44)

2 2 ∥ω ω ω∥ 2 ∥ω ω ω∥ 2

The right-hand site of this equation was derived from the relationship between the angular velocity ω ω ω and the rate of change of the rotation angle

∆θ

≈ ∥ω ω ω∥. (1.45)

∆t



We obtain ω ω ω ∆ t ω ω ω∆t q ( t + ∆ t ) ≈ 1 + ∗ q ( t ) = q ( t ) + ∗ q(t) (1.46) 2 2 and

q (t + ∆t) − q(t) 1

q ˙ = lim = ω ω ω ∗ q. (1.47)

∆t→0 t ∆ 2

Note that Eq. (1.8) provides a similar relationship between rotation matrix derivative and angular velocity

R ˙ = [ω ω ω]×R. (1.48)



1.3 Programming by demonstration

Robot Programming by Demonstration (PbD), also known as Learning from Demonstra-tion (LfD), Imitation Learning, or Teaching by Showing, is an approach that allows robots to learn tasks by observing and imitating human demonstrators. This method offers a more intuitive way to program robots, especially for those who may not have extensive technical expertise in robotics. By bridging the gap between human intuition and robotic execution, PbD provides a platform for teaching robots complex tasks in unstructured environments.

Traditional robot programming can be a daunting task. It requires in-depth knowl-edge of robotics principles and programming skills. Programming robots often involves intricate understanding of kinematics, dynamics, control systems, and sensor integra-tion. Developers need to write detailed code to specify every movement and decision a robot must make, which is a time-consuming and error-prone process. Furthermore, the diversity of environments and tasks necessitates domain-specific expertise to tailor robotic behaviors appropriately. As robots are employed in more complex and varied applications, the limitations of traditional programming approaches become increasingly apparent. This complexity creates a barrier for those who wish to leverage robotic tech-nology without the technical background. Programming by Demonstration offers a more accessible solutions for such users.

Programming by Demonstration aims to simplify robot programming by allowing robots to learn tasks from human demonstrations. Instead of explicitly programming 1.3. PROGRAMMING BY DEMONSTRATION 21

every detail of a task, PbD involves showing the robot how the task is performed, en-abling it to learn and generalize from these examples. This method reduces the need for extensive coding and provides a more natural interface for interacting with robots.

The benefits of PbD are manifold. It makes robot programming more accessible to non-experts, significantly reduces the time and effort required to teach robots new tasks, and increases adaptability to different environments and tasks. By focusing on human-like learning and imitation, PbD allows robots to handle tasks with more flexibility and efficiency, particularly in unstructured or changing environments.



1.3.1 Programming by Demonstration Levels

Programming by Demonstration can be understood across several levels, each addressing different aspects of robot learning and task execution. These levels include

• trajectory level,

• skill level,

• task level,

• symbolic level, and

• hyrbid PbD.

Trajectory-level programming by demonstration

Trajectory-level programming focuses on capturing and replicating the precise paths and movements demonstrated by a human instructor. By using modern motion capture systems, trajectory-level Programming by Demonstration (PbD) involves recording the demonstrator’s movements and translating them into robot-executable trajectories. This approach ensures that robots follow the exact movements necessary for task completion, making it ideal for tasks that require high precision and accuracy, such as surgical proce-dures, welding, or assembly tasks in manufacturing.

The strength of trajectory-level programming lies in its ability to replicate complex, continuous motions that may be difficult to describe through traditional programming methods. By directly capturing human motion and mapping it to the robot motion, robots can achieve smooth and natural movements that closely mimic human performance.

However, trajectory-level programming can struggle with adapting to variations in task execution, as it emphasizes specific motion reproduction rather than task understanding. This limitation can make it challenging for robots to adapt to changes in the environment or to different task conditions, such as varying object sizes or positions. Additionally, trajectory-level programming may not handle dynamic or unpredictable environments well, as it lacks the flexibility to adapt to real-time changes.

22 CHAPTER 1. PRELIMINARIES

Skill-level programming by demonstration

Skill-level programming provides an intermediate layer between trajectory-level and task-level programming. It focuses on teaching robots reusable skills or behaviors that can be combined to perform complex tasks. Skills such as grasping, manipulating objects, or navigating obstacles are abstracted from specific tasks and can be reused in different contexts. These skills serve as building blocks for more intricate operations, emphasizing modularity and flexibility. This approach enables robots to adapt and generalize across various tasks by learning skills that can be transferred between different applications.

A critical component of skill-level programming is the concept of movement primitives, which are fundamental building blocks of complex behaviors. Movement primitives repre-sent basic actions or motions that can be combined and sequenced to form complete skills. By learning these primitives through Programming by Demonstration (PbD), robots can capture essential movements from human demonstrations and use them as a basis for de-veloping more complex skills. For instance, a movement primitive for a robot arm might involve the basic motion required to reach out and grasp an object, which can then be adapted and modified for various tasks involving different objects or environments.

In skill-level programming by demonstration, robots observe human demonstrations to learn these skills. The process involves capturing key actions and movements, abstracting them into skills and movement primitives that can be applied in various situations. For example, a robot might learn the skill of grasping by observing a human picking up objects of different shapes and sizes, then generalize this skill to handle new objects it has never encountered before. This ability to generalize makes skill-level PbD particularly valuable in dynamic environments where tasks vary widely.

Combining skill programming by demonstration with other techniques, such as rein-forcement learning, can further enhance the learning process. While PbD is effective for initial skill acquisition, reinforcement learning can refine and optimize these skills based on feedback from the environment. This synergy allows robots to continuously improve their performance, making skills more robust and adaptable to different scenarios. A key challenge in this approach is identifying and segmenting skills and movement primi-tives from continuous demonstrations, which requires sophisticated algorithms to analyze and classify actions accurately. Moreover, ensuring that skills are transferable between different tasks and environments is crucial, necessitating a deep understanding of the un-derlying principles of each task and the ability to abstract these principles into generalized skills.

Task-level programming by demonstration

Task-level programming by demonstration (PbD) shifts the focus from replicating precise movements to achieving high-level goals and action sequences necessary for completing a task. At this level, tasks are abstracted into a series of steps that need to be executed in a specific order. The assumption is that the robot already knows how to execute low-1.3. PROGRAMMING BY DEMONSTRATION 23

level operations, such as moving an arm or gripping an object, and these operations are combined into a sequence to perform a more complex task. Task-level PbD provides a structured approach to replicating demonstrated behaviors and decision-making processes by observing human actions and translating them into symbolic or abstract representa-tions.

This approach is particularly useful for tasks requiring coordination and executing multiple actions in a precise sequence, such as assembly operations, navigation, or collab-orative tasks with humans or other robots. By focusing on high-level goals rather than specific movements, task-level PbD allows robots to interpret the intent behind actions and understand how to adapt them to new or changing situations. This flexibility enables robots to adjust their actions based on real-time feedback and environmental changes, im-proving their ability to function autonomously and effectively in dynamic environments.

While task-level PbD offers generalization across similar tasks, it often requires adap-tation to the specific context or environment in which the task is performed. This adap-tation can be achieved through sensory feedback, machine learning techniques, or user input, which help the robot refine its behavior in varying scenarios. By incorporating these elements, robots can learn from demonstrations to not only replicate tasks but also enhance their decision-making capabilities. Additionally, task-level PbD can be inte-grated with other levels, such as trajectory-level programming, to ensure that high-level plans are executed with the necessary precision and accuracy. This integration creates a comprehensive framework for robotic learning, combining the strengths of both high-level abstraction and detailed execution.

Symbolic-level programming by demonstration

Symbolic-level programming by demonstration (PbD) involves using symbols and logical constructs to teach robots complex tasks through human demonstrations. This approach focuses on abstract reasoning and decision-making, allowing robots to learn from demon-strations by representing tasks as high-level symbols that capture the essential features and relationships within the task environment. Through symbolic-level PbD, robots are capable of understanding abstract concepts such as objects, actions, and goals, enabling them to generalize learned knowledge to new situations and adapt to changes more effec-tively than with lower-level programming.

In symbolic-level PbD, robots observe human actions and extract symbolic represen-tations that define the task’s structure and intent. For example, in a household setting, a robot might observe a human placing objects in specific locations and represent this task with symbols like “pick,” “place,” “object,” and “destination.” This symbolic representa-tion allows the robot to plan and execute similar tasks even when conditions change, such as when objects or destinations differ from the original demonstration. By abstracting tasks into symbolic forms, symbolic-level PbD facilitates a higher level of autonomy and decision-making, as robots can reason about the task’s goals and constraints and adjust their actions accordingly.

24 CHAPTER 1. PRELIMINARIES

One of the main challenges of symbolic-level programming by demonstration is con-verting sensory data into meaningful symbolic representations. This requires sophisticated algorithms capable of interpreting and categorizing complex sensory inputs. Additionally, integrating symbolic reasoning with real-time control systems can be difficult, as it neces-sitates seamless coordination between high-level planning and precise execution.

Hybrid robot programming by demonstration

Hybrid robot programming by demonstration (PbD) combines multiple learning levels to enable robots to perform complex tasks efficiently and flexibly. For instance, in a furniture assembly task, a robot utilizes trajectory-level programming to capture precise movements from human demonstrations, such as aligning parts and inserting screws. These precise actions are then abstracted into skill-level programming, where the robot develops reusable skills like “screwing” and “aligning” that can be adapted to various assembly scenarios. Task-level programming coordinates these skills into a coherent sequence to achieve the overall assembly goal, allowing the robot to execute multiple actions efficiently and adjust its plan as needed. At the same time, symbolic-level programming provides high-level reasoning capabilities, enabling the robot to interpret assembly instructions and make decisions about the sequence and tools required for different furniture designs. This integration of programming levels allows the robot to adapt to changing conditions and perform complex tasks with precision and autonomy, enhancing its ability to operate in dynamic environments.

1.3.2 Challenges of Programming by Demonstration

While robot programming by demonstration offers significant advantages, it also presents various challenges. A major challenge is the correspondence problem, which arises due to differences in human and robot body structures, making it difficult to map human motions directly to robot actions. Human muscles allow for smooth, continuous, and adaptable movements due to their complex structure and ability to modulate force naturally, while traditional robot actuators produce less compliant movements due to their mechanical nature.

Additionally, it is generally difficult to capture and replicate the forces and torques arising in human demonstrations, especially when robots lack tactile feedback, which is crucial for tasks involving physical interactions. Dynamics play a critical role here, as robots need to understand not just the positions and velocities of movements but also how forces and torques affect interactions with objects and the environment. This complexity requires sophisticated models to translate the demonstrated dynamic tasks into robotic actions accurately.

To address these challenges comprehensively, several strategies are employed. Pro-gramming by demonstration is easier if robots with anthropomorphic features are used to imitate demonstrated tasks. This can simplify the mapping of human motions to robot 1.4. DATA COLLECTION FOR ROBOT PROGRAMMING BY DEMONSTRATION25

actions by, to a degree, mimicking human anatomy and movement patterns. In some cases, focusing on the goal of an action rather than replicating the exact motion can circumvent the need for direct motion mapping, allowing robots to achieve the desired outcomes without accurately replicating the demonstrated motion. Integrating sensors that provide feedback on force, torque, and environmental conditions further enhances robots’ ability to adapt their actions dynamically. The integration of sensory feedback supports adaptive control strategies, which are especially important for handling dynamic interactions during complex tasks.

Another significant challenge is acquiring a sufficient number of high-quality human demonstrations for the desired tasks. Human demonstrations are often limited due to practical constraints such as time, effort, and variability among different human instruc-tors. This scarcity of data can hinder the effectiveness of PbD approaches and the robots’ ability to generalize across different tasks and environments. Advanced machine learning algorithms can help address this issue by enabling robots to learn from fewer examples through techniques such as data augmentation, transfer learning, and reinforcement learn-ing. Additionally, simulation environments offer a safe and effective way to test and refine the computed programs before deploying them in the real world. This approach mini-mizes the risk of damage to both the robot and its surroundings and enables knowledge refinement without the need for the physical execution of tasks.



1.4 Data collection for robot programming by demon-

stration

In robot programming by demonstration (PbD), collecting accurate data on human ac-tions and movements is essential for enabling robots to learn and replicate tasks effectively. Various systems are employed to capture these demonstrations, each offering unique meth-ods to capture and interpret human behaviors. These systems can be categorized based on the technology used, and each has its own advantages and challenges.



1.4.1 Marker-Based Optical Trackers

Marker-based optical tracking systems are widely used in robotics for capturing detailed motion data. These systems rely on cameras and markers to track the position and orientation of objects in three-dimensional space. They are divided into two types: passive and active optical trackers.

Passive optical trackers use markers that reflect infrared light. These markers are often spherical and coated with retro-reflective material. The system includes multiple infrared cameras that emit light, which is then reflected by the markers. The cameras capture these reflections and use the data to triangulate the precise location of each marker in space. The simplicity and cost-effectiveness of passive markers make them appealing for many 26 CHAPTER 1. PRELIMINARIES

applications. However, they are highly susceptible to occlusion and require a controlled environment to maintain accuracy.

Active optical trackers, on the other hand, use markers that emit their own infrared signals. These markers are often equipped with LEDs that actively send signals to the cameras, which then process the data to determine the marker’s position. The advantage of active markers lies in their ability to maintain tracking even when occlusions occur. The unique signals emitted by active markers allow for continuous identification, making them more reliable in dynamic environments. However, active markers require a power source, which adds complexity and potential signal interference from other infrared devices. The power source are usually small batteries, which slightly increases the weight of markers.

One of the key challenges in using optical trackers is the need for data post-processing. After data collection, it is often necessary to correct for occlusions, noise, and inconsis-tencies in marker detection. This involves using sophisticated algorithms to smooth data, fill in missing information, and ensure the captured motion data accurately reflects the intended movements.

Motion capture pipeline for PbD

In PbD, captured motion data is used to teach robots how to perform tasks by replicating human actions. This involves translating human movements into robot kinematics. A typical processing pipeline is as follows:

• Marker Placement:

Markers are placed on or near joints such as shoulders, elbows, wrists, hips, knees, and ankles. This placement is crucial to accurately capture the motion of the limbs. For example, placing markers at the lateral and medial sides of the knee joint helps capture knee flexion and extension. Markers can also be attached to body segments like torso, head, and limbs (middle of the upper arm, forearm, thigh, and calf) to track the movement and rotation of these parts accurately. On larger body segments, markers may be arranged in a triangular or tetrahedral configuration to provide more precise data about the segment’s orientation and deformation. This setup helps in accurately defining the 3D space occupied by the limb. Ensuring symmetrical placement on bilateral body parts is essential for balanced data capture. This means placing markers at equivalent positions on both the left and right limbs. To minimize errors due to skin movement (soft tissue artifacts), markers should be securely attached and, when possible, placed over areas where skin movement is minimal. Strapping markers over tight clothing can also help reduce artifacts.

Passive markers are often attached using double-sided adhesive tape, ensuring they remain securely in place during motion. Active markers, due to their additional weight, may be attached using straps or clips, providing more secure attachment, especially during dynamic activities.

1.4. DATA COLLECTION FOR ROBOT PROGRAMMING BY DEMONSTRATION27

Well-placed markers simplify the data processing pipeline, as the software can more easily identify and track markers throughout the motion sequence. This reduces the need for manual corrections.

• Camera Setup and Calibration:

Multiple high-resolution cameras are arranged around the capture area to provide complete coverage. The camera placement should ensure that each marker is al-ways observed by more than one camera. The cameras are equipped with infrared filters to detect light reflected or emitted by the markers. They are calibrated to define a global coordinate system in which marker motion is computed. Calibra-tion involves capturing known points or patterns and adjusting camera settings to minimize distortion and align perspectives.

Calibration often includes a process called “wand calibration”, where a wand with known markers is moved through the capture volume to help align and synchro-nize the cameras. Proper calibration ensures that all cameras work in harmony to produce accurate 3D reconstructions of the captured motion.

• Image Data Acquisition:

The subject, fitted with markers as explained above, performs movements within the capture volume. Cameras capture the motion from different angles simultaneously. The system synchronizes images from all cameras to ensure a coherent measurement of marker positions. This synchronization is crucial for accurately reconstructing the 3D trajectories of the markers.

• Marker Data Acquisition and Processing:

The software identifies and labels each marker in the captured frames, distinguishing between markers based on patterns of movement and spatial relationships. Active systems simplify this process as each marker can emit a unique signal. The system triangulates the 3D positions of the markers using data from multiple camera per-spectives. This involves solving mathematical equations to determine precise spatial coordinates of each marker. Filtering techniques, such as Kalman filters or spline smoothing, can then be applied to reduce noise and remove artifacts from the data, ensuring smoother motion paths.

Additional post-processing steps may include correcting for any remaining noise, dealing with marker swaps (where markers might get mislabeled), and refining the trajectory data to ensure it accurately reflects the intended motion.

• Skeleton Mapping and Kinematic Analysis:

The 3D marker data is mapped onto a skeleton model representing the subject’s body structure. This mapping involves computing the joint angles of the skeleton 28 CHAPTER 1. PRELIMINARIES

model so that the model markers match the measured 3D marker positions. Typi-cally, the marker positions are first transformed into the 3D poses of body segments, and the joint angles are computed from the relative orientation of adjacent body segments. For example, the elbow joint angle is derived from the orientation of the forearm relative to the upper arm.

Skeleton mapping and human motion analysis perform best when personalized skele-ton models are used. These models are derived from a generic skeleton model of the human body, which contains predefined joint locations and lengths of limbs and other body parts. The generic skeleton model serves as a baseline that can be adjusted to fit an individual’s specific anatomical features. The scaling of the generic model to create personalized models is accomplished by using marker data to measure limb lengths and joint positions. This process often involves an auto-matic calibration step, where the system aligns the template’s proportions with the subject’s unique body dimensions, ensuring precise motion representation. By ad-justing the template, the motion capture system can accurately map the subject’s movements and provide detailed kinematic analysis.



There are several challenges that must be tackled by the described pipeline. A signifi-cant issue are soft tissue artifacts. Markers placed on the skin can move independently of the underlying bone structure, leading to inaccuracies in joint angle calculations. Tech-niques such as marker clusters and algorithms that estimate bone positions based on skin movement help mitigate this problem. Occlusions, where markers are temporarily hidden by other body parts or objects, can cause data gaps. Increasing the number of cameras can help maintain continuous marker visibility. A similar challenge is marker swapping due to fast or complex motions that cause the system to confuse markers. This can be addressed with uniquely identifiable active markers or advanced algorithms that predict each marker’s trajectory.

Environmental conditions, such as poor lighting, reflections, or interference from other infrared sources, can also disrupt data capture. Controlled environments or using adaptive systems that can handle variable conditions are essential. Additionally, infrared-based systems may suffer from interference caused by external infrared sources like sunlight, reducing accuracy. Using filters and adjusting camera sensitivity can help mitigate this interference. Human error during marker placement or system configuration can lead to inaccurate data. Standardized procedures and automated setup systems can reduce the impact of human error, enhancing the reliability of motion capture systems. Lastly, in applications requiring real-time feedback, latency can be a critical issue, affecting the responsiveness of the system. Optimizing algorithms for faster processing and reducing the computational demands can minimize latency, providing more immediate feedback and improving the user experience.

1.4. DATA COLLECTION FOR ROBOT PROGRAMMING BY DEMONSTRATION29

1.4.2 Inertial Measurement Units (IMUs)

Inertial Measurement Units (IMUs) are advanced sensor systems that integrate accelerom-eters, gyroscopes, and sometimes magnetometers to provide comprehensive data about an object’s orientation, velocity, and gravitational forces. They are widely used for motion capture in various applications, including robot programming by demonstration (PbD).

One of the primary advantages of IMUs is their portability. IMUs are small, lightweight, and easy to attach to different parts of the human body. This portability allows for the recording of natural human movements in diverse environments, including outdoor set-tings, without the need for a fixed setup. Unlike optical systems, which may require specific lighting conditions or line-of-sight visibility, IMUs can function effectively in a wide range of settings, providing flexibility in data collection. IMUs excel in capturing ro-tational data, offering precise measurements of body segment orientations. By attaching IMUs to key points on the body, such as the limbs and torso, it is possible to accu-rately compute joint angles and understand how different body parts interact during task execution.

IMUs can also measure the relative position of body parts by integrating the acceler-ation data captured by the accelerometers. This process involves integrating acceleration over time to obtain velocity and then integrating velocity to derive displacement. How-ever, this method provides a way to estimate how much a particular body segment has moved relative to its initial position, and this approach is inherently subject to errors. Any inaccuracies in the initial acceleration measurements are compounded through the integration process, leading to drift over time. Drift refers to the accumulation of er-rors in the calculated position and orientation, which can result from sensor noise, bias, temperature fluctuations, and prolonged use.

Despite their benefits, IMUs present several challenges that need to be addressed to maximize their effectiveness in PbD. One significant issue is drift, a phenomenon where small errors in measurement accumulate over time, leading to inaccuracies in the reported position and orientation. Implementing algorithms to correct drift and recalibrate sensors regularly is essential to ensure reliable data capture. Kalman filters or other sensor fusion techniques are often used to mitigate drift by combining IMU data with information from other sensors,



1.4.3 Electromagnetic Trackers

Electromagnetic tracking systems use magnetic fields to determine the position and ori-entation of sensors relative to a fixed transmitter. Like IMUs and unlike optical systems, electromagnetic trackers do not require a direct line of sight between the transmitter and sensors, making them highly effective in cluttered or confined environments where visual interference is common. This capability allows for seamless motion capture and teaching of robots in complex setting. The real-time tracking capability of electromagnetic sys-tems provides immediate feedback, which is vital for applications that demand fast and 30 CHAPTER 1. PRELIMINARIES

accurate motion capture, such as PbD.

However, electromagnetic tracking systems face some significant challenges, primarily due to their sensitivity to metal and electronic interference. Such interference can distort the magnetic field and lead to inaccuracies in position and orientation data, necessitating careful setup to avoid potential disruption sources. Additionally, the effective range of these systems is typically limited to a few meters from the transmitter, which can restrict their use in large spaces. For robots to learn effectively in PbD, it is crucial to strategically position transmitters to cover the intended area adequately. While calibration is usually straightforward, it may need to be repeated if environmental conditions change, ensuring the accuracy and integrity of the captured data.

In the context of PbD, electromagnetic tracking systems are beneficial for applications that require continuous and unobstructed data capture. They are particularly useful in scenarios where optical systems might fail, such as in virtual reality environments.



1.4.4 Computer Vision Techniques

Computer vision systems leverage cameras and AI algorithms to analyze visual data and track movements without the need for markers. This markerless motion capture approach simplifies setup, reduces time and cost, and provides a more natural and unob-trusive method for capturing human motion. Such systems are versatile and applicable across various environments, making them a valuable tool in robotic programming by demonstration (PbD).

Deep learning has significantly enhanced computer vision systems, particularly for pose and posture estimation in robot PbD. Algorithms such as convolutional neural net-works (CNNs) and recurrent neural networks (RNNs) extract features and patterns from complex visual data. CNNs, which excel in image recognition and classification tasks, are particularly effective for identifying and tracking body parts and objects during demon-strations. By employing multiple layers of convolutional filters, CNNs detect intricate patterns in visual data, facilitating accurate motion tracking without requiring manual feature extraction. RNNs complement this capability by processing sequential data, en-abling the detection and reconstruction of complex action sequences demonstrated by humans.

Several systems have successfully integrated deep learning techniques for pose detec-tion, which is critical for robot PbD. OpenPose and Google’s PoseNet estimate human poses in real-time by identifying key points on the human body, such as joints and limbs. These systems use CNNs to analyze video input, translating pixel information into detailed pose estimations. OpenPose is widely recognized for its ability to detect multiple body parts simultaneously, including hands and facial landmarks, while PoseNet focuses on identifying key body joints and provides efficient solutions for single-person pose estima-tion. Both systems exemplify the power of CNNs in analyzing complex motion patterns.

Additionally, DeepLabCut is an open-source toolkit that uses CNNs for precise motion 1.4. DATA COLLECTION FOR ROBOT PROGRAMMING BY DEMONSTRATION31

capture. By training the network to recognize specific body parts in a small subset of labeled images, DeepLabCut generalizes this knowledge to new frames, which enables accurate tracking of posture and motion. MediaPipe, developed by Google, employs machine learning models to detect landmarks on the body, face, and hands. Through the combined use of CNNs and graphical models, MediaPipe ensures spatial and temporal consistency. This makes it a robust solution for understanding human motion in real-time.

Despite their advantages, computer vision systems face several challenges in robot PbD. Accurate processing requires complex algorithms, significant computational power, and high-performance hardware, which can limit real-world applications. Environmen-tal factors such as lighting conditions, shadows, and occlusions can also affect system performance, leading to inconsistencies in image quality and tracking accuracy. Address-ing these challenges requires the development of robust algorithms capable of adaptation to varying conditions and compensation of these disturbances. As technology advances, these challenges are gradually being mitigated, which makes the applicability of computer vision systems in robotic programming by demonstration increasingly viable.

1.4.5 Teleoperation

In a typical teleoperation setup for PbD, a human operator uses control interfaces such as joysticks or haptic devices to guide the robot’s actions. As the operator controls the robot, the robot’s movements are recorded, capturing the precise motion trajectories. If the robot is equipped with force-torque sensors, the applied forces due to environmental interactions can also be recorded. The operator can observe how the robot moves either directly or through live video feeds. Haptic devices offer tactile feedback, allowing the operator to “feel” the environment and perform delicate manipulations with precision.

The data collected through teleoperation focuses on the robot’s actual movements rather than the operator’s commands. By recording how the robot moves in response to human guidance, the system captures detailed information about the robot’s kinematics and dynamics.

Despite its advantages, teleoperation for PbD faces challenges. Network latency can affect the responsiveness of the control system, leading to inaccuracies in executing com-mands, especially in fast-paced or safety-critical tasks. To mitigate these delays, robust communication systems and predictive algorithms are necessary to ensure smooth control. Additionally, translating human inputs into robot actions can be challenging, especially for intricate tasks.

1.4.6 Virtual Reality Systems

Virtual Reality (VR) systems provide a simulated environment where users can perform tasks and demonstrate actions to gather data for PbD. VR systems like Oculus Rift and Microsoft HoloLens offer immersive experiences that facilitate detailed task demonstra-tions by allowing users to interact with virtual objects and environments as if they were 32 CHAPTER 1. PRELIMINARIES

real. The primary advantage of VR systems in PbD is the creation of a safe demon-stration environment. This enables users to explore complex tasks without the fear of causing damage or harm. VR systems provide a rich, controlled environment for cap-turing demonstrations, allowing for precise manipulation and interaction that might be difficult to achieve in the real world.

However, one of the significant challenges when transitioning from VR to real-world applications is the reality gap. This gap arises due to differences between the simulated environment and the physical world, particularly concerning physical dynamics and in-teractions. For example, a robot learning a task in VR might not encounter the same friction, gravity, or collision responses when performing the task in the real world. These discrepancies can lead to errors or inefficiencies in the robot’s performance when moving from the virtual to the physical environment. Addressing the reality gap requires sophis-ticated modeling techniques to ensure that the VR environment accurately reflects the real-world conditions.

Modeling environments with sufficient accuracy for PbD is a complex and challenging task. It involves creating detailed simulations of physical properties, such as material textures, forces, and kinematic constraints. This level of detail requires advanced software and computational power to achieve realistic simulations, making it a significant barrier to the widespread adoption of VR in PbD setups. For instance, simulating the weight and resistance of objects, the tactile feedback when interacting with surfaces, and the dynamic response of objects to forces all require computationally expensive algorithms.

Moreover, VR systems often require sophisticated hardware, such as high-resolution headsets and motion tracking equipment, to ensure realistic simulations. This adds to the complexity and cost of using VR for PbD.. Despite these challenges, ongoing ad-vancements in VR technology continue to improve the fidelity and realism of simulations, gradually narrowing the reality gap and enhancing the effectiveness of VR in teaching robots through demonstration.



1.4.7 Wearable Sensor Systems

Wearable sensors, such as data gloves and sensesuits, are worn directly on the body to capture detailed motion data. In the case of humanoid robots, sensesuits are designed to replicate the kinematic structure of the target humanoid robot. Thesy consist of pas-sive mechanical structures embedded with sensors, such as goniometers, which measure joint angles and their motion. This design ensures that the recorded human motion can be directly mapped onto the robot without complex transformation algorithms. This eliminats the correspondence problem often encountered in PbD. Wearable sensors of-fer high-resolution data, capturing even the most subtle movements and enabling the recording of complex trajectories essential for sophisticated robotic tasks.

Despite their advantages, wearable systems can be cumbersome and may interfere with the user’s natural movement. The bulkiness can restrict fluidity, leading to less 1.4. DATA COLLECTION FOR ROBOT PROGRAMMING BY DEMONSTRATION33

naturalistic data capture. Moreover, data gloves require frequent calibration to maintain accuracy and can experience drift over time, compromising the precision of the data collected. While necessary, the calibration processes can be time-consuming and may not always eliminate all sources of error.

Sensesuits, much like clothing, are typically designed in specific sizes, which can be a limitation when working with human demonstrators of varying heights and body types. This means that a different sensesuit may be required for taller or shorter individual. Additionally, because sensesuits are constructed to mirror a specific kinematic structure, they are only suitable for robots with such a kinematic structure. This bespoke design approach makes sensesuits more expensive than other, more adaptable measurement de-vices.



1.4.8 Kinesthetic Guidance

Kinesthetic guidance involves physically guiding a robot through the desired motions. In this approach, the human instructor directly interacts with the robot, manually moving its end-effector to demonstrate the specific movements needed for a task. This hands-on method allows for precise control over the robot’s movements, making it a highly effective way to teach robots complex motions.

The primary advantage of kinesthetic guidance is its directness and intuitiveness. Since the human instructor physically guides the robot, the motions are recorded directly in the robot’s joint space, eliminating the correspondence problem often encountered in other Programming by Demonstration (PbD) methods. This direct mapping ensures that the robot accurately learns the desired movements without needing complex transformation algorithms to translate human motion into robotic actions. The instructor can specify key points for point-to-point movements or capture entire joint space trajectories, providing flexibility in how the motion data is recorded. This approach is particularly useful for tasks where the focus is on the robot’s end-effector motion, such as manipulating objects, assembling components, or performing precise tasks like welding.

Some robots, like the Franka Emika, are equipped with button interfaces that enhance the kinesthetic guidance process. These interfaces allow the instructor to interact with the robot’s control system directly during the guidance process. By pressing buttons located on the robot, the instructor can easily switch between different modes, such as teaching mode and playback mode, or lock specific joints to restrict movement to certain axes. This capability makes it easier to demonstrate complex motions and adjust the robot’s posture without interrupting the teaching process. The button interfaces provide real-time feedback and control, allowing the operator to focus on the subtleties of the task at hand while ensuring that the robot accurately records the intended path.

Despite its advantages, kinesthetic guidance has some limitations and requires skilled operators to guide the robot accurately. The instructor must have a good understanding of the task and the robot’s capabilities to ensure that the guided motion is both precise and 34 CHAPTER 1. PRELIMINARIES

safe. Additionally, while kinesthetic guidance excels in teaching end-effector movements, it can be challenging to ensure specific robot configurations, especially for redundant robots with more than six degrees of freedom. Redundant robots have multiple ways to achieve the same end-effector position, making it difficult to control the robot’s entire posture during guidance. This can lead to suboptimal configurations that might not be ideal for certain tasks or environments.

Furthermore, kinesthetic guidance is less practical for dual-arm robots and other robots requiring simultaneous control of multiple limbs. Since the instructor typically uses both hands to guide a single robot arm, managing two arms at once can be cumbersome and impractical. This limitation necessitates alternative approaches to handle multi-limb tasks effectively. The development of advanced interfaces and control strategies continues to be an area of active research, aiming to improve the usability and effectiveness of kinesthetic guidance for complex robotic systems.

1.4.9 Summary

Each of these systems offers unique benefits and limitations, making them suitable for different types of applications in robot programming by demonstration. The choice of data collection system depends on factors such as the task’s complexity, the environment, and the desired level of precision and adaptability. Combining these technologies can often yield the best results, leveraging the strengths of each system to overcome the limitations of others. By employing a diverse range of data collection techniques, robots can achieve a more comprehensive understanding of human behavior, leading to more effective and versatile robotic systems.





Chapter 2




Polynomials and splines



A standard approach to planning robotic motions is to specify a time-dependent para-metric trajectory that determines the robot’s motion from the start at t = 0 to its end at t = T . Trajectories are typically defined in the robot’s configuration space, specifying the joint coordinates n y ( t ) , ∀ t ∈ [0 , T ] , y ( t ) ∈, n R where is the number of the robot’s degrees of freedom, or in the task space, such as 3-D Cartesian space. In task space, the robot’s position 3 4 p ( t ) ∈ t ∈ ⊂ R and orientation q ( ) SO(3) R, expressed as unit quaternions, are typically specified. Other task spaces, such as a 2-D plane, may also be relevant depending on task constraints.

Smoothness and continuity are essential for generating feasible trajectories. To ensure stable and dynamically consistent motion, trajectories should be continuous up to at least the second derivative, meaning the trajectory, velocity, and acceleration profiles are smooth throughout. For tasks requiring precise control, such as point-to-point movements or following a complex path, this continuity minimizes abrupt transitions, reduces wear on actuators, and ensures energy-efficient motion.

For simple point-to-point motions, minimum jerk trajectories provide an effective solu-tion. Minimum jerk trajectories generate smooth straight-line spatial paths in Cartesian space with speed profiles determined by fifth-order polynomials. Orientation changes are handled using SLERP (Spherical Linear Interpolation), which ensures the shortest path in quaternion space. However, more complex tasks, such as following curved paths or adapting to obstacles, require more flexible representations.

Splines, particularly B-splines, address these challenges by enabling smooth trajecto-ries over multiple segments with high-order continuity. Splines allow local control, mean-ing changes in one segment affect only a small neighborhood. This makes them suitable for handling dynamic constraints and adapting to new tasks. This flexibility is especially valuable for learning robots, which must balance noisy sensor data, task constraints, and trajectory smoothness.

The differences between polynomials and splines stem from their structure and flex-ibility. While polynomials are simple and computationally efficient, they are limited to describing single-segment trajectories and may suffer from oscillations when used for

35

36 CHAPTER 2. POLYNOMIALS AND SPLINES

longer paths. Splines, on the other hand, are composed of piecewise polynomial segments, making them more robust and versatile for complex trajectory generation. Splines such as B-splines and natural splines offer additional benefits like smooth transitions, the ability to interpolate or approximate data, and ease of handling constraints.

In this chapter, we explore the mathematical tools of polynomials and splines for tra-jectory generation. We begin with point-to-point motion using minimum jerk trajectories and SLERP for orientation, followed by a detailed discussion of spline representations. This includes polynomial splines and B-splines, and their role in generating complex, smooth robot motions.



2.1 Point-to-point motion

Minimum jerk point to point movements between two positions in Cartesian space with zero initial and final velocities and accelerations result in straight lines spatial paths with speed profiles defined by a fifth order polynomial. Note that a straight line is the shortest path between two points in space. Equivalently, a SLERP trajectory defines the shortest path between two orientations in the orientation space. The term SLERP has been coined in computer graphics and stands for spherical linear interpolation. To define such trajectories, the following data needs to be available:

• T T T T T T 3 the initial pose t , q t , q t ∈ i i and the final pose f f , where i , t f R respectively

denote the initial and final position and 3 4 q i f R , q ∈ S ⊂ the initial and final orientation expressed as unit quaternions, and

• the desired travel time T .

We write

p T T T T T T T TT ( t ; t , q , t , q , T ) = t ( t ; t , t , T ) , q ( t ; q , q , T ) . (2.1)

minJrk i i i f f f i f

Next, we first provide the formulas for the position trajectory t(t), followed by the formulas for SLERP trajectory q(t) expressed in the space of unit quaternions.



2.1.1 Minimum jerk trajectories

Given the initial and final positions 3 t i f R , t ∈ , the minimum jerk position trajectory t 3 ( t ) ∈ R is given by a fifth order polynomial

t 5 4 3 2 ( t ; t , t , T ) = a t + a t + a t + a t + a t + a , (2.2)

i f 6 5 4 3 2 1

2.1. POINT-TO-POINT MOTION 37

with the coefficients computed as follows:

a1 = ti, (2.3)

a2 = 0, (2.4)

a3 = 0, (2.5)

a4 = , (2.6) 3 T 10(tf − ti)

a5 = − , (2.7) 4 T 15(tf − ti)

a6 = . (2.8) 5 T 6(tf − ti)

Here 3 a i R ∈ , ∀i. Note that

t 5 4 3 ( t ; t , t , T ) = t + ( t − t ) 6 ( t/T ) − 15 ( t/T ) + 10 ( t/T ) . (2.9)

i f i f i



2.1.2 SLERP trajectories

Given the initial and final orientations 3 q , q ∈ S, a SLERP trajectory is defined as

i f

follows:

q (t; qi, qf , T ) = ϑ1(t)qi + ϑ2(t)qf , 0 ≤ t ≤ T, (2.10)

where

sin ((1 − s(t)) θ)

ϑ1 (t) = , (2.11)

sin(θ)

sin (s(t)θ)

ϑ2 (t) = , (2.12)

sin(θ)

and θ is equal to the half of the rotation angle needed to rotate q i to qf . It can be calculated as

θ 3 = arccos( v ) , ( v, u ) = q f i R ∗ q , v ∈ , u ∈, R (2.13)

where ∗ denotes the quaternion product and the quaternion conjugation as defined in

Eqs. (1.20) and (1.22), respectively. To compute the shortest path in SO(3), we need to select the quaternion (q f or −qf ) closer to qi . If v ≥ 0, then qf is closer to qi, otherwise we replace q f with −qf . This way, v is always non-negative.

The temporal course of motion is determined by a monotonously increasing function s (t) ∈ , ≤ R 0 s(t) ≤ 1, s(0) = 0, s(T ) = 1. A good choice is the fifth order polynomial that provides the speed profile for minimum minimum jerk trajectories

s 5 4 3 ( t ) = 6 ( t/T ) − 15 ( t/T ) + 10 ( t/T ) . (2.14)

It can be shown that 3 q ( t ) ∈ S, ∀t.

Since ˙ s(0) = ¨ s(0) = ˙ s(T ) = ¨ s(T ) = 0, the robot is at rest with zero angular velocity 38 CHAPTER 2. POLYNOMIALS AND SPLINES

and acceleration at the beginning and the end of motion. The angular velocity ω ω ω at time t can be computed using the formula

ω ω ω (t) = 2 ˙ q(t) ∗ q(t), (2.15)

where q(t) is defined as in (2.10),

q ˙ ˙ ( t ) = ϑ1(t)qi + ˙ ϑ2(t)qf , (2.16)

ϑ ˙ cos ((1 − s(t)) θ) 1 ( t ) = − s ˙(t)θ, (2.17) sin( θ )

ϑ ˙ cos (s(t)θ) 2 ( t ) =s ˙(t)θ, (2.18) sin( θ )

and s(t) is defined as in (2.14).



2.2 Polynomial splines

Due to their flexibility and ease of computation, spline functions have an important role in approximation theory and statistics. From the statistical point of view, they are described in great detail in [6, 37], whereas the classic book of de Boor [5] provides their detailed description from the numerical analysis and approximation theory point of view. They have also been applied for the generation of complex robot trajectories [33]. A polynomial spline function of order 2m on the strictly increasing knot sequence t1 < . . . < tN is a polynomial of order 2m − 1 on every interval ti ≤ t < ti+1, i < N, and tN−1 ≤ t ≤ tN

k−1

s X j ( t ) = γ t , i = 1, . . . , N − 1. (2.19)

i,k ij

j=0

For k = 1, we obtain linear splines, for k = 2 cubic splines and for k = 6 quintic splines. These are the most useful polynomial splines for robotic applications.

As there are N N − 1 intervals, a polynomial spline of form (2.19) on knot sequence { τ }

i i=1

has k(N − 1) parameters. However, a spline function useful for robotic applications need to be continuous and have continuous derivatives up to at least second order. To achieve continuity, we impose constraints

s(j) (j) ( t ) = s (t ), i = 1, . . . , N − 2, j = 0, . . . , d, (2.20)

i,k i+1 i+1 i +1 ,k

where d is the highest derivative order that is required to be continuous. Besides k(N − 1) parameters, we now also have (d + 1)(N − 2) constraints. In this case there are altogether (k − d − 1)N + 2(d + 1)− k) free parameters. For cubic splines with continuous derivatives up to the the second order, we have k = 4, d = 2. Thus in this case there are N + 2 free parameters. Similarly, in case of quintic splines with continuous derivatives up to the 2.2. POLYNOMIAL SPLINES 39

fourth order, we obtain N + 4 free parameters.

A linear combination of polynomials of a certain order is a polynomial of the same order. Thus the space of all spline functions as defined in (2.19) is a vector space. A linear combination of spline functions that fulfill constraints (2.20) also fulfills these same constraints. This means that the space of all spline functions of order k that fulfill constraints (2.20) is also a vector space. Its dimension is (k − d − 1)N + 2(d + 1 − k).

In case of polynomial splines with continuous derivatives, the representation from equation (2.19) is not ideal because it has more parameters than the dimension of the vector space it represents. In the following we show how to construct a minimal set of basis spline functions that can be used to represent the vector space of polynomial splines with continuous derivatives.



2.2.1 B-splines

Let’s assume that we a have a nondecreasing knot sequence τττ = {τ1, . . . , τ ˜ }, τi ≤ τi+1. N A B-spline of order 1 is defined as follows



B  i i+1  1 , τ ≤ t < τ  ˜ i, 1 ( t ) = 1 , τ i < τ i +1 = τ ˜ , t = τ ˜ , i = 1 , . . . , N . (2.21) N N



0, otherwise



Unlike at the beginning of Section (2.2), here we allow that a knot appears more than once in the knot sequence τττ , i.e. τi = τi+1. Note that according to this definition, Bi = 0 if τ i = τi+1. Moreover, Bi(t) ̸= 0 only on the interval [τi, τi+1) (or [τi, τi+1] for one B-spline at the right end of the knot sequence τττ ). Note also that for any t ∈ [τ1, τ ˜ ], we obtain N

N ˜ −1

X B (t) = 1.

i,1

i=1

From the first order splines defined in equation (2.21), we recursively obtain the B-splines of higher orders (k > 1)

B i,k (t) = γi,k(t)Bi,k−1 (t) + (1 − γi+1,k (t))Bi+1,k−1(t), (2.22)

 t − τ

i , τ ̸= τ

γ +k−1 τ  i  i+k−1 τ i −

i,k (t) = i .

0, otherwise



Note that for τ i ̸= τi+k−1, τi+1 ̸= τi+k, equation (2.22) can be rewritten as

t − τi τi+k − t

Bi,k (t) = Bi,k−1(t) + Bi+1,k−1(t), (2.23)

τi+k−1 − τi τi+k − τi+1

40 CHAPTER 2. POLYNOMIALS AND SPLINES

For the B-splines of order k, ˜ N − k basis spline functions of order k can be obtained by the recursive formulas (2.21), (2.22).

Since Bi,1 are defined as constant step functions and higher order B-splines are obtained from lower order B-splines via multiplications that are linear in t, every B-spline of order k is a piecewise polynomial of order k − 1 on knot sequence τττ .

Next we prove by induction that Bi,k(t) = 0 outside of the support interval [τi, τi+k] and that Bi,k (t) > 0, ∀t ∈ (τi, τi+k). By definition (2.21) of B-splines of order 1, both is true for k = 1. Higher order B-splines are defined by the recursive formula (2.22), where B i,k is computed as a linear combination of Bi,k−1 and Bi+1,k−1 . By induction assumption, Bi,k−1 and Bi+1,k−1 vanish outside of the intervals [τi, τi+k−1] and [τi+1, τi+k], respectively. Thus their linear combination B i,k is zero outside of the interval [τi, τi+k]. Similarly, by induction assumption B i,k−1 and Bi+1,k−1 are greater than zero on (τi, τi+k−1) and (τ i+1, τi+k), respectively. Given that both γi,k(t) and 1 − γi+1,k(t) are greater than zero on these same intervals and Bi,k−1(τi+k−1) = 0, Bi+1,k−1(τi+k−1) > 0, we have proven that B i,k(t) > 0, ∀t ∈ (τi, τi+k).

An important property of B-splines is that although the lowest order B-splines B i,1 are not continuous, the two weights γi,1(t) and 1 − γi+1,1(t) are exactly right to cancel at each intermediate point τ i+1 the jump discontinuity of Bi,1 (t) and Bi+1,1(t) so that Bi,2(t) are continuous also at the internal knot points τ i+1. The order of continuity increases also in derivatives as higher order B-splines are computed. In there are no multiple internal knot points, the B-splines of order k are continuously differentiable up to the derivative of order k − 2 at the internal knot points. The multiplicity of internal knot points reduces the order of continuity.

This proof of this property relies on Marsden’s identity, a fundamental result in spline theory. Intuitively, Marsden’s identity provides a recursive relationship that combines adjacent B-splines to construct higher-order B-splines. The identity ensures that the resulting B-splines maintain the necessary continuity and smoothness properties. Specifi-cally, Marsden’s identity guarantees that the weighting functions used to combine adjacent B-splines inherently smooth out discontinuities in both the function and its derivatives, up to the required order. This mathematical mechanism underpins the construction of B-splines and explains their continuity properties. For a comprehensive proof and deeper understanding of Marsden’s identity, we refer the reader to de Boor’s seminal book on splines [5].

In summary, B-splines defined by equations (2.21) and (2.22) have the following prop-erties

1. Each B-spline Bi,k depends only on the knots {τi, . . . , τi+k}.

2. Each B-spline Bi,k is a piecewise polynomial on interval [τi , τi+k), thus it can be

written as

k−1

B X ( t ) = s (t)B (t), (2.24)

i,k i+j,k i+j,0

j=0

2.2. POLYNOMIAL SPLINES 41

1



0.8



0.6



0.4



0.2



0 0

0 0 1 2 3 4 5 6

Figure 2.1: Cubic B-Splines on the knot sequence {0, 0, 0, 0, 1, 2, 3, 4, 4, 5, 6, 6, 6, 6}



where si+j,k are defined as in equation (2.19).

3. Bi,k(t) = 0 outside of the interval [τi, τi+k] and Bi,k is equal to zero everywhere if

τi = τi+k.

4. Bi,k(t) > 0, ∀t ∈ (τi, τi+k).

5. Each Bi,k is k − 1 − mj continuously differentiable at every knot point τj , where mj

is the multiplicity of the knot τj.

Example cubic B-splines with multiple knots at the edge of the support interval are shown in Fig. 2.1.



2.2.2 Natural and complete smoothing splines

Let’s now assume a monotonically strictly increasing sequence N { t } , t < t , and

i i=1 i i+1

define a knot sequence N+4m−2 τττ = { τ i − i τ } , where = . . . = τ = t = . . . = , =1 1 2m 1 N+2m 1 τ

τN+4m−2 = tN , and τ2m+1 = t2, . . . , τN −2+2m = tN−1. As the multiplicity of all internal knot points in τττ is equal to 1, the B-splines of order 2m defined on knot sequence τττ are 2m − 2 continuously differentiable on [t1, tN ]. The multiplicity of edge knot points t1 and t N does not cause discontinuities on interval [t1, tN ] because the B-splines at the edges are respectively right and left continuous.

By induction we can show that B-splines N+2m−2 { B } associated with knot sequence

i,2m i=1

42 CHAPTER 2. POLYNOMIALS AND SPLINES

τττ form a local partition of unity, i.e.

N+2m−2

X Bi,k (t) = 1, ∀t ∈ [t , t ]. (2.25)

1 N

i=1

This property holds on the complete support interval [t1, tN ] because of 2m multiple knots at the edges. Without multiple knots, additional knots outside of the support interval [t 1, tN ] would be needed to obtain the local partition of unity property in the neighborhood of t1 and tN .

Next we observe that B-splines N+2m−2 { B } are linearly independent. To prove

i,2m i=1

this we must show that if P +2 −2 N m αiBi,2m(t) = 0, ∀t ∈ (tj , tj+1), then αi = 0, ∀i. i =1 For any interval (tj , tj+1), the local support property implies that there are only 2m B-splines different from zero on this interval, i.e. Bj,2m, . . . , Bj+2m−1,2m. Thus for any t ∈ (t j, tj+1), we obtain P2m−1 αj+iBj+i,2m(t) = 0. But on the interval (tj , tj+1), B-splines i =0 are just polynomials of degree 2m − 1. Using the Mardsen’s identity, one can show that each basis polynomial i t, i = 0, , . . . , 2m − 1, can be written as a linear combination of B 2m−1 ( t ). Thus { B } are linearly independent on (t , t ). It follows that if

j+i,2m j+i,2m i=0 j j+1

P2m−1 α B (t) = 0, then α = 0, ∀i. Since this is true for any interval (t , t ),

i=0 j+i j+i,2m j+i j j+1

it follows that B-splines are linearly independent.

In Section 2.2 we have seen that the space of polynomial splines with d continuous derivatives is at least (k − d−1)N +2(d+1)−k dimensional. Inserting k = 2m, d = 2m −2 into this formula, we obtain (2m − d − 1)N + 2(d + 1 − m) = N + 2m − 2. This means that B-splines N+2m−2 { B i,2m i=1 d } form the basis for the space of polynomial splines with

continuous derivatives.

In summary for B-splines N +2m−2 { B } defined on knot sequence τττ , we can add the

i,2m i=1

following properties to the list from Section 2.2.1:

6. N+2m−2 { B } form a local partition of unity.

i,2m i=1

7. N+2m−2 { B i,2m i=1 m } form the basis of polynomial splines with 2 − 2 continuous deriva-

tives.

Thus each polynomial spline can be written as

N+2m−2

s( X t ) = α B (t). (2.26)

i i,2m

i=1

Moreover, since the support interval of every Bi,2m is equal to [ti, ti+2m], we obtain for t j ≤ t ≤ tj+1

j

s X ( t ) = α B (t) (2.27)

i i,2m

i=j −2m+1

Thus for any t ∈ [t1, tN ], we need to consider the values of only 2m B-splines to calculate the value of s at t.

2.2. POLYNOMIAL SPLINES 43

Let’s assume now that for a given robot degree of freedom, we have a sequence of desired values N { t , f } . Let’s consider the following variational problem

i i i=1

Z t N

minimize m 2 m ( f ( t )) dt, f ∈ L[t , t ] (2.28)

2 1 N

t1

under the conditions (i) f(t ) = f , i = 1, . . . , N,

j i

where m (m −1) L [ t , t ] denotes the space of functions with continuous derivatives up to f

2 1 N

and square-integrable derivative (m) f. It is well known [6, 37] that the optimization problem (2.28) has a unique solution s that belongs to the space of polynomial splines of order 2m with 2m − 2 continuous derivatives with a further condition that

s (m+i) (m+i) ( t ) = s(t ) = 0, i = 0, . . . , m − 2. (2.29)

1 N

Its solution can thus be written in the form (2.26). Polynomial splines that solve optimiza-tion problem (2.28) are called natural (smoothing) splines [5]. For cubic natural splines we obtain the additional conditions ¨ s(t 1) = ¨ s(tN ) = 0, while for quintic natural splines, the additional conditions are given as (3) (3) (4) (4) s ( t ) = s ( t ) = s ( t ) = s(t ) = 0. We

1 N 1 N

already know that the dimension of the space of polynomial splines of order 2m on knot sequence τττ is equal to N + 2m − 2. Thus we can compute the natural spline interpolating the values f i at ti by solving the following linear interpolation system

N +2m−2

X (m+j) α B(t ) = 0, j = m − 2, . . . , 0

i i 1

i=1

N +2m−2

X αi Bi (t ) = f , j = 1, . . . , N (2.30)

j j

i=1

N +2m−2

X (m+j) α B(t , j , . . . , m − 2 ) = 0 = 0

i i 1

i=1

This system can be rewritten in a matrix form

Xα α α = f , (2.31)



44 CHAPTER 2. POLYNOMIALS AND SPLINES

where (N+2m−2)×(N+2m−2) N+2m−2 N+2m−2 X ∈ α α α R , ∈ f ∈ R , and R are given by

 (2m −2) (2m−2) (2m−2)   B 1 , 2 m ( t 1 ) B t . . . B t 2 , 2 m ( 1 ) ( 1 )  N +2 m − 2 , 2 m 0

  ... ... ... ... ...

 

 

 

 

 (   m ) ( m ) ( m ) ( ) ( ) ( )

 1,2m 1 2,2m 1 N +2m−2,2m 1   B t B t . . . B t 0





 

 

 

 1,2m 1 2,2m 1 N +2m−2,2m 1  α     B ( t ) B ( t ) . . . B ( t ) f

 

1  1 

 

 

    α f B ( t ) B ( t ) . . . B ( t )

X =   , α α α =   , f = ,           2   1,2m 2 2,2m 2 N +2m−2,2m 2   2     ... . .. . ..   ...       



 1,2m N 2,2m N N+2m −2,2m     B ( t ) B ( t ) . . . B ( t α f N ˜ )

N N 

 

 

  0

  ( m ) ( m ) ( m ) ) B ( t ) B t ) ( t ( . . . B

 

  1 , 2 m N 2 , 2 m N N +2 m − 2 , 2 m N ...

 

 

...   ... ... ...

 

 

 

 

 

B t B 1 (2m−2) (2m−2) (2m−2) 0 ,2m N t ( ) . . . B 2 ,2m N t ( ) N

+2m − N 2 , 2 m ( )

(2.32)

where ˜ N = N + 2m − 2. Due to the limited support of B-splines Bi,2m, the system matrix X is banded with only 2m values different from 0 in every row. Thus the system (2.31) can be solved efficiently. The partition of unity property of B-splines further contributes to the stability of numerical computations.

For robot trajectories, it is often required that the robot starts and ends its motion with velocities and accelerations equal to zero. It turns out that variational problem

Z t N

minimize m 2 m ( f ( t )) dt, f ∈ L t [ , t ] (2.33)

2 1 N

t 1

under the conditions (i) f(t ) = f , i = 1, . . . , N,

j i

j

X αiBi,2m (t), j = 1, . . . , m − 1,

i=j −2m+1

has a unique solution that belongs to the space of polynomial splines of order 2m on knot sequence τττ . Polynomial splines that solve optimization problem (2.33) are called complete smoothing splines [11]. Unlike in the case of natural splines, there are no additional con-ditions in this case. This is not surprising as variational problem (2.33) already contains N + 2m − 2 conditions, which is equal to the dimensionality of the space of polynomial splines of order 2m on knot sequence τττ . Similarly to variation problem (2.28), we can solve (2.33) by solving equation system (2.31), where the right hand side vector f and variable vector α α α are exactly the same as in (2.32), whereas system matrix X has different first and last m − 1 rows. Instead of derivatives of order m + j, j = 0, . . . , m − 2, we compute the derivatives of order 1 + j, j = 0, . . . , m − 2.

Interpolation works well if there is no noise in the desired positions fj that need to be interpolated. However, if the desired positions are noisy, it is better to only approximate the desired positions. Let’s now assume that we have a sequence of noisy desired positions 2.3. SUMMARY 45

{ ˜ ˜ M t j , f j } M > N m − t t j =1 . We assume that + 2 2 and 1 = ˜ 1, tN = ˜ t M , whereas the inter-mediate times ˜ ˜ t 2 , . . . , t M−1 may or may note coincide with the knot points t2, . . . , tN −1. We can now solve the following overdetermined system of linear equations with equality constraints

minimize 1 ˜ 2 ∥ X α α α − f ∥ (2.34) 2

N +2m−2

under the conditions X (j) α B (t ) = 0, j = 1, . . . , m − 1

i i,2m 1

i=1

N +2m−2

X (j) α B (t ) = 0, j = 1, . . . , m − 1

i i,2m N

i=1

These types of problems are called constrained linear least squares. In (2.34), X ∈

M ×(N+2m−2) N +2m−2 M , α α α ∈ , and ˜ f ∈ are computed as follows

R R R

      B ˜ 1,2m 1 2,2m 1 N +2m−2,2m 1 α (˜ t ) B (˜ t ) . . . B (˜ t ) f

1 1

 − 1 , 2 m 2 2 , 2 m 2 N +2 m2,2m 2   2 B       ˜ (˜ t ) B (˜ t ) . . . B (˜ t ) α   f2 

X =   , α α α =   , f =         . .. . .. ... ...

     

     

B ˜ 1 , 2 m (˜ t M ) B 2 , 2 m (˜ t M ) . . . B N +2 m − 2 , 2 m (˜ t α M ) N +2 m − 2 f M

(2.35)

The relationship between the number of measurements M and the number of knot points N +2m−2 determines the amount of smoothing versus interpolation. If N is too large, the polynomial spline computed by solving constrained linear least squares problem (2.34) will result in a noisy trajectory because the measurements ˜ M { f j } j =1 will be nearly interpolated. If N is too small, the resulting trajectory will be oversmoothed and will not follow the data well. One can start with a large N so that the resulting spline nearly interpolates the data and then gradually decrease it until the resulting errors {PN +2m−2 ˜ M α i B i, 2 m (˜ t j ) − f j } i =1 j =1

reach the expected standard deviation in the data. Bisection can be used to determine the optimal N.



2.3 Summary

Polynomial and spline-based trajectory representations are foundational tools in robotics for generating smooth and flexible robot motions. Their ability to ensure continuity up to the second or higher derivatives makes them particularly suitable for robotic applications where dynamic constraints and smooth transitions are essential. Polynomial trajectories, such as minimum jerk motions, provide an analytically simple and computationally effi-cient way to achieve point-to-point movements with zero initial and final velocities and accelerations. These trajectories guarantee straight-line paths in Cartesian space and shortest-path orientations using SLERP, which are advantageous for tasks requiring pre-46 CHAPTER 2. POLYNOMIALS AND SPLINES

cision and minimal computational overhead. However, their limitation lies in their lack of flexibility for complex, multi-segment paths.

Splines, particularly B-splines, address this limitation by enabling the construction of complex, continuous trajectories over multiple segments. Their flexibility in handling variable knot placement and their ability to maintain high-order continuity make them a robust choice for generating smooth paths in robotics. B-splines offer local control, as changes to one segment of the trajectory only affect a small neighborhood, and their partition of unity property ensures numerical stability. Natural and complete splines further extend these benefits by providing solutions to variational problems that balance smoothness and adherence to boundary or data constraints. These properties make splines highly effective for interpolation and approximation tasks, such as fitting noisy data or generating motion paths that satisfy specified conditions.

Despite their advantages, spline-based methods have some shortcomings. The choice of the number and placement of knots can significantly impact the quality of the trajec-tory, and improper settings may lead to oversmoothing or noisy paths. Moreover, while modern software packages simplify spline computation, understanding their underlying mathematical principles remains critical for designing trajectories with desired properties. These challenges highlight the trade-offs between simplicity, flexibility, and computational complexity in spline-based trajectory design.

Overall, the combination of polynomials for simple tasks and splines for complex, segmented paths offers a versatile approach to trajectory generation in robotics, catering to a wide range of applications while addressing the unique requirements of robotic motion planning and control.





Chapter 3




Dynamic Movement Primitives



As explained in Section 1.1, policies defined by PD controllers and desired time-dependent trajectories are typically only valid near the desired trajectory. They lack flexibility and are not suitable for dynamic and unpredictable environments, such as our homes. As an alternative, representations based on parametric systems of autonomous differential equations (1.4) can be used.

The goal of a motion generation system is to ensure that the robot successfully com-pletes the desired task. In many cases, this involves point-to-point movements for activ-ities like pick-and-place operations and grasping. There are also many tasks involving periodic movements, such as polishing and repetitive pick-and-place operations. Whether the specific path taken during these tasks matters can vary depending on the applica-tion. In this chapter, we will show that these two types of movements can be formalized using attractor dynamics: point attractors for point-to-point movements and limit cycle attractors for repetitive, periodic movements.

To address these movement challenges, we describe an approach to specifying robot control policies based on autonomous dynamic systems. We establish a modeling frame-work that addresses both discrete and periodic movements within a unified and coherent dynamic systems framework. The resulting representation is called Dynamic Movement Primitives (DMPs), which were developed by Schaal, Ijspeert, and others [13, 14, 32, 12]. A notable feature of DMPs is the inclusion of adjustable parameters that do not com-promise the stability of the dynamic system. This provides the basis for their integration with optimization techniques to generate a wide range of robot motions.

We will demonstrate that DMPs provide a comprehensive framework for the specifi-cation of robot control policies. They result in smooth kinematic control policies, which are essential for implementing robust robot behaviors. DMPs are highly suitable for con-trolling robots in uncertain environments because they: 1) guarantee smooth transitions in the event of sudden changes in movement description or disturbances, 2) are not ex-plicitly dependent on time, 3) provide an appropriate framework for trajectory learning and adaptation using imitation and reinforcement learning algorithms, and 4) allow the learning of complete families of trajectories from multiple demonstrations using statistical

47

48 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

learning algorithms. One of the unique properties of DMPs is their ability to modulate trajectories in real time based on sensory feedback. For example, the phase-stopping mechanism enables the robot to halt the progress of the phase and track the desired tra-jectory even when a large position error arises. Such control responses are much more difficult to implement with time-dependent representations, such as polynomial splines.



3.1 Control Policies as Dynamic Systems

We continue by explaining theoretical foundations of dynamic movement primitives. First we define discrete movement primitives that can encode control policies for discrete point-to-point movements. This type of DMPs is based on a set of nonlinear differential equa-tions with a well-defined attractor dynamics. There are a few different but essentially equivalent formulations in the literature, in the following we use the one outlined in [31, 12]. For a single robot degree of freedom, here denoted by y, which can either be one of the internal joint angles or one of the external task-space coordinates, the following system of linear differential equations with constant coefficients has been used to derive DMPs

τz ˙ = αz(βz(g − y) − z), (3.1)

τ y ˙ = z. (3.2)

Note that the auxiliary variable z is just a scaled velocity of the control variable, i. e. y ˙. The gain constants αz, βz > 0 have an interpretation in terms of spring stiffness and damping. With the parameters appropriately set, these equations form a globally stable linear dynamic system with g as a unique point attractor, which means that for any start configuration y = y0, the parameter y would reach g after a transient motion, just like a stretched spring, upon release, will return to its equilibrium point [31]. τ > 0 is called a time constant and can be used to speed up (when τ decreases) or slow down (when τ increases) the motion.

Let’s analyze why the above system is useful. We start by writing down a general solution of the non-homogenous linear differential equation system (3.1) – (3.2). It is well known that the general solution of such a system can be written as a sum of the par-ticular and homogeneous solution, i.e. [ T T T z ( t ) , y ( t )] = [ z ( t ) , y ( t )] + [ z ( t ) , y ( t )] . Here

p p h h

[ T T z ( t ) , y ( t )] denotes any function that solves system (3.1) – (3.2), while [ z ( t ) , y ( t )] is

p p h h

the general solution of the homogeneous part of Eq. (3.1) – (3.2), i. e.

" # " # " # " #

z ˙ 1 −αz(βzy + z) z 1 −αz −αzβz

= = A , A = . (3.3)

y ˙ τ z y τ 1 0

It is easy to check that a constant function [ T T z ( t ) , y ( t )] = [0 , g ] solves the system (3.1) –

p p

(3.2). Additionally, it is well known that the general solution of homogeneous system (3.3)

3.1. CONTROL POLICIES AS DYNAMIC SYSTEMS 49

is given by [ T z ( t ) , y ( t )] = exp (At) c. Thus, the general solution of Eq. (3.1) – (3.2) can

h h

be written as " # " #

z (t) 0

= + exp (At) c, (3.4)

y(t) g

where 2 T T c ∈ z R should be calculated from the initial conditions, [ z (0) , y (0)] = [ , y ] .

0 0

The eigenvalues of p are given by A λ = − α ±α − 4α β ). Solution (3.4) / (2 τ 2

1,2 z z z z

converges to [0 T , g ] if the real part of λ is smaller than 0, which is true for any α ,

1,2 z

βz , τ > 0. The system is critically damped, which means that y converges to g without oscillating and faster than for any other choice of A if A has two negative eigenvalues. This happens at αz = 4βz .

It is important to note here that changing τ or g does not compromise the stability of dynamic system (3.1) – (3.2). Regardless of their choice, the system remains critically damped. Only the attractor point and the speed of convergence towards the attractor point change with different τ and g.



3.1.1 Discrete (point-to-point) movements

Differential equations (3.1) – (3.2) ensure that y converges to g and can therefore be used to realize discrete point-to-point movements. To increase a rather limited set of trajectories that can be encoded by (3.1) – (3.2) and thus enable the representation of general point-to-point movements, we can add a nonlinear component to Eq. (3.1). One possibility is to add a linear combination of radial basis functions [31]

PN w Ψ (x)

f i=1 i i 2 ( x ) = x ( g − y ) , Ψ ( x ) = exp − h ( x − c ) , (3.5)

P 0 i i i N Ψ ( x)

i=1 i

where c i are the centers of radial basis function distributed along the trajectory and hi > 0. The term g − y0, y0 = y(0), is used to scale the trajectory if the initial and / or final configuration change. As long as the beginning and the end of movement are kept constant, this scaling factor has no effect and can also be omitted. A phase variable x is used in Eq. (3.5) instead of time to make the dependency of the resulting control policy on time more implicit. Its dynamics is defined by

τ x ˙ = −αxx, (3.6)

with the initial value x(0) = 1. (3.6) is called a canonical system. The gain αx must be positive. Given the initial condition x(0) = 1, we can compute the analytical solution for the first-order differential equation (3.6)

α

x x ( t ) = exp − t . (3.7)

τ

50 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

Note that according to this definition, phase decreases from 1 to 0 over time. As shown in Section 3.5.1, the appealing property of using the phase variable x instead of explicit time is that by appropriately modifying Eq. (3.6), the evolution of time can be stopped to account for perturbations during motion. There is no need to manage the internal clock of the system.

We now define the following system of nonlinear differential equations

τz ˙ = αz(βz(g − y) − z) + f (x), (3.8)

τy ˙ = z. (3.9)

In this formulation, the nonlinear function f is called a forcing term. It is defined as a function of phase variable x to avoid explicit time dependency. By setting the initial value of phase variable to 1, i.e. x(0) = 1, we can observe that the phase tends to 0 as time increases because the solution to Eq. (3.6) with the initial value x(01) = 1 is given by x(t) = exp(−αx/τ t). Note that the phase variable x and consequently f (x) tend to 0 as time increases. Hence the influence of the nonlinear forcing term f(x) vanishes with time and the system (3.8), (3.9), (3.6) is guaranteed to converge to [0 T , g ] , just like the system (3.1), (3.2). The control policy specified by variable y and its first- and second-order derivatives defines what we call a Dynamic Movement Primitive (DMP). For a robot with several degrees of freedom, each DOF is represented by its own differential equation system (3.8), (3.9), whereas the phase x defined by Eq. (3.6) remains common across all robot degrees of freedom. Note that the free parameters α α α from Eq. (1.1) and (1.4) consists of free parameters N { w } , g and τ .

i i=1

As the differential equation system (3.8), (3.9) are independent of each other, we can write

τ z ˙ = αz(βz(g − y) − z) + f (x), (3.10)

τ y ˙ = z, (3.11)

where n y , z ∈ R are the vectors combining all robot joins and auxiliary parameters while n is the number of degrees of freedom. Note that each degree of freedom has its own forcing term defined as a superposition of radial basis functions, which are gathered in vector forcing term T f ( x ) = [ f ( x ) , . . . , f ( x )] ,

1 n

The parameters ci and hi of Eq. (3.5) are usually defined so that Gaussian basis functions are uniformly distributed in the time domain. This can be achieved by setting the following distribution pattern in the phase domain, given N basis functions

i − 1

ci = exp −αx , i = 1, . . . , N, (3.12)

N − 1

hi = , i = 1, . . . , N − 1, hN = hN−1. (3.13) 2 ( 2

ci+1 − ci )

3.1. CONTROL POLICIES AS DYNAMIC SYSTEMS 51

position

f Transformation velocity

1 system 1 acceleration

position

f2 Transformation velocity



Canonical system system 2 acceleration



(phase) …



fn



position



Transformation velocity



system n acceleration



Figure 3.1: Schematic presentation of the canonical system and the transformation system that together form a DMP. The canonical system generates the phase that drives the transformation systems, which generate motor commands for all degrees of freedom.





Note that c1 = 1 = x(0) and cN = exp (−αx) = x(tT ), where tT is the time at which the dynamic movement primitive ends.



System (3.10), (3.11) is called transformation system. In dynamic movement primi-tives, the canonical system (3.6) and transformation systems work together to model and generate complex movements. The canonical system plays the role of a temporal scaling mechanism, controlling the progression of the phase of the movement. It provides a phase variable that evolves monotonically from a starting point to an end point. This phase variable serves as a reference for the transformation system, ensuring that the move-ment is time-independent and can be adjusted for different durations without altering the fundamental characteristics of the motion.



The transformation system, on the other hand, is responsible for shaping the movement trajectory itself. It takes the phase variable provided by the canonical system and applies it to a set of basis functions or nonlinear terms that define the desired trajectory. These basis functions are weighted and combined to produce the movement, allowing the DMP to generate complex, smooth, and adaptable trajectories. The transformation system also includes components that ensure the generated trajectory smoothly converges to a predefined goal, making the motion robust to changes in starting or goal positions.



Together, the canonical and transformation systems allow DMPs to encode and repro-duce flexible, goal-directed movements that can be easily modified for different tasks and environmental conditions, making them highly effective for robotic learning and control applications. Their interplay is shown in Fig. 3.1.





52 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

3.1.2 Periodic (rhythmic) movements

In the case of periodic movements, we form the forcing term by a superposition of periodic functions [31]

PN w Γ ) ( ϕ

f i=1 i i ( ϕ ) = r, Γ (ϕ) = exp (h (cos (ϕ − c ) − 1)) . (3.14)

P i i i N Γ ( ϕ)

i i =1

Here r is the amplitude of the oscillator and hi > 0. Writing Ω = 1/τ , Eqs. (3.10) and (3.11) are replaced by

z ˙ = Ω (αz(βz(g − y) − z) + f (ϕ)) , (3.15)

y ˙ = Ωz. (3.16)

The phase variable ϕ has been introduced in this case to avoid the explicit dependency on time. The phase is assumed to move with constant speed

ϕ ˙ = Ω, (3.17)

where Ω is the frequency of oscillation. In the case of periodic functions, we distribute the basis functions uniformly along the phase interval [0, 2π], i. e.

(2 ∗ i − 1)π

c i = , i = 1, . . . , N, (3.18)

2N

and h i = 2.5π/N, ∀i. The phase interval [0, 2π] describes one period of motion.



3.2 Computing DMP parameters from a single demon-

stration

Next we will show that Eq. (3.8), (3.9) , and (3.6) can be used to approximate any discrete point-to-point movement.

Discrete DMPs were designed to provide a representation with free parameters that enables accurate representation of a desired point-to-point movement and at the same time permits the modulation of different properties of the encoded trajectory. The shape parameters wi are the key when it comes to accurately representing a desired motion trajectory.

As explained in Section 1.3, programming by demonstration is an effective way to specifying motion trajectories. In this approach, a human instructor demonstrates the desired motion using one of the approaches described in Section 1.4. We obtain the following measurements

{ T y ( t ) , t } , (3.19)

d j j j=0

3.2. COMPUTING DMP PARAMETERS FROM A SINGLE DEMONSTRATION 53

where n y d j R ( t ) ∈ are the desired joint angles and T + 1 is the number of the measured robot configurations on the robot trajectory.

To compute a DMP (Dynamic Movement Primitive) from this data, we must first numerically compute the velocities and accelerations from the measured data. Central finite differences can be used for this purpose. In the general case of non-uniformly sampled data, we obtain the following formulas:

(tj − tj−1)(yd(tj+1) − yd(tj) + (tj+1 − tj )(yd(tj ) − yd(tj−1))

y ˙ d(tj) = , (3.20)

(tj+1 − tj−1)(tj+1 − tj)

2 y d(tj+1) − y d ( t j ) y d ( t j ) − y d ( t j − 1 )

y ¨ d(tj) = + . (3.21)

(t j+1 − tj)(tj − tj−1) tj+1 − tj tj+1 − tj−1

For uniform sampling, i.e. tj+1 − tj = tj − tj−1 = ∆t, the formulas simplify significantly:

yd(tj+1) − yd(tj−1)

y ˙d(tj ) = , (3.22)

2∆t

y ¨d(tj ) = . (3.23) 2 ∆ yd(tj+1) − 2yd(tj ) + yd(tj−1)

t

These formulas are valid for j = 1, . . . , T − 1. If the robot should be at rest at the beginning and end of the motion, we can set:

y ˙d(t0) = yd(tT ) = ¨ yd(t0) = ¨ y(tT ) = 0.

The demonstrated data (3.19) must be consistent with this assumption for the real robot to be at rest at the beginning and end of the motion. If the initial and final velocities are not known, we can estimate them using left and right finite differences:

y d (t1) − yd(t0)

y ˙d(t0) = , (3.24)

t 1 − t0

y d (tT ) − yd(tT −1)

y ˙ d(tT ) = . (3.25)

t T − tT −1

Similarly, for accelerations we obtain

2 yd(t2) − yd(t1) yd(t1) − y d ( t 0 )

y ¨d(t1) = − , (3.26)

(t 2 − t0)(t1 − t0) t2 − t1 t1 − t0

2 y ( t ) − y ( t ) y ( t ) − y ( t )

y d T d T −1 d T −1 d T −2 ¨ ( t ) = − . (3.27)

d T (t − t )(t − t ) t − t t − t

T T −2 T T −1 T T −1 T −1 T −2

We obtain the following data that can be used to compute a DMP

{ T y ( t ) , y ˙ ( t ) , y ¨ ( t ) } , (3.28)

d j d j d j j=0

where y d(tj ), y ˙d(tj), y ¨d(tj ) are the measured positions, velocities, and accelerations on 54 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

the training trajectory and T + 1 is the number of sampling points.

Computing DMP parameters by solving a system of linear equations

To compute the DMP parameters, we now rewrite the system of two first-order linear equations (3.8) – (3.9) as a second-order equation. This is done by replacing z with τy ˙ in Eq. (3.8)

τ 2y ¨ + α τy ˙ − α β (g − y) = f (x), (3.29)

z z z

with the components of T f = [ f , . . . , f ] defined as in Eq. (3.5). Note that time constant

1 n

τ is the same for all degrees of freedom. A possible choice is τ = t T , where tT is the duration of the training movement. On the other hand, the attractor point g varies across the degrees of freedom. It can be extracted directly from the data: g = y d(tT ).

For every degree of freedom k, 1 ≤ k ≤ n and using the desired values yd(tj), y ˙d(tj), y ¨d(tj ), we can compute the following values

F 2 = τy ¨ (t ) + α τ y ˙ (t ) − α β (g − y (t )), (3.30)

k,j d,k j z d,k j z z k d,k j

By considering the right hand side of differential equation (3.29) and the definition of function f (x) in (3.5), we obtain the following system of linear equations

N w Ψ (x )

F X k,i i j = x ( g − y ) , j = 0, . . . , T, (3.31)

k,j j k 0,k PN Ψ (x ) i i j

=1 i=1

where y0 = yd(t0) and the phase sampling points xj , j = 0, . . . , T, are computed using Eq. (3.7). Writing

    F w

k,0 k,1

F  T +1 k = . . .  , F  . . .  N k ∈ , w k = , w k ∈ ,   R   R

F k,T wk,N

we can rewrite the equation system (3.31) in the matrix form

1

Xwk = Fk, (3.32)

g k − yk,0

where the system matrix (T +1)×N X ∈ is given by

R

  Ψ 1 0 N ( x ) Ψ x . . . x

P P N 0N 0 Ψ ( x ) Ψ ( x )

(x0)

i=1 i 0 i=1 i 0

X   = . . . . . . . . . . (3.33)

 

  Ψ 1 ( x ) Ψ x . . . P T N PN T Ψ x

T N T ( x)

i i T (x ) Ψ (x ) =1 i =1 i T

The equation system (3.32) need to be solved to estimate the weights wk specifying a DMP encoding the k-th degree of freedom of the desired motion. The weights w k can 3.2. COMPUTING DMP PARAMETERS FROM A SINGLE DEMONSTRATION 55

be calculated by solving the above system of linear equations in a least-squares sense. This system is overdetermined because typically T > N. The full DMP is specified by computing separate DMPs for every robot degree of freedom.

The resulting DMP ensures that the robot reaches the attractor point g at time t T . Since discrete DMPs have been designed to represent discrete point-to-point movements, the training movement must come to a full stop at the end of the demonstration if the robot is to stay at the attractor point after tT . If any other type of motion is approximated by a DMP, the robot will overshoot the attractor point and returned back to it after the dynamics of the second-order system of differential equations starts dominating the motion. At least theoretically, the velocity does not need to be zero at the beginning of movement, but this is not usual in real PbD systems.

In the equations above, αx, αz, and βz are constant. They are set so that the conver-gence of the underlying dynamic system is ensured, which is for example the case if we set αx = 2, βz = 3, αz = 4βz = 12.

Computing DMP parameters by locally weighted regression

Ijspeert et al. [14, 12] suggested to estimate the DMP weights wk,i independently of each other using locally weighted regression [1]. Locally weighted regression computes the weights w k,i using equations

Fk,j = xj(gk − y0,k)wk,i. (3.34)

This equation is obtained from (3.31) by assuming the constant forcing term w k,i. The values of the kernel function Ψi(xj ) are in this case used to weight the importance of each of these equations, which results in the locally weighted least squares problem

T

X 2 Ψ ( x ) ( F − x ( g − y ) w ) . (3.35)

i j k,j j k 0,k k,i

j=0

It is easy to show that this optimization problem is solved by

ξξξ TΓ Γ Γ F

w k i k = , (3.36)

k,i T ξξξΓ Γ Γ ξξξ

k i k

where    

x0 (gk − y0,k) Ψi(x0) 0 . . . 0

    x ( g − y ) 0 Ψ ( x ) . . . 0

ξξξ k     1 k 0 ,k i 1 = ... . .. . ..

  i   , Γ Γ Γ = (3.37)

   

   

xT (gk − y0,k) 0 0 . . . Ψi(xT )

Note that no matrix inversion is needed to compute w k,i in this case.

The advantage of applying a full linear system (3.32) to estimate w is that this way we can approximate trajectories more accurately with less basis function by considering the interplay between the neighboring basis functions Ψi of Eq. (3.5). However, the 56 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

separate estimation of wk,i by solving optimization problem (3.35) also has its advantages, especially in the presence of noise when overfitting can become a problem [13].



Incremental estimation of periodic DMPs

In case of periodic DMPs, the equation system that need to be solved is quite similar. The second order differential equation system equivalent to (3.29) becomes

1 αz

Ω Ω

2 y ¨ + y ˙ − α zβz(g − y) = f (ϕ), (3.38)

with the components of T f = [ f , . . . , f ] defined as in Eq. (3.14). From the measurements

1 n

we obtain

Fk,j = y ¨ d,k(tj ) + y ˙d,k(tj ) − αzβz(gk − yd,k (tj)), (3.39) 2 Ω 1 αz Ω

while the linear equation system (3.31) transforms to

N w Ψ (ϕ )

F X k,i i j = r , j = 0, . . . , T, k = 1, . . . .n. (3.40)

k,j PN Ψ (ϕ )

i i =1 i =1 j

The phase variables are simply computed as ϕ j = Ωtj , which we obtain by solving Eq. (3.17) with the initial value ϕ 0 = 0.

Besides the weights w k,j, there are three additional parameters that need to be esti-mated from the data: the amplitude r, the center of oscillation g, and the frequency of oscillation Ω. r and g can easily be estimated from the data:

1 T

g X = y (t ), (3.41)

T d j + 1

j=0

r = max ∥yd(tj ) − g∥. (3.42) 0 ≤ j ≤ T

The estimation of frequency Ω is more involved and is described in the following section.

Assuming r, g, and Ω are known, the shape parameters w could be computed by solving the equation system (3.40)

Xw T = [ F , F , . . . , F ] (3.43)

k k,0 k,1 k,T

with the system matrix

  Γ ( ϕ ) Γ ( ϕ )

1 0 N 0 . . .

PN PN Γ ( ϕ ) Γ (ϕ )

  . .. ...

X = r   ...   i =1 i 0 i =1 i 0

 . (3.44)



  Γ ( ϕ ) Γ ( ϕ )

  1 T N T . . .

PN PN Γ ( ϕ ) Γ (ϕ )

i i T i T =1 i =1

3.2. COMPUTING DMP PARAMETERS FROM A SINGLE DEMONSTRATION 57

However, periodic motions do not necessarily have an ending and new data is supplied continuously. In such cases, shape parameters w can be estimated by applying recursive least squares with a forgetting factor. For the k-th degree of freedom, we obtain the following iteration

1 T ! P j −1 j j x x P j

Pj −1 = Pj −1 T −

, (3.45)

λ λ + x P − j j1 xj

w T = w + ( F − x w )P x , (3.46)

k,j k,j−1 k,j j k,j−1 j j

where T N N ×N j is the iteration index, w k,j 1,j N,j R = [ w , . . . , w ] ∈ and P j R ∈∈ are the cur-rent estimates for the shape parameters and auxiliary covariance matrix P, respectively, with the initial estimates P 0 = I, wk,0 = 0. Note that the covariance matrix only depends on the choice of basis functions and is independent of the measurements T { y ( t ) , t } .

d j j j=0

It is therefore the same for all robot degrees of freedom. The desired forcing term values F k,j are as in Eq. (3.39), xj is the M dimensional column vector associated with the corresponding row of the system matrix X from Eq. (3.44), 0 < λ ≤ 1 is the forgetting factor, and the final weights are given as w = w T . The forgetting factor should be set to 1 if only one period of example motion is available, but in such cases the standard least-squares solution is preferable.

The recursive formulation is typically used when learning periodic movements online, i. e. when estimating the DMP parameters while the human demonstrates the desired periodic motion. If in parallel the robot executes the current estimate of the periodic DMP, the human demonstrator can observe its performance and continue demonstrating until a satisfactory performance has been achieved.

3.2.1 Adaptive frequency oscillators

Unlike the time duration τ in the case of discrete movements, the frequency of oscillation is not directly observable in the data. To estimate the frequency, Righetti et al. [29] suggested to replace the constant speed assumption (3.17) by a system

ϕ ˙i = Ωi − Ke (t) sin (ϕi) , (3.47)

˙Ωi = −Ke (t) sin (ϕi) , (3.48)

α ˙i = η cos (ϕi) e (t) , (3.49)

where PL e ( t ) = y d ( t ) − y ˆ ( t ) and ˆ y ( t ) = αi cos (ϕi). Note that if e (t) = 0, the system i =1 (3.47) – (3.49) becomes equivalent to (3.17). It has been shown that by integrating this system, the frequencies Ωi contained in the observed motion trajectory can be estimated. The most significant frequency is selected as the base or fundamental frequency Ω for the DMP (see [8] for the integration of adaptive frequency oscillators with DMPs).

Alternatively, Petriˇc et al. [27] suggested to extract the motion frequency based on an adaptive frequency oscillator combined with an adaptive Fourier series. This results in a 58 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

second order system of differential equations

˙Ω = −Ke sin (ϕ) , (3.50)

ϕ ˙ = Ω − Ke sin (ϕ) , (3.51)

where Ω is the extracted frequency, ϕ is the phase, K is the coupling constant and e (t) = yd(t) − y ˆ(t) is the difference between the actual control value y and the estimated control value ˆ y, where ˆ y is given by

m

y X ˆ ( t ) =(α cos(cϕ(t)) + β sin(cϕ(t))). (3.52)

c c

c=0

Here, m denotes the size of the Fourier series. The weights α c and βc are updated according to the following learning rule

α ˙c = η cos(cϕ)e, (3.53)

β ˙ c = η sin(cϕ)e, (3.54)

where η is the learning constant (see [27] for more details). The size of the Fourier series is normally set to a low value such as m = 10.



3.2.2 Integration of DMPs for robot control

To control the robot, we need to integrate the DMP equations (3.10), (3.11), and (3.6) for discrete movements or (3.15), (3.16), and (3.17), for periodic movements. The simplest method to implement the integration is the Euler’s method, which is a first-order numeri-cal technique for approximating solutions to ordinary differential equations by iteratively advancing the solution using the derivative at each step. Beginning with an initial value, it estimates the next point based on the current value and its derivative. This procedure is repeated over the desired interval to construct an approximate solution. For discrete DMPs, we obtain the following integration formulae

1

zj+1 = zj + (αz(βz(g − yj ) − zj ) + f (xj )) ∆t, (3.55)

τ

1

y j+1 = yj + zj ∆t, (3.56)

τ

αx

xj+1 = xj − xj∆t, (3.57)

τ

where ∆t > 0 is the integration step. For the initial values we set x0 = 1, y0 should be set to the robot’s initial configuration, while z0 = τ v0, where v0 is the initial robot velocity. Usually the initial velocity is equal to 0. The integration step ∆t can be set to the servo rate of the robot controller. If the integration accuracy at this rate is not sufficient, ∆t can be set to a smaller value, with the robot’s servo rate being a multiple 3.2. COMPUTING DMP PARAMETERS FROM A SINGLE DEMONSTRATION 59

of the integration rate. In this case, only the integrated values computed at the robot controller’s servo rate are used to control the robot.

For periodic DMPs, we obtain similar integration formulae:

z j+1 = zj + Ω (αz(βz(g − yj ) − zj) + f (ϕj )) ∆t, (3.58)

y j+1 = yj + Ωzj∆t, (3.59)

x j+1 = xj + Ω∆t. (3.60)

In this case, the initial phase value is set to ϕ0 = 0, whereas y0 and z0 are initialized in the same way as for discrete DMPs.

Besides decreasing the integration step, the accuracy of integration can also be im-proved by replacing Euler’s method with one of the Runge-Kutta methods, which im-prove upon Euler’s method by using multiple intermediate evaluations of the derivative to achieve higher accuracy without reducing the step size. While Euler’s method is a first-order method (linear error reduction with step size), the Runge-Kutta methods, such as the commonly used fourth-order Runge-Kutta (RK4), achieve higher-order accuracy (quartic error reduction with step size) by considering the derivative at multiple points within the step interval. This results in better approximations of the true DMP.

An important property of DMPs is that they form a control policy, which means that even if the current state of the robot is not close to the trajectory used for training, the DMP system generates motor commands that smoothly move the robot towards the trained motion. For every phase x, an attractor field is generated that moves the robot



x = 1.00 x = 0.43 x = 0.19 Dynamical

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5 M

y2 1 y2 1 y2 1 ovement

0.5 0.5 0.5

0 0 0

0.5 0.5 0.5 Primitives

0.2 1 0 1 2 3 1 0 1 2 3 1 0 1 2 3 0.4 0.6 0.8 1 y1 1 1 1

Time [s] y1 y1

x = 0.08 x = 0.04 x = 0.02

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y2 1 y2 1 y2 1

0.5 0.5 0.5

0 0 0

0.5 0.5 0.5

0.2 1 0 1 2 3 1 0 1 2 3 1 0 1 2 3 0.4 1 1 1

Time [s] 0.6 0.8 1 y1 y1 y1

Figure 3.2: Attractor landscape generated by a DMP (from [12]) y y 1 (top left) fits the trajectory of Figure 1 and 2 (bottom left) fits a minimum jerk

g = (g , g ) ( 1, 1). The vector plots show (z , z y , assuming that only 1 2 = ˙ 1 ˙ ) at different values of ( , y ) 2 1 2

x , x , y ˙ , and y ˙ are not perturbed. In 1 2 1 2

= x x τ y 1 = 2 from 1.0 to 0.02 (i.e., from successive steps in time). Since ˙i = zi, such a graph illustrates the instantaneous 335 ( y ¨ , y ) of the 2D trajectory if the states ( 1 ( z ˙ , z ˙ , y ˙ , y ˙ , x , x ) for clarity. The vector plots are shown for successive 1 2 1 2 ˙ 1 ˙ 2 ¨ y , y ) were pushed somewhere else in state space. Note how the system 2 1 2

g = (1, 1) when x converges to 0.

60 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

towards the correct part of the trajectory. This is illustrated in Fig. 3.2 for a system with two degrees of freedom.



3.3 Cartesian Space DMPs

As explained in Section 1.2, all minimal, i.e. 3-D representations of orientation contain singularities and are therefore not suitable for representing orientation trajectories. Non-minimal representations of SO(3) should be used instead. This makes it more difficult to integrate differential equations on SO(3) because general integration methods do not have any information about the structure of SO(3) and can cause the integrated parameters to depart from the constraint manifold. A different approach should therefore be used to formulate and integrate DMPs for Cartesian space trajectories [36].

While positions are represented as vectors and involve linear operations, orienta-tions are typically represented using quaternions, which require non-linear operations like quaternion multiplication. This difference necessitates using quaternion algebra in the transformation system to ensure valid and smooth rotational trajectories.

3.3.1 Quaternion based DMPs

As explained in Section 1.2.2, orientations can be represented by unit quaternions q = v + u 3 3 4 3 ∈ S , where S is a unit sphere in , v ∈ , u ∈ R R R. Quaternions provide a singularity-free, non minimal representation of orientation with only four parameters. They have been utilized to represent orientation in various contexts including robot control [3] and vision-based tracking [34]. This representation is not unique because unit quaternions q and −q represent the same orientation. We can thus fully specify orientation trajectories by specifying unit quaternions q at all times t.

To fully describe a movement of the robot’s end effector in Cartesian space, we can specify its position trajectory 3 3 p ( t ) ∈ t R and orientation trajectory q ( ) ∈ S. Each of the 4 quaternion parameters could be represented by its own DMP system (3.8) – (3.9). However, the coefficients of the orientation trajectory q(t) are not independent of each other, thus if we integrate them independently, the integrated quaternion will gradually start deviating from the unit quaternion space. Instead, we transform the standard DMP equations (3.10) – (3.11) into the quaternion form as follows

τη η η ˙ = αz (βz2 log (go ∗ q) − η η η) + fo(x), (3.61)

1

τ q ˙ = η η η ∗ q, (3.62)

2

where 3 g ∈ S denotes the goal quaternion orientation, while the time constant τ > 0 and

o

the control gains αz, βz > 0 have the same role as in standard DMPs specified by (3.10) – (3.11). x is again the phase variable defined by Eq. (3.6). See Section 1.2.2 for the definition of quaternion conjugation (1.22), quaternion product (1.20), and quaternion 3.3. CARTESIAN SPACE DMPS 61

logarithm (1.41). The auxiliary variable 3 η η η ∈ R is treated as quaternion with 0 scalar part in (3.62). The motivation for equation (3.61) is provided by observing that g − y in Eq. (3.10) can be interpreted as linear velocity that takes y to g within unit time, just like 2 log (go ∗ q) is the angular velocity that takes orientation q to orientation go within unit time, as shown in (1.42). Equation (3.62) is motivated by the relationship between quaternion derivative ˙ q and angular velocity ω ω ω, ˙ q = 1/2 ω ω ω ∗ q, as shown in Eq. (1.47). From (3.62) and (1.47) we can thus derive that the auxiliary variable η η η is a scaled angular velocity,

η η η = τω ω ω. (3.63)

The nonlinear forcing term

PN o w Ψ (x)

f i i =1 i ( x ) = D x (3.64)

o o PN Ψ (x)

i i =1

contains free parameters o 3 w ∈, which need to be determined to follow any given i R orientation trajectory. 3×3 D o o 0 R = diag(2 log( g ∗ q )) ∈ is a diagonal matrix with scaling factors 2 log(go ∗ q q 0 ) on the diagonal, where0 is the initial orientation on the trajectory. The modulation by D o leads to useful scaling properties of the DMP system when the goal configuration go and consequently the amplitude of the movement change (see also Section 3.3.4).

Note that the logarithmic map defined on S3 is largely singularity-free except for the singularity at a single quaternion T q = − 1 + [0 , 0 , 0], which corresponds to unlikely rotation by a full angle 2π. As full rotations are rare in orientation trajectories, this singularity usually does not cause any issues when specifying orientation trajectories in unit quaternion space.

The data for computing a Cartesian space DMP can be gathered through programming by demonstration, which involves mapping the demonstrated motion to the positions and orientations of the end-effector

{ T p ( t ) , q ( t ) , t } . (3.65)

d j d j j j=0

Here 3 3 p d j R ( t ) ∈ are the desired positions of the end-effector and q ( t ) ∈ S the desired d j

orientations specified by unit quaternions. The linear velocitoes ˙ pd(tj ) and accelerations p ¨ d(tj ) can be obtained by numerical differentiation as explained in Section 3.2. To com-pute also the desired angular velocities ω ω ω d(tj ) and angular accelerations ˙ ω ω ωd(tj ), we first numerically differentiate qd(tj) using central difference formula

(tj − tj−1)(qd(tj+1) − qd(tj ) + (tj+1 − tj )(qd(tj) − qd(tj−1))

q ˙d(tj ) = . (3.66)

(tj+1 − tj−1)(tj+1 − tj )

These formulas are valid for 1 ≤ j ≤ T − 1. If the angular velocity is 0 at the beginning and the end of the motion, we can set ˙ q d(t0) = ˙ qd(tT ) = 0, otherwise we have to use left 62 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

and right finite differences like in Eq. (3.24) and (3.25) to compute ˙ qd(t0) and ˙ qd(tT ). Using Eq. (1.47), we can now compute the desired angular velocities

ω ω ωd(tj) = 2 ˙ qd(tj) ∗ q (t ). (3.67) d j

The desired angular accelerations ˙ ω ω ωd(tj) are then obtained using the same central dif-ference formula like in (3.20) and (3.66). This way we obtain all of the data needed to compute a quaternion DMP

{ T q ( t ) , ω ω ω ( t ) , ω ω ω ˙ ( t ) , t } . (3.68)

d j d j d j j j=0

To compute the parameters o w, we first rewrite Eq. (3.61)





i


τ 2ω ω ω ˙ − α (β 2 log (g ∗ q) − τω ω ω) = f (x), (3.69)

z z o o

The parameters o w can then be computed by solving the following system of linear equa-





i


tions

PN o w Ψ (x(t ))

i=1 i i j −1 2 x ( t ) = D τ ω ω ω ˙ t ) − α ( β 2 log ( g )) (3.70) ( ∗ q ( t − τω ω )) ω ( t ,

P j o d j z z o d j d j N Ψ ( x( t ))

i i j =1

where phases x(tj) are obtained as in (3.83). Similarly as for the standard DMPs, we can compute the goal orientation go = qd(tT ) and time constant τ = tT − t0 directly from the data.

Pastor et al. [26] used in Eq. (3.61) just the vector part of the quaternion product v 3 + u = g o R ∗ q , i. e. u ∈ g , instead of 2 log ( ∗ q). A similar type of error was used also o

in [38]. Defining u = vec (go ∗ q), their system can be written as

τη η η ˙ = αz (βzvec (go ∗ q) − η η η) + fo(x). (3.71)

While this is equivalent as far as the direction of change is concerned, such a formulation does not fully take into account the geometry of SO(3). By considering Eq. (1.42), we can see that only the logarithmic map multiplied by 2 can provide proper mapping of the quaternion product go ∗ q onto the angular velocity. Another difference is that there is no scaling factor D o in [26]. See Section 3.3.4 for the analysis of differences between the two approaches.

For the integration of (3.61), (3.62), and (3.6), we can use the formulae

1

η η η(t + ∆t) = η η η(t) + αz βz2 log go ∗ q(t) − η η η(t) + fo(x(t)) ∆t, (3.72)

τ

∆ t η η η ( t )

q(t + ∆t) = exp ∗ q(t), (3.73)

2 τ

αx

x(t + ∆t) = x(t) − x(t)∆t, (3.74)

τ

3.3. CARTESIAN SPACE DMPS 63

where the quaternion exponential is defined as in Eq. (1.39), i.e. exp(θn) = cos(θ) + sin(θ)n and quaternion logarithm as in Eq. (1.41). ∆t > 0 is the integration interval.

3.3.2 Position trajectories

For the sake of completeness we also rewrite DMP equations (3.10) – (3.11), which are used to encode the positional part of the trajectory, in variable p

τ z ˙ = α z(βz(gp − p) − z) + fp(x), (3.75)

τ p ˙ = z, (3.76)

where 3 g p R ∈ f denotes the final position on the recorded trajectory. The forcing termp



is defined as N P p w Ψ ( i =1 i ix) f p ( x ) = D px, (3.77) P N Ψ i ( x ) i =1 where 3×3 D p p 0 R = diag ( g − p ) ∈. As mentioned above, the diagonal matrices D in p

(3.77) and D o in (3.64) are used to scale the movement amplitude if the goal configura-tion gp and/or go change. To track the desired Cartesian space trajectories, we need to integrate (3.75) – (3.76) and (3.79) – (3.80) along with the common phase (3.6).

Given the robot tool center point trajectory T { p , p ˙ , p ¨ , t } , the free parameters

j j j j j=0

can be calculated in a similar way as in (3.82), i. e. by first rewriting (3.75) – (3.76) as one second order differential equation [12] and then solving the resulting system of linear equations

PN p w Ψ (x(t ))

P j j N p d t i i =1 i j −1 2 x ( t ) = D τ p ¨ α τ p ( ) + z ˙ d (tj ) − αz βz ( gp − pd( tj )) , (3.78)

i Ψi(x(tj )) =1

where the phases x(tj ) are calculated as in (3.83).

3.3.3 Rotation matrix based DMPs

Rotation matrices offer an alternative, singularity-free representation of SO(3). We can easily rewrite Eq. (3.61) – (3.62) in rotation matrix form

τ T η η η ˙ = α ( β log( R R) − η η η) + f (x). (3.79)

z z g o

τ ˙ R = [η η η]×R, (3.80)

where Rg denotes the goal orientation and all other parameters are just like in (3.61) – (3.62). The matrix logarithm log( T R R) has the same role as 2 log(g ∗ q) in Eq. (3.61),

g o

as shown by Eq. (1.37) in Section 1.2.3. Equation (3.80) is motivated by (1.8). Again we have η η η = τω ω ω, as can be easily deduced from (3.80) and (1.8). Thus η η η is the scaled angular velocity. The nonlinear forcing term is defined as in (3.64) and contains free parameters wo 3 ∈ i R, which need to be determined to follow any given rotation matrix trajectory.

64 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

The data for computing them is usually given as

{ T R ( t ) , ω ω ω ( t ) , ω ω ω ˙ ( t ) , t } , (3.81)

d j d j d j j j=0

where , ω ω ω d(tj ), ω ω ω ˙d(tj) are computed by numerical differentiation in a similar way as in the case of quaternion DMPs in Sec. 3.3.1. The parameters o w are calculated by solving the





i


following system of linear equations, which we obtain from (3.79)

PN o w Ψ (x(t ))

i i j =1 i −1 2 T x ( t ) = D τ ω ω ω ˙ ( t ) + α τω ω ω ( t ) − α β log R R ( t ) , (3.82)

P j d j z d j z z g d j N o Ψ ( x( t ))

i i j =1

where phases x(tj ) are calculated by integrating (3.6), i.e.

α

x x ( t ) = exp − t (3.83)

j j τ

and T 3×3 j = 0 , . . . , T . The scaling factor D o g 0 R = diag log R ∈ R is a diagonal matrix.

Standard Euler formulas can be applied to integrate (3.79). To integrate (3.80) we use the expression

[ η η η ]

R × ( t + ∆ t ) = exp ∆ t R(t), (3.84)

τ

which is guaranteed to generate a rotation matrix because the matrix exponential map exp (∆t[η η η]×/τ ) results in a rotation matrix and the product of two rotation matrices is a rotation matrix. Note that the matrix exponential is defined in a similar way as the quaternion exponential (1.40), i.e. as a rotation matrix corresponding to the rotation by angle θ about rotation axis n.



3.3.4 Comparison of DMPs defined on SO(3)

Figure 3.3 and 3.4 show that quaternion-based DMPs, which use the term 2 log (go ∗ q) (3.61), and rotation matrix-based DMPs generate better responses in terms of angular ve-locities than quaternion difference vector vec (g o ∗ q) DMP as in (3.71). Both quaternion-and rotation-matrix-based DMPs converge to the attractor point much faster than the approach proposed in [26], which is the desired characteristics of the linear part of the DMP system that should ensure convergence to the attractor point. Part of the reason for this is the lack of multiplication by 2 in (3.71). If we modify (3.71) to

τη η η ˙ = αz (βz 2vec (go ∗ q) − η η η) + fo(x), (3.85)

the DMP system (3.61) – (3.62) still generates a significantly quicker response than (3.85), (3.62), and converges faster, but the difference is smaller, as we can see by comparing Fig. 3.3 and 3.6. Nevertheless, the response of (3.61) – (3.62) is clearly preferable. Fig. 3.3 and 3.4 show that as expected, the quaternion based DMP (3.61) – (3.62) and the rotation matrix based DMP (3.79) – (3.80) generate exactly the same response in terms of angular 3.3. CARTESIAN SPACE DMPS 65



1 1

0.5 0.5

0 0

Quaternion −0.5 −0.5

Rotation matrix

−1 −1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time (seconds) Time (seconds)

5 5

0 0

−5 −5

Angular velocity (rad/sec) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Angular velocity (rad/sec)

Time (seconds) Time (seconds)

Figure 3.3: Response of the quaternion- Figure 3.4: Response of the rotation matrix

based DMP system (3.61) – (3.62) with- based DMP system (3.79) – (3.80) without

out nonlinear term fo. The upper graph nonlinear term fo. The upper graph shows shows the time trajectories of the four the time trajectories of the nine rotation

unit quaternion components, and the lower matrix components, and the lower graph

graph the time trajectories of the three the time trajectories of the three compo-

components of the angular velocity. nents of the angular velocity.



1 1

0.5 0.5

0 0

Quaternion −0.5 Quaternion −0.5

−1 −1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time (seconds) Time (seconds)

5 5

0 0

−5 −5

Angular velocity (rad/sec) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Time (seconds) Angular velocity (rad/sec)

Time (seconds)

Figure 3.5: Response of the DMP system Figure 3.6: Response of the DMP system

(3.71), (3.62) with quaternion difference in- (3.85), (3.62) with double quaternion dif-

stead of the logarithmic map (as proposed ference instead of the logarithmic map and

in [26]) and without nonlinear term fo. without nonlinear term fo.



66 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

1 1

0.5 0.5

0 0

−0.5 −0.5 Quaternion

Rotation matrix

−1 −1

0 0.5 1 1.5 0 0.5 1 1.5

Time (seconds) Time (seconds)

5 5

0 0

−5 −5

Angular velocity (rad/sec) 0 0.5 1 1.5 0 0.5 1 1.5 Angular velocity (rad/sec)

Time (seconds) Time (seconds)

Figure 3.7: Reproduction of the desired Figure 3.8: Reproduction of the desired

minimum jerk polynomial with the rotation minimum jerk polynomial with the quater-

matrix based DMP system (3.79) – (3.80). nion based DMP system (3.61) – (3.62).

velocity. The starting orientation was set to T q = 0 . 3717 + [ − 0 . 4993 , − 0 . 6162 , 0 . 4825]

0

and the goal orientation to T g = 0 . 2471 + [0 . 1797 , 0 . 3182 , − 0 . 8974]. The other constants

o

were specified as follows: αx = 2, αz = 48, βz = 12, τ = 3.5.

We have thus seen theoretically and experimentally that the proposed systems (3.79) – (3.80) and (3.61) – (3.62) are significantly better than (3.85), (3.62), even though the multiplication by 2 has been added compared to the system (3.71), (3.62) from [26]. Note that 2 is the correct multiplier to obtain a critically damped system in (3.61) – (3.62). If we further increase the multiplier, the resulting system is not critically damped but starts oscillating towards the goal orientation, which is suboptimal for robot control.

Next, we evaluate the performance of orientation trajectory approximation using both quaternion- and rotation-matrix-based DMPs. For this experimental evaluation, the same parameters as above were used, except for τ , which was set to 1.5. To generate an ex-ample orientation trajectory, we sampled a minimum jerk polynomial between the start quaternion q 0 and the goal quaternion go, and then normalized the resulting quaternion trajectory. The results shown in Fig. 3.7 and 3.8 demonstrate that both approaches can accurately reproduce the desired trajectories. However, this experiment also revealed a subtle issue that should be considered when using rotation-matrix-based DMPs. Specif-ically, the rotation matrix logarithm, as defined in (1.37), exhibits a discontinuity at | log(R)| = π. This occurs because, when an object is rotated by 180 degrees, the final orientation is the same regardless of the rotation direction. This discontinuity can cause problems when estimating DMPs through the system of equations (3.82), as fo can become discontinuous when this boundary is crossed, even if the movement itself is smooth. While this issue can be resolved by adding appropriate constants to ensure the continuity of fo in (3.79), such an approach requires the user to implement a cumbersome bookkeeping procedure. On the other hand, quaternion-based DMPs do not encounter this problem because there is no discontinuity boundary in the quaternion logarithm (1.41) on the unit 3.4. THIRD-ORDER DMPS AND SEQUENCING 67

sphere S 3 T . The only singularity occurs at q = − 1 + [0 , 0 , 0], which is rarely encountered in practice and can be easily handled (for example, by assuming the previous direction vector in the logarithm when T − 1 + [0 , 0 , 0] is reached on the trajectory). For this reason, we prefer to use quaternion representation to avoid the tedious bookkeeping associated with rotation-matrix-based DMPs.



3.4 Third-order DMPs and Sequencing

High performance robot control systems, which can take into account robot dynamics, utilize the desired positions, velocities, and accelerations to generate motor commands. The second order system (3.8) – (3.9) ensures only continuous positions and velocities if the goal g suddenly changes during robot motion. This problem can be alleviated by instead using a third-order system [32]

τ z ˙ = αz (βz(r − y) − z) + f (x), (3.86)

τy ˙ = z, (3.87)

τr ˙ = αg (g − r), (3.88)

The phase x is again given by Eq. (3.6). It is easy to see that the general solution of the linear part of equation system (3.86) – (3.88), i. e. the above expression without f (x), is given by

      z ( t ) 0 − α − α β α β

1 z z z z z

      y ( t ) = g + exp ( A t ) c , A = 1 0 0 . (3.89)

      τ

r(t) g 0 0 −αg

Matrix p A has eigenvalues at λ 1 g 2,3 z z α = − /τ , λ = − α ± α − α τ 4 β / (2 ), which all 2

z z

have negative real parts if τ , αz , βz, αg > 0. Thus for such parameters the system is guaranteed to converge to its unique attractor point [0 T , g, g ] . Just like before, the system is critically damped if αz = 4βz, in which case λ2,3 = −αz/(2τ). Also like before, since f (x) vanishes as t → ∞, the nonlinear system (3.86) – (3.88) is guaranteed to converge to this same attractor point.

Note that by setting r(t) = g, the third-order system (3.86) – (3.88) becomes equivalent to the second order system (3.8) – (3.9). Therefore we can (and should) use the original second-order system when learning DMP parameters from a single demonstration. The third-order system should only be used at execution time. If the final destination g suddenly changes, the third order system ensures that r gradually transitions to the new goal position and the robot acceleration, which is defined by Eq. (3.86) and ¨ y = ˙ z/τ , remains continuous.

The third-order system (3.86) – (3.88) is also useful to continuously transition from

joined DMP position

0.4

DMP2

0.2 DMP3

m 0

68 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES − 0.2

−0.4

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

MOTION PRIMITIVES

joined DMP velocity

0.4

0.2

0.2 DMP2 y1

y2 0 DMP3

m 0

−0.2

m/s

−0.2

−0.4

−0.40 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.6

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

joined DMP position

0.4 5 joined DMP acceleration

0.2 DMP2 DMP2

DMP3 0 DMP3

m 0 2

−5

m/s

−0.2 −10

−0.4 −15

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

TIME (sec)

joined DMP velocity

Figure 3.9: Sequencing of movement primitives. The application of third order DMPs (red 0.2 DMP2

full line) prevents the jump in acceleration, which can be observed when using second-0 DMP3

m/s −0.2

order DMPs (blue dashed line). − 0.4

− Figure 1: Sequencing of dynamic movement primitives: the output of the sec- 0.6

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

joined DMP acceleration ond and third order DMP. Panel 1 shows output of each motion primitive,

one DMP to another. From (3.86) and (3.87) we obtain 5 DMP2 panel 2 shows joined position, panel 3 shows resulting velocity of joined mo-

0 DMP3

2 tion primitives and panel 4 shows resulting accelerations of the joined motion

m/s 2 −5 τy ¨ + ˙ + − ( )

−10 τ α y α β y f x

r z z z = primitives. Label DMP2 and DMP3 denotes second and third order DMP . (3.90)

−15 α β

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 z z respectively

TIME (sec)

Thus if the robot position, velocity, and acceleration at transition time are given by yc, y ˙c, and ¨ yc, the next DMP should be initialized to



Figure 1: Sequencing of dynamic movement primitives: the output of the sec-2 τ y ¨ c + τ α z y ˙ c +αzβz yc − f (1) y ond and third order DMP. Panel 1 shows output of each motion primitive, (0) = y c , z (0) = τ y ˙ c , r (0) = , x(0) = 1. (3.91) α z β z panel 2 shows joined position, panel 3 shows resulting velocity of joined mo-tion primitives and panel 4 shows resulting accelerations of the joined motion In this way the positions, velocities, and accelerations remain continuous as the robot primitives. Label DMP2 and DMP3 denotes second and third order DMP transitions from one DMP to another. respectively

3.4.1 Third-order DMPs in Cartesian space

For quaternion based DMPs, we can define the following third-order system

τη η η ˙ = αz (βz2 log (r ∗ q) − η η η) + fo(x), (3.92)

1

τ q ˙ = η η η ∗ q, (3.93)

2

τr ˙ = αr2 log(go ∗ r) ∗ r. (3.94)

Equivalently to r in (3.86), r in (3.92) is now a variable that should continuously tran-sitions to the goal go, even if it suddenly changes. Equation (3.94) should be integrated together with (3.61) – (3.62) using formula (3.73).



3.5 Modulation of DMPs

Modulation of DMPs is a powerful technique that enhances the flexibility and adaptability of robotic motion. By modulating the parameters of the DMPs, such as the goal position,

0.5

0

3.5. MODULATION OF DMPS 69 Quaternion −0.5

−1

0 0.5 1 1.5

Time (seconds)

1

5

0.5

0 0

Quaternion −0.5 −5

−1

0 0 0.5 1 1.5 0.5 1 1.5 Angular velocity (rad/sec) Time (seconds) Time (seconds)

Figure 3.10: Response of the DMP system (3.61), (3.62), (3.94) to the switching of goal orientation during the movement. The goal of the movement was changed at 0.4 seconds. 5 The change of goal orientation corresponded to a rotation of 30.4 degrees. The original 0 movement is shown in Fig. 3.8. − 5

Angular velocity (rad/sec) 0 0.5 1 1.5

Time (seconds)

trajectory shape, or even the speed of movement, robots can dynamically adjust their actions in response to changes in the environment or the task at hand. This capability is crucial for tasks that require adaptation to unforeseen obstacles, variations in object placement, or changes in task requirements. For instance, modulation can be used to scale a movement trajectory to reach a new goal location, modify the trajectory to avoid an obstacle, or adjust the speed of execution to meet timing constraints. This adaptability makes DMPs particularly useful in real-world applications where conditions can change rapidly and unpredictably, allowing robots to perform complex tasks with a level of au-tonomy and robustness that would be difficult to achieve with static, pre-programmed movements.



3.5.1 Phase stopping

Normally, Eq. (3.6) is not analytically integrated to (3.7). Instead, at execution time (3.6) and (3.9) are modified to

αxx

τ x ˙ = − , (3.95)

1 + αpx|y ˜ − y|

τ y ˙ = z + αpy(˜ y − y), (3.96)

where y and ˜ y respectively denote the desired and the actual robot position. Note that if the robot cannot follow the desired motion, αpx|y ˜ − y| becomes large, which in turn makes the phase change ˙ x small. Thus the phase evolution is stopped until the robot catches up with the desired configuration y (see Fig. 3.97). On the other hand, if the robot follows the desired movement precisely, Eq. (3.95) and (3.96) are no different from (3.6) and (3.9), respectively.

We can also define this behavior for Cartesian-space DMPs. In this case the original equation for phase (3.6) is replaced with

αxx

τx ˙ = − , (3.97)

1 + αpx (∥p ˜ − p∥ + γ d(˜ q, q))

70 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES



1

0.5

0

Quaternion −0.5

−1

0 0.5 1 1.5 2 2.5 3

Time (seconds)

5

0

−5

Angular velocity (rad/sec) 0 0.5 1 1.5 2 2.5 3

Time (seconds)

1

0.8

0.6

Phase 0.4

0.2

0

0 0.5 1 1.5 2 2.5 3

Time (seconds)

Figure 3.11: The effects of phase stopping on quaternion based DMPs defined by (3.97) and (3.99). The original, unperturbed trajectory is indicated with the dashed lines, and the perturbed movement, which was stopped between 0.9 and 1.4 seconds, with full lines. Note that the robot can smoothly resume its movement once the perturbation has been removed.

where ∥p ˜ − p∥ + γ d(˜ q, q) is the trajectory tracking error, ˜ p and ˜ q are the actual position and orientation of the tool center point, respectively, and p and q the corresponding DMP control outputs. Note that in the case of large tracking errors, the error value ∥p ˜ − p∥ + γ d(˜ q, q) becomes large which in turn makes the phase change ˙ x small. Thus the phase evolution is stopped until the robot reduces the tracking error.

In the context of Cartesian space DMPs, Eq. (3.96) becomes different for the positional and orientational part of the trajectory, which are respectively encoded by Eq. (3.76) and (3.62). We obtain

τ p ˙ = z + αpp(˜ p − p), (3.98)

1

τq ˙ = (η η η + αpr2 log (˜ q ∗ q)) ∗ q, (3.99)

2

Eq. (3.99) is integrated using a formula analog to (3.73).



3.5.2 Robustness to perturbations through coupling of DMPs

Another advantage of DMPs is that they are robust against perturbations [12], e. g. when a robot is physically pushed away from the desired motion trajectory while moving. The standard approach to perturbations is – in case of Cartesian space DMPs – to simply continue integrating Eq. (3.6), (3.75) – (3.76), (3.61) – (3.62), (3.6) after the perturbation 3.5. MODULATION OF DMPS 71

has arisen. This way the robot gradually returns to the desired trajectory. However, this approach does not ensure that the complete movement is reproduced, just that the robot returns to the original movement. The phase stopping mechanism can be used if it is necessary to execute the complete movement. For this purpose, we enhance Eq. (3.97), (3.98), (3.99) as follows

αxx

τ x ˙ = − , (3.100)

1 + αpxε(˜ p, pd, p, q ˜, qd, q)

τp ˙ = z + αpp(˜ p + pd − 2p), (3.101)

1

τ q ˙ = (η η η + αpr 2 (log (˜ q ∗ q) + log (qd ∗ q))) ∗ q,

2

(3.102)

where

ε = ∥p ˜ − p∥ + ∥pd − p∥ + γ (d(˜ q, q) + d(qd, q)) ,

p d and qd are the desired position and orientation, respectively, and all other variables and constants are as in (3.97), (3.98), (3.99). The unit quaternion distance metric d defined in (1.43) can be used here. The complete approach to control the robot is specified by the following coupled DMP system

• A DMP system constructed from (3.61), (3.75), (3.101), (3.102), and (3.6). Its

outputs are used to generate motor commands.

• A separate DMP that uses standard DMP equations (3.61) – (3.62) and (3.75)

– (3.76) with the same constants and variables as the above system. This DMP generates the desired position and orientation for (3.101) – (3.102).

• The coupling of both DMP systems occurs through the phase equation (3.100),

which is used by both.

The described coupled system halts the phase evolution whenever the robot controller is unable to track the commanded position and orientation or whenever the commanded position and orientation deviate from the desired values. The second term in (3.101), (3.102) ensures that the system can recover once the perturbation is removed. After recovery the robot continues to perform the movement at the spot on the trajectory where it was before the onset of perturbation. Note that the original DMP system is equivalent to the one proposed in this section if the robot tracks the desired trajectory perfectly and there are no unexpected perturbations.

3.5.3 Obstacle avoidance

To realize obstacle avoidance in Cartesian space, the trajectory should be planned in this space as well. Let’s assume that we have a three degree-of-freedom discrete DMP that models point-to-point reaching in Cartesian space. We denote the 3-D position vector of 72 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

1

0.5

0

Quaternion −0.5

−1

0 0.5 1 1.5 2 2.5

Time (seconds)

5

0

−5

0 0.5 1 1.5 2 2.5

Angular velocity (rad/sec) Time (seconds)

1

0.8

0.6

Phase 0.4

0.2

0

0 0.5 1 1.5 2 2.5

Time (seconds)

Figure 3.12: Response of the DMP system (3.61), (3.62) to a perturbation. Here the orientation trajectory was perturbed by rotation of 28.3 degrees. The angular velocity was simultaneously set to 0. The graph shows the convergence of the perturbed movement to the final goal orientation.



the 3 DOF discrete dynamical system by T p = [ p , p , p ] . The objective is to generate a

x 2 x

reaching movement to the goal state T g = [ g , g , g ] without hitting obstacles . On the x y x

way to the goal state, an obstacle is positioned at T o = [ o , o , o ] . The goal is to avoid x y y

this obstacle. A suitable coupling term T C = [ C , C , C ] for obstacle avoidance can

t t,x t,y t,z

be formulated as follows:

C t (y, y ˙) = γ sig (∥o − y∥) Ry ˙ (π − φ) exp (−βφ) , (3.103)

where

T ( o − y ) y ˙

φ = arccos , (3.104)

∥o − y∥∥y ˙∥

sig(x) = , (3.105) η 1 + 1

e (x−d)

π

R = exp − φ n , (3.106)

2

(o − y) × y ˙

n = . (3.107)

∥o − y∥∥y ˙∥

3.6. LEARNING OF DMPS FROM MULTIPLE DEMONSTRATIONS 73





Figure 3.13: Obstacle avoidance: In both graphs, the obstacle is represented by a yellow sphere located on the trajectory that starts at a red circle. The left graph shows two scenarios: one where the DMP trajectory passes through the point obstacle when there is no coupling term for obstacle avoidance, and another where a 2D DMP successfully avoids the obstacle using a coupling term. The right graph illustrates DMP paths in 3D. Without a coupling term for obstacle avoidance, the DMP trajectory passes through the point obstacle. However, when the same DMP starts from multiple random positions with the coupling term activated, all resulting paths successfully avoid the obstacle, demonstrating the effectiveness of the obstacle avoidance mechanism.



γ , β, and η are the scaling factors and d is the distance at which the obstacle should start affecting the robot motion. The above coupling term generates a velocity component that lies in a plane defined by vectors o − y and ˙ y. It is orthogonal to the line o − y connecting the robot tip and the obstacle.

We can ensure that the robot tip avoids the obstacle by adding the coupling term C t to Eq. (3.9)

τz ˙ = αz(βz(g − y) − z) + f (x) + Ct(y, z/τ ). (3.108)

The resulting behavior is shown in Fig. 3.13. Note that in this way we can only ensure that the robot tip avoids the obstacle. However, the rest of the robot could still collide with it.



3.6 Learning of DMPs from Multiple Demonstrations

Since DMPs have not been designed to represent whole classes of movements, a standard DMP typically reproduces a specific movement, which can be modulated like described in Section 3.5. However, it is not possible to encode multiple trajectories within one DMP. It turns out, however, that by computing local task models, it is possible to generate a specific DMP that is adapted to the current configuration of the external world [35]. This 74 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

approach has the advantage that the generalized DMPs are computed by local regression methods, which apply local, linear optimization as compared to some other approaches, which apply global, nonlinear optimization methods to compute more complex motion representations [9].

3.6.1 Action generalization using dynamic movement primitives

and local weighting

Next we describe a methodology for generalizing multiple example trajectories to new, previously unobserved situations. This approach differs from the scenario in Section 3.2, where only a single example trajectory is available. In this context, we assume that each example trajectory is associated with parameters that describe the key characteristics of the task, typically its goal. These parameters act as query points into an example database.

To illustrate this, consider the problem of throwing a ball towards a specific target. Suppose we have a set of example robot trajectories that resulted in successful ball throws toward various targets. The goal of generalization is to generate a trajectory that will accurately direct the ball toward any given target in space. In this case, the parameters defining the goal of the task are the target positions.

We begin with a set of example trajectories, each accompanied by parameters that characterize the task:

Z i i i = { y ( t ) , y ˙ t ); q ( t ) , y ¨ ( = 1, . . . , M, | i (3.109)

d i,j d i,j d i,j i

j = 0, . . . , Ti},

where i i i y ( t , y ˙ ( ) t ) , y ) represent the measured positions, velocities, and acceler-¨ ( t

d i,j d i,j d i,j

ations along the i-th trajectory, M is the number of example trajectories, and T i + 1 is the number of sampling points on each trajectory. The parameters m q i R ∈ describe the task in each example scenario and are typically related to the goal of the action. They serve as query points into a database of example trajectories. This type of data can be obtained by any of the data collection methodologies described in Section 1.4.

To explain the concept of query points, let’s consider tasks such as reaching, ball throwing, and drumming. In the case of reaching movements, the goal of an action (or query point) is described by the final reaching destination in the Cartesian space, whereas the trajectories and consequently the attractor points of the DMPs can be defined in joint space. Thus the query and attractor points are connected through the robot kinematics. In other cases, the queries are less directly associated with the parameters of the DMP. For example, in the case of ball throwing, the goal is characterized by the position of the basket into which the ball should be thrown. This position is not directly encoded in the parameters of the discrete DMP as given by Eqs. (3.10) – (3.11). As example of periodic movements, we consider drumming, where the height at which the drum is mounted can vary. In this case the robot needs to adapt its drumming motion to the varying height 3.6. LEARNING OF DMPS FROM MULTIPLE DEMONSTRATIONS 75

of the drum, which is used as query point not directly encoded in the periodic DMP Eqs. (3.15) – (3.16). In summary, the query points normally originate from an intuitive characterization of the task, but the functional relationship between them and the DMP parameters may not be straightforward.

Thus the challenge is to generate new DMPs that specify optimal movements for any new query point q, which typically does not coincide with one of the example queries qk. As explained in Section 3.1, DMPs are generally defined by the parameters w, τ (or r and Ω = 1/τ in the case of periodic movements), and g. Therefore, it is necessary to learn a function that maps query points to these parameters

G T T T T ( Z ) : q 7−→ [[ w , . . . , w , g , τ ] ] (3.110)

1 n

or

G T T T T ( Z ) : q 7−→ [[ w , . . . , w ] , g , r, Ω] (3.111)

1 n

Examples Z specified in (3.109) are the data that can be used to learn this function.

Note that G(Z ) becomes a function only by constraining the solution trajectory to be as similar as possible to the example trajectories. For example, there are many different ways of how to throw a ball into a basket at a certain location. The relationship between the basket positions (query points) and DMP parameters as given in Eq. (3.110) becomes a function only by requesting that the generalized throwing movements are similar to the example throws. This similarity criterion should in practice be embedded into the optimization process.

In general, the functional relationship between q and DMP parameters is unknown. It is often not feasible to identify a global model that provides a good approximation for the function G(Z). To avoid global model identification, weighted averaging and Locally Weighted Regression (LWR) can be applied. These methods are motivated by the observation that some data points are more relevant than the others. Control policies that solve the task in situations that are very different from the current one do not carry much information about the optimal policy in the current situation. LWR is a regression method that fits local models to nearby data [1] and has a lower computational complexity than many other nonparametric regression methods such as Gaussian process regression (GPR) [23]. With LWR there is no global optimization step; instead, training data is stored and utilized when local models need to be computed. Such approaches are also denoted as lazy learning because processing of training data is deferred until a query needs to be answered.

To fully specify a DMP that defines the motion at a new query point, we need to estimate the shape parameters w, goal g, and time constant τ in case of discrete move-ments and the shape parameters w, center point g, amplitude r and frequency of motion Ω in case of periodic movements. In Section 3.2 we have already explained the generation of DMPs using only one training trajectory. Here we consider the estimation of these parameters from multiple demonstrations. The rest of the DMP parameters (αz , βz, and 76 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

αx ) remain fixed and are determined so that the convergence of the underlying dynamic system is guaranteed.



Generalization of goal, timing, amplitude, and frequency

The goal parameter g, time constant τ , amplitude r, and frequency Ω are either directly measured or estimated from the data. They have a clear physical interpretation. From the data (3.109), we thus obtain a sequence of measurements for discrete and periodic movements, respectively

{ M M g i i i , τ } , { g , r , . and Ω } (3.112)

=1 i i i i=1

For every new query point q, the new parameters g, τ, r, and Ω can be estimated using locally weighted distance function by putting higher weight on example queries qi that are close to the current query q. The distance weighting error criterion is in this case given by

M

C X 2 ( q ) = ( z − z )K(d(q, q )), (3.113)

i i

i=1

where z is one of gk, k = 1, . . . , n, τ , r, and Ω. The solutions are given by

1 M

g X = K(d(q, q ))g , (3.114)

P i i M K ( d( q, q i ))

i=1 i=1

1 M

y X i 0 i i,0 P M K = ( d( q, q )) y ( t ), (3.115)

K d ( d ( q , q ))

i i =1 i=1



1 M

τ X = K(d(q, q ))τ , (3.116)

P i i M K ( d( q, q ))

i=1 i i=1

1 M

r X = K(d(q, q ))r , (3.117)

P i i M K ( d( i q, q =1

i i=1 ))

1 M

Ω = X K(d(q, q ))Ω . (3.118)

P i i M K ( d( q, q ))

i=1 i i=1

There are many possibilities to select the weighting kernel K and distance d [1]. One possible choice is the tricube kernel

( 3 3 (1 − | d | ) if |d| < 1

K (d) = , (3.119)

0 otherwise

and the weighted Euclidean distance

d m×m ( q , q i i l R ) = ∥ D ( q − q ) ∥ , D = diag(1 /a ) ∈, a > 0. (3.120) l 3.6. LEARNING OF DMPS FROM MULTIPLE DEMONSTRATIONS 77

The tricube kernel K has finite extent and a continuous first and second derivative, which means that the first two derivatives of the prediction (as a function of query points) are also continuous. The finite support of K is useful because it can help reducing the computational complexity of different optimization problems. As discussed in [1], the choice of weighting function is rarely critical for the performance of locally weighted regression.



Generalization of weights by locally weighted regression

The synthesis of shape parameters w is more complex because these parameters are not directly measurable and do not have any physical interpretation. This makes their computation using a simple weighting approach problematic.

It is therefore better to estimate them using locally weighted regression. Based on Eq. (3.32), we can formulate a locally weighted regression problem to compute the optimal parameters w i for the given query point q. We first compute the target values for all example trajectories

F i 2 i i i = τ y ¨ ( t ) + α τ y ˙ ( t ) − α β ( g − y (t )), (3.121)

k,j d,k i,j z d,k i,j z z k d,k i,j

where i = 1, . . . , M , j = 0, . . . , Ti and k = 1, . . . , M . Writing

f i i i i T F , F = [ , . . . , F ] (3.122)

k k,0 k,1 k,N

we can formulate the following optimization problems (for all degrees of freedom)

M 2 1

min X i X w − f K(d(q, q )) (3.123) ,

w k ∈ R g N i k k i − y

k k, 0

i=1

where 1 N ≤ k ≤ n is the k-th degree of freedom and w k R ∈ are the DMP weights associated with the k-th degree of freedom. Note that the above criterion includes the goal parameter g k, the initial configuration yk,0 on the trajectory, and the timing constant τ . The generalized g, y0 and τ as respectively computed in (3.114), (3.115), and (3.116) should be used here.

K and distance d in the space of query points determine how much influence each of the example movements has on the final estimate of the control policy. The influence of each example movement should diminish with the distance of the query point q from the data point qk. Ideally, D should be determined so that the error in the execution of the task is as small as possible. To reduce the resulting problem to a one dimensional optimization problem while still accounting for the possibly varying spacing across dimensions of the query point space, we can define

ai = c ∗ max min {|qj,i − qk,i|}. (3.124)

j=1,...,M k=1,...,M

78 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

In this way, D is fully specified by a parameter c ∈ R. For regularly spaced data, c ≈ 2.2 often gives good results. With such c the number of example trajectories with nonzero K m ( d ( q , q )) is about 4, where m is the dimensionality of the query point. Automatic k

methods for determining c are also possible. Atkeson et al. [1] suggested to minimize validation set error, which evaluates the difference between the predicted output and the observed value in the validation set, where the data in the validation set are not used for training.

This is a linear least squares problems that is further simplified because the example trajectories for which K vanishes do not influence generalization. Thus the size of the system matrix associated with the objective function (3.123).

The computational complexity of solving the least squares system (3.123) is 2 O ( NT ),

T PM = T , and thus increases linearly with the number of data points considered by

k=1 k

LWR and quadratically with the number of radial basis functions used in (3.5) and (3.14), respectively. Due to our choice of weighting kernel K , we normally have K(d(q, qk)) = 0 for many k. Moreover, by cutting the support of basis functions (3.5) and (3.14) once their value falls below a certain threshold, matrices Xk become sparse as well. The quadratic dependence on the number of basis functions is not a problem because this number is generally much lower than the number of data points. In most practical examples, one can expect around 10000− 50000 data points and 25− 50 basis functions for DMPs. These facts make computational complexity sufficiently low to resolve the least-squares problem (3.123) on the fly using standard methods from sparse matrix algebra.

The calculation of weights for periodic movements is very similar. Instead of Eq. (3.32), the derivation of locally weighted regression problem should be based on Eq. (3.43). To generalize the learned movements to new situations, we store the estimated amplitudes and frequencies and the sampled movements from the last few movement periods, e.g. five, which produced a good movement on the robot as judged by the human teacher. We can now compute the target values for all trajectories



F i 1 α iz i i = y ¨ ( t ) + y ˙ ( t k,j d ,k i,j d ,k i,j ) − α z β z ( g k − y (t )), (3.125) d ,k i,j Ω 2 Ω

where i = 1, . . . , M , j = 0, . . . , Ti and k = 1, . . . , M . Writing

f i i i i T = [ F , F , . . . , F ] (3.126)

k k,0 k,1 k,N

we can formulate the optimization problems (for all degrees of freedom)

M

min X i 2 i k k X w − f K , ( d( q, q i )) (3.127)

w N k R ∈

i=1

where 1 N ≤ k ≤ n is the k-th degree of freedom and w k R ∈ are the DMP weights associated with the k-th degree of freedom. Similar as for discrete DMPs, the above criterion includes the goal parameter gk, the amplitude r and the frequency Ω. The 3.7. SUMMARY 79

procedure CollectTrainingData

acquire k k k trajectory points { y ( t , y ( t = 1 = 1 }, ) ˙ ) y ¨ | k , . . . , T , ( t ) , . . . , M, j

d k,j d k,j d k,j k

e. g. by kinesthetic guiding or direct imitation;

extract the attractor points {gk}, starting points {y0,k} and time constants {τk} or

amplitudes {rk} and frequencies {Ωk}, k = 1, . . . , M ;

associate the acquired trajectories with query points {q k};

procedure GeneralizeTrajectory

for ∗ a given query point q

estimate ∗ ∗ ∗ the attractor point g , starting point y , and time constant τ

0

or amplitude ∗ ∗ r and frequency Ω using (3.114), (3.115) and (3.116) or (3.117) and (3.118), respectively;

minimize ∗ N (3.123) or (3.127) to estimate the parameters w ∈ R ;

output ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ the generalized DMP { g , y , τ , { w }} or { g , r , Ω , { w }} for query q ;

o k k

Figure 3.14: Training and generalization of goal-directed actions

generalized g, r and Ω as respectively computed by (3.114), (3.117), and (3.118) should be used.



3.7 Summary

Dynamic Movement Primitives (DMPs) represent a powerful and flexible framework for the specification and execution of robot control policies. This chapter introduced the the-oretical foundations of DMPs and demonstrated their utility in representing and learning robot motions through a comprehensive approach grounded in dynamic systems theory. By relying on autonomous, nonlinear differential equations, DMPs ensure smooth kine-matic trajectories and allow for robust performance across a wide range of tasks and conditions.

DMPs offer a unified framework to model both discrete (point-to-point) and peri-odic (repetitive) movements, making them suitable for different applications in robotics. They incorporate adjustable parameters that enable the encoding of diverse trajectories while maintaining the stability of the underlying system. This ensures that robots can adapt their movements dynamically without compromising the smoothness or reliability of motion execution.

A distinguishing feature of DMPs is their modular design, which separates the tem-poral progression of movements (handled by a canonical system) from the spatial shaping of trajectories (handled by transformation systems). This separation allows for time-independent motion representation and facilitates adaptation to varying task durations. Additionally, DMPs integrate nonlinear forcing terms that can be learned from demon-strations using techniques such as locally weighted regression. This capability enables robots to generalize learned trajectories to new scenarios and goals, even in unstructured environments.

80 CHAPTER 3. DYNAMIC MOVEMENT PRIMITIVES

The chapter also explored advanced capabilities of DMPs, such as modulation and adaptation. Modulation techniques allow for dynamic adjustments of motion parameters like goal position, trajectory amplitude, and movement frequency. Furthermore, DMPs can incorporate coupling terms to account for sensory feedback, enabling real-time re-sponses to environmental changes. For example, mechanisms like phase stopping and trajectory coupling enhance the robustness of DMPs against perturbations and ensure smooth recovery from disturbances.

Another significant advantage of DMPs is their compatibility with learning from demonstration (LfD) methodologies. DMPs can be initialized from a single demonstration or synthesized from multiple examples to generalize across task variations. This makes them particularly suitable for tasks that require both precision and adaptability, such as object manipulation, assembly, and human-robot collaboration.

The chapter also introduced extensions of DMPs to Cartesian space, where both po-sition and orientation trajectories are represented seamlessly. Quaternion-based formula-tions provide a singularity-free representation of rotational trajectories, ensuring smooth and robust orientation control. Similarly, periodic DMPs enable the encoding of rhythmic motions, with applications ranging from locomotion to repetitive industrial tasks.

Evaluations of DMPs demonstrated their effectiveness in learning and reproducing complex motion patterns while ensuring stability and robustness. The ability to adapt trajectories in real time based on sensory feedback further underscores their suitability for dynamic and uncertain environments. The integration of DMPs with online learn-ing methods, such as reinforcement learning and statistical modeling, holds promise for extending their capabilities in real-world applications.

In conclusion, DMPs provide a comprehensive and practical framework for robotic motion generation and control. Their blend of stability, flexibility, and adaptability makes them a valuable tool for addressing the challenges of modern robotics.





Chapter 4




Dynamic Systems and Gaussian

Mixture Regression



4.1 Robot Motion Representation and Dynamical

Systems

DMPs are not the only type of dynamical system that can be used for robot control. An alternative approach also based on dynamical systems has been developed in [9] and [15]. This approach models robot motion as a dynamical system whose evolution encodes the desired robot motion in each robot state. Let’s denote the robot’s state by n y ∈ R,

where y typically represents the robot’s position (and sometimes velocity) and n is the number of degrees of freedom. With this approach, the motion is modeled as a first-order autonomous dynamical system:

y ˙ = f (y), (4.1)

where n n f : 7→ R R is a continuous and differentiable vector field that defines the rate of change of the state.

To represent the desired motions, functions f (y) should be designed such that the system converges to a desired attractor point. For a point-to-point motion, the attractor is typically a fixed point ∗ y , and the goal is to ensure that:

lim ∗ n y ( t ) = y for all y (0) ∈. R (4.2)

t →∞

The stability of dynamical systems is a critical property in ensuring the desired be-havior of robotic motion. For a fixed point ∗ y to act as an attractor, the vector field f (y) must be designed such that the equilibrium point ∗ y is globally asymptotically stable. This means that for any initial state ∗ y (0), the state trajectories y ( t ) converge to y over

time, and small perturbations do not lead to diverging trajectories. The SEDS frame-work ensures this property by embedding stability guarantees into its design. It leverages Lyapunov theory to ensure that the system’s energy-like function decreases monotoni-

81

82 CHAPTER 4. DYNAMIC SYSTEMS AND GAUSSIAN MIXTURE REGRESSION

cally as the state evolves, enabling robust recovery from perturbations and changes in the environment.

While DMPs are a widely used and powerful approach for motion representation in robotics, the SEDS approach provides a complementary framework with unique strengths. DMPs excel at replicating specific demonstrated trajectories using second-order dynamics with constant coefficients, ensuring inherent stability as the phase variable approaches zero. This stability, combined with their computational efficiency and the ability to adapt to different start and goal configurations, makes DMPs particularly effective for tasks requiring fast, reliable, and precise trajectory reproduction. Moreover, DMPs often require only a single demonstration to encode a trajectory, making them well-suited for applications where simplicity and speed are paramount.

The SEDS approach, however, offers significant advantages in scenarios requiring adaptability and generalization across diverse conditions. Unlike DMPs, which are fo-cused on a single trajectory, SEDS represents motion as a continuous vector field over the state space. This enables SEDS to capture an entire class of trajectories, providing robust responses to perturbations, varying initial conditions, and dynamic environmen-tal changes. By directly encoding attractor dynamics within the representation, SEDS eliminates the need for replanning when deviations occur—a feature shared with DMPs.

In the rest of this chapter, we explain how to design functions f with stable attractor dynamics and how to learn such functions from human demonstrations using the SEDS framework.



4.2 Lyapunov Stability Theory and Conditions

To design functions f with stable attractor dynamics, we first introduce the Lyapunov stability theory. The Lyapunov theory provides a framework to analyze the stability of dynamical systems. A system described by ˙ y = f (y) is said to be stable if small perturbations in the initial state y(0) result in bounded deviations of y(t) for all future times. The stability conditions are assessed using a Lyapunov function n V ( y ) : → R R, which acts as an energy-like measure of the system’s state.

The key concepts are as follows:

1. ∗ Stability : The attractor point y is stable if, for every ϵ > 0, there exists a δ > 0

such that ∗ ∗ ∥ y (0) − y ∥ < δ implies ∥ y ( t ) − y∥ < ϵ for all t ≥ 0.

2. ∗ Asymptotic Stability : The attractor point y is asymptotically stable if it is

stable and, in addition, lim ∗ ∗ y ( t ) = y for all y (0) in a neighborhood of y. t→∞

3. ∗ Global Asymptotic Stability : The equilibrium point y is globally asymptoti-

cally stable if lim ∗ n t→∞ R y ( t ) = y for all y (0) ∈.

The Lyapunov function V (y) must satisfy the following conditions to prove stability:

4.3. DESIGN OF DYNAMIC SYSTEMS BY PBD 83

• ∗ ∗ V ( y ) > 0 for all y ̸ = y and V ( y) = 0 (positive definiteness),

• ˙ ⊤ V ( y ) = ∇ V ( y )f (y) ≤ 0 (negative semi-definiteness),

• ∗ For asymptotic stability, ˙ V ( y ) < 0 for all y ̸ = y ,

• For global asymptotic stability, V (y) → ∞ as ∥y∥ → ∞ (radially unboundedness).

An example of a candidate Lyapunov function is the quadratic function:



V 1 ∗ 2 ( y ) = ∥ y − y ∥, (4.3) 2

which ensures that the system’s trajectories converge to ∗ y if ˙ V (y) < 0.

By analyzing the above conditions, we can prove the stability of an attractor without computing an explicit solution to the system’s differential equations. This is in contrast to indirect methods, which rely on analyzing the solutions of the system. The direct method evaluates stability by constructing a Lyapunov function and examining its behavior along the system’s trajectories. This makes it particularly powerful for analyzing nonlinear sys-tems where explicit solutions are often infeasible. Stability of the attractor is guaranteed if a Lyapunov function V (y) exist such that:

1. ∗ ∗ V ( y ) > 0 for all y ̸ = y and V ( y) = 0,

2. ˙ ⊤ ∗ V ( y ) = ∇ V ( y ) f ( y ) < 0 for all y ̸ = y.

Its time derivative along the trajectories of the system is:

V ˙ ∗ ⊤ ( y ) = ( y − y )f (y). (4.4)

To ensure stability, ∗ f ( y ) must be designed such that ˙ V ( y ) < 0 for y ̸ = y.



4.3 Design of Dynamic Systems by PBD

The desired motion is often learned from user demonstrations. Given a dataset of tra-jectories N D = { ( y , y ˙ ) } , the goal is to fit a function f (y) that replicates the observed

i i i=1

behavior and fulfills the Lyapunov stability criteria. This can be achieved using proba-bilistic models, e.g. Gaussian Mixture Models (GMM) combined with Gaussian Mixture Regression (GMR).

4.3.1 Data Collection

A human demonstrates the desired behavior multiple times. From each trajectory we sample the recorded states and their corresponding velocities:

D T N n = {{ ( y , y ˙ ) } } , (4.5)

n,t n,t t=0 n=1

84 CHAPTER 4. DYNAMIC SYSTEMS AND GAUSSIAN MIXTURE REGRESSION

where n indexes the user demonstrations and t the recorded positions and velocities on the demonstrated motion. The number of demonstrations needed depends on the complexity of the task and the variability of the trajectories. For tasks with low variability, a few demonstrations (e.g., 3-5) may suffice to capture the key characteristics of the motion. For tasks with higher variability or complex dynamics, more demonstrations are required to accurately compute the dynamical system. It is crucial to ensure that the demonstrations cover the entire state space relevant to the task to prevent poor generalization.

4.3.2 Gaussian Mixture Model (GMM)

A Gaussian Mixture Model (GMM) is a probabilistic model used to represent data as a mixture of multiple Gaussian distributions. It is particularly effective for modeling com-plex data distributions and is widely used in tasks such as clustering, density estimation, and trajectory modeling in robotics. It is defined as follows:

K

p( X x ) = π N (x; µ , Σ ), (4.6)

k k k

k=1

where:

• K is the number of Gaussian components,

• PK π k are the mixing coefficients, with π = 1 and π ≥ 0, k =1 k k

• N (x; µk, Σk) is the k-th Gaussian component with mean vector µk and covariance

matrix Σk.

Each Gaussian component is expressed as:



N 1 1 ⊤ − 1 ( x ; µ k , Σ k ) = exp − ( x − µ k ) Σ ( x − µ k ) , (4.7) (2 2 π ) d/ | 1 / 2 k Σ k | 2

where d is the dimensionality of x and |Σ k| is the determinant of Σk. The parameters of a GMM include:

• mixing coefficients πk,

• means µk,

• covariance matrices Σk.

These parameters are typically estimated using the Expectation-Maximization (EM) al-gorithm, which iteratively maximizes the likelihood of the data under the model.

In robotics, GMMs are often used to model trajectory distributions. Given a set of demonstrated trajectories, GMMs can encode the variability and correlations in the data. This way we provide a compact representation for learning and reproducing motion.

4.3. DESIGN OF DYNAMIC SYSTEMS BY PBD 85

To derive the velocity field function f from (4.1), we model the dataset D defined in Eq. (4.5) using a Gaussian Mixture Model (GMM):

K

p( X y , y ˙ ) = π N (y, y ˙; µ , Σ ), (4.8)

k k k

k=1

where p(y, y ˙) is the joint probability of position y and velocity ˙ y, πk are the mixing coefficients, PK 2n 2n×2n π = 1 and µ , ∈ are the mean and covariance of the k=1 k k R ∈ Σ k R

k-th component

  ⊤

" # ! " # !

N 1 1 y y − 1 ( y , y ˙ ; µ k , Σ k ) = exp  − − µ k Σ − µk  . (4.9) (2 n π ) | 1 / 2 k Σ k | 2 y ˙ y ˙

Model (4.8) can also be interpreted as

K

p X ( y , y ˙ ) = P (k)p(y, y ˙|k), (4.10)

k=1

where P (k) = πk denotes the probability of picking the k-th component and p(y, y ˙|k) = N ⊤ ⊤ ⊤ ( y , y ˙ ; µ , Σ ) is the conditional probability that the datapoint [ y , y ˙ ] belongs to the

k k

k-th component.

The likelihood of the data under the GMM is given by

N T # n K " ! y

L Y Y X n,t = π N ; µ , Σ , (4.11)

k k k y ˙

n n,t =1 t =0 k =1

and the log-likelihood, which is typically maximized, is given by:

N " # !! T n K y

ln( X X X n,t L ) = ln π N ; µ , Σ . (4.12)

k k k y ˙

n n,t =1 t =0 k =1

The number of mixture components K is a critical parameter that affects the model’s ability to represent the underlying data. In the context of threjectory representation by DS, K determines the resolution at which the trajectory dynamics are modeled. Each Gaussian component corresponds to a region of the state space, capturing the local be-havior of the system. Thus, selecting K has direct implications for the fidelity of the learned motion. A concrete procedure for determining K is provided in Section 4.3.5.

4.3.3 Gaussian Mixture Regression (GMR)

Gaussian Mixture Regression (GMR) is a technique derived from a Gaussian Mixture Model (GMM) to model the conditional distribution of output variables given input vari-ables. We start by assuming that the joint probability distribution of [ ⊤ ⊤ ⊤ y , y ˙ ] is modeled 86 CHAPTER 4. DYNAMIC SYSTEMS AND GAUSSIAN MIXTURE REGRESSION

by a GMM as in (4.8). Let’s write

" #

µ k,y

µk = , (4.13)

µ k,y ˙

" #

Σ k,y Σk,y,y ˙

Σk = . (4.14)

Σ k,y ˙,y Σk,y ˙

The conditional distribution p( ˙ y|y) for each component is then given as:

p( ˙ y|y, k) = N (y; µ bk,y, Σ bk,y), (4.15)

where:

µ −1 = µ + Σ Σ (y − µ ), (4.16)

b k, k k, y ˙ k, y ˙ , yy k,y

Σ −1 = Σ − Σ Σ Σ . (4.17)

b k k,y ˙ k,y ˙,y k,y k,y,y ˙



The overall conditional distribution p( ˙ y|y) is a mixture of the conditional distributions for all components:

K

p X ( ˙ y | y ) = h (y)N (y; µ ), , Σ (4.18)

k bk b k

k=1

where hk(y) are the responsibilities:



hk (y) = = , (4.19) P K P K P P (k)p(y|k) πkN (y; µk,y, Σk,y)

i (i)p(y|i) πiN (y; µi,y, Σi,y) =1 i =1

The expected value of ˙ y is then computed as follows:

K

y X ˙ = f ( y ) = y y µ E[ ˙ y | ] = h ( ) ˆ . (4.20)

k k

k=1

Writing

A −1 = Σ Σ , (4.21)

k k,y ˙,y k,y

b −1 k k,y ˙ k,y ˙,y k, µ = − Σ Σ µ µ y = − (4.22)

k,y k,y ˙ k k,y A µ,

we can rewrite Eq. (4.16) as

µ µ b k = µ k, y ˙ + A k ( y −k,y) = Aky + bk. (4.23)

4.3. DESIGN OF DYNAMIC SYSTEMS BY PBD 87

We obtain the following expression for the velocity field ˙ y

K

y ˙ X = f ( y ) = h (y)(A y + b ). (4.24)

k k k

k=1

GMR provides a smooth and probabilistic mapping from y to ˙ y, making it ideal for tasks like trajectory learning and motion planning in robotics.



4.3.4 Stable Estimator of Dynamical Systems (SEDS)



The SEDS framework is a powerful approach for learning stable motion dynamics from demonstrations. By leveraging principles from Lyapunov stability theory, SEDS ensures that the resulting dynamical system converges to a desired attractor point while reproduc-ing the demonstrated motion. This makes it particularly suitable for robotics applications where robustness and adaptability are critical, such as trajectory planning, manipulation, and navigation tasks.

SEDS models the motion as a mixture of Gaussians, where each component captures a segment of the observed behavior. The framework enforces global asymptotic stability by constraining the parameters of the learned dynamical system. These stability guarantees are achieved through a Lyapunov function, which ensures that the system’s trajectories converge to the desired attractor from any initial condition within the state space.

A key strength of SEDS lies in its ability to balance accurate reproduction of demon-strated motions with mathematical stability guarantees. This dual objective is formu-lated as an optimization problem, where the parameters of the Gaussian Mixture Model (GMM) are estimated while satisfying stability constraints. The following sections present the mathematical formulation of this optimization problem and its variants.

Next we explain how the unknown parameters πk, µk, Σk forming the Gaussian mixture model (4.8) are computed in SEDS. The standard approach of applying an expectation-maximization algorithm to estimate these parameters is insufficient because with a classic EM-algorithm it is not possible to guarantee that the resulting system (4.24) is asymp-totically stable. Thus in the following we derive sufficient conditions for the estimation of parameters πk, µk, Σk that guarantee the asymptotic stability of (4.24).

To prove stability we utilize Lyapunov function



V 1 ∗ ⊤ ∗ ( y ) = ( y − y ) ( y − y). (4.25) 2

88 CHAPTER 4. DYNAMIC SYSTEMS AND GAUSSIAN MIXTURE REGRESSION

Its derivative is given by

˙ dV d y 1 d

V ∗ ⊤ ∗ ∗ ⊤ ( y ) = = (( y − y ) ( y − y )) y ˙ = ( y − y )y ˙

dy dt 2 dy

K

= ( ∗ X ⊤ y − y ) h (y)(A y + b )

k k k

k=1

K

= ( ∗ X ⊤ ∗ ∗ y − y ) h ( y )( A ( y − y ) + A y + b )

k k k k

k=1

K

= X ∗ ⊤ ∗ ∗ ⊤ ∗ h ( y ) ( y − y ) A ( y − y ) + ( y − y ) ( A y + b ) (4.26)

k k k k

k=1

The definition of responsibilities (4.19) guarantees that h k(y) > 0. Note also that

1 1

( ∗ ⊤ ∗ ∗ ⊤ ⊤ ⊤ ∗ y − y ) A k k k y ( − y ) = ( y − y ) ( A + A y − ) + ( A k k A − )) ( y )

2 2

= 1 ∗ ⊤ ⊤ ∗ y ( − y ) ( A k + A )( y − y). (4.27) k 2

We can thus guarantee that ˙ ⊤ V ( y ) < 0 ∀ y if symmetric matrices A + A are negative-

k k

definite, i.e. ⊤ ∗ A + A ≺ 0, and if A y + b = 0. Note that these are sufficient, not

k k k k

necessary conditions. As explained in Section 4.2, it follows from the Lyapunov theory that an arbitrary differential equation system (4.24) with negative definite matrices ⊤ A + A

k k

and fulfilling ∗ ∗ A y + b = 0 is asymptotically stable at target y.

k k



We can now finally define an optimization problem to compute the unknown param-eters πk, µk, Σk. Two different optimization problems have been proposed in [15]. The Stable Estimator of Dynamical Systems (SEDS) Likelihood framework optimizes a log-likelihood function while ensuring global asymptotic stability. This results in the following optimization problem:

n

1 N ! T K

min X X X L ( θ ) = log π N (y , y ˙ ; µ , Σ ) ,

πk, µk, Σk n T k n,t n,t k k

=1 t=0 k=1

subject to: ∗ A y + b = 0, ∀k

k k

A ⊤ + A ≺ , ∀k 0

k k (4.28)

Σk ≻ 0, ∀k

0 < πk < 1, ∀k

K

X π = 1

k

k=1

Here T is the overall number of datapoints in (4.5). Alternatively, SEDS Mean Squared 4.3. DESIGN OF DYNAMIC SYSTEMS BY PBD 89

Error framework optimizes

1 N Tn

min X X 2 J ( θ ) = ∥ y ˙ − f ( y ) ∥,

πk T n,t n,t , µ 2

k k n=1 t=0 , Σ

subject to: ∗ A y + b = 0, ∀k

k k

A ⊤ k k A + ≺ , ∀ 0k

(4.29)

Σk ≻ 0, ∀k

0 < πk < 1, ∀k

K

X π = 1

k

k=1

SEDS-Likelihood and SEDS-MSE optimization problems can be solved using standard libraries for nonlinear programming.

The SEDS-likelihood approach optimizes the probability of the observed data under the Gaussian Mixture Model (GMM) used to represent the dynamics. It seeks to find the parameters that maximize the likelihood of the data, leveraging the probabilistic na-ture of GMM. It provides a robust probabilistic framework, making it well-suited for tasks involving uncertainty or noisy demonstrations. By capturing the underlying data distribution, it can effectively handle complex or multimodal trajectory patterns. It is particularly valuable in applications where uncertainty quantification is important, as the probabilistic nature allows for modeling the variance in trajectories. However, the resulting optimization process can be computationally expensive due to the need to cal-culate the likelihood for multiple components and enforce stability constraints. It may be less intuitive for users focused on direct trajectory reproduction, as the emphasis is on probability rather than error minimization.

The SEDS-MSE approach, on the other hand, directly minimizes the error between the velocities predicted by the learned model and those observed in the demonstrations. It focuses on accurate trajectory reproduction by reducing the Euclidean distance between predicted and observed values. The objective is straightforward and intuitive, as it di-rectly minimizes the mismatch between the demonstrated and reproduced velocities.It is computationally efficient as it avoids the probabilistic calculations required by the SEDS-likelihood approach. This approach is easier to interpret and validate in terms of achieving precise trajectory reproduction. However, unlike the likelihood approach, it does not ex-plicitly account for uncertainty or variability in the data, which can be a limitation in tasks with noisy or multimodal data. It may be less robust to outliers or variations in the demonstrations, as it focuses solely on minimizing the deterministic error.

In summary, the SEDS-likelihood approach is probabilistic, focusing on capturing the underlying distribution of the data and ensuring robustness to noise. It is computation-ally intensive but better suited for tasks that involve significant variability or require modeling uncertainty. In contrast, the SEDS-MSE approach prioritizes accuracy and 90 CHAPTER 4. DYNAMIC SYSTEMS AND GAUSSIAN MIXTURE REGRESSION

simplicity, directly targeting the precise reproduction of demonstrated trajectories. It is computationally lighter and more intuitive but less equipped to handle variability and noise.



4.3.5 Selecting Optimal Number of Mixture Components for

SEDS

In the SEDS framework, the choice of K must also account for the stability constraints imposed during parameter estimation. A larger K provides greater flexibility to model complex dynamics but may introduce challenges in satisfying the Lyapunov stability con-ditions. Conversely, a smaller K simplifies stability enforcement but may lack the resolu-tion to accurately capture the demonstrated trajectories.

To ensure a balance, the designer must ensure that K is sufficient to capture the vari-ability in the demonstration data, including transitions between different regions of the state space. On the other hand, K should not be too large to such that the optimiza-tion problem remains computationally tractable while satisfying the stability constraints. A smaller K often results in a more interpretable model, which can be beneficial for understanding the learned dynamics.

In practice, K is usually determined experimentally, starting with a smaller number and increasing it until the model adequately captures the variability in the demonstrated trajectories. The selection process should also account for the trade-offs between accuracy, stability, and computational complexity inherent to the SEDS framework:

Initial Selection Based on Data Complexity Start with a small value of K (e.g.,

2-3) to capture the basic structure of the data. Increase K iteratively, adding components as necessary to capture finer details in the trajectories.

Evaluation Metrics Use model selection criteria such as the Akaike Information Cri-

terion (AIC) or Bayesian Information Criterion (BIC) to balance model fit and complexity. These criteria are defined as:

AIC = 2m − 2 ln(L b ), (4.30)

BIC = m ln(T ) − 2 ln(L b), (4.31)

where m is the number of estimated parameters (πk, µk, Σk), T is the number of data points, and ln(L b ) is the maximized log-likelihood of the model as specified in (4.12). Lower AIC or BIC scores indicate better trade-offs between model complexity and data fit.

Cross-Validation : Divide the data into training and validation sets. Fit the GMM with

different values of K and evaluate the performance on the validation set. Select the K that minimizes error on the validation set while avoiding overfitting.

4.3. DESIGN OF DYNAMIC SYSTEMS BY PBD 91

Stability Analysis : Ensure that the chosen K allows the model to satisfy the Lyapunov

stability constraints in the SEDS framework. Test the model’s ability to converge to the attractor for a range of initial conditions.



4.3.6 Accuracy of Motion Reproduction with SEDS

To evaluate the performance of Stable Estimator of Dynamical Systems (SEDS) frame-work, we analyze its performance in reproducing demonstrated trajectories and its ability to handle perturbations. While the accuracy of motion reproduction with SEDS is usually not as high as what can be achieved with Dynamic Movement Primitives (DMPs), the system reliably converges to the desired attractor. This convergence ensures robustness and adaptability in dynamic environments, a key advantage of the SEDS framework over methods focusing solely on trajectory accuracy.

Increasing the number of mixture components K in the Gaussian Mixture Model (GMM) used by SEDS could improve the accuracy of motion reproduction. A larger K allows the system to capture finer details in the demonstrated trajectories. However, this improvement in accuracy comes at the cost of increased complexity in the optimization problem. As K grows, ensuring that the stability constraints are satisfied becomes more computationally challenging, and the resulting system may be harder to interpret.

Figure 4.2 illustrates how Gaussian Mixture Models (GMM) within the SEDS frame-work approximate demonstrated 2-D movements by displaying the mean values and co-variance matrices of the Gaussians alongside the reproduced trajectories. Each Gaussian is represented as an ellipse, where the center denotes the mean and the shape and size re-flect the covariance matrix, indicating the variance and orientation of the Gaussian. The trajectories pass through these ellipses, demonstrating how the means act as attractors that guide the motion while the covariance influences the spread and variability around the mean. The number of components determines the level of detail captured: with more Gaussians, finer details of the motion are preserved, while fewer Gaussians yield a smoother but less precise approximation. As the trajectory progresses, it transitions smoothly between regions dominated by different Gaussians, blending their influences based on proximity and variance. This ensures a continuous and stable motion. The visual representation in both position and phase planes highlights how each Gaussian shapes specific segments of the trajectory, creating a dynamic interplay that accurately and stably replicates the demonstrated movement.

Figure 4.3 illustrates the global asymptotic stability of the SEDS framework through streamlines that visualize the system’s behavior. Each streamline represents the trajectory of the system as it evolves from a different initial position within the state space. The graphs highlight that, regardless of the initial starting position, all trajectories converge to the desired attractor point. This behavior demonstrates the stability of the learned dynamical system and validates its ability to generalize across diverse initial conditions.

The streamlines underscore the core strength of SEDS in embedding stability directly 92 CHAPTER 4. DYNAMIC SYSTEMS AND GAUSSIAN MIXTURE REGRESSION

150

80 60

100

60 40

50 20

40

0 20

0

-20

-50 0

-40

-100 -20

-60

-150

-40

-80

-60

-200

-150 -100 -50 0 50 100 -150 -100 -50 0 50 100 -200 -150 -100 -50 0 50 100 150

20

30

100

10

20

0

50 10

-10 0

0

-20

-10

-20

-50 -30

-30

-40

-100

-40

-50

0 50 100 150 0 50 100 150 -100 -50 0 50 100

10

120 0

8

6

100

4

80

2 -5

0

60

-2

40

-4

-10

-6

20

-8

0

-10

-15

-40 -30 -20 -10 0 10 -40 -30 -20 -10 0 10 0 20 40 60 80 100 120

Figure 4.1: Each row in the figure shows the reproduction of demonstrated 2-D movements with a single dynamic system. The red dotted lines show the demonstrated trajectories and the full colored lines the trajectories reproduced by the computed dynamic system. The figures show trajectories in (y1, y2) position plane and (y1, y ˙1) and (y2, y ˙2) phase planes.



within the learned model. Unlike approaches that rely on external control mechanisms to recover from perturbations, the SEDS framework inherently returns to the desired trajectory due to its attractor dynamics. This feature makes it particularly suitable for applications requiring consistent convergence to target points in dynamic or uncertain environments.

Beyond accuracy, a key strength of SEDS is its inherent robustness to perturbations. The dynamic system, governed by Lyapunov stability principles, ensures that the system trajectories naturally return to the desired motion even when subjected to disturbances, as demonstrated in Fig. 4.4.

4.4. SUMMARY 93





Figure 4.2: Each column in the figure shows the Gaussian mixture models used to repre-sent the demonstrated movements. The means and covariance matrices in (y1, y2) position plane and phase spaces (y 1, y ˙1) and (y2, y ˙2) are shown. The same data as in Fig. 4.1 was used to train the models.





4.4 Summary



The Stable Estimator of Dynamical Systems (SEDS) framework offers a robust, flexible, and mathematically grounded approach to learning stable dynamical systems for robot motion representation. This chapter has explored the foundational principles, design, and evaluation of SEDS, emphasizing its unique advantages and limitations when compared to other methods such as Dynamic Movement Primitives (DMPs).



SEDS leverages Gaussian Mixture Models (GMM) and Gaussian Mixture Regression (GMR) to encode the variability in demonstrated motions while ensuring stability through Lyapunov theory [15, 16]. This guarantees that the system converges to the desired at-tractor point under all initial conditions. This built-in stability is a significant advantage, allowing SEDS to respond adaptively to perturbations and environmental variations with-out requiring external stabilization mechanisms.



A key strength of SEDS lies in its ability to balance motion reproduction accuracy



150



120



100



100



100



50 50



80

0

0 60

-50

40

-50

-100

20

-150 -100

0

-200

-150 -100 -50 0 50 100 0 50 100 150 -40 -30 -20 -10 0 10

Figure 4.3: Convergence towards the attractor point for different starting positions. The same data as in Fig. 4.1 was used to train the models.

94 CHAPTER 4. DYNAMIC SYSTEMS AND GAUSSIAN MIXTURE REGRESSION





Figure 4.4: Robustness of the learned models to perturbations. The same data as in Fig. 4.1 was used to train the models. During motion reproduction, the robot’s end-effector at t = 1 second with the displacement vector [−30; 30] mm.



with stability guarantees. While its accuracy in reproducing individual trajectories may not match that of DMPs, SEDS excels in tasks requiring generalization and adaptability. Its ability to recover from perturbations and return to the desired trajectory makes it particularly suitable for dynamic and uncertain environments [9]. However, the choice of the number of mixture components, K, in the GMM significantly impacts performance. A larger K can improve the fidelity of motion reproduction by capturing finer details in trajectories but adds complexity to the optimization process, potentially affecting com-putational efficiency.



Unlike DMPs, which prioritize simplicity and efficient trajectory reproduction, SEDS’s strength lies in its broader applicability to tasks that demand adaptability and inherent stability. While well-suited for continuous state spaces, one limitation of SEDS is that en-suring stability for complex trajectories with many attractors may require careful tuning or domain-specific heuristics, which can increase implementation complexity. Recent ad-vancements, such as learning Lyapunov functions alongside policies using neural networks, further extend the capabilities of SEDS [4].

Compared to other trajectory planning techniques, such as polynomials and splines, SEDS benefits from its dynamic nature. Splines, while optimized for smoothness and precision in predefined paths, are static representations that require external intervention to handle perturbations. In contrast, SEDS inherently adapts through its vector field dynamics, making it more robust to disturbances and better suited for complex, changing environments [30].

In summary, SEDS provides a powerful framework for designing dynamic systems that are accurate in reproducing demonstrated motions while being inherently stable and robust. Its applicability extends to various robotics tasks, including manipulation, naviga-tion, and interaction, where adaptability and stability are paramount. Despite requiring careful parameter selection and computational resources for complex tasks, SEDS remains a cornerstone method for learning and representing stable dynamical systems in robotics.





Chapter 5




Probabilistic Movement Primitives

(ProMPs)



Probabilistic Movement Primitives (ProMPs) provide a powerful approach to learning and reproducing motion in robotics, extending the versatility and adaptability of pre-vious frameworks like Dynamic Movement Primitives (DMPs) and Stable Estimator of Dynamical Systems (SEDS) with some unique features. Unlike DMPs, which focus on de-terministic trajectory generation, or SEDS, which emphasizes stability through Lyapunov functions, ProMPs employ a probabilistic formulation to capture the variability and un-certainty inherent in motion demonstrations. This chapter explores the foundations of ProMPs, their mathematical formulation, and their comparative advantages over DMPs and SEDS.



5.1 Mathematical Formulation of ProMPs

At the core of ProMPs lies the assumption that trajectories can be represented as a linear combination of basis functions with weights drawn from a multivariate Gaussian distribution. Formally, for a single degree of freedom (DoF), a trajectory y(t), which now includes both position y(t) and velocity ˙ y(t), is expressed as:

y(t) = Φ(t)w + ϵ, (5.1)

where n Φ ( t ) is a 2 × n matrix of time-dependent basis functions, w ∈ is an n R-dimensional vector of weights, and ϵ ∼ N (0, Σϵ) is Gaussian noise capturing trajectory variability,

" # 2 σ 0

with y Σ ϵ 2 σ

= to account for different noise levels in position and velocity.

0 y ˙

The state vector y(t) is given by:

" #

y(t)

y(t) = , (5.2)

y ˙(t)

95

96 CHAPTER 5. PROBABILISTIC MOVEMENT PRIMITIVES (PROMPS)

which ensures that both position and velocity are included in the representation.

Each row of the basis function matrix Φ(t) is defined as:

" #

ϕ(t)

Φ (t) = , (5.3) ϕ ˙

(t)

where ϕ(t) is an n × 1 vector of basis functions for position, and ˙ ϕ(t) is its derivative with respect to time, representing the basis functions for velocity. Each element of ϕ(t)



is defined as: 2 ( t − c i ) ϕ i ( t ) = exp − , (5.4) 2 2 h i

where c i and hi are the center and width of the i-th basis function, respectively. These parameters determine the temporal placement and scale of the basis functions, enabling flexible representation of trajectories. The complete vector of basis functions is then given by:

h i

ϕ(t) = ϕ1 (t) ϕ2(t) . . . ϕn(t) , (5.5)

where n is the number of Gaussian basis functions.

The coefficients of the matrix Φ(t) are defined as:

" #

Φ(t) = , (5.6) ˙ ϕ1 (t) ϕ2(t) . . . ϕn(t)

ϕ ˙ ˙ 1 ( t ) ϕ 2 ( t ) . . . ϕn(t)

where ϕi(t) is the i-th basis function evaluated at time t, and ˙ ϕi(t) is its time derivative. This formulation ensures that Φ(t) encapsulates both position and velocity contributions for all basis functions.

The weights w are modeled probabilistically:

w ∼ N (µw, Σw), (5.7)

where n n×n µ w R ∈ and Σ w R ∈ are the mean and covariance matrix of the weight distribution, respectively. This probabilistic modeling enables ProMPs to capture both the mean behavior and the variability of motion across multiple demonstrations.



5.1.1 Using Phase Variables Instead of Time

While the primary formulation of ProMPs uses time t to index the basis functions, it is often advantageous to use a phase variable s instead. The phase variable provides a normalized representation of progress through the trajectory, independent of absolute time. The phase variable s(t) is defined as:

t

s(t) = , (5.8)

T

5.1. MATHEMATICAL FORMULATION OF PROMPS 97

where T is the total duration of the trajectory. This normalization maps t ∈ [0, T ] to s ∈ [0, 1].

The basis functions are then parameterized by s rather than t:

2 ( s − c )

ϕ i i 2 s

( ) = exp − , (5.9)

2hi

˙ s − c

ϕ i i i 2 (

s) = ˙ s(t) − ϕ (s) , (5.10)

hi

where ˙ 1 s ( t ) = is the time derivative of the phase variable.

T

Using the phase variable allows for time-scaling invariance, which is particularly useful when trajectories of varying durations need to be compared or reproduced. The trajectory representation is then expressed as:

y(s) = Φ(s)w + ϵ, (5.11)

where Φ(s) replaces Φ(t), and the structure of Φ(s) remains consistent with the definition above.



5.1.2 Marginalizing Over Weights to Compute Likelihoods

Let’s assume now that we observe trajectory y(t). Assuming model (5.7), its likelihood at each time t is computed by marginalizing over the weights:

Z

p(y(t)) = p(y(t)|w)p(w)dw. (5.12)

This integral can be solved efficiently due to the Gaussian assumptions, yielding:

p ⊤ ( y ( t )) = N ( y ( t ); Φ ( t ) µ , Φ ( t ) Σ Φ ( t ) + Σ ), (5.13)

w w ϵ

where Φ(t) is a matrix of basis functions evaluated at all time steps. To transition from Eq. (5.12) to Eq. (5.13), we exploit the fact that both p(y(t)|w) and p(w) are Gaussian distributions, which makes the integral tractable. The likelihood of observing the trajectory y(t) given the weights w is Gaussian, expressed as:

p(y(t)|w) = N (y(t); Φ(t)w, Σϵ), (5.14)

where Φ(t) is the matrix of basis functions evaluated at all time steps, Σϵ is the covariance matrix representing independent Gaussian noise, and Φ(t)w represents the mean of this conditional Gaussian. Similarly, the prior distribution over weights is also Gaussian:

p(w) = N (w; µ w , Σw), (5.15)

98 CHAPTER 5. PROBABILISTIC MOVEMENT PRIMITIVES (PROMPS)

where µw and Σw are the mean and covariance of the weights, respectively.

Substituting these into Eq. (5.12), the marginalization integral becomes:

Z

p (y(t)) = N (y(t); Φ(t)w, Σ ϵ)N (w; µw, Σw )dw. (5.16)

The product of these two Gaussian distributions yields another Gaussian distribution. As explained below (see Section 5.1.3), the marginal distribution of y(t) has a mean given by ⊤ Φ ( t ) µ and a covariance given by Φ ( t ) Σ Φ ( t ) + Σ . Consequently, the integral w w ϵ

simplifies directly to:

p ⊤ ( y ( t )) = N ( y ( t ); Φ ( t ) µ , Φ ( t ) Σ Φ ( t ) + Σ ). (5.17)

w w ϵ

This result leverages the properties of Gaussian distributions, specifically their closed-form solutions for convolution (marginalization in this case) and combination. By carefully handling the mean and covariance terms, we arrive at the desired probabilistic description of y(t) in Eq. (5.17).

The above formulation highlights the elegance of ProMPs in combining probabilistic reasoning with basis function representation. Each basis function contributes to the overall trajectory generation, while the covariance Σw captures the uncertainty and variability in motion.



5.1.3 Product of Gaussian Distributions

The integral in Eq. (5.12) involves the product of two Gaussian distributions:

Z

p (y(t)) = N (y(t); Φ(t)w, Σ ϵ)N (w; µw, Σw )dw. (5.18)

To compute this integral, we use the fact that the product of two Gaussian distributions yields another Gaussian distribution, scaled by a normalization factor. This enables the integral to be solved in closed form.

The conditional likelihood p(y(t)|w) is given as:

p(y(t)|w) = N (y(t); Φ(t)w, Σϵ), (5.19)

while the prior on weights is given by:

p(w) = N (w; µ w , Σw). (5.20)

The product of these distributions is:

p(y(t), w) = N (y(t); Φ(t)w, Σ ϵ)N (w; µw, Σw). (5.21)

5.2. LEARNING THE PARAMETERS OF THE WEIGHT DISTRIBUTION 99

To compute p(y(t)), we marginalize over w:

Z

p(y(t)) = p(y(t), w)dw. (5.22)

Substituting the product form:

Z

p (y(t)) = N (y(t); Φ(t)w, Σ ϵ)N (w; µw, Σw )dw. (5.23)

This integral can be evaluated directly because the product of Gaussians is Gaussian. Specifically, the marginal distribution of y(t) is another Gaussian with:

• Mean: Φ(t)µw,

• ⊤ Covariance: Φ ( t ) Σ w Φ ( t ) + Σϵ.

Eq. (5.17) is then obtained using this mean and covariance.

This result captures both the mean and variance of the observed trajectory y(t), com-bining contributions from the weight distribution and the noise model. The term Φ(t)µ w represents the mean trajectory, while ⊤ Φ ( t ) Σ Φ ( t ) + Σ encapsulates the variability due

w ϵ

to weight uncertainty and measurement noise.



5.2 Learning the Parameters of the Weight Distribu-

tion

In general, ProMPs are learned from multiple demonstrations. The diversity of demon-strations should reveal the variance of the task and the uncertainty of the execution.

The parameters of the weight distribution, µ w and Σw, are learned from a set of N N demonstrated trajectories { y } . Each trajectory y (t), including both position

i i=1 i

and velocity, is represented as a set of observations T i { y ( t ) } over discrete time steps

i j j=1

t 1, t2, . . . , tTi . Each observation yi(tj ) is defined as:

" #

yi (tj )

yi (tj) = , (5.24)

y ˙i(tj )

where yi(tj) and ˙ yi(tj ) are the position and velocity at time tj , respectively. Note that the durations of the demonstrations, Ti, are not necessarily constant.

For each demonstration, yi(t) is vectorized into Yi:

  yi ( t1 )

  y ( t )

Y   i 2 = (5.25) ...

i   ,

 

 

yi (tTi )

100 CHAPTER 5. PROBABILISTIC MOVEMENT PRIMITIVES (PROMPS)

where 2Ti Y i R ∈ . This vectorization enables a consistent formulation for parameter estimation, incorporating both position and velocity at each time step.

The basis function matrix Φ(t) incorporates both position and velocity contributions. Its 2 × n structure is defined as:

" #

ϕ1 (t) ϕ2(t) . . . ϕn(t)

Φ(t) = , (5.26) ϕ ˙ ˙ ˙

1 2 n ( t ) ϕ ( t ) . . . ϕ(t)

where ϕi(t) is the i-th Gaussian basis function evaluated at time t and ˙ ϕi(t) is its time derivative. To concatenate these basis functions over all time steps of a demonstration, the full concatenated matrix Φ is constructed as:

  Φ ( t )

1

  Φ ( t )

Φ =   2 (5.27) ...

  ,

 

 

Φ (t Ti )

where 2Ti ×n Φ ∈ R represents the basis function contributions for all positions and veloci-ties across the entire demonstration.

To estimate µw and Σw, we first compute the weights for each demonstration by solving the regularized least-squares problem:

Ti

w X 2 2 = arg min ∥ y ( t ) − Φ ( t ) w ∥ + λ ∥ w ∥ , (5.28)

i i j j

w

j=1

where λ is a regularization parameter. The closed-form solution is given by:

w ⊤ −1 ⊤ = ( Φ Φ + λ I ) ΦY , (5.29)

i i

where Φ is the concatenated matrix of basis functions evaluated at all time steps, and Y i is the vectorized trajectory for the i-th demonstration, including both position and velocity.

During this process, it is often necessary to normalize the trajectories to ensure a con-sistent time scale across demonstrations. This normalization adjusts each demonstration to a common duration T norm using a mapping from the original time scale [0, Ti] to the normalized time scale [0, Tnorm]. The normalized time parameter τ (t) is defined as:

t

τ (t) = Tnorm, (5.30)

Ti

where t ∈ [0, Ti] represents the original time scale. The normalized basis functions and their derivatives are then computed using τ(t) to ensure compatibility across varying durations.

5.2. LEARNING THE PARAMETERS OF THE WEIGHT DISTRIBUTION 101

Once the weights N { w } are computed, the mean and covariance are estimated as:

i i=1

1 N

µw = X wi, (5.31)

N

i=1

1 N

Σ X ⊤ = ( w − µ )( w − µ ). (5.32)

w i w i w N

i=1

5.2.1 Motion Reproduction

During reproduction, a new trajectory is generated by sampling weights from the learned distribution:

w sampled ∼ N (µw, Σw ). (5.33)

The trajectory is then computed as:

y(t) = Φ(t)wsampled, (5.34)

where Φ(t) accounts for both position and velocity contributions.

Fig. 5.1 shows how the ProMP model captures and reproduces motion trajectories based on the training data. The training data highlights the variability across multiple demonstrations, showing the inherent differences in the motion. The reproduced ProMPs illustrate the model’s ability to learn the central tendency of the motion, depicted as the mean trajectory, while simultaneously encoding the variability through the 1-sigma interval. This probabilistic representation not only captures the typical motion pattern





Figure 5.1: The upper row graphs show the data for training a ProMP and the lower row the reproduced ProMPs (red curves) with 1-sigma interval (green curves).





102 CHAPTER 5. PROBABILISTIC MOVEMENT PRIMITIVES (PROMPS)





Figure 5.2: The left graph shows the position obtained by integrating mean velocity of a ProMP and the variance of training data. The right graph shows the mean velocity and its variance computed by the ProMP.



but also provides a clear understanding of the uncertainty, showcasing the model’s strength in generalizing to similar trajectories while maintaining robustness to variability in the data. The graphs confirm that the ProMP implementation accurately reflects the desired probabilistic properties of the motion.

ProMPs are designed to encode and reproduce not only position trajectories but also velocity profiles, ensuring a comprehensive representation of motion dynamics, as shown in Figure 5.2. By capturing both the mean velocity and its variability, ProMPs enable robots to replicate the nuances of demonstrated motions with greater accuracy. This ability to model velocity alongside position is crucial for tasks that require precise dynamic exe-cution, such as manipulation or interaction in variable environments, where both spatial and temporal aspects of motion are essential for success. This and the previous example were generated using the code from [7].

To ensure that the reproduced trajectory conforms to the desired duration Treproduction, we align the time scaling with the normalized duration Tnorm. The time parameter t for reproduction is scaled as:

t −1 = τ(τ (t)), (5.35)

where t τ ( t ) = T is the normalized time scale. The basis functions and their

T norm reproduction

derivatives are then evaluated as:

2 ( τ ( t ) − c )

ϕ i ( τ ( t )) = exp − , (5.36)

i 2 2 h





i


˙ τ ( t ) − c

ϕ i ( τ ( t )) = ˙ τ ( t ) − ϕ (τ(t)) , (5.37)

i i 2 h





i


where ˙ T norm τ ( t ) = is the scaling factor for the time derivative. This ensures that

Treproduction

the reproduced trajectory adheres to the desired temporal constraints while retaining the learned variability and central tendencies of the demonstrations.

5.3. ENSURING SPECIFIC CONFIGURATION AT A GIVEN TIME 103

5.3 Ensuring Specific Configuration at a Given Time

In robotic applications, it is often required for the trajectory to achieve a specific config-uration yg (e.g., a goal or intermediate state) at a given time tg. ProMPs enable this by conditioning the weight distribution on the desired configuration.



5.3.1 Conditioning on a Specific Configuration

To enforce a specific configuration yg at time tg , we impose a probabilistic constraint within the ProMP framework. The conditioned distribution over the weights w is com-puted as (see also Section 5.3.4):

p(w|y(tg) = yg ) ∝ p(y(tg) = yg |w)p(w). (5.38)

The likelihood of the desired configuration given the weights is Gaussian:

p(y(tg) = yg |w) = N (yg; Φ(tg)w, Σϵ), (5.39)

where Φ(tg ) is the basis function matrix evaluated at tg, and Σϵ represents the observation noise.

Combining this with the prior distribution over weights, p(w) = N (w; µw, Σw ), the conditioned distribution p(w|y(tg) = yg) is Gaussian with:

µcond = µ + G(y − Φ(t )µ ), (5.40)

w w g g w

Σcond = Σ GΦ(t )Σ − , (5.41)

w w g w

where the gain matrix G is defined as:

G ⊤ ⊤ −1 = Σ Φ ( t ) ( Φ ( t ) Σ Φ ( t ) + Σ ). (5.42)

w g g w g ϵ



5.3.2 Detailed Derivation of Conditioned Mean and Covariance

The conditioning process in ProMPs leverages properties of multivariate Gaussian distri-butions. Given a prior Gaussian distribution over weights:

p(w) = N (w; µ w , Σw), (5.43)

and a probabilistic observation model:

p(y(tg) = yg |w) = N (yg; Φ(tg)w, Σϵ), (5.44)

the posterior distribution p(w|y(tg) = yg) can be derived using the Gaussian conditioning formula.

104 CHAPTER 5. PROBABILISTIC MOVEMENT PRIMITIVES (PROMPS)

To compute the posterior distribution p(w|y(tg) = yg), we use the fact that the product of two Gaussian distributions is another Gaussian, up to a normalization factor. Specifically:

p(w|y(tg) = yg ) ∝ p(y(tg) = yg |w)p(w). (5.45)

Substituting the Gaussian forms:

p(y(tg) = yg |w) = N (yg; Φ(tg)w, Σϵ), (5.46)

p(w) = N (w; µ w , Σw), (5.47)

the product of these distributions yields:

p cond cond ( w | y ( t ) = y ) = N ( w ; µ , Σ), (5.48)

g g w w

where the conditioned mean and covariance are given by:

µcond = µ + G(y t − Φ ( )µ ), (5.49)

w w g g w

Σcond = Σ − GΦ(t )Σ . (5.50)

w w g w

The conditioned mean cond µ w w µ is a combination of the prior mean and a correction term. This correction term adjusts the prior mean based on the discrepancy between the desired configuration y g and the predicted configuration Φ(tg)µw , weighted by the uncertainties in the prior and observation models. The conditioned covariance cond Σ is

w

the prior covariance Σ w reduced by a term proportional to the information gain from the observation at t g. This reflects the fact that the uncertainty in w decreases as more constraints are added.

5.3.3 Reproducing the Conditioned Trajectory

Once the conditioned distribution p(w|y(tg ) = yg) is computed, a new trajectory can be generated. A sample of weights wsampled is drawn from the conditioned distribution:

w cond cond ∼ N ( µ , Σ). (5.51)

sampled w w

The trajectory is then computed as:

y(t) = Φ(t)wsampled, (5.52)

where the reproduced trajectory respects the specified configuration yg at time tg while maintaining the natural variability of the learned ProMP.

Fig. 5.3 shows trajectory generation with ProMPs given different initial conditions for the beginning of motion. On the left, the training data consists of multiple trajec-tory demonstrations highlighting the variability in execution. The demonstrations exhibit 5.3. ENSURING SPECIFIC CONFIGURATION AT A GIVEN TIME 105

significant variance, particularly in the middle and end segments of the trajectories, illus-trating the inherent variability in human or robot motion. On the right, the generated trajectories represent the mean trajectories computed by conditioning ProMPs on the specified initial points.

ProMPs enable such conditioning by leveraging their probabilistic nature, where the underlying distribution of trajectory weights is updated based on the given initial con-ditions. This process ensures that the generated trajectories not only start from the specified points but also conform to the variability and patterns observed in the training data. The conditioned trajectories smoothly transition from the specified initial states, reflecting the probabilistic correlations encoded in the learned model.

The ability to condition trajectories makes ProMPs particularly useful for tasks requir-ing adaptability to changes in the starting configuration. For example, in applications like pick-and-place operations, where the robot’s initial position might vary due to dynamic workspace constraints, ProMPs can generate appropriate motion plans that adhere to the statistical properties of the demonstrated trajectories. This adaptability is achieved with-out the need for re-learning or extensive recalibration, which is a significant advantage over deterministic approaches.

Moreover, the right-hand plot demonstrates the smoothness and consistency of the conditioned trajectories, ensuring that the robot’s motion remains natural and feasible. This smooth adaptation is crucial in collaborative scenarios, such as human-robot inter-action, where the robot must seamlessly adjust its actions based on the human partner’s behavior. The probabilistic framework of ProMPs enables such adaptability while pre-serving the natural variability and robustness of the learned motion.

Overall, Fig. 5.3 illustrates the core strengths of ProMPs: their ability to capture





Figure 5.3: The left graph shows the training data, the mean trajectory and the variance in the training data. The right graph shows the computed mean trajectories given the initial condition and the associated variances.

106 CHAPTER 5. PROBABILISTIC MOVEMENT PRIMITIVES (PROMPS)

the variability of demonstrated motions, adapt to task-specific constraints, and generate trajectories that balance naturalness and task requirements. This capability underscores ProMPs’ suitability for dynamic and uncertain environments where flexibility and robust-ness are paramount.

5.3.4 Bayesian Basis for Conditioning

The conditioning operation described above is rooted in Bayes’ theorem. Bayes’ theorem states that:

p(w|y(tg) = yg ) ∝ p(y(tg) = yg |w)p(w). (5.53)

Here, p(w) is the prior distribution over the weights, representing what has been learned from demonstrations, and p(y(tg) = yg|w) is the likelihood of the desired configuration yg given these weights. The posterior distribution p(w|y(tg) = yg) combines these terms to yield a new distribution over weights that respects both the prior knowledge and the imposed constraint. This Bayesian formulation ensures consistency between the learned ProMP and the specified configuration, allowing the system to balance prior variability with task-specific requirements.



5.4 Summary and Discussion

Probabilistic Movement Primitives (ProMPs) offer a versatile and robust framework for learning, modeling, and reproducing motion trajectories in robotics. By employing a probabilistic representation, ProMPs capture the variability and uncertainty inherent in motion demonstrations, making them particularly suitable for tasks requiring adaptability and multimodal trajectory representations. The foundational work by Paraschos et al. [24, 25] laid the theoretical groundwork for ProMPs, introducing a probabilistic extension of traditional movement primitives that enables both deterministic reproduction of mean trajectories and stochastic sampling for diverse trajectory generation.

Compared to other frameworks such as Dynamic Movement Primitives (DMPs) and the Stable Estimator of Dynamical Systems (SEDS), ProMPs excel in capturing proba-bilistic variations and ensuring adaptability. DMPs combine accurate trajectory reproduc-tion with attractor dynamics, ensuring convergence to a desired goal even in the presence of perturbations, but rely on fixed attractor points and deterministic formulations. SEDS emphasizes stability and robustness through Lyapunov functions, guaranteeing global con-vergence to attractors but often lacks explicit modeling of variability. ProMPs, on the other hand, prioritize capturing the probabilistic nature of motion, making them ideal for tasks requiring flexibility and the ability to encode multimodal trajectory distributions. However, ProMPs do not inherently ensure stability, which might require additional con-straints or mechanisms in specific applications.

The probabilistic framework of ProMPs leverages a Gaussian weight distribution to encode variability observed in demonstrations. Using a weighted combination of basis 5.4. SUMMARY AND DISCUSSION 107

functions parameterized by either time or a phase variable, ProMPs provide a robust mechanism for modeling trajectories of varying durations while maintaining consistency. Learning involves estimating the mean and covariance of the weight distribution from multiple demonstrations, capturing both the central tendency and variability in the ob-served data. The use of Bayesian conditioning enables task-specific constraints, such as achieving specific configurations, while preserving the natural variability of motion.

ProMPs have demonstrated significant potential in human-robot interaction and col-laborative tasks, as highlighted in the works of Maeda et al. [21, 20]. These extensions, referred to as interaction primitives, leverage the probabilistic framework to model and predict the motion of interacting agents. ProMPs have been successfully applied to co-ordinate multiple human-robot tasks, enabling robots to anticipate human actions and adjust their behavior in real-time. Similarly, they have shown promise in robot-directed imitation learning tasks [19], reproducing complex behaviors demonstrated by humans and generalizing from limited examples.

The choice between ProMPs, DMPs, and SEDS depends on the specific requirements of the task. For applications demanding deterministic and stable motion generation, DMPs or SEDS may be preferred. However, for tasks requiring adaptability, probabilistic reasoning, and the ability to model multimodal distributions, ProMPs provide a com-pelling solution. Furthermore, ProMPs’ extension to interaction primitives highlights their versatility in multi-agent and collaborative scenarios, enabling coordinated motion and real-time adaptability.

In summary, ProMPs provide a powerful tool for modern robotic motion generation, combining flexibility, adaptability, and the ability to represent probabilistic variations. Their applicability extends to various robotics tasks, including human-robot collaboration, manipulation, and complex task planning, making them a cornerstone in robotic motion representation methodologies.



108 CHAPTER 5. PROBABILISTIC MOVEMENT PRIMITIVES (PROMPS)





Chapter 6




Motion Representations and Robot

Learning



Robot motion representation stands at the core of both control and learning in robotics. In conclusion, we reflect on the major representations discussed in this textbook – splines, Dynamic Movement Primitives (DMPs), Stable Estimator of Dynamical Systems (SEDS), and Probabilistic Movement Primitives (ProMPs) – and examine their significance in imitation learning, robot learning, especially in reinforcement learning (RL) and deep reinforcement learning (DRL).

Modern robotics demands flexible, adaptive, and robust representations that can ac-commodate the diverse requirements of robot motion. Polynomials and splines, as dis-cussed, form the foundation of trajectory planning. Their ability to model smooth, con-tinuous trajectories with explicit control over derivatives makes them an excellent tool for precision tasks like pick-and-place operations or industrial machining. However, their static nature poses challenges in dynamic and uncertain environments. While spline-based representations excel in predefined settings, they lack the adaptability required for tasks that involve external perturbations or unstructured interactions. The advancements in robot control and machine learning have necessitated motion representations that go beyond such static paradigms.

DMPs emerged to address challenges in robot motion generation by introducing a dynamic, goal-oriented framework that encodes trajectories as solutions to differential equations with attractor dynamics. This ensures robust convergence to a desired goal, making DMPs ideal for tasks requiring resilience to disturbances or variations in start-ing and end conditions. Their simplicity and scalability to high-dimensional tasks have solidified their role in robotic control. However, their deterministic formulation, while advantageous for precise goal-oriented applications, limits their capacity to represent and generalize the variability inherent in real-world demonstrations.

To address this limitation, extensions to DMPs have integrated statistical learning techniques, enabling them to capture trajectory variability and generalize across the train-ing data. Using statistical learning techniques such as locally weighted regression, DMPs

109

110 CHAPTER 6. MOTION REPRESENTATIONS AND ROBOT LEARNING

can interpolate or extrapolate to generate trajectories for unseen queries, thus adapting to novel configurations or task conditions. This approach also supports probabilistic reason-ing, allowing robots to account for uncertainty in trajectory execution. Such capabilities are particularly beneficial in dynamic environments, including human-robot collabora-tion, where DMPs enhanced by statistical learning can adaptively and robustly respond to variability in human actions or task requirements, bridging classical control with mod-ern data-driven frameworks [2, 17].

SEDS offers an alternative by emphasizing global stability through Lyapunov func-tions. This framework is rooted in the mathematical rigor of stability theory. They ensure convergence to attractors across a wide range of initial conditions. SEDS is particularly well-suited for tasks where safety and reliability are paramount, such as human-robot interaction or autonomous navigation in cluttered environments. The use of Gaussian Mixture Models (GMMs) enables SEDS to capture the variability of demonstrated mo-tions while maintaining stability. However, the trade-off between stability and motion reproduction fidelity can constrain its use in tasks requiring precise trajectory matching. Furthermore, the computational complexity associated with ensuring Lyapunov stability can be a limiting factor in high-dimensional settings.

ProMPs represent a significant evolution in motion representation, prioritizing the probabilistic nature of trajectories. By modeling motion as a distribution over trajecto-ries, ProMPs capture both the central tendency and the variability of demonstrations. This capability makes them particularly suitable for tasks requiring adaptability, such as human-robot collaboration or manipulation under uncertainty. The use of Bayesian inference allows ProMPs to incorporate task-specific constraints seamlessly, enabling ap-plications in dynamic and multimodal environments. Their flexibility and probabilistic foundation position them as a versatile tool in modern robotics.

The broader context of robot control and learning places these representations within the frameworks of reinforcement learning (RL) and deep reinforcement learning (DRL). RL, with its emphasis on optimizing policies through trial-and-error interactions, benefits significantly from representations like DMPs and ProMPs. DMPs provide a structured way to initialize RL policies, leveraging their dynamic attractor properties to explore goal-oriented behaviors effectively. ProMPs, on the other hand, enable RL agents to model uncertainty and adapt policies to stochastic environments. Their probabilistic nature aligns well with the exploration-exploitation trade-offs inherent in RL. For example, the structured representations provided by ProMPs can reduce the sample complexity of learning policies in tasks requiring generalization across varying conditions [28].

In DRL, where the focus shifts to high-dimensional state and action spaces, motion representations play an even more crucial role. DMPs and SEDS, with their ability to encode complex trajectories and ensure stability, respectively, serve as valuable tools for embedding domain knowledge into neural network architectures. Moreover, incorporat-ing ProMP-based priors into DRL models can accelerate convergence by guiding policy exploration towards regions of the state space with high demonstration density. Similarly,

111

SEDS-inspired stability constraints can enhance the robustness of DRL policies, ensuring safe interactions in physical environments [18, 10].

Furthermore, motion representations can serve as a bridge between model-based and model-free approaches in RL. While model-free methods rely on data-driven exploration, model-based methods benefit from the structured representations offered by DMPs, SEDS, and ProMPs. These representations provide a way to encode prior knowledge about dy-namics, reducing the sample complexity of learning. For example, in robot manipulation tasks, ProMPs can represent a family of feasible trajectories, allowing RL algorithms to focus on fine-tuning policies rather than discovering them from scratch.

The interplay between motion representations and learning extends to imitation learn-ing and policy transfer. Techniques like guided policy search and behavior cloning rely on theoretically sound motion representations to encode demonstration data. The ability to transfer knowledge across tasks and environments depends on the expressiveness and adaptability of the chosen representation.

In conclusion, the trajectory representations discussed in this textbook offer a wide spectrum of tools for addressing the diverse challenges of robot motion. From the precision of splines to the adaptability of ProMPs, each framework contributes unique strengths to the broader landscape of robotics. The integration of these representations with im-itation learning, reinforcement learning, and deep learning paradigms underscores their relevance in advancing robot autonomy. As robotics continues to evolve, the synergy be-tween control, learning, and motion representation will play a pivotal role in shaping the capabilities of future robotic systems.



112 CHAPTER 6. MOTION REPRESENTATIONS AND ROBOT LEARNING



Bibliography



[1] C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. AI Review,

11:11–73, 1997.

[2] S. Chernova and A. Thomaz. Robot learning from human teachers. Synthesis Lectures

on Artificial Intelligence and Machine Learning, 8(3):1–121, 2014.

[3] S. Chiaverini and B. Siciliano. The unit quaternion: A useful tool for inverse kine-

matics of robot manipulators. System Analysis, Modelling and Simulation, 35:45–60, 1999.

[4] A. Coulombe and H.-C. Lin. Generating stable and collision-free policies through

Lyapunov function learning. In IEEE International Conference on Robotics and Automation (ICRA), pages 3037–3043, London, UK, 2023.

[5] C. de Boor. A Practical Guide to Splines; Revised Edition. Applied Mathematical

Sciences 27. Springer, New York, 2001.

[6] R. L. Eubank. Spline Smoothing and Nonparametric Regression. Marcel Dekker,

New York, 1988.

[7] A. Fabisch. movement primitives: Imitation learning of cartesian motion with move-

ment primitives. Journal of Open Source Software, 9(97):6695, 2024.

[8] A. Gams, A. J. Ijspeert, S. Schaal, and J. Lenarˇciˇc. On-line learning and modula-

tion of periodic movements with nonlinear dynamical systems. Autonomous Robots, 27(1):3–23, 2009.

[9] E. Gribovskaya, S. M. Khansari-Zadeh, and A. Billard. Learning non-linear multi-

variate dynamics of motion in robotic manipulators. The International Journal of Robotics Research, 30(1):80–117, 2011.

[10] A. Gupta, K. Zhang, and H.-C. Lin. Stability-aware reinforcement learning for robotic

systems. IEEE Transactions on Robotics, 39(3):567–589, 2023.

[11] C. L. Hu and L. L. Schumaker. Univariate complete smoothing. Numerische Math-

ematik, 49(1):1–10, 1986.

113

114 BIBLIOGRAPHY

[12] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal. Dynamical move-

ment primitives: learning attractor models for motor behaviors. Neural Computation, 25(2):328–373, 2013.

[13] A. J. Ijspeert, J. Nakanishi, and S. Schaal. Learning rhythmic movements by demon-

stration using nonlinear oscillators. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, pages 958–963, Lausanne, Switzerland, 2002.

[14] A. J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation with nonlinear

dynamical systems in humanoid robots. In Proc. IEEE Int. Conf. Robotics and Automation , pages 1398–1403, Washington, DC, 2002.

[15] S. M. Khansari-Zadeh and A. Billard. Learning stable non-linear dynamical systems

with gaussian mixture models. IEEE Transactions on Robotics, 27(5):943–957, 2011.

[16] S. M. Khansari-Zadeh and A. Billard. Learning control Lyapunov function to en-

sure stability of dynamical system-based robot reaching motions. Robotics and Au-tonomous Systems, 62(6):752–765, 2014.

[17] J. Kober, J. A. Bagnell, and J. Peters. Reinforcement learning in robotics: A survey.

The International Journal of Robotics Research , 32(11):1238–1274, 2013.

[18] P. Kormushev, S. Calinon, and D. G. Caldwell. Learning adaptive motor skills: A

survey of techniques for robotics. Advanced Robotics, 34(9):555–573, 2020.

[19] G. J. Maeda, M. Ewerton, M. Goldhoorn, J. Kr¨ uger, and J. Peters. Probabilis-

tic movement primitives for robot-directed imitation learning. Autonomous Robots, 41(3):593–612, 2017.

[20] G. J. Maeda, M. Ewerton, D. Koert, and J. Peters. Probabilistic movement primi-

tives for coordination of multiple human-robot collaborative tasks. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 276–284, 2017.

[21] G. J. Maeda, M. Ewerton, and J. Peters. Learning interaction for collaborative tasks

with probabilistic movement primitives. In Robotics: Science and Systems (RSS), 2016.

[22] R. M. Murray, Z. Li, and S. S. Sastry. A Mathematical Introduction to Robotic

Manipulation. CRC Press, Boca Raton, New York, 1994.

[23] D. Nguyen-Tuong, M. Seeger, and J. Peters. Model learning with local Gaussian

process regression. Advanced Robotics, 23:2015–2034, 2009.

[24] A. Paraschos, C. Daniel, J. Peters, and G. Neumann. Probabilistic movement prim-

itives. In Advances in Neural Information Processing Systems (NeurIPS), pages 2616–2624. Curran Associates, Inc., 2013.

BIBLIOGRAPHY 115

[25] A. Paraschos, C. Daniel, J. Peters, and G. Neumann. Using probabilistic movement

primitives in robotics. Autonomous Robots, 42(3):529–551, 2018.

[26] P. Pastor, L. Righetti, M. Kalakrishnan, and S. Schaal. Probabilistic movement

primitives. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 365–371, San Francisco, California, 2013.

[27] T. Petriˇc, A. Gams, A. J. Ijspeert, and L. ˇ Zlajpah. On-line frequency adaptation

and movement imitation for rhythmic robotic tasks. The International Journal of Robotics Research, 30(14):1775–1788, 2011.

[28] H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard. Recent advances

in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems, 3(1):297–330, 2020.

[29] L. Righetti, J. Buchli, and A. J. Ijspeert. Dynamic Hebbian learning in adaptive

frequency oscillators. Physica D, 216:269–281, 2006.

[30] L. Rozo, J. Silv´erio, S. Calinon, and D. G. Caldwell. Learning controllers for reactive

and proactive behaviors in human-robot collaboration. Frontiers in Robotics and AI, 3:30, 2016.

[31] S. Schaal, P. Mohajerian, and A. Ijspeert. Dynamics systems vs. optimal control – a

unifying view. Progress in Brain Research, 165(6):425–445, 2007.

[32] S. Schaal, J. Peters, J. Nakanishi, and A. Ijspeert. Learning movement primitives.

In P. Dario and R. Chatila, editors, Robotics Research: The Eleventh International Symposium, pages 561–572, Berlin, Heidelberg, 2005. Springer.

[33] S. E. Thompson and R. V. Patel. Formulation of joint trajectories for industrial

robots using B-splines. IEEE Trans. Industrial Electronics, 34(2):192–199, May 1987.

[34] A. Ude. Filtering in a unit quaternion space for model-based object tracking. Robotics

and Autonomous Systems, 28(2-3):163–172, 1999.

[35] A. Ude, A. Gams, T. Asfour, and J. Morimoto. Task-specific generalization of dis-

crete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5):800–815, 2010.

[36] A. Ude, B. Nemec, T. Petriˇc, and J. Morimoto. Orientation in cartesian space

dynamic movement primitives. In IEEE International Conference on Robotics and Automation (ICRA), pages 2997–3004, Hong Kong, 2014.

[37] G. Wahba. Spline Models for Observational Data. SIAM, Philadelphia, 1990.

[38] J. Yuan. Closed-loop manipulator control using quaternion feedback. IEEE Journal

of Robotics and Automation, 4(4):434 –440, 1988.