Informatica 40 (2016) 311–316 311
  
Using Mixed Reality and Natural Interaction in Cultural Heritage 
Applications 
Raffaello Brondi, Marcello Carrozzino, Cristian Lorenzini and Franco Tecchia 
Laboratorio Percro of Scuola Superiore Sant’Anna  
Via Alamanni 13, San Giuliano Terme (PI) - Italy 
E-mail: r.brondi, m.carrozzino, c.lorenzini, f.tecchia@sssup.it 
 
Keywords: natural interaction, mixed reality, virtual reality, cultural heritage, museum application 
Received: June 27, 2016 
 
In this paper, we present a general architecture for Mixed Reality applications. The proposed solution 
has been developed in order to provide a useful instrument to develop Cultural Heritage applications. 
During the design of the system, particular attention was given to intangible knowledge, such as manual 
activities, performing arts, lost civilizations habits, representing a particular heritage poorly addressed 
by previous studies. The system aims at providing an easy and engaging infrastructure to develop 
immersive application to be used for communication / dissemination and education purposes. The 
proposed architecture exploits Natural User Interfaces solutions as interaction metaphor between the 
Virtual Environment and the user. Natural interaction in fact provides high sense of presence and 
immersion to the user, improving the user engagement and fostering the learning process. 
The paper presents also two case studies, where two different applications aimed at teaching and 
disseminating crafts knowledge, in particular printmaking and weaving, have been developed on top of 
the presented architecture. 
Povzetek: Predstavljena je mešana resničnost za namene elektronske kulturne dediščine. 
 
1 Introdction 
Virtual Reality (VR) applications create interactive 
environments in which the observer feels totally 
immersed: users can move or interact in a completely 
synthetic world [1]. Differently, in Augmented Reality 
(AR) applications the digital content is integrated into the 
real environment [2]. VR and AR technologies are 
becoming extremely popular and are used nowadays to 
implement many kind of applications in several different 
fields: military, medicine, education, visualization, 
entertainment, etc…. These two technologies represent 
two different expressions of a common family of 
technologies and applications falling under the definition 
of Mixed Reality (MR). Milgram and Kishino [3] 
theorized the concept of “virtuality continuum” in order 
to create a classification able to describe this concept. 
They placed on one extreme of the continuum the real 
world and on the other side a completely virtual world. 
The space of virtuality continuum lying between the two 
extremes, all the technologies and applications, 
represents different flavors of Mixed Reality (MR), 
including AR (real environments augmented with virtual 
contents) and Augmented Virtuality (virtual 
environments augmented with real contents).  Several 
researches have focused on Mixed — rather Virtual or 
Augmented — Environments, how their adoption can 
affect several aspects of the user experience and how at 
the same time they are affected by the new arising 
technologies. 
VR and MR applications in the context of Cultural 
Heritage (CH) are nowadays gaining an increasing 
consent, for a variety of purposes including digital 
conservation (reconstructing artwork damaged or 
destroyed) [4][5], for validation of scientific hypotheses 
in archeological reconstructions [6] and for education 
[7]. At the same time the recent spreading of depth 
sensors together with sensorised controllers is nowadays 
shaping the way we interact with the Virtual 
Environments (VE). Natural User Interfaces1 (NUIs) are 
becoming more and more popular, and new richer 
interaction metaphors can be designed in order to 
improve the engagement and sense of presence of the 
users providing a completely new experience [8]. NUIs 
enabling visitors to be physically and emotionally 
involved during a virtual experience are becoming 
popular also in the Cultural Heritage context [9]. 
In this paper we present a general architecture for 
Mixed Reality systems that can be used to provide an 
immersive experience to Cultural Heritage visitors. The 
                                                          
1 Natural User Interface is a term used to identify 
human-computer interactions based on typical inter-
human communication. These interfaces allow 
computers to understand the innate human means of 
interaction (e.g. voice and gestures) and do not require 
humans to "learn" the language of computers (e.g., 
keyboard and mouse). 
312 Informatica 40 (2016) 311–316  R. Brondi et al. 
proposed solution can be used both for dissemination 
purposes, as it enhances the engagement of the user, and 
for training/teaching activities. In particular such systems 
results extremely effective when trying to transmit 
intangible knowledge, as like as craftsmanship, since it 
allows the visitors to physically emulate the proposed 
actions. Moreover the system can be used both with 
static prerecorded material and with real-time capture of 
the real context. 
2 State of the art 
As above mentioned, differently from VR, MR mixes 
“synthetic” and “real” information making them 
coexisting in the same environment. Most of the 
applications developed in the context of CH are based on 
Augmented Reality. Among the first AR applications 
developed in the CH field is ARCHEOGUIDE [10]. 
Using a Head Mounted Display (HMD) visitors of 
archaeological sites can see virtual reconstructions of the 
temples and other monuments directly superimposed on 
the real ruins. Nowadays, thanks to the ubiquitous 
networking availability and the technological progresses 
in mobile computing, the ARCHEOGUIDE concept has 
been further developed. New research is focusing on 
mobile devices as a gateway to provide augmented 
cultural content everywhere [11]. 
Other AR applications developed in the same field 
aim at providing new ways of interaction between 
visitors and artworks inside museums. This 
“augmentation” of the real-world environment can lead 
to an intuitive access to museum information and 
enhances the impact of the museum exhibition on virtual 
visitors [12]. Wojciechowski et al. [13] developed an 
AR-system composed by an authoring tool and an AR-
browser. Using the former instrument, museum 
superintendents can design Virtual and Augmented 
Reality exhibitions. Through the AR-browser, installed 
for example in a kiosk, visitors can see the 
representations of cultural objects overlaid on the video 
captured by a camera. Similarly Chen et al. [14] 
proposed a new AR guidance system for museums based 
on markers. ARCube [15] exploits a 3D marker to enable 
360° interaction with fully reconstructed three-
dimensional archaeological artefacts in real-world 
contexts. Debenham et al. [16] developed an AR-system 
used inside the Natural History Museum in London 
which provides visitors with augmented contents through 
hand-held displays in order to enable an exciting new 
way to present the evolutionary history of our planet. In 
[17] Augmented Reality has been used instead to 
improve the work of the restorers and promote 
communication and cooperation between them.  
The great success of AR in the Cultural Heritage 
context is mainly related to the fact that they provide an 
easy, engaging and friendly way to access information 
related to a particular asset, commonly by keeping the 
cultural asset in the foreground and enriching its images 
with digital content. When dealing with intangible assets, 
such as performing arts, manual activities, lost 
civilizations habits, there is not a real concrete object to 
augment. This kind of evanescent knowledge requires a 
deeper usage of virtual components [18] because the real 
part is not physically present or is not always available.  
All MR solutions usually needs to merge 3D (live or 
recorded) information coming from the real environment 
with a 3D synthetic environment as smoothly as possible. 
By using immersive displays like HMDs, the user 
experience can be further enhanced. Tecchia et al. in 
their work [19] proposed a HMD visualization system 
including the real time stream of 3D images of user's 
hands recorded with a depth-camera. The system makes 
use of two colored markers placed on top of the user 
hands to enable a basic interaction with the VE. The 
depth sensor put on top of the HMD was in charge of 
recording the peri-personal space and in particular the 
user hands. The acquired stream was used to recreate a 
representation of the user inside the VE. Moreover, using 
the combination of RGB and depth information coming 
from the camera, the system recognizes the fingers 
movements in the environment allowing the user 
interaction with the environment (e.g. virtual object 
pinching). 
The metaphors of interaction enabled inside MR 
applications, from completely Virtual to Augmented 
ones, represent another extremely important design 
aspect. Specific solutions impact differently on the sense 
of Presence, Immersion and Engagement of the user. In 
the context of CH applications, improving these factors 
can enhance the impact of a dissemination application. It 
becomes even more important when the aim of the 
application is learning or training. Safeguarding and 
passing over skills and intangible cultural heritage 
features is the subject of several experiments, as well as 
large research projects [20][21]. Carrozzino et al. [9] 
argued that Immersive VEs combined with natural 
interaction would provide a powerful solution to develop 
a system to transfer practical skills.  
In the last years several researchers and 
technological industry leaders have focused on the 
development of different solutions enabling a smooth and 
simple natural interaction of the user inside the VE. Most 
of the efforts so far are focusing on hand tracking 
solutions [22][23][24]. The Leap Motion Controller 
represents one of the latest technological products 
created to enable user natural interaction inside the VE 
based on hand tracking/gestures. It is gaining a lot of 
popularity due to the ease of use and tracking 
performance achieved with the latest updates. This 
device can be easily integrated in any VR/MR 
application in order to allow users to see and interact 
with the VE with their own hands. Coupling the Leap 
Motion controller with an HMD (e.g. the Oculus Rift or 
HTC Vive) provides developers with an extremely 
powerful and relatively cheap VR/MR solution that can 
be used in many Cultural Heritage contexts to provide 
extremely engaging experiences to the users. 
Given these premises, the presented architecture 
aims at providing an easy way to develop MR 
applications, exploiting the capabilities of HMDs and 
immersive displays in general, coupled with devices, like 
the Leap Motion, able to track user hands in order to 
Using Mixed Reality and Natural Interaction in... Informatica 40 (2016) 311–316 313 
create engaging interactive applications in the context of 
CH. 
3 The architecture 
The proposed system has been designed and realized on 
top of the XVR technology [25], an in-house made VR-
oriented framework offering a graphics engine for the 
real-time visualization of complex three-dimensional 
models and the support to a wide range of VR devices 
(such as trackers, motion capture devices, stereo 
projection systems and HMDs). XVR applications are 
developed using a dedicated scripting language whose 
constructs and commands are targeted to VR, including 
the support to 3D animation, 3D sound effects, audio and 
video streaming and advanced user interaction. This 
choice has allowed a good flexibility in terms of support 
of hardware devices and ease of developing dedicated 
software add-ons able to expand the capabilities of the 
framework. Figure 1 gives an overview of the developed 
system which can be divided in three main parts. A low 
level infrastructure based on XVR and responsible for 
the direct interaction with the visualization system. This 
component is in charge of managing not only the 
stereoscopic rendering on the immersive displays but 
also the VE update according to the tracking 
technologies of the visualization system used. This 
system element allows applications developed on the 
proposed system to run on the latest HMDs, the Oculus 
Rift and HTC Vive, and also on projection based systems 
like CAVEs displays. 
On top of this low level component the system is 
composed by a Mixed Reality module which is in charge 
of merging the real content coming from various streams 
(audio, images or video either 2D or 3D) with the virtual 
contents. The resources coming from the real world can 
be either pre-recorded or real-time captures of the world. 
The system takes care of handling the different streams, 
registering them inside the VE and rendering them to the 
user.  
Dedicated tools have been developed in order to 
handle 3D video (RGBD streams) acquired with depth 
cameras like the Microsoft Kinect, since existing tools 
lack of the features needed to opportunely post-process 
this kind of data. In particular the suite of developed 
tools allow to trim the stream (in order to select specific 
portions of the stored data) and to clean the video data in 
order to make it easier to seamlessly mix it with the 
virtual environment (see Figure 2). The use of these tools 
allow to separate, with a good precision, the desired 
content of the stream from the unwanted 
background/noise. 
The virtual content consists of 3D models and 
Virtual Storyboards (VS). The system interprets the VS, 
a sequence of instructions defining the application 
storyboard, in real time. The VS, defined in text files in 
order to be easily authored by any text tool, provides the 
possibility of defining custom key-points that can alter 
 
Figure 1: Architecture overview. 
 
Figure 2: Cleaning 3D video streams acquired with depth cameras using the developed tools. 
 
314 Informatica 40 (2016) 311–316  R. Brondi et al. 
the flow of the application.  
The VS allows also to specify how the real world 
resources and other elements (e.g. camera animations, 
interactive elements, movements and dialogues) are 
displayed in the VE and how the user can interact with 
them. All the resources and the relationship between the 
resources, the action of the users, the timing and the 
environment are defined in these configuration files. This 
allows to easily replicate the same functionalities in 
different contexts, loading custom resources and 
developing different applications. 
3.1 Interaction in the VE 
The developed architecture contains a hand interaction 
module dedicated to the management of user interaction 
with the VE. The module is in charge of tracking the user 
hands and detecting gestures. Using the tracking 
information, a virtual representation of the user hands is 
provided in the VE. The system animates the virtual 
hands interpreting the information coming from the 
sensors used to capture the user. For each hand, first it 
uses the position of the palm in order to evaluate the 
position of the user virtual representation. Then for each 
finger, it evaluates the angles to be applied to each 
phalange.  
The detected gestures are used in order to enable 
actions to undertake in the Virtual Environment. 
Currently pointing, tapping and pinching gestures have 
been implemented in the system. These actions can be 
used to develop the user interaction with the environment 
(e.g. selection/pressing of GUI elements like a button, 
object selection and or transformation). The module 
offers an abstraction to the above infrastructure allowing 
the use of different input systems. The architecture have 
been tested with the Leap Motion Controller and with the 
CyberGlove II. Other hand tracking systems will be 
included in the future. 
3.2 Case Studies 
Using our architecture, two different case study 
applications have been developed, for two different kinds 
of intangible Cultural Heritage dealing with 
craftsmanship.  
The subject of the first application has been the work 
of printmakers. The VE replicates a structured print 
house featuring different locations where artisans can 
show their job, retracing all the steps involved in the 
process of making a stamp. An approach similar to "I'm 
in VR" has been used to stream pre-recorded depth-
movies inside the 3D VE [21], in order to reproduce real 
artisan movements. Depth movies allow to observe 
human motion with a high level of detail. More complex 
systems using graphics animations would require the use 
of very expensive motion tracking systems that can also 
hinder the artisan work (see Figure 3, right). 
 
 
Figure 3: The pedagogical agent inside the VE (left)  
and the artisan pre-recorded depth-movies (right). 
Artisans are visualized as pre-recorded depth-movies 
tessellated in real-time, rendered as polygonal meshes 
and merged inside the VE. Users can also explore the 
environment using natural movements or teleporting 
themselves in different places; they can also interact with 
the environment by selecting 3D objects and GUI buttons 
using the provided NUI.  
 
 
Figure 4: Watching own hands and  
artisan’s hands at the same time. 
A virtual character, a pedagogical agent, guides the 
user through the environment in order to explain the 
actions of the artisans and to provide information about 
their work (see Figure 3, left). Visitors, beyond observing 
the artisans from a classic “third person view” (see 
Figure 3 right), can observe the manual activities from a 
“first person point of view” (see Figure 4) as they were 
seeing his/her own hands. Furthermore, when in first 
person view, users can try to emulate the movements of 
the artisan as they see both their own hands, captured by 
the hand tracking device, and the artisan’s ones. 
The second application developed with the proposed 
architecture is related to the work of weavers. Weaving is 
a repetitive manual job, made on a loom. The developed 
application recreates an artisan workshop where different 
 
Figure 5: Hands motion recorded with Cyber Gloves 
vs depth-stream recording reproductions. 
 
Using Mixed Reality and Natural Interaction in... Informatica 40 (2016) 311–316 315 
weavers are working. Users can freely explore the space 
around the artisan and, like the previously described 
application, see their own hands and overlap them to the 
"ghost" hands of the artisan in order to learn how to 
perform some of the actions needed during the work of 
weavers. The artisans' hands can be visualized both as 
pre-recorded depth-movies and as computer graphics 
animated "avatars". The 3D avatar animations have been 
recorded using the Cyber Globe II. Using the data gloves 
to capture the artisans’ hands movements has been 
possible in this case because weavers commonly wear 
gloves in their work and therefore this did not hinder the 
artisan’s activities. This allowed also to compare depth 
videos against avatar animations (see Figure 5) in terms 
of information delivery and perceived quality. 
4 Acknowledgments 
The design and implementation of the proposed 
architecture has been carried out in the context of the 
AMICA project, funded by Fondazione TIM under the 
“Beni Invisibili” financing program. 
The study of the related work, the setup of the test 
methodology and the design of future expansions of the 
described methodology have been carried out in the 
context of the EU 2020-TWINN-2015 eHERITAGE 
project (grant number 692103). 
 
5 References 
[1]  Riva, Giuseppe. "Virtual reality as communication 
tool: A sociocognitive analysis." Presence: 
Teleoperators and Virtual Environments 8.4 (1999): 
462-468. 
[2]  Azuma, Ronald T. "A survey of augmented reality." 
Presence: Teleoperators and virtual environments 
6.4 (1997): 355-385. 
[3]  Milgram, Paul, and Fumio Kishino. "A taxonomy 
of mixed reality visual displays." IEICE 
TRANSACTIONS on Information and Systems 
77.12 (1994): 1321-1329. 
[4]  Carrozzino, Marcello, et al. "The virtual museum of 
sculpture." Proceedings of the 3rd international 
conference on Digital Interactive Media in 
Entertainment and Arts. ACM, 2008. 
[5]  Brondi, Raffaello, and Marcello Carrozzino. 
"ARTworks: An Augmented Reality Interface as an 
Aid for Restoration Professionals." International 
Conference on Augmented and Virtual Reality. 
Springer International Publishing, 2015. 
[6]  Barceló, J. A., Forte, M., & Sanders, D. H. (Eds.). 
(2000). Virtual reality in archaeology. Oxford, UK: 
ArchaeoPress. 
[7]  Economou, Maria, and L. Pujol. "Educational tool 
or expensive toy? Evaluating VR evaluation and its 
relevance for virtual heritage." en New Heritage. 
New media and cultural heritage, Oxon, Routledge 
(2006). 
[8]  Brondi, R., Alem, L., Avveduto, G., Faita, C., 
Carrozzino, M., Tecchia, F., & Bergamasco, M. 
(2015, September). Evaluating the impact of highly 
immersive technologies and natural interaction on 
player engagement and flow experience in games. 
In International Conference on Entertainment 
Computing (pp. 169-181). Springer International 
Publishing. 
http://doi.acm.org/10.1145/161468.16147 
[9]  Carrozzino, M., Lorenzini, C., Duguleana, M., 
Evangelista, C., Brondi, R., Tecchia, F., & 
Bergamasco, M. (2016, June). An Immersive VR 
Experience to Learn the Craft of Printmaking. In 
International Conference on Augmented Reality, 
Virtual Reality and Computer Graphics (pp. 378-
389). Springer International Publishing. 
[10]  Vlahakis, V., Karigiannis, J., Tsotros, M., Gounaris, 
M., Almeida, L., Stricker, D., ... & Ioannidis, N. 
(2001, November). Archeoguide: first results of an 
augmented reality, mobile computing system in 
cultural heritage sites. In Virtual Reality, 
Archeology, and Cultural Heritage (pp. 131-140). 
[11]  Brondi, R., Carrozzino, M., Tecchia, F., & 
Bergamasco, M. (2012, May). Mobile augmented 
reality for cultural dissemination. In Proceedings of 
1st International Conference on Information 
Technologies for Performing Arts, Media Access 
and Entertainment, Firenze, Italy (pp. 113-117). 
[12]  Sylaiou Styliani, Liarokapis Fotis, Kotsakis Kostas, 
and Patias Petros. Virtual museums, a survey and 
some issues for consideration. Journal of cultural 
Heritage, 10(4):520–528, 2009, Elsevier. 
[13]  Rafal Wojciechowski, Krzysztof Walczak, Martin 
White, and Wojciech Cellary. Building virtual and 
augmented reality museum exhibitions. In 
Proceedings of the ninth international conference on 
3D Web technology, pages 135–144. ACM, 2004. 
[14]  Chia-Yen Chen, Bao Rong Chang, and Po-Sen 
Huang. Multimedia augmented reality information 
system for museum guidance. Personal and 
ubiquitous computing, 18 (2):315–322, 2014, 
Springer. 
[15]  B Jiménez Fernández-Palacios, F Nex, A Rizzi, and 
F Remondino. Arcube – the augmented reality cube 
for archaeology. Archaeometry, 2014, Wiley Online 
Library 
[16]  Paul Debenham, Graham Thomas, and Jonathan 
Trout. Evolutionary augmented reality at the natural 
history museum. In Mixed and Augmented Reality 
(ISMAR), 2011 10th IEEE International 
Symposium on, pages 249–250. IEEE, 2011. 
[17]  Brondi, Raffaello, and Marcello Carrozzino. 
"Fostering collaboration among restoration 
professionals using augmented reality." 2014 IEEE 
23rd International WETICE Conference. IEEE, 
2014. 
[18]  Papagiannakis, G., Ponder, M., Molet, T., 
Kshirsagar, S., Cordier, F., Magnenat-Thalmann, 
M., & Thalmann, D. (2002). LIFEPLUS: revival of 
life in ancient Pompeii, virtual systems and 
multimedia. In Proceedings of VSMM 2002 (No. 
VRLAB-CONF-2007-038). 
316 Informatica 40 (2016) 311–316  R. Brondi et al. 
[19]  Tecchia, Franco, et al. "I'm in VR!: using your own 
hands in a fully immersive MR system." 
Proceedings of the 20th ACM Symposium on 
Virtual Reality Software and Technology. ACM, 
2014. 
[20]  Skills project, CORDIS EU, available at: 
http://cordis.europa.eu/project/rcn/103956_en.html 
[21]  Carrozzino, M., et al. (2015). AMICA-Virtual 
Reality as a Tool for Learning and Communicating 
the Craftsmanship of Engraving. Proceedings of 
2015 Digital Heritage International Congress, (pp. 
187-188) 
[22]  Piumsomboon, T., Altimira, D., Kim, H., Clark, A., 
Lee, G., & Billinghurst, M. (2014, September). 
Grasp-Shell vs gesture-speech: A comparison of 
direct and indirect natural interaction techniques in 
augmented reality. In Mixed and Augmented 
Reality (ISMAR), 2014 IEEE International 
Symposium on (pp. 73-82). IEEE. 
[23]  Rautaray, S. S., & Agrawal, A. (2015). Vision 
based hand gesture recognition for human computer 
interaction: a survey. Artificial Intelligence Review, 
43(1), 1-54. 
[24]  Ding, W. and Marchionini, G. 1997. A Study on 
Video Browsing Strategies. Technical Report. 
University of Maryland at College Park.  
[25]  Tecchia, Franco et al.. "A Flexible Framework for 
Wide-Spectrum VR Development." Presence 19.4 
(2010): 302-312