HCI Bibliography Home | HCI Conferences | AM Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
AM Tables of Contents: 101112131415

Proceedings of Audio Mostly 2015: A Conference on Interaction with Sound

Fullname:AudioMostly 2015: Conference on Interaction with Sound
Editors:George Kalliris; Charalampos Dimoulas
Location:Thessaloniki, Greece
Dates:2015-Oct-04 to 2015-Oct-06
Publisher:ACM
Standard No:ISBN: 978-1-4503-3896-7; ACM DL: Table of Contents; hcibib: AM15
Papers:39
Links:Conference Website
Rhythmic Persuasion Model: Shifting from Phatic to Persuasion BIBAFull-Text 1
  Hanif Baharin; Nadiah Zin
We argue that, as an aspect of phatic communication, rhythm entrainment may be used to enhance persuasive technology through inducing mimicry of desired behaviours. This paper delineated theoretical justifications for our argument. Based on ambient persuasion model and rhythm entrainment framework, and following the results of our previous experiment, we proposed rhythmic persuasion model and hypothesised that rhythmic auditory icons are more likely to induce mimicry. In the future, we will conduct experiments to test this hypothesis.
Multilayer Formats and the Semantic Web: a Music Case Study BIBAFull-Text 2
  Adriano Baratè; Goffredo Haus; Luca A. Ludovico
The advent of the so-called Semantic Web led to the transformation of the World Wide Web into an environment where documents are associated with data and metadata. The latter kind of information specifies the semantic context of datain a format suitable to be queried and interpreted in an automatic way. Extensible Markup Language (XML) is extensively used in the Semantic Web, since this format supports not only human- but also machine-readable tags. On the one side the Semantic Web aims to create a set of automatically-detectable relationships among data, thus providing users with a number of non-trivial paths to navigate information in a geographically distributed framework. On the other side, multilayer formats typically operate in a similar way, but at a "local" level. In this case, information is contained, hierarchically structured and interconnected within a single document. Also in this context XML is extensively adopted. The goal of the present work is to discuss the possibilities emerging from a combined approach, namely by adopting multilayer formats in the Semantic Web, addressing in particular augmented-reality applications. From this point of view, an XML-based international standard known as IEEE 1599 will be employed to show a number of innovative applications in music.
Moodplay: an interactive mood-based musical experience BIBAFull-Text 3
  Mathieu Barthet; György Fazekas; Alo Allik; Mark Sandler
Moodplay is a system that allows users to collectively control music and lighting effects to express desired emotions. The interaction is based on the Mood Conductor participatory performance system that uses web, data visualisation and affective computing technologies. We explore how artificial intelligence, semantic web and audio synthesis can be combined to provide new personalised and immersive musical experiences. Participants can choose degrees of energy and pleasantness to shape the music played using a web interface. Semantic Web technologies have been embedded in the system to query mood coordinates from a triple store using a SPARQL endpoint and to connect to external linked data sources for metadata.
TouchNoise: A New Multitouch Interface for Creative Work with Noise BIBAFull-Text 4
  Axel Berndt; Nadia Al-Kassab; Raimund Dachselt
TouchNoise is a multitouch noise modulation interface designed for musical live performance. It allows the direct and indirect manipulation of sound particles in the stereophonic frequency spectrum. In order to increase TouchNoise's playability we conducted a comprehensive interface revision retaining only its core interaction concept. New interaction techniques and gestures for radial menus, effect range settings, and frequency band effects are introduced. The revision paved the way for a series of new functionalities, such as flocking, flow fields, and MIDI connectivity, making TouchNoise a fully-fledged, powerful interface for creative work with noise. This paper introduces the new TouchNoise interface and functionalities through a discussion of the revision process and derives interaction principles and design recommendations for musical multitouch interfaces in general.
Machine Learning Algorithms for Environmental Sound Recognition: Towards Soundscape Semantics BIBAFull-Text 5
  Vasileios Bountourakis; Lazaros Vrysis; George Papanikolaou
This paper investigates methods aiming at the automatic recognition and classification of discrete environmental sounds, for the purpose of subsequently applying these methods to the recognition of soundscapes. Research in audio recognition has traditionally focused on the domains of speech and music. Comparatively little research has been done towards recognizing non-speech environmental sounds. For this reason, in this paper, we apply existing techniques that have been proved efficient in the other two domains. These techniques are comprehensively compared to determine the most appropriate one for addressing the problem of environmental sound recognition.
Tonic: Combining Ranking and Clustering Dynamics for Music Discovery BIBAFull-Text 6
  Dimitrios Bountouridis; Jan Van Balen; Marcelo Rodríguez-López; Anna Aljanaki; Frans Wiering; Remco C. Veltkamp
This paper describes the design of Tonic, a novel web interface for music discovery and playlist creation. Tonic maps songs into a two dimensional space using a combination of free tags, metadata, and audio-derived features. Search results are presented in this two dimensional space using a combination of clustering and ranking visualization strategies. Tonic was ranked first in the 2014 MIREX User Experience Grand Challenge, where it was evaluated in terms of learnability, robustness and overall user satisfaction, amongst others.
Pyc2Sound: a Python tool to convert images into sound BIBAFull-Text 7
  Vincent Bragard; Thomas Pellegrini; Julien Pinquier
This article reports ongoing work on a user interface dedicated to generate sound from pictures and hand drawings. If we imagine what sound would correspond to a given image, on what parameters of the image do we focus and what would the result sound like? In this paper, we try to answer this question by giving a model transforming images into sound based on chosen parameters extracted from the image. For this, an input image is first binarized, then its skeleton is extracted and 'tracks' are identified and used to generate chirps in an additive synthesis approach.
Digital synthesis of impact sounds BIBAFull-Text 8
  Vasileios Chatziioannou
Nonlinear interactions may impose several restrictions to numerical simulation attempts, in particular concerning the design of unconditionally stable algorithms. Such algorithms are required for real-time synthesis applications where virtual instruments may be modified on-line. One characteristic case of nonlinear interactions is that of impact sounds. This paper gives a brief summary of recently developed numerical techniques that are suitable for the simulation of systems involving collisions. A simple numerical example, involving a lumped collision model, is used to demonstrate the methodology, and further perspectives concerning the applicability of the presented family of schemes to real-time synthesis applications are discussed.
Emotional cues, emotional signals, and their contrasting effects on listener valence BIBAFull-Text 9
  Justin Christensen
Smith and Harper[17] documented a considerable amount of animal sounds, and they found that the signals that had intentions to communicate (e.g. courtship display or defensive threatening posturing) were most often multimodal signals. This results as these signals are meant to be reliable and of benefit to both the sender and the receiver of the signal, otherwise they would cease to have the intended effect of communication. In contrast with signals, animal cues are much more commonly unimodal as they are unintentional by the sender.
   In my research, I investigate whether subjects exhibit different magnitudes of emotional valence responses between multimodal haptic and audio sources to unimodal audio sources, and whether there are divergences between the different types of emotions presented. Some emotions are more strongly intentional signals (e.g. happiness, fear or grief) while others are more emotional cues (e.g. sadness or calmness). My hypothesis is that musical and sound stimuli that are mimetic of emotional signals should combine to elicit a stronger response when presented as a multimodal stimulus as opposed to as a unimodal stimulus, whereas musical or sound stimuli that are mimetic of emotional cues interact in less clear and less cohesive manners with their corresponding haptic signals.
   For my investigations, subjects listen to samples from the International Affective Digital Sounds Library[2] and selected musical works on speakers in combination with a tactile transducer attached to their chair. The listening sessions are recorded on EEG supported by SCR, respiratory rate, heart rate and subject feedback responses.
Product Sound Design: Form, Function, and Experience BIBAFull-Text 10
  Cumhur Erkut; Stefania Serafin; Michael Hoby; Jonniy Sårde
Current interactive products, services, and environments are appraised by their sensory attributes, in addition to their form and function. Sound is an important factor in these multisensory product appraisals. Integrating this sound opportunity into the design and development of interactive products, which are fit for real-world, yet constitute a strong brand identity, remains a challenge. We address this challenge by applying the research know-how of an academic institution and business practices of a sound agency SME within the core R&D and production process of the third industrial partner. Our approach has clear application scenarios in, e.g., extended wireless headsets, car audio appliances, and portable entertainment devices. We describe the prototypes developed during the project life span, and the activities and outcomes of a half-day workshop designed to disseminate the project results.
Combined Auditory Warnings For Driving-Related Information BIBAFull-Text 11
  Johan Fagerlönn; Stefan Lindberg; Anna Sirkka
Designing appropriate auditory warnings is a well-known challenge. The present work focuses on a new type of auditory warning for within-vehicle use, combining a signal that conveys urgency information with a signal that conveys more detailed information about the urgent event. In the study, three concepts of "combined warnings" are compared. The concepts differ in terms of the sound type used to convey event information. The results support the usefulness and potential of combined warnings. However, using information sounds that are too abstract can have a severe degrading effect on warning efficiency and cognitive effort. Interestingly, these abstract sounds may also negatively impact the user's ability to respond accurately to the urgency level of the warning.
Automatic Recognition of Eventfulness and Pleasantness of Soundscape BIBAFull-Text 12
  Jianyu Fan; Miles Thorogood; Bernhard E. Riecke; Philippe Pasquier
A soundscape is the sound environment perceived by a given listener at a given time and space. An automatic soundscape affect recognition system will be beneficial for composers, sound designers, and audio researchers. Previous work on an automatic soundscape affect recognition system has demonstrated the effectiveness of predicting valence and arousal on responses from one expert user. Thus, further validations of multi-users' data are necessary for testing the generalizability of the system. We generated a gold standard by averaging responses from people provided people agreed with each other enough. Here, we model a set of common audio features extracted from a corpus of 120 soundscape recording samples that were labeled for valence and arousal in an online study with human subjects. The contribution of this manuscript is threefold: (1) study the inter-rater agreement showing the high level agreement between participants' responses regarding valence and arousal, (2) train stepwise linear regression models with the average responses of participants for soundscape affect recognition, which obtains better results than the previous study, (3) test the correlation between the level of pleasantness and the level of eventfulness based upon the gold standard.
The Role of Agency in Ludoacoustic Immersion: Experiencing Recorded Sound and Music in Situational Context BIBAFull-Text 13
  Hans-Peter Gasselseder
In sound-fx and music, expressive artifacts of the recording contribute to forming expectations on how an object (sound source) is interacting with an environment (room) while also accounting for the associated resonances occurring in that object. By switching attentional focus between action and resonance in object and environment as a result of comparing expectations to incoming stimuli, the context of a virtual situation is simulated by referencing to syntax of body-object-environment interaction. This virtual syntax may be partially projected onto the situational context of the user, leading to antecedents of immersion that depend on emotional arousal and personality traits of the listener. After having outlined a conceptual framework describing the mediation and agency detection of sonic expression within the acoustic properties of situational contexts, the paper provides an outlook on how these agents may be translated to meaningful structures that are yet to be studied in video games.
Haptic and Visual feedback in 3D Audio Mixing Interfaces BIBAFull-Text 14
  Steven Gelineck; Dan Overholt
This paper describes the implementation and informal evaluation of a user interface that explores haptic feedback for 3D audio mixing. The implementation compares different approaches using either the LEAP Motion for mid-air hand gesture control, or the Novint Falcon for active haptic feedback in order to augment the perception of the 3D space. We compare different interaction paradigms implemented using these interfaces, aiming to increase speed and accuracy and reduce the need for constant visual feedback. While the LEAP Motion relies upon visual perception and proprioception, users can forego visual feedback with interfaces such as the Novint Falcon and rely primarily on haptic cues, allowing more focus on the spatial sound elements. Results of the evaluation support this claim, as users preferred the interaction paradigm using the Falcon with no visual feedback. Furthermore, users disliked active haptic feedback for augmented perception of 3D space or for snapping to objects.
Echoes of reverb: from cave acoustics to sound design BIBAFull-Text 15
  Christos A. Goussios; Nikolaos Tsinikas; Niovi Kitsiou
This work attempts to highlight some specific interconnections between certain human artistic expressions and their relation with sonic environments and architectural soundscapes, especially connections with some acoustic qualities of spaces, such as the reverberation. The main focus is the effort to discover applications of these acoustic qualities, as identifying characteristics of certain spaces, in order to serve the narration in fiction films through sound design. The use of the reverberation as a story telling tool in films is highlighted. The difficulty of the human definition and expressions concerning our sonic environment is mentioned through the partial absence of a sonic vocabulary. One the first part the unintended effect and reflection of acoustic characteristics of spaces on certain human artistic and generally creative procedures is presented and part of a related research-in-progress, is introduced. The second part of this work focuses on the intentional use of acoustic characteristics on film sound in order to imitate certain spaces and furthermore underline, multiply etc feelings, psychological situations, emotional experiences and many more in audiovisual narration.
The Impact of Sound Design on the Interpretation of Archive Silent Films. Meteora, (1924): A case study BIBAFull-Text 16
  Christos A. Goussios; Eleni Gkolfinopoulou; Dimitra Margaritidou; Ioannis Sykovaris; Konstantinos Stathis
This work focuses on the potential and impact that sound design can have or add on the interpretation of Archive Silent Films. An endeavor to discover linkage and possibilities between sound elements and emotions is attempted. Moreover sound design is presented as an expressive means in cinema nowadays. The Greek silent film Meteora (Dorizas, 1924), was used as the canvas for different approaches and applications of sound design, highlighting the unlimited possibilities of the image/sound narrative interconnections. It was also used for the performance of a multifaceted experiment between film school students, where there were two major fields of interest: memory-imagination-creation using sound elements and persuasion-impact-potential of an existing soundtrack. The students acted in two ways: as sound designers, sharing their imagination and as audience criticizing, arguing and discussing on an already-applied sound design. These fore mentioned actions, attempts and applications were part of the educational procedure for the teaching of Film Sound & Music in the School of Film Studies, Faculty of Fine Arts, Aristotle University of Thessaloniki (AUTh), Greece on spring semester 2014-2015.
Spatial Sound and Multimodal Interaction in Immersive Environments BIBAFull-Text 17
  Francesco Grani; Dan Overholt; Cumhur Erkut; Steven Gelineck; Georgios Triantafyllidis; Rolf Nordahl; Stefania Serafin
Spatial sound and interactivity are key elements of investigation at the Sound And Music Computing master program at Aalborg University Copenhagen.
   We present a collection of research directions and recent results from work in these areas, with the focus on our multifaceted approaches to two primary problem areas: 1) creation of interactive spatial audio experiences for immersive virtual and augmented reality scenarios, and 2) production and mixing of spatial audio for cinema, music, and other artistic contexts. Several ongoing research projects are described, wherein the latest developments are discussed.
   These include elements in which we have provided sonic interaction in virtual environments, interactivity with volumetric sound sources using VBAP and Wave Field Synthesis (WFS), and binaural sound for virtual environments and spatial audio mixing. We show that the variety of approaches presented here are necessary in order to optimize interactivity with spatial audio for each particular type of task.
The Sound of the Smell of my Shoes BIBAFull-Text 18
  Mark Grimshaw; Mads Walther-Hansen
Given the sensory poverty of virtual environments, such as those found in computer games that rely, in the main, solely on audio-visual interfaces, how best do we attain the experience of presence in those environments when presence requires the construction of a coherent (in the sense of realism) place in which to be and in which to act? The paper explores this question through an investigation of the senses of hearing and smell and suggests the possibility of introducing the experience of odours into such environments through the use of sound.
Learning Visual Programming by Creating a Walkable Interactive Installation BIBAFull-Text 19
  Aristotelis Hadjakos; Heizo Schulze; André Düchting; Christian Metzger; Marc Ottensmann; Friederike Riechmann; Anna-Maria Schneider; Michael Trappmann
In this paper we explore the question of how to teach visual programming languages, such as Max/MSP, Pure Data, vvvv, to experienced media and sound artists. A pedagogical challenge with these students is to keep them motivated after the initial phase of learning the core concepts and creating first simple programs. This is the case because they already have their means for artistic expression and experience a mismatch between their current skill and what they would need to successfully apply visual programming in their art. Our approach consists of letting them develop an interactive installation based on the principles "media first", liveliness, design patterns and technical framework. The core idea is to let the students create and bring their media, in which they are experts creating and use this as the backbone of an interactive installation. Our students developed a walkable interactive installation bang!, which serves as a case study for this approach.
Creating a Super Instrument: Composers and Pianists Reaching Beyond Their Technical and Expressive Capabilities BIBAFull-Text 20
  Maria Kallionpää; Hans-Peter Gasselseder
Thanks to the development of new technology, musical instruments are no more tied to their existing acoustic or technical limitations as almost all parameters can be augmented or modified in real time. An increasing number of composers, performers, and computer programmers have thus become interested in different ways of "supersizing" acoustic instruments in order to open up previously-unheard instrumental sounds. This leads us to the question of what constitutes a super instrument and what challenges does it pose aesthetically and technically? Although the classical music performers have traditionally been dependent on their existing instrumental skills, various technological solutions can be used to reach beyond them. This paper focuses on the possibilities of enhancing composers' and performing pianists' technical and expressive vocabulary in the context of electroacoustic super instrument compositions. The discussion will be illustrated by two compositional case studies.
PC-based room correction for audio BIBAFull-Text 21
  Fotios Kontomichos; Nicolas-Alexander Tatlas; Panagiotis Hatziantoniou; Charalampos Papadakos
The characteristics and implementation of a complete software bundle consisting of a room acoustics measurement tool, an application for the calculation of the room correction filters and a plugin for real time audio filtering is described. Via this plug-in a non-experienced user should be able to follow the documentation, measure the impulse response of his listening room, calculate the appropriate stereo inverse filters and utilize them in order to equalize the audio input in real time. The proposed room equalization solution was evaluated in terms of subjective and objective performance.
Feature-Based Language Discrimination in Radio Productions via Artificial Neural Training BIBAFull-Text 22
  R. Kotsakis; A. Mislow; G. Kalliris; M. Matsiola
The current paper focuses on the discrimination of audio content, deriving from radio productions, based on the spoken language. During the implementation several audio features were extracted and subsequently evaluated, containing the spectral, timbre and tempo properties of the implicated voice signals. In this process, the differentiated patterns that appear in radio productions, such as speech signals, phone conversations and music interferences had to be initially detected and classified, leading in the employment of a prequel generic classification scheme. The hierarchical structure of discrimination integrated parametric segmentation with various window lengths, in order to detect the most efficient ones. The conducted experiments were supported by machine learning approaches, and more specifically by artificial neural networks topologies, which demonstrate increased discrimination potentials, when they are implicated in audio semantic analysis problems. The achieved overall and partial classification performances were high, revealing the saliency of the selected parameters and the efficiency of the whole implemented methodology.
Analyzing and organizing the sonic space of vocal imitations BIBAFull-Text 23
  D. A. Mauro; D. Rocchesso
The sonic space that can be spanned with the voice is vast and complex and, therefore, it is difficult to organize and explore. In order to devise tools that facilitate sound design by vocal sketching we attempt at organizing a database of short excerpts of vocal imitations. By clustering the sound samples on a space whose dimensionality has been reduced to the two principal components, it is experimentally checked how meaningful the resulting clusters are for humans. Eventually, a representative of each cluster, chosen to be close to its centroid, may serve as a landmark in the exploration of the sound space, and vocal imitations may serve as proxies for synthetic sounds.
Chromatic Reconstruction and Musical Aberration: a survey into the Brain Synapses of Musicality BIBAFull-Text 24
  Dionysios Politis; Dimitrios Margounakis; Miltiadis Tsaligopoulos; Georgios Kyriafinis
A long issue in music representation is the transforming mechanism between music and colors. As music promotes itself as the basis on which audiovisual arts are fostered, a practical method for correlating music performance with its visual surroundings is needed. This paper describes how our brain detects and allures music in a two dimensional visual space, and how the color superimposition may reversely lead to music production.
Obtaining General Chord Types from Chroma Vectors BIBAFull-Text 25
  Marcelo Queiroz; Maximos Kaliakatsos-Papakostas; Emilios Cambouropoulos
This paper presents two novel strategies for processing chroma vectors corresponding to polyphonic audio, and producing a symbolic representation known as GCT (General Chord Type). This corresponds to a fundamental step in the conversion of general polyphonic audio files to this symbolic representation, which is required for enlarging the current corpus of harmonic idioms used for conceptual blending in the context of the COINVENT project. Preliminary results show that the strategies proposed produce correct results, even though harmonic ambiguities (e.g. between a major chord with added major 6th and a minor chord with minor 7th) might be resolved differently according to each strategy.
Gestural control in electronic music performance: sound design based on the 'striking' and 'bowing' movement metaphors BIBAFull-Text 26
  Frederic Robinson; Cedric Spindler; Volker Böhm; Erik Oña
Following a call for clear movement-sound relationships in motion-controlled digital musical instruments (DMIs), we developed a sound design concept and a DMI implementation with a focus on transparency through intuitive control metaphors. In order to benefit from the listener's and performer's natural understanding of physical processes around them, we use gestures with strong physical associations as control metaphors, which are then mapped to sound modules specifically designed to represent these associations sonically. The required motion data can be captured by any low-latency sensor device worn on the hand or wrist, that has an inertial measurement unit with six degrees of freedom. A dimension space analysis was applied on the current implementation in order to compare it to existing DMIs and illustrate its characteristics. In conclusion, our approach resulted in a DMI with strong results in transparency, intuitive control metaphors, and a coherent audio-visual link.
Towards an Enactive Swimming Sonification: Exploring Multisensory Design and Musical Interpretation BIBAFull-Text 27
  Gabriela Seibert; Daniel Hug; Markus Cslovjecsek
In this paper we present a design method that integrates the exploration of visual representations and musical expertise in the process of creating a swimming sonification, and initial results of the method's application in an explorative study. Our focus lies on the creation of a sonic representation that facilitates the affective, intuitive reproduction of the crawl swim movement. The method integrates artistic creativity and a systematic design process. By combining the linguistic-conceptual, visual and auditory representation of the (imagined) movement, we aim to advance the expressive quality of the sonic representation as well as the design method in a crossmodal, holistic way. Finally we report on a qualitative evaluation of the potential of this approach to support the affective, intuitive re-enactment of the swimming movement.
RecApp: A mobile application for ubiquitous and collaborative audio processing BIBAFull-Text 28
  Efstathios A. Sidiropoulos; Evdokimos I. Konstantinidis; Rigas G. Kotsakis; Andreas A. Veglis
The present paper provides a methodological framework for the design of a novel intelligent audio content processing cloud model. Motivated from previous experience on "desktop" audio content analysis and processing, a device-independent cloud model is thoroughly analyzed and designed, aiming at serving demanding new media services for the non-technologically experts (i.e. Journalists, Media Professionals, etc.). RecApp Android mobile application provides support to Journalists easily editing and sharing their audio material so they can better evaluate and disseminate it in working groups.
The Delayed Medium: Hidden Matrices of Unheard Sound in Some Artistic Data-visualisation Experiments BIBAFull-Text 29
  Morten Søndergaard
This paper investigates how unheard data and sound (i.e. sounds from data sources not accessible to the human perception system) emerge in artistic data-experiments as what I term 'the delayed medium' of the data archive. The paper's argument begins by quoting Walther Benjamin's notion of an latent archive 'under' the archive and discusses this latency in view of that which Katherine Hayles has named 'technogenesis'. It is in this framing that the 'delayed medium' emerges, and I am revisiting Arthur Koestler, Lev Manovich, Andrew Pickering, and Ulrich Neisser to further understand and describe this new mode of cultural production; My argument is, however, primarily building on an analysis of the Danish media artist Thorbjørn Lausten's artistic experiments in visualizing unheard data and sound. It proposes that the delayed medium is pivotal in the cognitive processing of the creative matrices emerging as an 'opsis' derived from the artistic experiments and pointing towards latency as a fundamental condition (and possible key to the cognitive processing) of cultural knowledge in an age of technogenesis.
A Wireless Acoustic Sensor Network for environmental monitoring based on flexible hardware nodes BIBAFull-Text 30
  N. A. Tatlas; S. M. Potirakis; S. A. Mitilineos; S. Despotopoulos; D. Nicolaidis; M. Rangoussi
Monitoring areas of environmental interest by employing a wireless acoustic sensor network has been recently investigated. Although feature-rich sensor solutions have surfaced, the requirements for a flexible node from a networking and processing segmentation point of view cannot be met by commercially available systems. The hardware development presented is based on a single design that can support multiple subsystems with the actual node operation being defined by the embedded software. The systems performance was investigated through a six month pilot operation; for that duration, data for anthropogenic, geophysical and biophysical sounds were logged.
The StringPhone: a novel voice driven physically based synthesizer BIBAFull-Text 31
  David Stubbe Teglbjærg; Jesper S. Andersen; Stefania Serafin
This paper describes the development of TheStringPhone, a physical modeling based polyphonic digital musical instrument that uses the human voice as input excitation. The core parts of the instrument include digital filters, waveguide sections and feedback delay networks for reverberation. We describe the components of the instrument and the results of an informal evaluation with different musicians.
BF-Classifier: Background/Foreground Classification and Segmentation of Soundscape Recordings BIBAFull-Text 32
  Miles Thorogood; Jianyu Fan; Philippe Pasquier
Segmentation and classification is an important but time consuming part of the process of using soundscape recordings in sound design and research. Background and foreground are general classes referring to a signal's perceptual attributes, and used as a criteria by sound designers when segmenting sound files. We establish the background / foreground classification task within a musicological and production-related context, and present a method for automatic segmentation of soundscape recordings based on this task. We created a soundscape corpus with ground truth data obtained from a human perception study. An analysis of the corpus showed an average agreement of each class -- background 92.5%, foreground 80.8%, and background with foreground 75.3%. We then used the corpus to train a machine learning technique using a Support Vector Machines classifier. An analysis of the classifier demonstrated similar results to the average human performance (background 96.7%, foreground 80%, and background with foreground 86.7%). We then report an experiment evaluating the classifier with different analysis windows sizes, which demonstrates how smaller window sizes result in a diminishing performance of the classifier.
An automatic speech detection architecture for social robot oral interaction BIBAFull-Text 33
  E. G. Tsardoulias; A. L. Symeonidis; P. A. Mitkas
Social robotics have become a trend in contemporary robotics research, since they can be successfully used in a wide range of applications. One of the most fundamental communication skills a robot must have is the oral interaction with a human, in order to provide feedback or accept commands. And, although text-to-speech is an almost solved problem, this isn't the case for speech detection, since it includes a large number of different conditions, many of which are literally unpredictable. There are quite a few well established ASR (Automatic Speech Recognition) tools, however without providing efficient results, especially in less popular languages. The current paper investigates different speech detection strategies via the utilization of the Sphinx-4 open-source library. The first is a way to incorporate languages for which no acoustic or language model exists (Greek in our case), following the grapheme-to-phoneme concept. The speech detection model is evaluated using audio captured from a NAO v4 robot, a difficult task due to the high levels of included noise, thus denoising techniques are investigated as well.
Augmenting Social Multimedia Semantic Interaction through Audio-Enhanced Web-TV Services BIBAFull-Text 34
  Nikolaos Tsipas; Panagiotis Zapartas; Lazaros Vrysis; Charalampos Dimoulas
Multimedia semantic analysis is a key element in managing the exponentially growing amount of produced multimedia content, available on the web and the social media. Towards this direction, a semantically enhanced Web-TV environment providing video-on-demand and simulcast streaming services, is proposed. The system offers content management and analysis automation capabilities by exploiting information derived from the semantic analysis of the user uploaded content and the social interaction of its users through the processes of annotation and tagging. A fusion based approach is employed for the categorization of content, enabling users to combine heterogeneous semantic information, thus enhancing content exploration with rich media experience. The paper focuses on the analysis of the system's architecture, the applied methodologies for incorporating user generated classification schemes and annotations, and the evaluation of machine learning algorithms to provide innovative multimedia content exploration methods.
Die Neukoms. Local streamed live-performance with mobile devices BIBAFull-Text 35
  Jeroen Visser; Raimund Vogtenhuber
This paper is about the streaming of a live-performance to a local audience. The audience is invited to connect to a wireless stream using their personal mobile devices like smartphones and tablets as remote loudspeakers. In this way the streamed audio will be spatialized by the mobile devices and the participating audience thus become part of the performance. Such a setup was explored in several live-performances of a group of composers-musicians called Die Neukoms. The performers strive to augment the impression of liveness when performing electro-acoustic music, and at the same time probe into alternative performer-listener relationships. Therefore they explore the effect of spatialization and the diffusion in time of the audio stream using personal mobile devices. By involving the audience in the process of the performance, this also ruptures the classical performers-listeners paradigm.
Embedding sound localization and spatial audio interaction through coincident microphones arrays BIBAFull-Text 36
  N. Vryzas; C. A. Dimoulas; G. V. Papanikolaou
This paper discusses a methodology for embedding sound localization techniques for spatial audio interaction, aiming at matching the low computing capabilities of mobile and embedded systems. The main goal is to implement a sound localization system, using a microphone array that combines increased accuracy with compromised computational load and applicable layout size. In particular, four cardioid microphones are placed in a cross-shape arrangement, thus forming a planar coincident microphones array for horizontal direction of arrival estimation. The incorporation of two additional microphones at the perpendicular plane is also considered for 3D audio localization. The implemented system is evaluated through simulation experiments and real-world field measurements in comparison to B-Format based localization. Joint time frequency analysis is considered for improving the localization accuracy in pure SNR conditions. The utilization of multiple arrays is also discussed for 2D and 3D position estimations, as well as signal enhancement by means of time-delay compensation.
Mobile Audio Intelligence: From Real Time Segmentation to Crowd Sourced Semantics BIBAFull-Text 37
  Lazaros Vrysis; Nikolaos Tsipas; Charalampos Dimoulas; George Papanikolaou
The task of general audio detection and segmentation based in means of machine learning is very popular and high-demanding procedure nowadays. Most relevant works in the last decade aim at modelling audio in order to conduct a semantics analysis and a high -- level categorization. A generic strategy that would detect audio events as means of transitions from one audio state to another is considered interesting and would support whole classification workflow. This work investigates the possibilities in designing a robust bimodal segmentation algorithm for audio that would perform well in different conditions without relying on complicated machine learning schemes by minimizing prior knowledge for detection model, and thus, delivering consistent performance for any input signal and computing environment. Additionally, a modern user-generated content approach for populating and updating ground truth databases is presented. Both techniques are implemented and embedded as upgrades, in a mobile software environment for smartphones.
Audio Feedback Design Principles for Hand Gestures in Audio-Only Games BIBAFull-Text 38
  Wenjie Wu; Stefan Rank
The design of audio feedback for touch-less gesture interaction with audio-only environments requires a structured approach. We present design principles for responsive audio feedback geared towards hand gestures in audio-only games, focusing on diegetic environmental feedback before, during, and after gestures. Illustrations for the principles are drawn from a project investigating the usefulness of different designs for encouraging user participation and maintaining immersion in audio-only games. The findings indicate that replacing explicit audio instructions for hand positions and movements with responsive audio feedback for suggesting interaction methods using environmental story-related audio cues leads to measurably higher immersion. Principles for designing immersive audio feedback responsive to hand gesture interaction in a 3-dimensional space also have implications for virtual reality and the blind community's access to motion-controlled games.
A confirmatory approach of the Luminance-Texture-Mass model for musical timbre semantics BIBAFull-Text 39
  Asterios Zacharakis; Konstantinos Pastiadis
This study presents a listening experiment designed to further examine the previously proposed luminance-texture-mass (LTM) model for timbral semantics. Thirty two musically trained listeners rated twenty four instrument tones on six predefined semantic scales, namely, brilliance, depth, roundness, warmth, fullness and richness. The selection of this limited set of descriptors was based on previous exploratory work. These six semantic scales were analysed through Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) to produce two different timbre spaces. These timbre spaces were subsequently compared for configurational and dimensional similarity with the LTM semantic space and the direct MDS perceptual space obtained with the same stimuli. The results showed that the selected six semantic scales are adequately representing the LTM model and are fair at predicting the configurations of the sounds that result from pairwise dissimilarity ratings.