News and Events

Juan J. Bosch defends his PhD thesis
27 Jun 2017

Date: Thursday, June 27th, 2017 at 11:30h in room 55.309 (Tanger Building, UPF Communication Campus)

Title: From Heuristics-Based to Data-Driven Audio Melody Extraction.

Supervisor: Dra. Emilia Gómez.

President:  Dr. Xavier Serra (DTIC - UPF)
Secretary:   Dr. Juan Pablo Bello (New York Univ.)
Member:         Dr. Marteen Grachten (Johannes Kepler Univ.)


20 Jun 2017 - 20:02 | view
Seminar by Juan P. Bello on audio source identification
27 Jun 2017

In the context of the PhD defense by Juan J. Bosch, the MTG organizes a seminar by Juan P. Bello, NYU. 

Date: June, 27th 2017 15:30, room 55.410

Title: Towards Multiple Source Identification in Environmental Audio Streams 

Abstract: Automatic sound source identification is a fundamental task in machine listening with a wide range of applications in environmental sound analysis including the monitoring of urban noise and bird migrations. In this talk I will discuss our efforts at addressing this problem, including data collection, annotation and the systematic exploration of a variety of methods for robust classification. I will discuss how simple feature learning approaches such as spherical k-means significantly outperform off-the-self methods based on MFCC, given large codebooks trained with every possible shift of the input representation. I will show how the size of codebooks, and the need for shifting data, can be reduced by using convolutional filters, first by means of the deep scattering spectrum, and then as part of deep convolutional neural networks. As model complexity increases, however, performance is impeded by the scarcity of labeled data, a limitation that we partially overcome with a new framework for audio data augmentation. While promising, these solutions only address simplified versions of the real-world problems we wish to tackle. At the end of the talk, I’ll discuss various steps we’re currently undertaking to close that gap.

Bio: Juan Pablo Bello is Associate Professor of Music Technology, and Computer Science & Engineering, at New York University, with a courtesy appointment at NYU's Center for Data Science. In 1998 he received a BEng in Electronics from the Universidad Simón Bolívar in Caracas, Venezuela, and in 2003 he earned a doctorate in Electronic Engineering at Queen Mary, University of London. Juan's expertise is in digital signal processing, computer audition and music information retrieval, topics that he teaches and in which he has published more than 80 papers and articles in books, journals and conference proceedings. He is director of the Music and Audio Research Lab (MARL), where he leads research on music and sound informatics. His work has been supported by public and private institutions in Venezuela, the UK, and the US, including a CAREER award from the National Science Foundation and a Fulbright scholar grant for multidisciplinary studies in France. For a complete list of publications and other activities, please visit:


20 Jun 2017 - 19:59 | view
Article published in Acta Musicologica related to CompMusic

In order to reach the musicological community and explain the work done within CompMusic, Xavier Serra wrote an article that has just been published in Acta Musicologica, the official peer-reviewed journal of the International Musicological Society.

Serra X. "The computational study of a musical culture through its digital traces". Acta Musicologica. 2017;89(1):24-44.
postprint version:

Abstract: From most musical cultures there are digital traces, digital artifacts, that can be processed and studied computationally, and this has been the focus of computational musicology already for several decades. This type of research requires clear formalizations and some simplifications, for example, by considering that a musical culture can be conceptualized as a system of interconnected entities. A musician, an instrument, a performance, or a melodic motive, are examples of entities and they are linked by various types of relationships. We then need adequate digital traces of the entities, for example, a textual description can be a useful trace of a musician and a recording one of a performance. The analytical study of these entities and of their interactions is accomplished by processing the digital traces and by generating mathematical representations and models of them. But a more ambitious goal, however, is to go beyond the study of individual artifacts and analyze the overall system of interconnected entities in order to model a musical culture as a whole. The reader might think that this is science fiction, and he or she might be right, but there is research trying to make advances in this direction. In this article I undertake an overview the state-of-the-art related to this type of research, identifying current challenges, describing computational methodologies being developed, and summarizing musicologically relevant results of such research. In particular, I review the work done within CompMusic, a project in which my colleagues and I have developed audio signal processing, machine learning, and semantic web methodologies to study several musical cultures.
20 Jun 2017 - 15:31 | view
Emilia Gomez receives an award of Premios Andaluces de las Telecomunicaciones

Emilia Gomez receives an award of Premios Andaluces de las Telecomunicaciones for her professional career. These awards are granted by Asociación Andaluza de Ingenieros de Telecomunicaciones de Andalucía Occidental (Asitano)and aim to highlight the work done by companies or professionals in the ICT, to promote the innovation and the technological development.


20 Jun 2017 - 14:16 | view
Seminars by Axel Roebel and Matthias Mauch
28 Jun 2017

Axel Roebel, from IRCAM, and Mathias Mauch, from Apple and Queen Mary University of London, will give research seminars on Wednesday, June 28th at 15:30h in room 55.410. 

15:30 - Axel Roebel: "Analysis/Re-Synthesis of Singing - and Texture Sounds"

This presentation will discuss recent research exploring different approaches to sound synthesis using analysis/re-synthesis methods for singing and sound textures. We will describe Ircam's Singing Synthesis system ISiS integrating two synthesis approaches: A classical phase vocoder based approach and a more innovative deterministic and stochastic decomposition (PaN) based on a pulse and noise model. We will notably discuss the underlying analysis of the glottal pulse parameters as well as some recent approaches to establish high level control of the singing voice quality (Intensity Changes, mouth opening, roughness of the voice). Concerning Sound Texture synthesis we will describe a recent signal representation using perceptually motivated parameters : envelop statistics in the perceptual bands (McDermott, 2009, 2011, 2013), discuss synthesis methods that allow producing sound signals from these statistical descriptors, and demonstrate some synthesis results not only for analysis synthesis of textures but also the use as effect for the transformation of arbitrary sounds by means of manipulation of these descriptors.

16:30 - Mathias Mauch: "Evolving Music in the Lab and in the Wild"

Let's revisit music culture through the eye of an evolutionary biologist. Can we evolve music in the lab, like bacteria in a Petri dish? Can we observe how music changes in the wild? I'll be reporting on two data-driven studies I did in collaboration with actual biologists to answer just these questions. On the way I'll be introducing my own background in music informatics and the tools we needed to analyse the audio.
20 Jun 2017 - 11:27 | view
Georgi Dzhambazov defends his PhD thesis
28 Jun 2017

Wednesday, June 28th, 2017 at 11:30h in room 55.309 (Tanger Building, UPF Communication Campus)

Georgi Dzhambazov: “Knowledge-based Probabilistic Modeling for Tracking Lyrics in Music Audio Signals
Thesis Director: Xavier Serra
Thesis Committee: Emilia Gómez (UPF), Axel Roebel (IRCAM) and Matthias Mauch (Apple & Queen Mary University of London)
[Full thesis document and accompanying materials]

Abstract: This thesis proposes specific signal processing and machine learning methodologies for automatically aligning the lyrics of a song to its corresponding audio recording. The research carried out falls in the broader field of music information retrieval (MIR) and in this respect, we aim at improving some existing state-of-the-art methodologies, by introducing domain-specific knowledge. The goal of this work is to devise models capable of tracking in the music audio signal the sequential aspect of one particular element of lyrics - the phonemes. Music can be understood as comprising different facets, one of which is lyrics. The models we build take into account the complementary context that exists around lyrics, which is any musical facet complementary to lyrics. The facets used in this thesis include the structure of the music composition, structure of a melodic phrase, the structure of a metrical cycle. From this perspective, we analyse not only the low-level acoustic characteristics, representing the timbre of the phonemes, but also higher-level characteristics, in which the complementary context manifests. We propose specific probabilistic models to represent how the transitions between consecutive sung phonemes are conditioned by different facets of complementary context. The complementary context, which we address, unfolds in time according to principles that are particular of a music tradition. To capture these, we created corpora and datasets for two music traditions, which have a rich set of such principles: Ottoman Turkish makam and Beijing opera. The datasets and the corpora comprise different data types: audio recordings, music scores, and metadata. From this perspective, the proposed models can take advantage both of the data and the music-domain knowledge of particular musical styles to improve existing baseline approaches. As a baseline, we choose a phonetic recognizer based on hidden Markov models (HMM): a widely-used methodology for tracking phonemes both in singing and speech processing problems. We present refinements in the typical steps of existing phonetic recognizer approaches, tailored towards the characteristics of the studied music traditions. On top of the refined baseline, we device probabilistic models, based on dynamic Bayesian networks (DBN) that represent the relation of phoneme transitions to its complementary context. Two separate models are built for two granularities of complementary context: the structure of a melodic phrase (higher-level) and the structure of the metrical cycle (finer-level). In one model we exploit the fact the syllable durations depend on their position within a melodic phrase. Information about the melodic phrases is obtained from the score, as well as from music-specific knowledge.Then in another model, we analyse how vocal note onsets, estimated from audio recordings, influence the transitions between consecutive vowels and consonants. We also propose how to detect the time positions of vocal note onsets in melodic phrases by tracking simultaneously the positions in a metrical cycle (i.e. metrical accents). In order to evaluate the potential of the proposed models, we use the lyrics-to-audio alignment as a concrete task. Each model improves the alignment accuracy, compared to the baseline, which is based solely on the acoustics of the phonetic timbre. This validates our hypothesis that knowledge of complementary context is an important stepping stone for computationally tracking lyrics, especially in the challenging case of singing with instrumental accompaniment. The outcomes of this study are not only theoretic methodologies and data, but also specific software tools that have been integrated into Dunya - a suite of tools, built in the context of CompMusic, a project for advancing the computational analysis of the world's music. With this application, we have also shown that the developed methodologies are useful not only for tracking lyrics, but also for other use cases, such as enriched music listening and appreciation, or for educational purposes.
20 Jun 2017 - 11:17 | view
SMC Master Students' Project Defenses
29 Jun 2017

The oral presentations of the SMC Master Thesis will take place on Thursday, June 29th from 9:30h to 19:30h in room 55.309. The defenses are public.

  Student Project Title Supervisor(s)
9:30 Kushagra Sharma

Automatic analysis of time varying metrical structures in music

Ajay Srinivasamurthy & Xavier Serra


Meghana Sudhindra

Open tool and corpora for the study of Carnatic music

Xavier Serra


Nestor Napoles Lopez

Automatic Harmonic Analysis of Classical String Quartets From Symbolic Score

Xavier Serra & Rafael Caro Repetto


Albin Andrew Correya

Navigating source-ambiguous sounds using user-tailored perceptual attributes

Frederic Font & Xavier Favory


Felipe Loaiciga Espeleta

High Resolution Audio Analysis From A Music Information Retrieval Perspective

Frederic Font


Javuer Arredondo Garrido

Retrieval of Drum Samples by High-Level Descriptors

Frederiic Font


Pedro José Gonzalez Gonzalez

Dataset for the classification of perceptual attributes of sound samples 

Xavier Serra






Deniz Saglam

Exploring Consistency in Genre Annotation and Genre Classification

Dmitry Bogdanov & Alastair Porter


Minz Sanghee Won

Understanding latent semantics of deep learning models for electronic music

Dmitry Bogdanov & Jordi Pons


Daniel Balcells Eichenberger

Interactive Visual Maps for Music Browsing

Dmitry Bodganov & Perfecto Herrera


Siddharth Bhardwaj

Audio data augmentation for musical instrument recognition Cross-lingual voice conversion with non-parallel data

Olga Slizovskaia & Emilia Gómez & Gloria Haro


Helena Cuesta i Mussarra

Automatic transcription from vocal music: the choir case

Emilia Gómez






Gerard Erruz Lopez

Sound source separation techniques for 3D audio scenes

Marius Miron


Tomas Jozef Gajecki Somervail

Enhanced Mixes with Source Separation for Cochlear Implant users

Jordi Janer


Laia Ermi i Carbonell

Non-invasive prediction of intubation difficulties in general anesthesia

Jordi Bonada


Pablo Alonso Jimenez

Cross-lingual voice conversion with non-parallel data

Merlijn Blaauw






Germán Ruiz Marcos

Analysis of the skills acquisition process in musical improvisation: an approach based on creativity

Josep M. Comajuncosas & Enric Guaus


Tessy Anne Vera Troes

Measuring groove: A Computational Analysis of Timing and Dynamics in Drum Recordings

Cárthach Ó Nuanáin & Daniel Gómez


Joseph Matthew Munday

Contextually relevant note suggestion system for touch screen keyboards

Ángel Faraldo & Patricia Santos & Perfecto Herrera


Marc Siquier Peñafort

Computational modelling of expressive music performance in hexaphonic guitar

Sergio Giraldo


Natalia Delgado Galan

Neural correlates of music and emotion in autistic and non autistic children

Rafael Ramirez


Jimmy Jarjoura

Fusion of musical contents, brain activity and short term physiological signals for music-emotion recognition.

Sergio Giraldo & Rafael Ramirez


Pablo Fernandez Blanco

Study of biomechamics in violin performances with kinekt and its relationship with Sound

Alfonso Pérez



20 Jun 2017 - 10:13 | view
The EyeHarp receives an award of Fundacion Caser

The EyeHarp project receives an award of the Fundacion Caser in the category Dependencia y Sociedad. Those awards are granted to projects that help to integrate and improve the quality of life of people with disability.

The EyeHarp is a gaze-controlled music interface that aims to allow people with physical disabilities learn and play music. It is open source and free to download and use.

15 Jun 2017 - 13:12 | view
Perfecto Herrera gives a keynote at ISMIS 2017

On June Perfecto Herrera will give a keynote titled "Elements of musical intelligence for the next generation of digital musical tools" in the 23rd International Symposium on Methodologies for Intelligent Systems (ISMIS-2017) that takes place in the Warsaw University of Technology, Poland. In this talk he will address different aspects of what could be considered "musical intelligence" in humans, and how our current music technologies mock some of them, fail at simulating others, or (sometimes) do something different but interesting. In this context some of the outcomes of a recently finished EU project, GiantSteps, addressed to study and develop intelligent software components in music creation scenarios, will be presented.


14 Jun 2017 - 10:26 | view
Seminar by Douglas Eck on Magenta
15 Jun 2017

Douglas Eck, from Google Brain, gives a talk about "Generative Models of Drawing and Sound" on Thursday, June 15th 2017, at 12:30 in room 55.410.

Abstract: I'll give an overview talk about Magenta, a project investigating music and art generation using deep learning and reinforcement learning. I'll discuss some of the goals of Magenta and how it fits into the general trend of AI moving into our daily lives. I'll talk about two specific recent projects. First I'll discuss our research on Teaching Machines to Draw with SketchRNN, a LSTM recurrent neural network able to construct stroke-based drawings of common objects. SketchRNN is trained on thousands of crude human-drawn images representing hundreds of classes. Second I'll talk about NSynth, a deep neural network that learns to make new musical instruments via a WaveNet-style temporal autoencoder. Trained on hundreds of thousands of musical notes, the model learns to generalize in the space of musical timbres, allowing musicians to explore new sonic spaces such as sounds that exist somewhere between a bass guitar and a flute. This will be a high-level overview talk with no need for prior knowledge of machine learning models such as LSTM or WaveNet.

Short bio: Doug is a Research Scientist at Google leading Magenta, a Google Brain project working to generate music, video, image and text using deep learning and reinforcement learning. A main goal of Magenta is to better understanding how AI can enable artists and musicians to express themselves in innovative new ways. Before Magenta, Doug led the Google Play Music search and recommendation team. From 2003 to 2010 Doug was an Associate Professor in Computer Science at the University of Montreal's MILA Machine Learning lab, where he worked on expressive music performance and automatic tagging of music audio.
10 Jun 2017 - 18:56 | view