Multimodal Resources

UPC-TALP database of isolated meeting-room acoustic events

This database was produced within the CHIL Project (Computers in the Human Interaction Loop), in the framework of an Integrated Project (IP 506909) under the European Commission's Sixth Framework Programme. It contains a set of isolated acoustic events that occur in a meeting room environment and that were recorded for the CHIL Acoustic Event Detection (AED) task. The recorded sounds do not have temporal overlapping. The database can be used as training material for AED technologies as well as for testing AED algorithms in quiet environments without temporal sound overlapping.


The database contains signals corresponding to 23 audio channels with corresponding labels (out of 84 channels used in the whole CHIL task). The 23 audio channels correspond to: 12 microphones of the 3 T-shaped clusters, 4 tabletop omni directional microphones, and 7 channels of the Mark III array.

Data was recorded at 44.1kHz, 24-bit precision, and then converted to 16-bit Raw Little Endian format. All the channels were synchronized. During all recordings two-three additional people were inside the room for a more realistic scenario.

Approximately 60 sounds per sound class were recorded. Each session was produced by the same ten people (5 men and 5 women). There are 3 sessions per participant. At each session, the participant took a different place in the room out of 7 fixed different positions. During each session a person had to produce a complete set of sounds twice. A script indicating the order of events to be produced was given to each participant. Almost each event was followed and preceded by a pause of several seconds. All sounds were produced individually, except “applause” and several “laugh” that were produced by the people that were inside the room altogether. The annotation was done manually.

The database is stored on 3 DVDs (one session per DVD).

The following table summarizes the content of the DVDs and shows the number of annotated acoustic events in each session:

Event type Session 1 Session 2 Session 3
Knock (door, table) 15 18 17
Door open 20 20 20
Door close 20 21 20
Steps 28 24 21
Chair moving 23 28 25
Spoon (cup jingle) 23 21 24
Paper work (listing, wrapping) 31 29 24
Key jingle 21 21 23
Keyboard typing 21 25 20
Phone ringing/Music 37 36 43
Applause 20 20 20
Cough 22 22 21
Laugh 22 21 21
Unknown 38 46 42

Additional information