Detecting speech and music in audio content