Speaker Odyssey 2018 keynote

Slides, video and bibliography for my invited keynote at Odyssey 2018, Les Sables d’Olonne, France, June 2018.

Speaking naturally? It depends who is listening…

Presented at Speaker Odyssey 2018.

PDF slides

Log in if you want to mark this as completed

Links to the videos and sounds included in this presentation:

  • Audio for Carlini and Wagner “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text”, eprint arXiv:1801.01944
  • Video for Carlini, Mishra, Vaidya, Zhang, Micah, Shields, Wagner and Zhou. “Hidden Voice Commands”, USENIX Security Symposium (Security), August 2016 PDF
  • Video for Athalye, Engstrom, Ilyas and Kwok “Synthesizing Robust Adversarial Examples”, eprint arXiv:1707.07397

Readings

Speech quality Adversarial methods Attacks on speech recognition
  • Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, Wenchao Zhou. Hidden Voice Commands. 25th USENIX Security Symposium, Austin, TX, USA, Aug 2016.
  • Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, Wenyuan Xu. DolphinAttack: Inaudible Voice Commands. In Proc. 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), Dallas, TX, USA, Oct-Nov 2017.
Speaker verification, spoofing and anti-spoofing Speech synthesis
  • Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu. Natural TTS synthesis by conditioning Wavenet on Mel Spectrogram Predictions. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, Apr 2018.
  • Cassia Valentini-Botinhao, Zhizheng Wu, Simon King. Towards minimum perceptual error training for DNN-based speech synthesis. Proc. 16th Annual Conference of the International Speech Communication Association (Interspeech), Dresden, Germany, Sep 2015.
  • Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li. Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework. ASRU 2017
  • Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari. Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. Proc. ICASSP 2017
  • Yuki Saito, Shinnosuke Takamichi ,Hiroshi Saruwatari. Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks IEEE/ACM Trans Audio, Speech, and Language Processing, 26(1) Jan 2018 DOI 10.1109/TASLP.2017.2761547
Miscellaneous: liveness detection; privacy
  • Linghan Zhang, Sheng Tan, Jie Yang, Yingying Chen. VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones CCS’16, October 24-28, 2016, Vienna, Austria
  • Linghan Zhang, Sheng Tan, Jie Yang. Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication. CCS’17, October 30-November 3, 2017, Dallas, TX, USA
  • Sree Hari Krishnan Parthasarathi. Privacy-Sensitive Audio Features for Conversational Speech Processing. PhD THÈSE NO 5234 (2011) EPFL, Switzerland
Excellent 0
Very helpful 0
Quite helpful 1
Slightly helpful 0
Confusing 0
No rating 0
My brain hurts 0
Really quite difficult 0
Getting harder 0
Just right 1
Pretty simple 0
No rating 0