SlideShare a Scribd company logo
1 of 16
Download to read offline
Evaluations of Deletion-Based Method
     and Mixing-Mased Method
         for Audio CAPTCHAs


 Takuya NISHIMOTO (Univ. Tokyo, Japan)
  Takayuki WATANABE (TWCU, Japan)
              @nishimotz

                                         1
CAPTCHA
   Completely Automated Public Turing test
    to tell Computers and Humans Apart
       popular security techniques on the Web
            prevent automated programs from abusing
       image-based CAPTCHAs
            image containing distorted characters
            preventing use of persons with visual disability
       audio CAPTCHAs were created
   create better audio CAPTCHA tasks
       safeness: the difference of recognition performance
       usability: mental workload of human in listening speech



                                                                  2
Performance gap model
   performance of machine should be lower
       than the intelligibility of human
   gap: safeness                    100
       should be large                                     Human



                                      Intelligibility (%)
   exposed ratio (ER)
       0%: random answer                                                      ASR
            chance-level; no gap
       100%: best guess
            easy for both; no gap
   practical condition
       0 < ER < 100
                                                  0         Exposed Ratio (%)        100
                                                            (Provided Information)
                                                                                           3
Safeness: ER control
   machine is becoming strong
       statistical ASR method is the mainstream
       supervised machine learning (Hidden Markov Models)
       teqniques to cope with the noise
   CAPTCHA tasks should be created systematically
       it should not be created by trial and error
       controllability of Exposed Ratio is essential
   Mixing-based method: best way to control ER?
       mixing noises / distorting signals
            can hide portion of information, however...
            difficult to measure the ER, performance is not easy to predict
       alternatives must be investiated
                                                                               4
Usability: Mental workload
   CAPTCHAs should not increase mental workload
   the workload may increase, if they are..
       difficult to listen / memorize the task
   long task (many charactors)
       difficult to remember
       safer, but higher mental workload
   requirements
       information can be obtained in short time, easily
   investigation required
       human auditory sensation
       language cognition

                                                            5
Top-down knowledge
   incomplete stimulus
       knowledge helps to guess the information
   visual sensation:
       if part of image is missing, or part of the word is hidden
       common knowledge can complement image
            about the character and the vocabulary
   speech perception:
       if "word familiarity" is high: easy to guess
   phonemic restoration
       may help the human listening



                                                                     6
Deletion-based method
   delete some parts on temporal axis little by little
       if every 30 msec over a period of 100 msec is replaced
        with silence, the 30% of the information was deleted
       if the ratio of remained sections go down, the degree of
        listening difficulty may increase.
   Exposed Ratio can be controlled easily
   however, not easy to understand....
                                            deletion (original)



                                            Festival engine
                                            KAL (HMM-based)
                                                                   7
Phonemic restration
   interrupted speech and noise maskers combined
       the fence effect
       continuity of speech signal perceived
       may help human listening
       does not affect machine performance
   expected to enlarge the gap
       performance difference of human and machine

                                           deletion +
                                           phonemic restration




                                                                 8
NASA-TLX evaluation
   mental workload
       rating 6 subscales
            Mental, Physical, and Temporal
             Demands, Frustration, Effort, and
             Performance
       range: 0-100
   weights of subscales (6-1)
       for each participant
       placing an order
        how the 6 dimensions are related
        to personal definition of workload
   weighted workload (WWL)

                                                 9
Deletion vs Mixing (Exp1)
   objective: compare intelligibility and mental workload
       Deletion-Based Method (DBM)
       Mixing-Based Method (MBM)
            effect of SNR (signal-to-noise ratio) in MBM
   human intelligibility test
       75 utterances: 3,4,5 digits numbers (3 x 25)
            Japanese recorded speech
       subjects: 15 (5 x 3) undergraduate students
       mental workload (WWL) by NASA-TLX
            normalized within every subject
            their average and SD become 50 and 10 respectively



                                                                  10
Setup (Exp1)
   compare DBM and MBM within a person
       acoustic presentation: given by headphone
            at the subject’s preferred reference loudness level
   MBM disturbing signals
       utterances of Japanese sentences
        fragmented as short periods, shuffled and combined
    Group       Trial 1: D30             Trial 2: M0, Mm10, Mm20

    G1          DBM 30%                  MBM SNR 0dB

    G2          DBM 30%                  MBM SNR -10dB

    G3          DBM 30%                  MBM SNR -20dB

                                                                   11
Performance (Exp1)
      DBM(T1):marginally significant (p<0.1) (G1>G2)
      DBM 30% task is harder than MBM 0dB, -10dB, -20dB
      MBM(T2): effect of SNR conditions is significant, however,
      only between 0dB & -10dB (p<0.05) (G1>G2)
               DBM 30% vs                          DBM 30% vs                          DBM 30% vs
100             MBM 0dB                            MBM -10dB                           MBM -20dB

90

80

70

60

50

40                                                                                       T1          T2

30
        s101    s102   s103   s104   s105   s201    s202   s203   s204   s205   s301   s302   s303   s304   s305

                                                                                                                   12
Workload (Exp1)
    WWL: individual difference cancelled
            subtraction of DBM (D30) score
             from MBM (M0, Mm10 and Mm20) score was performed
                    DBM 30% vs                          MBM 30% vs                          DBM 30% vs
                     MBM 0dB                            MBM -10dB                           MBM -20dB
    20
    10
    0
             s101    s102   s103   s104   s105   s201    s202   s203   s204   s205   s301   s302   s303   s304   s305
-10
-20
-30
                                   WWL: MBM 0db < DBM 30% ?
-40
-50
                                   no significance (ANOVA)
-60
                                   MBM: task difficulty is not easy to control


                                                                                                                        13
Human vs Machine (Exp2)
   deletion-based method (DBM) is evaluated
   automatic speech recognition using HMM
      task: numbers (1-7 digits) in Japanese
      training: 8440 uttrances, 18 states, 20 mixtures
      evaluation: 1001 utterances, sentence recognition
   human intelligibility test
      75 utterances: 3,4,5 digits numbers (3 x 25)
      subjects: 17 undergraduate students
      mental workload (WWL) by NASA-TLX
          normalized within every subject




                                                           14
Results (Exp2)
   DBM: Exposed Ratio can controll the gap size
     100                                  70

      90                                                       Workload

                                          60
      80

      70
                                          50
      60

      50                 Human Ave. (%)   40

      40                 Machine (%)
                                          30
      30
                                                 30%     50%              70%
           30%     50%            70%

                                               DBM 30%
                                               gap is very large, however,
    Significant diffrerence (p<0.05)           workload is very high.


                                                                                15
Conclusion
   audio CAPTCHA task using phonemic restration
       deletion-based method (DBM)
   evaluation of CAPTCHA task
       performance + mental workload (NASA-TLX)
   comparison between DBM and MBM
       DBM: easier to controll the task
   future works
       ASR evaluation of mixing-based method
       improve the noise
       investigation of phonemic restration
            really improving performance? only decreasing workload?
       word familiarity, speech rate, synthesized speech, ...
                                                                       16

More Related Content

Viewers also liked

Dynamic range and the many ways producers manipulate dynamic range
Dynamic range and the many ways producers manipulate dynamic rangeDynamic range and the many ways producers manipulate dynamic range
Dynamic range and the many ways producers manipulate dynamic rangeSebastián La Rocca
 
Digital Audio Workstations - Lesson 1 Assignment
Digital Audio Workstations - Lesson 1 AssignmentDigital Audio Workstations - Lesson 1 Assignment
Digital Audio Workstations - Lesson 1 AssignmentArtur Shamsutdinov
 
Type and usage of important audio cable
Type and usage of important audio cableType and usage of important audio cable
Type and usage of important audio cableSebastián La Rocca
 
ITT TY Music Technology - Week 1 - Analogue & Digital Audio
ITT TY Music Technology - Week 1 - Analogue & Digital AudioITT TY Music Technology - Week 1 - Analogue & Digital Audio
ITT TY Music Technology - Week 1 - Analogue & Digital Audiophillthomas
 
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525Premier Farnell
 
AUDIO DIGITAL NUEVATEC-EMA
AUDIO DIGITAL NUEVATEC-EMAAUDIO DIGITAL NUEVATEC-EMA
AUDIO DIGITAL NUEVATEC-EMAmuevatecema
 
Optimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizerOptimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizera3labdsp
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approacha3labdsp
 
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015Philip Desenne
 
Intro to Music Production: assignment 1 (microphone types and polar patterns)
Intro to Music Production: assignment 1 (microphone types and polar patterns)Intro to Music Production: assignment 1 (microphone types and polar patterns)
Intro to Music Production: assignment 1 (microphone types and polar patterns)Janice63
 
Practical Applications of Digital Audio Networking
Practical Applications of Digital Audio NetworkingPractical Applications of Digital Audio Networking
Practical Applications of Digital Audio NetworkingBob Vanden Burgt
 
Digital audio recording
Digital audio recording Digital audio recording
Digital audio recording music_hayes
 
Intro to Compression: Audio and Video Optimization for Learning
Intro to Compression: Audio and Video Optimization for LearningIntro to Compression: Audio and Video Optimization for Learning
Intro to Compression: Audio and Video Optimization for LearningNick Floro
 
Analogue & Digital
Analogue & DigitalAnalogue & Digital
Analogue & Digitalk13086
 

Viewers also liked (20)

Dynamic range and the many ways producers manipulate dynamic range
Dynamic range and the many ways producers manipulate dynamic rangeDynamic range and the many ways producers manipulate dynamic range
Dynamic range and the many ways producers manipulate dynamic range
 
iPad productivity usage 101 (basics)
iPad productivity usage 101 (basics)iPad productivity usage 101 (basics)
iPad productivity usage 101 (basics)
 
Digital Audio Workstations - Lesson 1 Assignment
Digital Audio Workstations - Lesson 1 AssignmentDigital Audio Workstations - Lesson 1 Assignment
Digital Audio Workstations - Lesson 1 Assignment
 
Microphone basics
Microphone basicsMicrophone basics
Microphone basics
 
Type and usage of important audio cable
Type and usage of important audio cableType and usage of important audio cable
Type and usage of important audio cable
 
ITT TY Music Technology - Week 1 - Analogue & Digital Audio
ITT TY Music Technology - Week 1 - Analogue & Digital AudioITT TY Music Technology - Week 1 - Analogue & Digital Audio
ITT TY Music Technology - Week 1 - Analogue & Digital Audio
 
Mixer v1.0.3
Mixer v1.0.3Mixer v1.0.3
Mixer v1.0.3
 
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525
 
AUDIO DIGITAL NUEVATEC-EMA
AUDIO DIGITAL NUEVATEC-EMAAUDIO DIGITAL NUEVATEC-EMA
AUDIO DIGITAL NUEVATEC-EMA
 
Optimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizerOptimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizer
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approach
 
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015
 
Intro to Music Production: assignment 1 (microphone types and polar patterns)
Intro to Music Production: assignment 1 (microphone types and polar patterns)Intro to Music Production: assignment 1 (microphone types and polar patterns)
Intro to Music Production: assignment 1 (microphone types and polar patterns)
 
Practical Applications of Digital Audio Networking
Practical Applications of Digital Audio NetworkingPractical Applications of Digital Audio Networking
Practical Applications of Digital Audio Networking
 
Mixing fundamentals
Mixing fundamentalsMixing fundamentals
Mixing fundamentals
 
Guide to mixing
Guide to mixingGuide to mixing
Guide to mixing
 
Digital audio recording
Digital audio recording Digital audio recording
Digital audio recording
 
Intro to Compression: Audio and Video Optimization for Learning
Intro to Compression: Audio and Video Optimization for LearningIntro to Compression: Audio and Video Optimization for Learning
Intro to Compression: Audio and Video Optimization for Learning
 
Analogue & Digital
Analogue & DigitalAnalogue & Digital
Analogue & Digital
 
Audio spotlighting
Audio spotlightingAudio spotlighting
Audio spotlighting
 

Similar to Nishimoto icchp2010

Tracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real TimeTracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real TimeCameron Craddock
 
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelIDES Editor
 
Iberspeech2012
Iberspeech2012Iberspeech2012
Iberspeech2012joseangl
 
PR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural NetworksPR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural Networksjaewon lee
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]威華 王
 
Project - Sound Model Similarity Search
Project - Sound Model Similarity SearchProject - Sound Model Similarity Search
Project - Sound Model Similarity SearchSudarshan Bala
 
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT sipij
 
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Carlos Toxtli
 
Presentation at SMI 2023
Presentation at SMI 2023Presentation at SMI 2023
Presentation at SMI 2023Joaquim Jorge
 
BIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.pptBIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.pptgrssieee
 
Performance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmmPerformance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmmAbdullah al Mamun
 
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser Access
Non-Linear Optimization Schemefor Non-Orthogonal Multiuser AccessNon-Linear Optimization Schemefor Non-Orthogonal Multiuser Access
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser AccessVladimir Lyashev
 
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISINGWAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISINGIJNSA Journal
 
Petar Petrov MSc thesis defense
Petar Petrov MSc thesis defensePetar Petrov MSc thesis defense
Petar Petrov MSc thesis defensePetar Petrov
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
 
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Shamman Noor Shoudha
 
Algorithm for Lossy Image Compression using FPGA
Algorithm for Lossy Image Compression using FPGAAlgorithm for Lossy Image Compression using FPGA
Algorithm for Lossy Image Compression using FPGAMistral Solutions
 

Similar to Nishimoto icchp2010 (20)

Tracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real TimeTracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real Time
 
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
 
Iberspeech2012
Iberspeech2012Iberspeech2012
Iberspeech2012
 
project_final
project_finalproject_final
project_final
 
PR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural NetworksPR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural Networks
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 
Project - Sound Model Similarity Search
Project - Sound Model Similarity SearchProject - Sound Model Similarity Search
Project - Sound Model Similarity Search
 
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
 
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...
 
Presentation at SMI 2023
Presentation at SMI 2023Presentation at SMI 2023
Presentation at SMI 2023
 
BIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.pptBIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.ppt
 
Performance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmmPerformance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmm
 
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser Access
Non-Linear Optimization Schemefor Non-Orthogonal Multiuser AccessNon-Linear Optimization Schemefor Non-Orthogonal Multiuser Access
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser Access
 
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISINGWAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
 
Petar Petrov MSc thesis defense
Petar Petrov MSc thesis defensePetar Petrov MSc thesis defense
Petar Petrov MSc thesis defense
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
 
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
 
Algorithm for Lossy Image Compression using FPGA
Algorithm for Lossy Image Compression using FPGAAlgorithm for Lossy Image Compression using FPGA
Algorithm for Lossy Image Compression using FPGA
 

More from Takuya Nishimoto

221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ている221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ているTakuya Nishimoto
 
220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集Takuya Nishimoto
 
220126 python-datalake-spark
220126 python-datalake-spark220126 python-datalake-spark
220126 python-datalake-sparkTakuya Nishimoto
 
211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解する211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解するTakuya Nishimoto
 
211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10Takuya Nishimoto
 
210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれから210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれからTakuya Nishimoto
 
210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPython210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPythonTakuya Nishimoto
 
210526 Power Automate Desktop Python
210526 Power Automate Desktop Python210526 Power Automate Desktop Python
210526 Power Automate Desktop PythonTakuya Nishimoto
 
191208 python-kansai-nishimoto
191208 python-kansai-nishimoto191208 python-kansai-nishimoto
191208 python-kansai-nishimotoTakuya Nishimoto
 
191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimoto191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimotoTakuya Nishimoto
 
190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjp190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjpTakuya Nishimoto
 

More from Takuya Nishimoto (20)

221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ている221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ている
 
220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集
 
220126 python-datalake-spark
220126 python-datalake-spark220126 python-datalake-spark
220126 python-datalake-spark
 
211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解する211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解する
 
211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10
 
210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれから210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれから
 
210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPython210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPython
 
210728 mpy
210728 mpy210728 mpy
210728 mpy
 
210630 python
210630 python210630 python
210630 python
 
210526 Power Automate Desktop Python
210526 Power Automate Desktop Python210526 Power Automate Desktop Python
210526 Power Automate Desktop Python
 
210428 python
210428 python210428 python
210428 python
 
200918 hannari-python
200918 hannari-python200918 hannari-python
200918 hannari-python
 
200429 python
200429 python200429 python
200429 python
 
200325 flask
200325 flask200325 flask
200325 flask
 
200208 osh-nishimoto-v2
200208 osh-nishimoto-v2200208 osh-nishimoto-v2
200208 osh-nishimoto-v2
 
191208 python-kansai-nishimoto
191208 python-kansai-nishimoto191208 python-kansai-nishimoto
191208 python-kansai-nishimoto
 
191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimoto191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimoto
 
191114 iotlt-nishimoto
191114 iotlt-nishimoto191114 iotlt-nishimoto
191114 iotlt-nishimoto
 
191030 anna-with-python
191030 anna-with-python191030 anna-with-python
191030 anna-with-python
 
190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjp190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjp
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Nishimoto icchp2010

  • 1. Evaluations of Deletion-Based Method and Mixing-Mased Method for Audio CAPTCHAs Takuya NISHIMOTO (Univ. Tokyo, Japan) Takayuki WATANABE (TWCU, Japan) @nishimotz 1
  • 2. CAPTCHA  Completely Automated Public Turing test to tell Computers and Humans Apart  popular security techniques on the Web  prevent automated programs from abusing  image-based CAPTCHAs  image containing distorted characters  preventing use of persons with visual disability  audio CAPTCHAs were created  create better audio CAPTCHA tasks  safeness: the difference of recognition performance  usability: mental workload of human in listening speech 2
  • 3. Performance gap model  performance of machine should be lower  than the intelligibility of human  gap: safeness 100  should be large Human Intelligibility (%)  exposed ratio (ER)  0%: random answer ASR  chance-level; no gap  100%: best guess  easy for both; no gap  practical condition  0 < ER < 100 0 Exposed Ratio (%) 100 (Provided Information) 3
  • 4. Safeness: ER control  machine is becoming strong  statistical ASR method is the mainstream  supervised machine learning (Hidden Markov Models)  teqniques to cope with the noise  CAPTCHA tasks should be created systematically  it should not be created by trial and error  controllability of Exposed Ratio is essential  Mixing-based method: best way to control ER?  mixing noises / distorting signals  can hide portion of information, however...  difficult to measure the ER, performance is not easy to predict  alternatives must be investiated 4
  • 5. Usability: Mental workload  CAPTCHAs should not increase mental workload  the workload may increase, if they are..  difficult to listen / memorize the task  long task (many charactors)  difficult to remember  safer, but higher mental workload  requirements  information can be obtained in short time, easily  investigation required  human auditory sensation  language cognition 5
  • 6. Top-down knowledge  incomplete stimulus  knowledge helps to guess the information  visual sensation:  if part of image is missing, or part of the word is hidden  common knowledge can complement image  about the character and the vocabulary  speech perception:  if "word familiarity" is high: easy to guess  phonemic restoration  may help the human listening 6
  • 7. Deletion-based method  delete some parts on temporal axis little by little  if every 30 msec over a period of 100 msec is replaced with silence, the 30% of the information was deleted  if the ratio of remained sections go down, the degree of listening difficulty may increase.  Exposed Ratio can be controlled easily  however, not easy to understand.... deletion (original) Festival engine KAL (HMM-based) 7
  • 8. Phonemic restration  interrupted speech and noise maskers combined  the fence effect  continuity of speech signal perceived  may help human listening  does not affect machine performance  expected to enlarge the gap  performance difference of human and machine deletion + phonemic restration 8
  • 9. NASA-TLX evaluation  mental workload  rating 6 subscales  Mental, Physical, and Temporal Demands, Frustration, Effort, and Performance  range: 0-100  weights of subscales (6-1)  for each participant  placing an order how the 6 dimensions are related to personal definition of workload  weighted workload (WWL) 9
  • 10. Deletion vs Mixing (Exp1)  objective: compare intelligibility and mental workload  Deletion-Based Method (DBM)  Mixing-Based Method (MBM)  effect of SNR (signal-to-noise ratio) in MBM  human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  Japanese recorded speech  subjects: 15 (5 x 3) undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject  their average and SD become 50 and 10 respectively 10
  • 11. Setup (Exp1)  compare DBM and MBM within a person  acoustic presentation: given by headphone  at the subject’s preferred reference loudness level  MBM disturbing signals  utterances of Japanese sentences fragmented as short periods, shuffled and combined Group Trial 1: D30 Trial 2: M0, Mm10, Mm20 G1 DBM 30% MBM SNR 0dB G2 DBM 30% MBM SNR -10dB G3 DBM 30% MBM SNR -20dB 11
  • 12. Performance (Exp1) DBM(T1):marginally significant (p<0.1) (G1>G2) DBM 30% task is harder than MBM 0dB, -10dB, -20dB MBM(T2): effect of SNR conditions is significant, however, only between 0dB & -10dB (p<0.05) (G1>G2) DBM 30% vs DBM 30% vs DBM 30% vs 100 MBM 0dB MBM -10dB MBM -20dB 90 80 70 60 50 40 T1 T2 30 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 12
  • 13. Workload (Exp1)  WWL: individual difference cancelled  subtraction of DBM (D30) score from MBM (M0, Mm10 and Mm20) score was performed DBM 30% vs MBM 30% vs DBM 30% vs MBM 0dB MBM -10dB MBM -20dB 20 10 0 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 -10 -20 -30 WWL: MBM 0db < DBM 30% ? -40 -50 no significance (ANOVA) -60 MBM: task difficulty is not easy to control 13
  • 14. Human vs Machine (Exp2)  deletion-based method (DBM) is evaluated  automatic speech recognition using HMM  task: numbers (1-7 digits) in Japanese  training: 8440 uttrances, 18 states, 20 mixtures  evaluation: 1001 utterances, sentence recognition  human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  subjects: 17 undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject 14
  • 15. Results (Exp2)  DBM: Exposed Ratio can controll the gap size 100 70 90 Workload 60 80 70 50 60 50 Human Ave. (%) 40 40 Machine (%) 30 30 30% 50% 70% 30% 50% 70% DBM 30% gap is very large, however, Significant diffrerence (p<0.05) workload is very high. 15
  • 16. Conclusion  audio CAPTCHA task using phonemic restration  deletion-based method (DBM)  evaluation of CAPTCHA task  performance + mental workload (NASA-TLX)  comparison between DBM and MBM  DBM: easier to controll the task  future works  ASR evaluation of mixing-based method  improve the noise  investigation of phonemic restration  really improving performance? only decreasing workload?  word familiarity, speech rate, synthesized speech, ... 16