Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Neural Netw ; 139: 105-117, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33684609

RESUMO

Recently, we have witnessed Deep Learning methodologies gaining significant attention for severity-based classification of dysarthric speech. Detecting dysarthria, quantifying its severity, are of paramount importance in various real-life applications, such as the assessment of patients' progression in treatments, which includes an adequate planning of their therapy and the improvement of speech-based interactive systems in order to handle pathologically-affected voices automatically. Notably, current speech-powered tools often deal with short-duration speech segments and, consequently, are less efficient in dealing with impaired speech, even by using Convolutional Neural Networks (CNNs). Thus, detecting dysarthria severity-level based on short speech segments might help in improving the performance and applicability of those systems. To achieve this goal, we propose a novel Residual Network (ResNet)-based technique which receives short-duration speech segments as input. Statistically meaningful objective analysis of our experiments, reported over standard Universal Access corpus, exhibits average values of 21.35% and 22.48% improvement, compared to the baseline CNN, in terms of classification accuracy and F1-score, respectively. For additional comparisons, tests with Gaussian Mixture Models and Light CNNs were also performed. Overall, the values of 98.90% and 98.00% for classification accuracy and F1-score, respectively, were obtained with the proposed ResNet approach, confirming its efficacy and reassuring its practical applicability.


Assuntos
Disartria/classificação , Disartria/diagnóstico , Redes Neurais de Computação , Índice de Gravidade de Doença , Interface para o Reconhecimento da Fala , Humanos , Distribuição Normal , Fala/fisiologia , Interface para o Reconhecimento da Fala/normas , Fatores de Tempo
3.
Rev. Investig. Innov. Cienc. Salud ; 3(2): 98-118, 2021. ilus
Artigo em Espanhol | LILACS, COLNAL | ID: biblio-1392911

RESUMO

La acústica forense es una disciplina de la criminalística que ha alcanzado una ma-durez analítica que obliga a que el perito en análisis de voz se especialice en adquirir conocimientos en fonética, tecnologías de sonido, habla, voz, lenguaje, patologías del habla y la voz, así como procesamiento de la señal sonora. Cuando un dictamen deba ser realizado por un profesional de la salud completamente ajeno a la técnica legal, se tropieza con una falta de protocolos, métodos y procedimientos de trabajo que le permitan entregar un informe técnico, válido y validado para la realización de una entrevista y su posterior análisis comparativo de voces, lo que promueve la necesidad de elaborar una ruta o guía metodológica a través de medios académicos físicos o electrónicos para el desarrollo de este conocimiento y su difusión profesional y científica


Forensic acoustics is a criminalistics discipline that has reached an analytical maturity that requires the expert in voice analysis to specialize in acquiring knowledge in pho-netics, sound technologies, speech, voice, language, speech, and voice pathologies, as well as sound signal processing. When an opinion must be made by a health profes-sional completely unrelated to the legal technique, he encounters a lack of protocols, methods, and work procedures that allow him to deliver a technical, valid, and vali-dated report for conducting an interview and its subsequent comparative analysis of voices, which promotes the need to develop a methodological route or guide through physical or electronic academic means for the development of this knowledge and its professional and scientific dissemination


Assuntos
Interface para o Reconhecimento da Fala , Reconhecimento de Voz , Voz , Qualidade da Voz/fisiologia , Interface para o Reconhecimento da Fala/normas , Disartria , Reconhecimento de Voz/fisiologia
4.
Curr Alzheimer Res ; 17(7): 658-666, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33032509

RESUMO

BACKGROUND: Current conventional cognitive assessments are limited in their efficiency and sensitivity, often relying on a single score such as the total correct items. Typically, multiple features of response go uncaptured. OBJECTIVES: We aim to explore a new set of automatically derived features from the Digit Span (DS) task that address some of the drawbacks in the conventional scoring and are also useful for distinguishing subjects with Mild Cognitive Impairment (MCI) from those with intact cognition. METHODS: Audio-recordings of the DS tests administered to 85 subjects (22 MCI and 63 healthy controls, mean age 90.2 years) were transcribed using an Automatic Speech Recognition (ASR) system. Next, five correctness measures were generated from Levenshtein distance analysis of responses: number correct, incorrect, deleted, inserted, and substituted words compared to the test item. These per-item features were aggregated across all test items for both Forward Digit Span (FDS) and Backward Digit Span (BDS) tasks using summary statistical functions, constructing a global feature vector representing the detailed assessment of each subject's response. A support vector machine classifier distinguished MCI from cognitively intact participants. RESULTS: Conventional DS scores did not differentiate MCI participants from controls. The automated multi-feature DS-derived metric achieved 73% on AUC-ROC of the SVM classifier, independent of additional clinical features (77% when combined with demographic features of subjects); well above chance, 50%. CONCLUSION: Our analysis verifies the effectiveness of introduced measures, solely derived from the DS task, in the context of differentiating subjects with MCI from those with intact cognition.


Assuntos
Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/psicologia , Diagnóstico por Computador/métodos , Testes Neuropsicológicos , Estudo de Prova de Conceito , Interface para o Reconhecimento da Fala , Idoso , Idoso de 80 Anos ou mais , Disfunção Cognitiva/fisiopatologia , Diagnóstico por Computador/normas , Diagnóstico Diferencial , Feminino , Humanos , Masculino , Testes Neuropsicológicos/normas , Interface para o Reconhecimento da Fala/normas , Gravação em Fita/métodos , Gravação em Fita/normas
5.
J Med Internet Res ; 22(6): e14827, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32442129

RESUMO

BACKGROUND: Recent advances in natural language processing and artificial intelligence have led to widespread adoption of speech recognition technologies. In consumer health applications, speech recognition is usually applied to support interactions with conversational agents for data collection, decision support, and patient monitoring. However, little is known about the use of speech recognition in consumer health applications and few studies have evaluated the efficacy of conversational agents in the hands of consumers. In other consumer-facing tools, cognitive load has been observed to be an important factor affecting the use of speech recognition technologies in tasks involving problem solving and recall. Users find it more difficult to think and speak at the same time when compared to typing, pointing, and clicking. However, the effects of speech recognition on cognitive load when performing health tasks has not yet been explored. OBJECTIVE: The aim of this study was to evaluate the use of speech recognition for documentation in consumer digital health tasks involving problem solving and recall. METHODS: Fifty university staff and students were recruited to undertake four documentation tasks with a simulated conversational agent in a computer laboratory. The tasks varied in complexity determined by the amount of problem solving and recall required (simple and complex) and the input modality (speech recognition vs keyboard and mouse). Cognitive load, task completion time, error rate, and usability were measured. RESULTS: Compared to using a keyboard and mouse, speech recognition significantly increased the cognitive load for complex tasks (Z=-4.08, P<.001) and simple tasks (Z=-2.24, P=.03). Complex tasks took significantly longer to complete (Z=-2.52, P=.01) and speech recognition was found to be overall less usable than a keyboard and mouse (Z=-3.30, P=.001). However, there was no effect on errors. CONCLUSIONS: Use of a keyboard and mouse was preferable to speech recognition for complex tasks involving problem solving and recall. Further studies using a broader variety of consumer digital health tasks of varying complexity are needed to investigate the contexts in which use of speech recognition is most appropriate. The effects of cognitive load on task performance and its significance also need to be investigated.


Assuntos
Informática Aplicada à Saúde dos Consumidores/métodos , Laboratórios/normas , Resolução de Problemas/fisiologia , Interface para o Reconhecimento da Fala/normas , Adolescente , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
6.
J Acoust Soc Am ; 146(3): 1615, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31590492

RESUMO

Speech (syllable) rate estimation typically involves computing a feature contour based on sub-band energies having strong local maxima/peaks at syllable nuclei, which are detected with the help of voicing decisions (VDs). While such a two-stage scheme works well in clean conditions, the estimated speech rate becomes less accurate in noisy condition particularly due to erroneous VDs and non-informative sub-bands mainly at low signal-to-noise ratios (SNR). This work proposes a technique to use VDs in the peak detection strategy in an SNR dependent manner. It also proposes a data-driven sub-band pruning technique to improve syllabic peaks of the feature contour in the presence of noise. Further, this paper generalizes both the peak detection and the sub-band pruning technique for unknown noise and/or unknown SNR conditions. Experiments are performed in clean and 20, 10, and 0 dB SNR conditions separately using Switchboard, TIMIT, and CTIMIT corpora under five additive noises: white, car, high-frequency-channel, cockpit, and babble. Experiments are also carried out in test conditions at unseen SNRs of -5 and 5 dB with four unseen additive noises: factory, sub-way, street, and exhibition. The proposed method outperforms the best of the existing techniques in clean and noisy conditions for three corpora.


Assuntos
Interface para o Reconhecimento da Fala/normas , Razão Sinal-Ruído , Acústica da Fala , Voz
7.
J Speech Lang Hear Res ; 62(7): 2203-2212, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31200617

RESUMO

Purpose The application of Chinese Mandarin electrolaryngeal (EL) speech for laryngectomees has been limited by its drawbacks such as single fundamental frequency, mechanical sound, and large radiation noise. To improve the intelligibility of Chinese Mandarin EL speech, a new perspective using the automatic speech recognition (ASR) system was proposed, which can convert EL speech into healthy speech, if combined with text-to-speech. Method An ASR system was designed to recognize EL speech based on a deep learning model WaveNet and the connectionist temporal classification (WaveNet-CTC). This system mainly consists of 3 parts: the acoustic model, the language model, and the decoding model. The acoustic features are extracted during speech preprocessing, and 3,230 utterances of EL speech mixed with 10,000 utterances of healthy speech are used to train the ASR system. Comparative experiment was designed to evaluate the performance of the proposed method. Results The results show that the proposed ASR system has higher stability and generalizability compared with the traditional methods, manifesting superiority in terms of Chinese characters, Chinese words, short sentences, and long sentences. Phoneme confusion occurs more easily in the stop and affricate of EL speech than the healthy speech. However, the highest accuracy of the ASR could reach 83.24% when 3,230 utterances of EL speech were used to train the ASR system. Conclusions This study indicates that EL speech could be recognized effectively by the ASR based on WaveNet-CTC. This proposed method has a higher generalization performance and better stability than the traditional methods. A higher accuracy of the ASR system based on WaveNet-CTC can be obtained, which means that EL speech can be converted into healthy speech. Supplemental Material https://doi.org/10.23641/asha.8250830.


Assuntos
Inteligibilidade da Fala/fisiologia , Interface para o Reconhecimento da Fala/normas , Voz Alaríngea , China , Aprendizado Profundo , Humanos , Laringe Artificial , Modelos Teóricos , Fonética , Acústica da Fala
8.
J Acoust Soc Am ; 145(3): 1493, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-31067946

RESUMO

The effects on speech intelligibility and sound quality of two noise-reduction algorithms were compared: a deep recurrent neural network (RNN) and spectral subtraction (SS). The RNN was trained using sentences spoken by a large number of talkers with a variety of accents, presented in babble. Different talkers were used for testing. Participants with mild-to-moderate hearing loss were tested. Stimuli were given frequency-dependent linear amplification to compensate for the individual hearing losses. A paired-comparison procedure was used to compare all possible combinations of three conditions. The conditions were: speech in babble with no processing (NP) or processed using the RNN or SS. In each trial, the same sentence was played twice using two different conditions. The participants indicated which one was better and by how much in terms of speech intelligibility and (in separate blocks) sound quality. Processing using the RNN was significantly preferred over NP and over SS processing for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. SS processing was not significantly preferred over NP for either subjective intelligibility or sound quality. Objective computational measures of speech intelligibility predicted better intelligibility for RNN than for SS or NP.


Assuntos
Inteligibilidade da Fala , Interface para o Reconhecimento da Fala/normas , Idoso , Feminino , Auxiliares de Audição/normas , Humanos , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Percepção da Fala
9.
J Acoust Soc Am ; 145(3): 1640, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-31067961

RESUMO

Hearing impaired persons, and particularly hearing-aid and cochlear implant (CI) users, often have difficulties communicating over the telephone. The intelligibility of classical so-called narrowband telephone speech is considerably lower than the intelligibility of face-to-face speech. This is partly because of the lack of visual cues, limited telephone bandwidth, and background noise. This work proposes to artificially extend the standard bandwidth of telephone speech to improve its intelligibility for CI users. Artificial speech bandwidth extension (ABE) is obtained through a front-end signal processing algorithm that estimates missing speech components in the high-frequency spectrum from learned data. A state-of-the-art ABE approach, which already led to superior speech quality for people with normal hearing, is used for processing telephone speech for CI users. Two different parameterizations are evaluated, one being more aggressive than the other. Nine CI users were tested with and without the proposed ABE algorithm. The experimental evaluation shows a significant improvement in speech intelligibility and speech quality over the phone for both versions of the ABE algorithm. These promising results support the potential of ABE, which could be incorporated into a commercial speech processor or a smartphone-based pre-processor that streams the telephone speech to the CI.


Assuntos
Implantes Cocleares/normas , Acústica da Fala , Inteligibilidade da Fala , Interface para o Reconhecimento da Fala/normas , Telefone , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
10.
J Acoust Soc Am ; 145(1): 338, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30710939

RESUMO

This paper describes a vision-referential speech enhancement of an audio signal using mask information captured as visual data. Smartphones and tablet devices have become popular in recent years. Most of them not only have a microphone but also a camera. Although the frame rate of the camera in such devices is very low compared to the audio signal from the microphone, it will be useful to enhance the speech signal if both signals are used adequately. In the proposed method, the speaker broadcasts not only his/her speech signal through a loudspeaker but also its mask information through a display. The receiver can enhance the speech combining the speech signal captured by the microphone and the reference signal captured by the camera. Some experiments were conducted to evaluate the effectiveness of the proposed method compared to a typical sparse approach. It was confirmed that the speech could be enhanced even when there were different kinds of noise and a high level of real noise in the environments. Experiments were also conducted to check the sound quality of the proposed method. They were compared to clear audio data compressed with various bps mp3 format. The sound quality was sufficient for practical application.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Processamento de Linguagem Natural , Interface para o Reconhecimento da Fala/normas , Adulto , Feminino , Humanos , Processamento de Imagem Assistida por Computador/normas , Masculino , Razão Sinal-Ruído , Percepção da Fala
11.
J Acoust Soc Am ; 145(1): 131, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30710945

RESUMO

The automatic analysis of conversational audio remains difficult, in part, due to the presence of multiple talkers speaking in turns, often with significant intonation variations and overlapping speech. The majority of prior work on psychoacoustic speech analysis and system design has focused on single-talker speech or multi-talker speech with overlapping talkers (for example, the cocktail party effect). There has been much less focus on how listeners detect a change in talker or in probing the acoustic features significant in characterizing a talker's voice in conversational speech. This study examines human talker change detection (TCD) in multi-party speech utterances using a behavioral paradigm in which listeners indicate the moment of perceived talker change. Human reaction times in this task can be well-estimated by a model of the acoustic feature distance among speech segments before and after a change in talker, with estimation improving for models incorporating longer durations of speech prior to a talker change. Further, human performance is superior to several online and offline state-of-the-art machine TCD systems.


Assuntos
Processamento de Linguagem Natural , Percepção da Fala , Adulto , Feminino , Humanos , Masculino , Psicoacústica , Inteligibilidade da Fala , Interface para o Reconhecimento da Fala/normas
12.
IEEE Trans Neural Netw Learn Syst ; 30(1): 138-150, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29993561

RESUMO

Inspired by the behavior of humans talking in noisy environments, we propose an embodied embedded cognition approach to improve automatic speech recognition (ASR) systems for robots in challenging environments, such as with ego noise, using binaural sound source localization (SSL). The approach is verified by measuring the impact of SSL with a humanoid robot head on the performance of an ASR system. More specifically, a robot orients itself toward the angle where the signal-to-noise ratio (SNR) of speech is maximized for one microphone before doing an ASR task. First, a spiking neural network inspired by the midbrain auditory system based on our previous work is applied to calculate the sound signal angle. Then, a feedforward neural network is used to handle high levels of ego noise and reverberation in the signal. Finally, the sound signal is fed into an ASR system. For ASR, we use a system developed by our group and compare its performance with and without the support from SSL. We test our SSL and ASR systems on two humanoid platforms with different structural and material properties. With our approach we halve the sentence error rate with respect to the common downmixing of both channels. Surprisingly, the ASR performance is more than two times better when the angle between the humanoid head and the sound source allows sound waves to be reflected most intensely from the pinna to the ear microphone, rather than when sound waves arrive perpendicularly to the membrane.


Assuntos
Biomimética/métodos , Robótica/métodos , Localização de Som , Percepção da Fala , Interface para o Reconhecimento da Fala , Biomimética/normas , Humanos , Robótica/normas , Localização de Som/fisiologia , Percepção da Fala/fisiologia , Interface para o Reconhecimento da Fala/normas , Realidade Virtual
13.
Int J Med Inform ; 121: 39-52, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30545488

RESUMO

The overall purpose of automatic speech recognition systems is to make possible the interaction between humans and electronic devices through speech. For example, the content captured from user's speech using a microphone can be transcribed into text. In general, such systems should be able to overcome adversities such as noise, communication channel variability, speaker's age and accent, speech speed, concurrent speeches from other speakers and spontaneous speech. Despite this challenging scenario, this study aims to develop a Web System Prototype to generate medical reports through automatic speech recognition in the Brazilian Portuguese language. The prototype was developed by applying a Software Engineering technique named Delivery in Stage. During the conduction of this technique, we integrated the Google Web Speech API and Microsoft Bing Speech API into the prototype to increase the number of compatible platforms. These automatic speech recognition systems were individually evaluated in the task of transcribing the dictation of a medical area text by 30 volunteers. The recognition performance was evaluated according to the Word Error Rate measure. The Google system achieved an error rate of 12.30%, which was statistically significantly better (p-value <0.0001) than the Microsoft one: 17.68%. Conducting this work allowed us to conclude that these automatic speech recognition systems are compatible with the prototype and can be used in the medical field. The findings also suggest that, besides supporting medical reports construction, the Web System Prototype can be useful for purposes such as recording physicians' notes during a clinical procedure.


Assuntos
Documentação/métodos , Internet/estatística & dados numéricos , Erros Médicos/prevenção & controle , Sistemas Computadorizados de Registros Médicos/normas , Software , Interface para o Reconhecimento da Fala/normas , Fala/fisiologia , Adulto , Brasil , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
14.
JAMA Netw Open ; 1(3): e180530, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-30370424

RESUMO

IMPORTANCE: Accurate clinical documentation is critical to health care quality and safety. Dictation services supported by speech recognition (SR) technology and professional medical transcriptionists are widely used by US clinicians. However, the quality of SR-assisted documentation has not been thoroughly studied. OBJECTIVE: To identify and analyze errors at each stage of the SR-assisted dictation process. DESIGN SETTING AND PARTICIPANTS: This cross-sectional study collected a stratified random sample of 217 notes (83 office notes, 75 discharge summaries, and 59 operative notes) dictated by 144 physicians between January 1 and December 31, 2016, at 2 health care organizations using Dragon Medical 360 | eScription (Nuance). Errors were annotated in the SR engine-generated document (SR), the medical transcriptionist-edited document (MT), and the physician's signed note (SN). Each document was compared with a criterion standard created from the original audio recordings and medical record review. MAIN OUTCOMES AND MEASURES: Error rate; mean errors per document; error frequency by general type (eg, deletion), semantic type (eg, medication), and clinical significance; and variations by physician characteristics, note type, and institution. RESULTS: Among the 217 notes, there were 144 unique dictating physicians: 44 female (30.6%) and 10 unknown sex (6.9%). Mean (SD) physician age was 52 (12.5) years (median [range] age, 54 [28-80] years). Among 121 physicians for whom specialty information was available (84.0%), 35 specialties were represented, including 45 surgeons (37.2%), 30 internists (24.8%), and 46 others (38.0%). The error rate in SR notes was 7.4% (ie, 7.4 errors per 100 words). It decreased to 0.4% after transcriptionist review and 0.3% in SNs. Overall, 96.3% of SR notes, 58.1% of MT notes, and 42.4% of SNs contained errors. Deletions were most common (34.7%), then insertions (27.0%). Among errors at the SR, MT, and SN stages, 15.8%, 26.9%, and 25.9%, respectively, involved clinical information, and 5.7%, 8.9%, and 6.4%, respectively, were clinically significant. Discharge summaries had higher mean SR error rates than other types (8.9% vs 6.6%; difference, 2.3%; 95% CI, 1.0%-3.6%; P < .001). Surgeons' SR notes had lower mean error rates than other physicians' (6.0% vs 8.1%; difference, 2.2%; 95% CI, 0.8%-3.5%; P = .002). One institution had a higher mean SR error rate (7.6% vs 6.6%; difference, 1.0%; 95% CI, -0.2% to 2.8%; P = .10) but lower mean MT and SN error rates (0.3% vs 0.7%; difference, -0.3%; 95% CI, -0.63% to -0.04%; P = .03 and 0.2% vs 0.6%; difference, -0.4%; 95% CI, -0.7% to -0.2%; P = .003). CONCLUSIONS AND RELEVANCE: Seven in 100 words in SR-generated documents contain errors; many errors involve clinical information. That most errors are corrected before notes are signed demonstrates the importance of manual review, quality assurance, and auditing.


Assuntos
Erros Médicos/estatística & dados numéricos , Prontuários Médicos/estatística & dados numéricos , Prontuários Médicos/normas , Interface para o Reconhecimento da Fala/estatística & dados numéricos , Interface para o Reconhecimento da Fala/normas , Adulto , Idoso , Idoso de 80 Anos ou mais , Boston , Auditoria Clínica , Colorado , Estudos Transversais , Feminino , Humanos , Masculino , Sistemas Computadorizados de Registros Médicos , Pessoa de Meia-Idade , Médicos
15.
J Digit Imaging ; 31(5): 615-621, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-29713836

RESUMO

The aim of this study was to analyze retrospectively the influence of different acoustic and language models in order to determine the most important effects to the clinical performance of an Estonian language-based non-commercial radiology-oriented automatic speech recognition (ASR) system. An ASR system was developed for Estonian language in radiology domain by utilizing open-source software components (Kaldi toolkit, Thrax). The ASR system was trained with the real radiology text reports and dictations collected during development phases. The final version of the ASR system was tested by 11 radiologists who dictated 219 reports in total, in spontaneous manner in a real clinical environment. The audio files collected in the final phase were used to measure the performance of different versions of the ASR system retrospectively. ASR system versions were evaluated by word error rate (WER) for each speaker and modality and by WER difference for the first and the last version of the ASR system. Total average WER for the final version throughout all material was improved from 18.4% of the first version (v1) to 5.8% of the last (v8) version which corresponds to relative improvement of 68.5%. WER improvement was strongly related to modality and radiologist. In summary, the performance of the final ASR system version was close to optimal, delivering similar results to all modalities and being independent on user, the complexity of the radiology reports, user experience, and speech characteristics.


Assuntos
Idioma , Radiologia , Interface para o Reconhecimento da Fala/normas , Estônia , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos
16.
Behav Res Methods ; 50(6): 2597-2605, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29687235

RESUMO

Verbal responses are a convenient and naturalistic way for participants to provide data in psychological experiments (Salzinger, The Journal of General Psychology, 61(1),65-94:1959). However, audio recordings of verbal responses typically require additional processing, such as transcribing the recordings into text, as compared with other behavioral response modalities (e.g., typed responses, button presses, etc.). Further, the transcription process is often tedious and time-intensive, requiring human listeners to manually examine each moment of recorded speech. Here we evaluate the performance of a state-of-the-art speech recognition algorithm (Halpern et al., 2016) in transcribing audio data into text during a list-learning experiment. We compare transcripts made by human annotators to the computer-generated transcripts. Both sets of transcripts matched to a high degree and exhibited similar statistical properties, in terms of the participants' recall performance and recall dynamics that the transcripts captured. This proof-of-concept study suggests that speech-to-text engines could provide a cheap, reliable, and rapid means of automatically transcribing speech data in psychological experiments. Further, our findings open the door for verbal response experiments that scale to thousands of participants (e.g., administered online), as well as a new generation of experiments that decode speech on the fly and adapt experimental parameters based on participants' prior responses.


Assuntos
Pesquisa Comportamental/métodos , Pesquisa Comportamental/normas , Rememoração Mental , Interface para o Reconhecimento da Fala/normas , Fala , Adolescente , Feminino , Humanos , Masculino , Adulto Jovem
17.
J Med Syst ; 42(5): 89, 2018 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-29610981

RESUMO

Speech recognition is increasingly used in medical reporting. The aim of this article is to identify in the literature the strengths and weaknesses of this technology, as well as barriers to and facilitators of its implementation. A systematic review of systematic reviews was performed using PubMed, Scopus, the Cochrane Library and the Center for Reviews and Dissemination through August 2017. The gray literature has also been consulted. The quality of systematic reviews has been assessed with the AMSTAR checklist. The main inclusion criterion was use of speech recognition for medical reporting (front-end or back-end). A survey has also been conducted in Quebec, Canada, to identify the dissemination of this technology in this province, as well as the factors leading to the success or failure of its implementation. Five systematic reviews were identified. These reviews indicated a high level of heterogeneity across studies. The quality of the studies reported was generally poor. Speech recognition is not as accurate as human transcription, but it can dramatically reduce turnaround times for reporting. In front-end use, medical doctors need to spend more time on dictation and correction than required with human transcription. With speech recognition, major errors occur up to three times more frequently. In back-end use, a potential increase in productivity of transcriptionists was noted. In conclusion, speech recognition offers several advantages for medical reporting. However, these advantages are countered by an increased burden on medical doctors and by risks of additional errors in medical reports. It is also hard to identify for which medical specialties and which clinical activities the use of speech recognition will be the most beneficial.


Assuntos
Prontuários Médicos/normas , Interface para o Reconhecimento da Fala/normas , Humanos , Quebeque
19.
Health Informatics J ; 23(1): 3-13, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-26635322

RESUMO

Speech recognition software can increase the frequency of errors in radiology reports, which may affect patient care. We retrieved 213,977 speech recognition software-generated reports from 147 different radiologists and proofread them for errors. Errors were classified as "material" if they were believed to alter interpretation of the report. "Immaterial" errors were subclassified as intrusion/omission or spelling errors. The proportion of errors and error type were compared among individual radiologists, imaging subspecialty, and time periods. In all, 20,759 reports (9.7%) contained errors, of which 3992 (1.9%) were material errors. Among immaterial errors, spelling errors were more common than intrusion/omission errors ( p < .001). Proportion of errors and fraction of material errors varied significantly among radiologists and between imaging subspecialties ( p < .001). Errors were more common in cross-sectional reports, reports reinterpreting results of outside examinations, and procedural studies (all p < .001). Error rate decreased over time ( p < .001), which suggests that a quality control program with regular feedback may reduce errors.


Assuntos
Sistemas de Informação em Radiologia/normas , Projetos de Pesquisa/estatística & dados numéricos , Relatório de Pesquisa/normas , Semântica , Interface para o Reconhecimento da Fala/normas , Estudos Transversais , Documentação/métodos , Documentação/normas , Documentação/estatística & dados numéricos , Humanos , Radiologistas/normas , Radiologistas/estatística & dados numéricos , Sistemas de Informação em Radiologia/estatística & dados numéricos , Estudos Retrospectivos , Interface para o Reconhecimento da Fala/estatística & dados numéricos
20.
BMC Med Inform Decis Mak ; 16(1): 132, 2016 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-27756284

RESUMO

BACKGROUND: Speech recognition software might increase productivity in clinical documentation. However, low user satisfaction with speech recognition software has been observed. In this case study, an approach for implementing a speech recognition software package at a university-based outpatient department is presented. METHODS: Methods to create a specific dictionary for the context "sports medicine" and a shared vocabulary learning function are demonstrated. The approach is evaluated for user satisfaction (using a questionnaire before and 10 weeks after software implementation) and its impact on the time until the final medical document was saved into the system. RESULTS: As a result of implementing speech recognition software, the user satisfaction was not remarkably impaired. The median time until the final medical document was saved was reduced from 8 to 4 days. CONCLUSION: In summary, this case study illustrates how speech recognition can be implemented successfully when the user experience is emphasised.


Assuntos
Departamentos Hospitalares/métodos , Aplicações da Informática Médica , Pacientes Ambulatoriais , Interface para o Reconhecimento da Fala/normas , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...