Search | Global Index Medicus

Analysis on Applicability of Common Chinese Word Segmentation Software in Literature Study of Traditional Chinese Medicine Text / 世界科学技术-中医药现代化

Haifeng YANG; Mingliang CHEN; Zhen ZHAO.

World Science and Technology-Modernization of Traditional Chinese Medicine ; (12): 536-541, 2017.

Article in Chinese | WPRIM | ID: wpr-609189

ABSTRACT

This study was aimed to evaluate the applicability of common Chinese word segmentation software used in the literature study of traditional Chinese medicine (TCM) text,in order to put forward ideas on developing specialized TCM text word segmentation software.By means of installing and operating Chinese word segmentation software,the text segmentation experiment was conducted on TCM text samples.Aspects,such Chinese word segmentation accuracy,speed,maneuverability,reliability,extendibility,portability and other characteristics,were compared among different Chinese word segmentation software.The results showed that there were differences on the accuracy,speed,maneuverability,reliability,extendibility,portability among different Chinese word segmentation software.It was difficult to achieve best performance on different aspects by single software.Through the comparison of different Chinese word segmentation software,the Pan-Gu Segment software showed the best performance on accuracy,with good maneuverability,and high word segmentation efficiency,which was the most suitable for word segmentation in TCM text.It was concluded that developing specialized TCM text segmentation software may be the best solution to meet the requirement of text segmentation in TCM literature study.Basic studies should be strengthened from aspects,such as the construction of standard TCM copus,the completion of TCM dictionary base,the introduction,optimization and innovation of word segmentation algorithm,as well as the development of word segmentation software for TCM text.

Study on Automatic Word Segmentation for Traditional Chinese Medical Record Literature / 中国中医药信息杂志

Fan ZHANG; Xiaofeng LIU; Yan SUN.

Chinese Journal of Information on Traditional Chinese Medicine ; (12): 38-41, 2015.

Article in Chinese | WPRIM | ID: wpr-462563

ABSTRACT

Objective To study the automatic word segmentation scheme suitable for traditional Chinese medical record literature. Methods Hierarchical Hidden Markov Model was used as segmentation model. Totally 300 ancient medical record literature and 300 modern medical record literature were set as experimental subjects to establish the dictionary of traditional Chinese medicine and the test corpus, with a purpose to segment the words and evaluate of the results. Results Without using dictionary of traditional Chinese medicine, the word segmentation accuracy of two kinds of medical record literature was about 75%;the part-of-speech tagging accuracy of ancient medical literature was 56.74%, the modern medical literature accuracy was 64.81%. By using dictionary of tradition Chinese medicine, the word segmentation accuracy of ancient medical record literature was 90.73%, the modern medical record literature accuracy was 95.66%;the part-of-speech tagging accuracy of ancient medical record literature was 78.47%, the modern medical record literature accuracy was 91.45%, which was obviously higher than that of ancient medical record literature. Conclusion The current word segmentation scheme has initially solved the problem of word segmentation of traditional Chinese medical record literature and part-of-speech tagging of modern medical record literature. Part of speech tagging is basically correct, but part-of-speech tagging of ancient medical record literature tagging needs further study for many influencing factors.

La separación entre palabras en la escritura de niños que inician la escolaridad primaria / Word separation in the writing of children beginning primary school

Querejeta, Maira; Piacente, Telma; Guerrero Ortiz-Hernán, Bárbara; Alva Canto, Elda Alicia.

Interdisciplinaria ; 30(1): 45-64, ene.-jul. 2013. tab

Article in Spanish | LILACS | ID: lil-708511

ABSTRACT

En la escritura convencional de oraciones y de textos, las palabras están separadas por espacios en blanco. Sin embargo, en los inicios de la alfabetización suelen observarse hiposegmentaciones e hipersegmentaciones entre palabras. Las investigaciones sobre el tema son escasas, limitándose a registrar su aparición, sin llegar a establecer todos los tipos de uniones o separaciones. En función de ello, a partir de un diseño descriptivo transeccional correlacional, el propósito de este trabajo fue caracterizar el tipo y frecuencia de separaciones no convencionales entre palabras que aparecen durante el primer año de escolaridad en niños argentinos y mexicanos con expectativas de logro diferentes. En segundo lugar, examinar las posibles relaciones que guardan esas separaciones con el desempeño lector y el nivel de vocabulario. Para ello, se examinaron 30 niños argentinos y mexicanos que tenían entre 6 y 7 años de edad mediante pruebas de escritura, lectura y vocabulario. En el análisis lingüístico de las producciones infantiles se utilizó la noción gramatical de formante morfológico que posibilitó caracterizar la totalidad y distribución de las segmentaciones no convencionales. Los resultados más significativos aportan evidencia empírica de la presencia de tales segmentaciones en los niveles iniciales del aprendizaje y una distribución semejante del tipo de errores cometidos, independientemente de la nacionalidad y exigencias en la alfabetización. Se concluye, con las limitaciones del caso en razón del número de niños incluidos, que parecen constituir una etapa evolutiva, a ser considerada en futuras investigaciones y en estrategias de enseñanza de la escritura de oraciones y textos.

In conventional writing of sentences and texts, the words are separated by blank spaces. However, the beginnings of literacy often observed hypo-segmentations and hyper-segmentations between words. Hypo-segmentation occurs when adequate separation between words is missing, whereas hyper-segmentation takes place when a word is separated incorrectly and a space is inserted between two of its elements. Research on this topic is scarce and has registered the appearance of arbitrary joints and separations both in content and functional words, but has failed to account for all of the phenomena observed in children's writing. This paper has two aims: first of all, characterize the type and frequency of the unconventional separations between words written at the end of the first year of primary schooling in Mexican and Argentinian children, by reason of the different literacy modalities of both countries. The expectations prescribed in the curricula of both countries are different: Mexico expected children to learn to read and write words and sentences at the end of preschool. In Argentina those expectations are related to the first cycle of primary schooling. Regarding the teaching methodologies to which the children screened were exposed, although it is not possible to characterize them as exhaustive, since teachers often introduce different variants, they are predominantly phonic in the case of Mexico, and derived from the psychogenesis of the language written in the case of Argentina. Second, identifying the possible connection between these phenomena and variables as reading, writing and vocabulary performance range. Therefore, thirty Argentinian and Mexican children between six and seven years of age (mean age: six years and nine months) were examined through (both spontaneous and dictated) writing, reading and vocabulary tests. The notion of morphological formants was used to linguistic analysis of child productions; this allowed for the categorization of all the unconventional segmentations which came up. Among some of the results of this study, we would like that the proposed classification turned out to be extremely useful and even more adequate than other classifications in terms of word -function and word -content. Secondly, it should be pointed out that unconventional separations were more frequently observed in dictations. Thirdly, Mexican children obtained higher scores when reading words and pseudo-words, which demonstrate that they have better control of the alphabetic principle in comparison with the Argentinian ones. Fourthly, and in agreement with other research, hypo-segmentation frequency of appearance turned out to be higher than hyper-segmentation, both groups displaying similar characteristics. On the one hand, the average scores for the mistakes made failed to show any significant differences in terms of children's nationalities. On the other hand, even though such hypo-segmentations mostly appeared due to union of a lexical and a grammatical formant, in other cases only lexical or grammatical formants were joined, which proves once again how useful the categorization was. Finally, correlations with the levels of word reading and writing, reading comprehension and vocabulary range turned out to be low. In connection with this, with the limitations of the case number, it can be pointed out that wrong segmentation goes beyond the children's performance within such variables and seems to be an evolutive stage within the progressive control of reflection upon and conscious manipulation of lexical items. It would prove interesting to increase the study to one larger number of participants and made longitudinal studies, during the schooling process enabling to clear out the evolution of the phenomena examined throughout the schooling period. It should be pointed out that unconventional word separations are not always spotted at school and that their persistence, unless proper intervention strategies are used, may have subsequent repercussions on text production, which corresponds to more advanced schooling levels.

Research of Hospital Infection Control Information System based on Word Segmentation / 中华医院感染学杂志

Jinchang LENG; Kun PENG; Ming WU; Yubin XING.

Chinese Journal of Nosocomiology ; (24)2009.

Article in Chinese | WPRIM | ID: wpr-596341

ABSTRACT

OBJECTIVE In order to take further study of principles and risk factors of infection breakouts at hospitals to reduce infection rates,and provide all levels of management with infectious information.METHODS After collecting the data of hospital infections and analyzing the datum structure of the hospital information system,we developed the software of nosocomial infection information management system using word `segmentation.RESULTS The subsystem could monitor infection of all inpatients,collect and analyze the data of infection examination,patients infected,usage of antibiotics and intervening operations,therefore,relieve information administers from collection works and make them focus on analysis,guidance and problem solutions.CONCLUSIONS As a part of the hospital information system,nosocomial infection information management system can improve the affectivity and efficiency of hospital infection administers due to its accuracy,timeliness and wide coverage.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL