Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
1.
Heliyon ; 10(7): e27886, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38590855

RESUMO

Intuitionistic fuzzy hypersoft sets (IFHSSs) are a novel model that is projected to address the limitations of Intuitionistic fuzzy soft sets (IFSSs) regarding the entitlement of a multi-argument domain for the approximation of parameters under consideration. It is more flexible and reliable as it considers the further classification of parameters into their relevant parametric valued sets. In this paper, we proposed some trigonometric (cosine and cotangent) similarity measures and their weighted trigonometric similarity measures (SMs). Trigonometric Similarity measures (SMs) for intuitionistic fuzzy hypersoft sets (IFHSSs) are significantly implied to check the similarity measures and help to determine the similarity between different factors. Also, in order to evaluate the validity of the significant study and apply the results to a daily life problem. We use them to solve problems involving the selection of renewable energy sources. According to several technical contributing factors, the analysis identifies the ideal location for the implementation of the energy production units. Future case studies with many features and additional bifurcation along with multiple decision-makers can use the suggested methodologies. Also, several existing structures, such as fuzzy, Pythagorean fuzzy, Neutrosophic theories, etc., can be utilized with the suggested method.

2.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37544658

RESUMO

MOTIVATION: Recent advances in spatially resolved transcriptomics (ST) technologies enable the measurement of gene expression profiles while preserving cellular spatial context. Linking gene expression of cells with their spatial distribution is essential for better understanding of tissue microenvironment and biological progress. However, effectively combining gene expression data with spatial information to identify spatial domains remains challenging. RESULTS: To deal with the above issue, in this paper, we propose a novel unsupervised learning framework named STMGCN for identifying spatial domains using multi-view graph convolution networks (MGCNs). Specifically, to fully exploit spatial information, we first construct multiple neighbor graphs (views) with different similarity measures based on the spatial coordinates. Then, STMGCN learns multiple view-specific embeddings by combining gene expressions with each neighbor graph through graph convolution networks. Finally, to capture the importance of different graphs, we further introduce an attention mechanism to adaptively fuse view-specific embeddings and thus derive the final spot embedding. STMGCN allows for the effective utilization of spatial context to enhance the expressive power of the latent embeddings with multiple graph convolutions. We apply STMGCN on two simulation datasets and five real spatial transcriptomics datasets with different resolutions across distinct platforms. The experimental results demonstrate that STMGCN obtains competitive results in spatial domain identification compared with five state-of-the-art methods, including spatial and non-spatial alternatives. Besides, STMGCN can detect spatially variable genes with enriched expression patterns in the identified domains. Overall, STMGCN is a powerful and efficient computational framework for identifying spatial domains in spatial transcriptomics data.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Simulação por Computador
3.
Soft comput ; : 1-27, 2023 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-37362267

RESUMO

Locating the propagation source is one of the most important strategies to control the harmful diffusion process on complex networks. Most existing methods only consider the infection time information of the observers, but the diffusion direction information of the observers is ignored, which is helpful to locate the source. In this paper, we consider both of the diffusion direction information and the infection time information to locate the source. We introduce a relaxed direction-induced search (DIS) to utilize the diffusion direction information of the observers to approximate the actual diffusion tree on a network. Based on the relaxed DIS, we further utilize the infection time information of the observers to define two kinds of observers-based similarity measures, including the Infection Time Similarity and the Infection Time Order Similarity. With the two kinds of similarity measures and the relaxed DIS, a novel source locating method is proposed. We validate the performance of the proposed method on a series of synthetic and real networks. The experimental results show that the proposed method is feasible and effective in accurately locating the propagation source.

4.
Med Biol Eng Comput ; 61(7): 1723-1744, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36884143

RESUMO

PURPOSE: Fetal echocardiography is widely used for the assessment of fetal heart development and detection of congenital heart disease (CHD). Preliminary examination of the fetal heart involves the four-chamber view which indicates the presence of all the four chambers and its structural symmetry. Examination of various cardiac parameters is generally done using the clinically selected diastole frame. This largely depends on the expertise of the sonographer and is prone to intra- and interobservational errors. To overcome this, automated frame selection technique is proposed for the recognition of fetal cardiac chamber from fetal echocardiography. METHODS: Three techniques have been proposed in this research study to automate the process of determining the frame referred as "Master Frame" that can be used for the measurement of the cardiac parameters. The first method uses frame similarity measures (FSM) for the determination of the master frame from the given cine loop ultrasonic sequences. FSM makes use of similarity measures such as correlation, structural similarity index (SSIM), peak signal to noise ratio (PSNR), and mean square error (MSE) to identify the cardiac cycle, and all the frames in one cardiac cycle are superimposed to form the master frame. The final master frame is obtained by considering the average of the master frame obtained using each similarity measure. The second method uses averaging of ± 20% from the midframes (AMF). The third method uses averaging of all the frames (AAF) of the cine loop sequence. Both diastole and master frames have been annotated by the clinical experts, and their ground truths are compared for validation. No segmentation techniques have been used to avoid the variability of the performance of various segmentation techniques. All the proposed schemes were evaluated using six fidelity metrics such as Dice coefficient, Jaccard ratio, Hausdorff distance, structural similarity index, mean absolute error, and Pratt figure of merit. RESULTS: The three proposed techniques were tested on the frames extracted from 95 ultrasound cine loop sequences between 19 and 32 weeks of gestation. The feasibility of the techniques was determined by the computation of fidelity metrics between the master frame derived and the diastole frame chosen by the clinical experts. The FSM-based identified master frame found to closely match with manually chosen diastole frame and also ensures statistically significant. The method also detects automatically the cardiac cycle. The resultant master frame obtained through AMF though found to be identical to that of the diastole frame, the size of the chambers found to be reduced that can lead to inaccurate chamber measurement. The master frame obtained through AAF was not found to be identical to that of clinical diastole frame. CONCLUSION: It can be concluded that the frame similarity measure (FSM)-based master frame can be introduced in the clinical routine for segmentation followed by cardiac chamber measurements. Such automated master frame selection also overcomes the manual intervention of earlier reported techniques in the literature. The fidelity metrics assessment further confirms the suitability of proposed master frame for automated fetal chamber recognition.


Assuntos
Ecocardiografia , Coração Fetal , Coração Fetal/diagnóstico por imagem , Diástole , Razão Sinal-Ruído , Computadores
5.
J Supercomput ; 79(8): 9127-9156, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36644509

RESUMO

Information dissemination occurs through the 'word of media' in the digital world. Fraudulent and deceitful content, such as misinformation, has detrimental effects on people. An implicit fact-based automated fact-checking technique comprising information retrieval, natural language processing, and machine learning techniques assist in assessing the credibility of content and detecting misinformation. Previous studies focused on linguistic and textual features and similarity measures-based approaches. However, these studies need to gain knowledge of facts, and similarity measures are less accurate when dealing with sparse or zero data. To fill these gaps, we propose a 'Content Similarity Measure (CSM)' algorithm that can perform automated fact-checking of URLs in the healthcare domain. Authors have introduced a novel set of content similarity, domain-specific, and sentiment polarity score features to achieve journalistic fact-checking. An extensive analysis of the proposed algorithm compared with standard similarity measures and machine learning classifiers showed that the 'content similarity score' feature outperformed other features with an accuracy of 88.26%. In the algorithmic approach, CSM showed improved accuracy of 91.06% compared to the Jaccard similarity measure with 74.26% accuracy. Another observation is that the algorithmic approach outperformed the feature-based method. To check the robustness of the algorithms, authors have tested the model on three state-of-the-art datasets, viz. CoAID, FakeHealth, and ReCOVery. With the algorithmic approach, CSM showed the highest accuracy of 87.30%, 89.30%, 85.26%, and 88.83% on CoAID, ReCOVery, FakeHealth (Story), and FakeHealth (Release) datasets, respectively. With a feature-based approach, the proposed CSM showed the highest accuracy of 85.93%, 87.97%, 83.92%, and 86.80%, respectively.

6.
Diagnostics (Basel) ; 13(2)2023 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-36673005

RESUMO

PROBLEM: Similarity measures are widely used as an approved method for spectral discrimination or identification with their applications in different areas of scientific research. Even though a range of works have been presented, only a few showed slightly promising results for human tissue, and these were mostly focused on pathological and non-pathological tissue classification. METHODS: In this work, several spectral similarity measures on hyperspectral (HS) images of in vivo human tissue were evaluated for tissue discrimination purposes. Moreover, we introduced two new hybrid spectral measures, called SID-JM-TAN(SAM) and SID-JM-TAN(SCA). We analyzed spectral signatures obtained from 13 different human tissue types and two different materials (gauze, instruments), collected from HS images of 100 patients during surgeries. RESULTS: The quantitative results showed the reliable performance of the different similarity measures and the proposed hybrid measures for tissue discrimination purposes. The latter produced higher discrimination values, up to 6.7 times more than the classical spectral similarity measures. Moreover, an application of the similarity measures was presented to support the annotations of the HS images. We showed that the automatic checking of tissue-annotated thyroid and colon tissues was successful in 73% and 60% of the total spectra, respectively. The hybrid measures showed the highest performance. Furthermore, the automatic labeling of wrongly annotated tissues was similar for all measures, with an accuracy of up to 90%. CONCLUSION: In future work, the proposed spectral similarity measures will be integrated with tools to support physicians in annotations and tissue labeling of HS images.

7.
Complex Intell Systems ; 9(3): 3333-3354, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36530758

RESUMO

Pythagorean fuzzy sets (PFSs) proved to be powerful for handling uncertainty and vagueness in multi-criteria group decision-making (MCGDM). To make a compromise decision, comparing PFSs is essential. Several approaches were introduced for comparison, e.g., distance measures and similarity measures. Nevertheless, extant measures have several defects that can produce counter-intuitive results, since they treat any increase or decrease in the membership degree the same as the non-membership degree; although each parameter has a different implication. This study introduces the differential measure (DFM) as a new approach for comparing PFSs. The main purpose of the DFM is to eliminate the unfair arguments resulting from the equal treatment of the contradicting parameters of a PFS. It is a preference relation between two PFSs by virtue of position in the attribute space and according to the closeness of their membership and non-membership degrees. Two PFSs are classified as identical, equivalent, superior, or inferior to one another giving the degree of superiority or inferiority. The basic properties of the proposed DFM are given. A novel method for multiple criteria group decision-making is proposed based on the introduced DFM. A new technique for computing the weights of the experts is developed. The proposed method is applied to solve two applications, the evaluation of solid-state drives and the selection of the best photovoltaic cell. The results are compared with the results of some extant methods to illustrate the applicability and validity of the method. A sensitivity analysis is conducted to examine its stability and practicality.

8.
PeerJ Comput Sci ; 8: e1124, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36262151

RESUMO

Identification of drug-target interaction (DTI) is a crucial step to reduce time and cost in the drug discovery and development process. Since various biological data are publicly available, DTIs have been identified computationally. To predict DTIs, most existing methods focus on a single similarity measure of drugs and target proteins, whereas some recent methods integrate a particular set of drug and target similarity measures by a single integration function. Therefore, many DTIs are still missing. In this study, we propose heterogeneous network propagation with the forward similarity integration (FSI) algorithm, which systematically selects the optimal integration of multiple similarity measures of drugs and target proteins. Seven drug-drug and nine target-target similarity measures are applied with four distinct integration methods to finally create an optimal heterogeneous network model. Consequently, the optimal model uses the target similarity based on protein sequences and the fused drug similarity, which combines the similarity measures based on chemical structures, the Jaccard scores of drug-disease associations, and the cosine scores of drug-drug interactions. With an accuracy of 99.8%, this model significantly outperforms others that utilize different similarity measures of drugs and target proteins. In addition, the validation of the DTI predictions of this model demonstrates the ability of our method to discover missing potential DTIs.

9.
Int J Inf Technol ; 14(2): 607-618, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35106437

RESUMO

Identification of sub-networks within a network is essential to understand the functionality of a network. This process is called as 'Community detection'. There are various existing community detection algorithms, and the performance of these algorithms can be varied based on the network structure. In this paper, we introduce a novel random graph generator using a mixture of Gaussian distributions. The community sizes of the generated network depend on the given Gaussian distributions. We then develop simulation studies to understand the impact of density and sparsity of the network on community detection. We use Infomap, Label propagation, Spinglass, and Louvain algorithms to detect communities. The similarity between true communities and detected communities is evaluated using Adjusted Rand Index, Adjusted Mutual Information, and Normalized Mutual Information similarity scores. We also develop a method to generate heatmaps to compare those similarity score values. The results indicate that the Louvain algorithm has the highest capacity to detect perfect communities while Label Propagation has the lowest capacity.

10.
BMC Bioinformatics ; 23(1): 23, 2022 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-34991460

RESUMO

BACKGROUND: Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS: To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS: We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.


Assuntos
Ontologias Biológicas , Semântica , Medical Subject Headings , Reprodutibilidade dos Testes , Systematized Nomenclature of Medicine
11.
Proc Inst Mech Eng H ; 236(1): 3-11, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34405750

RESUMO

Computer-aided diagnostic systems (CADS) assist radiologists in classifying liver cancer using computed tomography (CT) images. To enhance diagnosis performance, image sequences are recorded at various time points in a single/multi-view format. Mutual information (MI) is a widely used medical image registration metric with a high rate of success, but it can result in misregistration due to a lack of spatial details. To address this issue and to establish anatomical correspondence between multi-phase CT images of the liver, a features-based technique is developed in this article. The proposed model uses fixed and moving images as inputs, with both images having the same dimensions. The registered images are the two images that differ in terms of their combinations/colors. In the output registered images, the tumor in the liver portion has classes with viewpoints. There is an appropriate way to view the tumor, and the output registered images should permit concluding that the registered image of the delayed phases, with a longer delay time, contains the most region portion within the output registered image. The detected and matched values are greater than the values of the feature outcomes. Having a large tumor provides valuable information in the presenting form for discussing the variation of the various phases and delayed testing results. And this will aid the radiologist in making an accurate diagnosis.


Assuntos
Neoplasias Hepáticas , Tomografia Computadorizada por Raios X , Humanos , Neoplasias Hepáticas/diagnóstico por imagem
12.
Neuroinformatics ; 20(3): 665-675, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34716564

RESUMO

Despite a huge advancement in neuroimaging techniques and growing importance of inter-personal brain research, few studies assess the most appropriate computational methods to measure brain-brain coupling. Here, we focus on the signal processing methods to detect brain-coupling in dyads. From a public dataset of functional Near Infra-Red Spectroscopy signals (N=24 dyads), we derived a synthetic control condition by randomization, we investigated the effectiveness of four most used signal similarity metrics: Cross Correlation, Mutual Information, Wavelet Coherence and Dynamic Time Warping. We also accounted for temporal variations between signals by allowing for misalignments up to a maximum lag. Starting from the observed effect sizes, computed in terms of Cohen's d, the power analysis indicated that a high sample size ([Formula: see text]) would be required to detect significant brain-coupling. We therefore discuss the need for specialized statistical approaches and propose bootstrap as an alternative method to avoid over-penalizing the results. In our settings, and based on bootstrap analyses, Cross Correlation and Dynamic Time Warping outperform Mutual Information and Wavelet Coherence for all considered maximum lags, with reproducible results. These results highlight the need to set specific guidelines as the high degree of customization of the signal processing procedures prevents the comparability between studies, their reproducibility and, ultimately, undermines the possibility of extracting new knowledge.


Assuntos
Encéfalo , Processamento de Sinais Assistido por Computador , Encéfalo/diagnóstico por imagem , Neuroimagem , Reprodutibilidade dos Testes , Análise Espectral
13.
Math Biosci Eng ; 19(1): 855-872, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34903016

RESUMO

One of the most dominant and feasible technique is called the PHF setting is exist in the circumstances of fuzzy set theory for handling intricate and vague data in genuine life scenario. The perception of PHF setting is massive universal is compared to these assumptions, who must cope with two or three sorts of data in the shape of singleton element. Under the consideration of the PHF setting, we utilized some SM in the region of the PHF setting are to diagnose the PHFDSM, PHFWDSM, PHFJSM, PHFWJSM, PHFCSM, PHFWCSM, PHFHVSM, PHFWHVSM and demonstrated their flexible parts. Likewise, a lot of examples are exposed under the invented measures based on PHF data in the environment of medical diagnosis to demonstrate the stability and elasticity of the explored works. Finally, the sensitive analysis of the presented works is also implemented and illuminated their graphical structures.


Assuntos
Tomada de Decisões , Lógica Fuzzy , Algoritmos
14.
Rev. cuba. inform. méd ; 13(2): e446, 2021. tab, graf
Artigo em Espanhol | LILACS, CUMED | ID: biblio-1357281

RESUMO

Una meta del sistema de salud es la prevención de enfermedades, por ello cobra especial importancia el estudio de la relación de enfermedades con el espacio. Existen evidencias del empleo de los Sistemas de Información Geográfica en estudios sobre la distribución espacial de problemas de salud. A pesar de esto, los trabajos reportados en la literatura consultada no explotan la componente espacial de los datos, lo que limita su integralidad. Por otra parte, existe dispersión en las metodologías, herramientas y técnicas para abordar estudios de este tipo. En esta investigación se presenta un método de estratificación de territorios basado en Sistemas de Información Geográfica y medidas de similitud geométrica, definidas a partir de los criterios: distancia, tamaño y conectividad. La propuesta permite realizar estudios estratificados según la primera ley de la geografía y garantiza la obtención de estratos más compactos. El método propuesto cuenta con cinco etapas: Selección de indicadores y territorios, Preprocesamiento de indicadores, Agrupamiento, Postprocesamiento y Visualización, soportado en una solución informática basada en software libre. Como parte de la validación se aplica el método en un caso de estudio y se realiza el análisis de índices de validación que avalan la efectividad y competitividad de la propuesta(AU)


A goal of the health system is the prevention of diseases, which is why the study of the relationship of diseases with space is of special importance. There is evidence of the use of Geographic Information Systems in studies on the spatial distribution of health problems. Despite this, the works reported in the consulted literature do not exploit the spatial component of the data, which limits its comprehensiveness. On the other hand, there is dispersion in the methodologies, tools and techniques to approach studies of this type. This research presents a method of stratification of territories based on Geographic Information Systems and geometric similarity measures, defined from the criteria: distance, size and connectivity. The proposal allows for stratified studies according to the first law of geography and guarantees the obtaining of more compact strata. The proposed method has five stages: Selection of indicators and territories, Pre-processing of indicators, Grouping, Post-processing and Visualization, supported by a computer solution based on free software. As part of the validation, the method is applied in a case study and the analysis of validation indices is carried out that guarantee the effectiveness and competitiveness of the proposal(AU)


Assuntos
Humanos , Masculino , Feminino , Design de Software , Sistemas de Saúde , Sistemas de Informação Geográfica/normas , Prevenção de Doenças
15.
PeerJ Comput Sci ; 7: e740, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34722873

RESUMO

Different fields such as linguistics, teaching, and computing have demonstrated special interest in the study of sign languages (SL). However, the processes of teaching and learning these languages turn complex since it is unusual to find people teaching these languages that are fluent in both SL and the native language of the students. The teachings from deaf individuals become unique. Nonetheless, it is important for the student to lean on supportive mechanisms while being in the process of learning an SL. Bidirectional communication between deaf and hearing people through SL is a hot topic to achieve a higher level of inclusion. However, all the processes that convey teaching and learning SL turn difficult and complex since it is unusual to find SL teachers that are fluent also in the native language of the students, making it harder to provide computer teaching tools for different SL. Moreover, the main aspects that a second language learner of an SL finds difficult are phonology, non-manual components, and the use of space (the latter two are specific to SL, not to spoken languages). This proposal appears to be the first of the kind to favor the Costa Rican Sign Language (LESCO, for its Spanish acronym), as well as any other SL. Our research focus stands on reinforcing the learning process of final-user hearing people through a modular architectural design of a learning environment, relying on the concept of phonological proximity within a graphical tool with a high degree of usability. The aim of incorporating phonological proximity is to assist individuals in learning signs with similar handshapes. This architecture separates the logic and processing aspects from those associated with the access and generation of data, which makes it portable to other SL in the future. The methodology used consisted of defining 26 phonological parameters (13 for each hand), thus characterizing each sign appropriately. Then, a similarity formula was applied to compare each pair of signs. With these pre-calculations, the tool displays each sign and its top ten most similar signs. A SUS usability test and an open qualitative question were applied, as well as a numerical evaluation to a group of learners, to validate the proposal. In order to reach our research aims, we have analyzed previous work on proposals for teaching tools meant for the student to practice SL, as well as previous work on the importance of phonological proximity in this teaching process. This previous work justifies the necessity of our proposal, whose benefits have been proved through the experimentation conducted by different users on the usability and usefulness of the tool. To meet these needs, homonymous words (signs with the same starting handshape) and paronyms (signs with highly similar handshape), have been included to explore their impact on learning. It allows the possibility to apply the same perspective of our existing line of research to other SL in the future.

16.
Front Public Health ; 9: 695141, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34631642

RESUMO

The COVID-19 pandemic has taken more than 1.78 million of lives across the globe. Identifying the underlying evolutive patterns between different countries would help us single out the mutated paths and behavior of this virus. I devise an orthonormal basis which would serve as the features to relate the evolution of one country's cases and deaths to others another's via coefficients from the inner product. Then I rank the coefficients measured by the inner product via the featured frequencies. The distances between these ranked vectors are evaluated by Manhattan metric. Afterwards, I associate each country with its nearest neighbor which shares the evolutive pattern via the distance matrix. Our research shows such patterns is are not random at all, i.e., the underlying pattern could be contributed to by some factors. In the end, I perform the typical cosine similarity on the time-series data. The comparison shows our mechanism differs from the typical one, but is also related to each it in some way. These findings reveal the underlying interaction between countries with respect to cases and deaths of COVID-19.


Assuntos
COVID-19 , Análise por Conglomerados , Humanos , Pandemias , SARS-CoV-2
17.
PeerJ ; 9: e11927, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34589292

RESUMO

Phenotypic characteristics of a plant species refers to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently generated promising results for genotypic data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant species, we tested it on the phenotypic data obtained from about 2,400 Soybean species. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost the same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude less than that of HC.

18.
Front Genet ; 12: 702259, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34504515

RESUMO

Drug repositioning is a method of systematically identifying potential molecular targets that known drugs may act on. Compared with traditional methods, drug repositioning has been extensively studied due to the development of multi-omics technology and system biology methods. Because of its biological network properties, it is possible to apply machine learning related algorithms for prediction. Based on various heterogeneous network model, this paper proposes a method named THNCDF for predicting drug-target interactions. Various heterogeneous networks are integrated to build a tripartite network, and similarity calculation methods are used to obtain similarity matrix. Then, the cascade deep forest method is used to make prediction. Results indicate that THNCDF outperforms the previously reported methods based on the 10-fold cross-validation on the benchmark data sets proposed by Y. Yamanishi. The area under Precision Recall curve (AUPR) value on the Enzyme, GPCR, Ion Channel, and Nuclear Receptor data sets is 0.988, 0.980, 0.938, and 0.906 separately. The experimental results well illustrate the feasibility of this method.

19.
J Appl Crystallogr ; 54(Pt 3): 776-786, 2021 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-34188612

RESUMO

A method for the ab initio crystal structure determination of organic compounds by a fit to the pair distribution function (PDF), without prior knowledge of lattice parameters and space group, has been developed. The method is called 'PDF-Global-Fit' and is implemented by extension of the program FIDEL (fit with deviating lattice parameters). The structure solution is based on a global optimization approach starting from random structural models in selected space groups. No prior indexing of the powder data is needed. The new method requires only the molecular geometry and a carefully determined PDF. The generated random structures are compared with the experimental PDF and ranked by a similarity measure based on cross-correlation functions. The most promising structure candidates are fitted to the experimental PDF data using a restricted simulated annealing structure solution approach within the program TOPAS, followed by a structure refinement against the PDF to identify the correct crystal structure. With the PDF-Global-Fit it is possible to determine the local structure of crystalline and disordered organic materials, as well as to determine the local structure of unindexable powder patterns, such as nanocrystalline samples, by a fit to the PDF. The success of the method is demonstrated using barbituric acid as an example. The crystal structure of barbituric acid form IV solved and refined by the PDF-Global-Fit is in excellent agreement with the published crystal structure data.

20.
J Appl Crystallogr ; 54(Pt 2): 612-623, 2021 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-33953658

RESUMO

An approach for the comparison of pair distribution functions (PDFs) has been developed using a similarity measure based on cross-correlation functions. The PDF is very sensitive to changes in the local structure, i.e. small deviations in the structure can cause large signal shifts and significant discrepancies between the PDFs. Therefore, a comparison based on pointwise differences (e.g. R values and difference curves) may lead to the assumption that the investigated PDFs as well as the corresponding structural models are not in agreement at all, whereas a careful visual inspection of the investigated structural models and corresponding PDFs may reveal a relatively good match. To quantify the agreement of different PDFs for those cases an alternative approach is introduced: the similarity measure based on cross-correlation functions. In this paper, the power of this application of the similarity measure to the analysis of PDFs is highlighted. The similarity measure is compared with the classical R wp values as representative of the comparison based on pointwise differences as well as with the Pearson product-moment correlation coefficient, using polymorph IV of barbituric acid as an example.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...