Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 71
Filter
1.
J Am Stat Assoc ; 119(546): 1297-1308, 2024.
Article in English | MEDLINE | ID: mdl-38984070

ABSTRACT

Extreme environmental events frequently exhibit spatial and temporal dependence. These data are often modeled using max stable processes (MSPs) that are computationally prohibitive to fit for as few as a dozen observations. Supposed computationally-efficient approaches like the composite likelihood remain computationally burdensome with a few hundred observations. In this paper, we propose a spatial partitioning approach based on local modeling of subsets of the spatial domain that delivers computationally and statistically efficient inference. Marginal and dependence parameters of the MSP are estimated locally on subsets of observations using censored pairwise composite likelihood, and combined using a modified generalized method of moments procedure. The proposed distributed approach is extended to estimate inverted MSP models, and to estimate spatially varying coefficient models to deliver computationally efficient modeling of spatial variation in marginal parameters. We demonstrate consistency and asymptotic normality of estimators, and show empirically that our approach leads to statistically efficient estimation of model parameters. We illustrate the flexibility and practicability of our approach through simulations and the analysis of streamflow data from the U.S. Geological Survey.

2.
Clin Ophthalmol ; 18: 1535-1546, 2024.
Article in English | MEDLINE | ID: mdl-38827775

ABSTRACT

Background: Cataract surgery is one of the most frequently performed eye surgeries worldwide, and among several techniques, phacoemulsification has become the standard of care due to its safety and efficiency. We evaluated the advantages and disadvantages of two phacoemulsification techniques: phaco-chop and divide-and-conquer. Methods: PubMed, Cochrane, Embase, and Web of Science databases were queried for randomized controlled trial (RCT), prospective and retrospective studies that compared the phaco-chop technique over the divide-and-conquer technique and reported the outcomes of (1) Endothelial cell count change (ECC); (2) Ultrasound time (UST); (3) Cumulated dissipated energy (CDE); (4) Surgery time; and (5) Phacoemulsification time (PT). Heterogeneity was examined with I2 statistics. A random-effects model was used for outcomes with high heterogeneity. Results: Nine final studies, (6 prospective RCTs and 3 observational), comprising 837 patients undergoing phacoemulsification. 435 (51.9%) underwent the phaco-chop technique, and 405 (48.1%) underwent divide-and-conquer. Overall, the phaco-chop technique was associated with several advantages: a significant difference in ECC change postoperatively (Mean Difference [MD] -221.67 Cell/mm2; 95% Confidence Interval [CI] -401.68 to -41.66; p < 0.02; I2=73%); a shorter UST (MD -51.16 sec; 95% CI -99.4 to -2.79; p = 0.04; I2=98%); reduced CDE (MD -8.68 units; 95% CI -12.76 to -4.60; p < 0.01; I2=84%); a lower PT (MD -55.09 sec; 95% CI -99.29 to -12.90; p = 0.01; I2=100). There were no significant differences in surgery time (MD -3.86 min; 95% CI -9.55 to 1.83; p = 0.18; I2=99%). Conclusion: The phaco-chop technique proved to cause fewer hazards to the corneal endothelium, with less delivered intraocular ultrasound energy when compared to the divide-and-conquer technique.

3.
Psychometrika ; 2024 May 30.
Article in English | MEDLINE | ID: mdl-38814412

ABSTRACT

With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel "divide- and-conquer" parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.

4.
Front Genet ; 15: 1226228, 2024.
Article in English | MEDLINE | ID: mdl-38384715

ABSTRACT

Introduction: The likelihood ratio (LR) can be an efficient means of distinguishing various relationships in forensic fields. However, traditional list-based methods for derivation and presentation of LRs in distant or complex relationships hinder code editing and software programming. This paper proposes an approach for a unified formula for LRs, in which differences in participants' genotype combinations can be ignored for specific identification. This formula could reduce the difficulty of by-hand coding, as well as running time of large-sample-size simulation. Methods: The approach is first applied to a problem of kinship identification in which at least one of the participants is alleged to be inbred. This can be divided into two parts: i) the probability of different identical by descent (IBD) states according to the alleged kinship; and ii) the ratio of the probability that specific genotype combination can be detected assuming the alleged kinship exists between the two participants to the similar probability assuming that they are unrelated, for each state. For the probability, there are usually recognized results for common identification purposes. For the ratio, subscript letters representing IBD alleles of individual A's alleles are used to eliminate differences in genotype combinations between the two individuals and to obtain a unified formula for the ratio in each state. The unification is further simplified for identification cases in which it is alleged that both of the participants are outbred. Verification is performed to show that the results obtained with the unified and list-form formulae are equivalent. Results: A series of unified formulae are derived for different identification purposes, based on which an R package named KINSIMU has been developed and evaluated for use in large-size simulations for kinship analysis. Comparison between the package with two existing tools indicated that the unified approach presented here is more convenient and time-saving with respect to the coding process for computer applications compared with the list-based approach, despite appearing more complicated. Moreover, the method of derivation could be extended to other identification problems, such as those with different hypothesis sets or those involving multiple individuals. Conclusion: The unified approach of LR calculation can be beneficial in kinship identification field.

5.
Sensors (Basel) ; 24(2)2024 Jan 05.
Article in English | MEDLINE | ID: mdl-38257415

ABSTRACT

Fiber optic gyroscope (FOG)-based north finding is extensively applied in navigation, positioning, and various fields. In dynamic north finding, an accelerated turntable speed shortens the time required for north finding, resulting in a rapid north-finding response. However, with an increase in turntable speed, the turntable's jitter contributes to signal contamination in the FOG, leading to a deterioration in north-finding accuracy. This paper introduces a divide-and-conquer algorithm, the segmented cross-correlation algorithm, designed to mitigate the impact of turntable speed jitter. A model for north-finding error is established and analyzed, incorporating FOG's self-noise and the turntable's speed jitter. To validate the feasibility of our method, we implemented the algorithm on a FOG. The simulation and experimental results exhibited a strong concordance, affirming the validity of our proposed north-finding error model. The experimental findings indicate that, at a turntable speed of 180°/s, the north-finding bias error within a 360 s duration is 0.052°, representing a 64% improvement over the traditional algorithm. These results indicate the effectiveness of the proposed algorithm in mitigating the impact of unstable turntable speeds, offering a solution for north finding with both prompt response and enhanced accuracy.

6.
Inf inference ; 12(3): iaad032, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37593361

ABSTRACT

Modeling the distribution of high-dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, separately recover the structure of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop spectral top-down recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.

7.
Clin Ophthalmol ; 17: 2405-2412, 2023.
Article in English | MEDLINE | ID: mdl-37605764

ABSTRACT

Purpose: To determine the energy expenditure in phacoemulsification surgery expressed as cumulative dissipated energy (CDE) among the divide and conquer, ultrachopper-assisted divide and conquer, and phaco-chop techniques for dense cataract removal. Patients and Methods: The clinical data were obtained from the medical charts of dense cataracts patients undergoing routine phacoemulsification employing any of three phaco-fragmentation techniques, including divide and conquer using the Kelman 0.9 mm tip, the ultrachopper tip, and the phaco-chop technique using the Kelman 0.9 mm tip. Cumulated dissipated energy (CDE), longitudinal ultrasound time (UST), and endothelial cell loss were compared among groups at the one-month postoperative. Results: Surgeries from 90 eyes were analyzed, among whom the conventional divide-and-conquer technique group included 30 patients, 32 in the ultrachopper group, and 28 in the phaco-chop technique group. The average CDE in the conventional divide and conquer group was 44.52 ± 23.00, the ultrachopper technique was 43.27 ± 23.18, and 20.11 ± 11.06 in the phaco-chop group. Phaco-fragmentation chop demonstrated significantly lower CDE than the other techniques (p= <0.0001). The phaco-chop technique showed statistically significantly lower CDE when compared to the other two groups (p=<0.0001) with 93.96 ± 39.71 seconds. There were no statistically significant differences in postoperative endothelial cell density between groups (p=0.4916). Conclusion: The use of the phaco-chop technique in hard cataract phacoemulsification represents a lower energy expenditure than divide and conquer and ultrachopper techniques; nevertheless, no differences regarding endothelial density loss were evidenced.

8.
MethodsX ; 10: 101968, 2023.
Article in English | MEDLINE | ID: mdl-36582480

ABSTRACT

Nowadays, molecular dynamics (MD) simulations of proteins with hundreds of thousands of snapshots are commonly produced using modern GPUs. However, due to the abundance of data, analyzing transport tunnels present in the internal voids of these molecules, in all generated snapshots, has become challenging. Here, we propose to combine the usage of CAVER3, the most popular tool for tunnel calculation, and the TransportTools Python3 library into a divide-and-conquer approach to speed up tunnel calculation and reduce the hardware resources required to analyze long MD simulations in detail. By slicing an MD trajectory into smaller pieces and performing a tunnel analysis on these pieces by CAVER3, the runtime and resources are considerably reduced. Next, the TransportTools library merges the smaller pieces and gives an overall view of the tunnel network for the complete trajectory without quality loss.

9.
Clin Ophthalmol ; 16: 3283-3287, 2022.
Article in English | MEDLINE | ID: mdl-36237494

ABSTRACT

The stop-and-chop technique, which involves occlusion and chopping using vacuum to stabilize the nucleus, is an excellent combination of the divide-and-conquer and phaco-chop techniques. However, effectively chopping an un-solid (soft to moderate) nucleus is not easy, since the optimal vacuum to hold an un-solid nucleus is often associated with breaking of occlusion and aspiration of the nucleus. We modified the stop-and-chop technique such that occlusion and tight nucleus holding using ultrasound (US) power is not necessary. After completing the central groove and cracking the nucleus into two hemi-sections, the right nucleus half is chopped without nucleus rotation and occlusion. The right hemi-nucleus is stabilized by pressing against the right sac with the US tip without occlusion. Since this technique can reduce the risk of nucleus perforation and posterior capsular rupture, the surgeons can place the US tip firmly in a deep position, which provide safe and efficient nucleus division.

10.
Int J Mol Sci ; 23(19)2022 Sep 29.
Article in English | MEDLINE | ID: mdl-36232786

ABSTRACT

ApoB-100 is a member of a large lipid transfer protein superfamily and is one of the main apolipoproteins found on low-density lipoprotein (LDL) and very low-density lipoprotein (VLDL) particles. Despite its clinical significance for the development of cardiovascular disease, there is limited information on apoB-100 structure. We have developed a novel method based on the "divide and conquer" algorithm, using PSIPRED software, by dividing apoB-100 into five subunits and 11 domains. Models of each domain were prepared using I-TASSER, DEMO, RoseTTAFold, Phyre2, and MODELLER. Subsequently, we used disuccinimidyl sulfoxide (DSSO), a new mass spectrometry cleavable cross-linker, and the known position of disulfide bonds to experimentally validate each model. We obtained 65 unique DSSO cross-links, of which 87.5% were within a 26 Å threshold in the final model. We also evaluated the positions of cysteine residues involved in the eight known disulfide bonds in apoB-100, and each pair was measured within the expected 5.6 Å constraint. Finally, multiple domains were combined by applying constraints based on detected long-range DSSO cross-links to generate five subunits, which were subsequently merged to achieve an uninterrupted architecture for apoB-100 around a lipoprotein particle. Moreover, the dynamics of apoB-100 during particle size transitions was examined by comparing VLDL and LDL computational models and using experimental cross-linking data. In addition, the proposed model of receptor ligand binding of apoB-100 provides new insights into some of its functions.


Subject(s)
Apolipoproteins B , Cysteine , Apolipoprotein B-100 , Apolipoproteins B/metabolism , Computer Simulation , Disulfides , Ligands , Lipoproteins, LDL/chemistry , Lipoproteins, VLDL , Models, Structural , Sulfoxides
11.
Stat Med ; 41(25): 5113-5133, 2022 11 10.
Article in English | MEDLINE | ID: mdl-35983945

ABSTRACT

In this article, we tackle the estimation and inference problem of analyzing distributed streaming data that is collected continuously over multiple data sites. We propose an online two-way approach via linear mixed-effects models. We explicitly model the site-specific effects as random-effect terms, and tackle both between-site heterogeneity and within-site correlation. We develop an online updating procedure that does not need to re-access the previous data and can efficiently update the parameter estimate, when either new data sites, or new streams of sample observations of the existing data sites, become available. We derive the non-asymptotic error bound for our proposed online estimator, and show that it is asymptotically equivalent to the offline counterpart based on all the raw data. We compare with some key alternative solutions both analytically and numerically, and demonstrate the advantages of our proposal. We further illustrate our method with two data applications.


Subject(s)
Research Design , Humans , Computer Simulation , Linear Models
12.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35696639

ABSTRACT

With the development of high-throughput genotyping technology, single nucleotide polymorphism (SNP)-SNP interactions (SSIs) detection has become an essential way for understanding disease susceptibility. Various methods have been proposed to detect SSIs. However, given the disease complexity and bias of individual SSI detectors, these single-detector-based methods are generally unscalable for real genome-wide data and with unfavorable results. We propose a novel ensemble learning-based approach (ELSSI) that can significantly reduce the bias of individual detectors and their computational load. ELSSI randomly divides SNPs into different subsets and evaluates them by multi-type detectors in parallel. Particularly, ELSSI introduces a four-stage pipeline (generate, score, switch and filter) to iteratively generate new SNP combination subsets from SNP subsets, score the combination subset by individual detectors, switch high-score combinations to other detectors for re-scoring, then filter out combinations with low scores. This pipeline makes ELSSI able to detect high-order SSIs from large genome-wide datasets. Experimental results on various simulated and real genome-wide datasets show the superior efficacy of ELSSI to state-of-the-art methods in detecting SSIs, especially for high-order ones. ELSSI is applicable with moderate PCs on the Internet and flexible to assemble new detectors. The code of ELSSI is available at https://www.sdu-idea.cn/codes.php?name=ELSSI.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genome , Genome-Wide Association Study/methods
13.
J Comput Biol ; 29(8): 782-801, 2022 08.
Article in English | MEDLINE | ID: mdl-35575747

ABSTRACT

Accurate multiple sequence alignment is challenging on many data sets, including those that are large, evolve under high rates of evolution, or have sequence length heterogeneity. While substantial progress has been made over the last decade in addressing the first two challenges, sequence length heterogeneity remains a significant issue for many data sets. Sequence length heterogeneity occurs for biological and technological reasons, including large insertions or deletions (indels) that occurred in the evolutionary history relating the sequences, or the inclusion of sequences that are not fully assembled. Ultra-large alignments using Phylogeny-Aware Profiles (UPP) (Nguyen et al. 2015) is one of the most accurate approaches for aligning data sets that exhibit sequence length heterogeneity: it constructs an alignment on the subset of sequences it considers "full-length," represents this "backbone alignment" using an ensemble of hidden Markov models (HMMs), and then adds each remaining sequence into the backbone alignment based on an HMM selected for that sequence from the ensemble. Our new method, WeIghTed Consensus Hmm alignment (WITCH), improves on UPP in three important ways: first, it uses a statistically principled technique to weight and rank the HMMs; second, it uses k > 1 HMMs from the ensemble rather than a single HMM; and third, it combines the alignments for each of the selected HMMs using a consensus algorithm that takes the weights into account. We show that this approach provides improved alignment accuracy compared with UPP and other leading alignment methods, as well as improved accuracy for maximum likelihood trees based on these alignments.


Subject(s)
Algorithms , Consensus , Markov Chains , Phylogeny , Sequence Alignment
14.
Stat Med ; 41(15): 2840-2853, 2022 07 10.
Article in English | MEDLINE | ID: mdl-35318706

ABSTRACT

Provider profiling has been recognized as a useful tool in monitoring health care quality, facilitating inter-provider care coordination, and improving medical cost-effectiveness. Existing methods often use generalized linear models with fixed provider effects, especially when profiling dialysis facilities. As the number of providers under evaluation escalates, the computational burden becomes formidable even for specially designed workstations. To address this challenge, we introduce a serial blockwise inversion Newton algorithm exploiting the block structure of the information matrix. A shared-memory divide-and-conquer algorithm is proposed to further boost computational efficiency. In addition to the computational challenge, the current literature lacks an appropriate inferential approach to detecting providers with outlying performance especially when small providers with extreme outcomes are present. In this context, traditional score and Wald tests relying on large-sample distributions of the test statistics lead to inaccurate approximations of the small-sample properties. In light of the inferential issue, we develop an exact test of provider effects using exact finite-sample distributions, with the Poisson-binomial distribution as a special case when the outcome is binary. Simulation analyses demonstrate improved estimation and inference over existing methods. The proposed methods are applied to profiling dialysis facilities based on emergency department encounters using a dialysis patient database from the Centers for Medicare & Medicaid Services.


Subject(s)
Medicare , Quality of Health Care , Aged , Health Personnel , Humans , United States
15.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35212357

ABSTRACT

Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature.


Subject(s)
Deep Learning , Molecular Structure , Neural Networks, Computer
16.
Biostatistics ; 23(2): 397-411, 2022 04 13.
Article in English | MEDLINE | ID: mdl-32909599

ABSTRACT

Divide-and-conquer (DAC) is a commonly used strategy to overcome the challenges of extraordinarily large data, by first breaking the dataset into series of data blocks, then combining results from individual data blocks to obtain a final estimation. Various DAC algorithms have been proposed to fit a sparse predictive regression model in the $L_1$ regularization setting. However, many existing DAC algorithms remain computationally intensive when sample size and number of candidate predictors are both large. In addition, no existing DAC procedures provide inference for quantifying the accuracy of risk prediction models. In this article, we propose a screening and one-step linearization infused DAC (SOLID) algorithm to fit sparse logistic regression to massive datasets, by integrating the DAC strategy with a screening step and sequences of linearization. This enables us to maximize the likelihood with only selected covariates and perform penalized estimation via a fast approximation to the likelihood. To assess the accuracy of a predictive regression model, we develop a modified cross-validation (MCV) that utilizes the side products of the SOLID, substantially reducing the computational burden. Compared with existing DAC methods, the MCV procedure is the first to make inference on accuracy. Extensive simulation studies suggest that the proposed SOLID and MCV procedures substantially outperform the existing methods with respect to computational speed and achieve similar statistical efficiency as the full sample-based estimator. We also demonstrate that the proposed inference procedure provides valid interval estimators. We apply the proposed SOLID procedure to develop and validate a classification model for disease diagnosis using narrative clinical notes based on electronic medical record data from Partners HealthCare.


Subject(s)
Algorithms , Research Design , Computer Simulation , Humans , Logistic Models
17.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34619757

ABSTRACT

Long-read sequencing technology enables significant progress in de novo genome assembly. However, the high error rate and the wide error distribution of raw reads result in a large number of errors in the assembly. Polishing is a procedure to fix errors in the draft assembly and improve the reliability of genomic analysis. However, existing methods treat all the regions of the assembly equally while there are fundamental differences between the error distributions of these regions. How to achieve very high accuracy in genome assembly is still a challenging problem. Motivated by the uneven errors in different regions of the assembly, we propose a novel polishing workflow named BlockPolish. In this method, we divide contigs into blocks with low complexity and high complexity according to statistics of aligned nucleotide bases. Multiple sequence alignment is applied to realign raw reads in complex blocks and optimize the alignment result. Due to the different distributions of error rates in trivial and complex blocks, two multitask bidirectional Long short-term memory (LSTM) networks are proposed to predict the consensus sequences. In the whole-genome assemblies of NA12878 assembled by Wtdbg2 and Flye using Nanopore data, BlockPolish has a higher polishing accuracy than other state-of-the-arts including Racon, Medaka and MarginPolish & HELEN. In all assemblies, errors are predominantly indels and BlockPolish has a good performance in correcting them. In addition to the Nanopore assemblies, we further demonstrate that BlockPolish can also reduce the errors in the PacBio assemblies. The source code of BlockPolish is freely available on Github (https://github.com/huangnengCSU/BlockPolish).


Subject(s)
High-Throughput Nucleotide Sequencing , Software , High-Throughput Nucleotide Sequencing/methods , Reproducibility of Results , Sequence Alignment , Sequence Analysis, DNA/methods
18.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34893794

ABSTRACT

Multiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https://github.com/iliuh/FMAlign.


Subject(s)
Base Sequence , Sequence Alignment , Sequence Analysis, DNA/methods , Algorithms , Databases, Factual , Genome, Bacterial , Genome, Human , Humans , Research Design , Software
19.
Bioinform Biol Insights ; 15: 11779322211059238, 2021.
Article in English | MEDLINE | ID: mdl-34866905

ABSTRACT

Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST.

20.
Front Robot AI ; 8: 689908, 2021.
Article in English | MEDLINE | ID: mdl-34671647

ABSTRACT

The scalability of traveling salesperson problem (TSP) algorithms for handling large-scale problem instances has been an open problem for a long time. We arranged a so-called Santa Claus challenge and invited people to submit their algorithms to solve a TSP problem instance that is larger than 1 M nodes given only 1 h of computing time. In this article, we analyze the results and show which design choices are decisive in providing the best solution to the problem with the given constraints. There were three valid submissions, all based on local search, including k-opt up to k = 5. The most important design choice turned out to be the localization of the operator using a neighborhood graph. The divide-and-merge strategy suffers a 2% loss of quality. However, via parallelization, the result can be obtained within less than 2 min, which can make a key difference in real-life applications.

SELECTION OF CITATIONS
SEARCH DETAIL
...