Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Supercomput ; 78(2): 2556-2579, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34226796

RESUMO

With increasing numbers of GPS-equipped mobile devices, we are witnessing a deluge of spatial information that needs to be effectively and efficiently managed. Even though there are several distributed spatial data processing systems such as GeoSpark (Apache Sedona), the effects of underlying storage engines have not been well studied for spatial data processing. In this paper, we evaluate the performance of various distributed storage engines for processing large-scale spatial data using GeoSpark, a state-of-the-art distributed spatial data processing system running on top of Apache Spark. For our performance evaluation, we choose three distributed storage engines having different characteristics: (1) HDFS, (2) MongoDB, and (3) Amazon S3. To conduct our experimental study on a real cloud computing environment, we utilize Amazon EMR instances (up to 6 instances) for distributed spatial data processing. For the evaluation of big spatial data processing, we generate data sets considering four kinds of various data distributions and various data sizes up to one billion point records (38.5GB raw size). Through the extensive experiments, we measure the processing time of storage engines with the following variations: (1) sharding strategies in MongoDB, (2) caching effects, (3) data distributions, (4) data set sizes, (5) the number of running executors and storage nodes, and (6) the selectivity of queries. The major points observed from the experiments are summarized as follows. (1) The overall performance of MongoDB-based GeoSpark is degraded compared to HDFS- and S3-based GeoSpark in our experimental settings. (2) The performance of MongoDB-based GeoSpark is relatively improved in large-scale data sets compared to the others. (3) HDFS- and S3-based GeoSpark are more scalable to running executors and storage nodes compared to MongoDB-based GeoSpark. (4) The sharding strategy based on the spatial proximity significantly improves the performance of MongoDB-based GeoSpark. (5) S3- and HDFS-based GeoSpark show similar performances in all the environmental settings. (6) Caching in distributed environments improves the overall performance of spatial data processing. These results can be usefully utilized in decision-making of choosing the most adequate storage engine for big spatial data processing in a target distributed environment.

2.
FEMS Microbiol Ecol ; 95(3)2019 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-30753635

RESUMO

A horizontal, fluorophore-enhanced, repetitive extragenic palindromic-polymerase chain reaction (rep-PCR) DNA fingerprinting technique was adapted to examine the genotypic richness and source differentiation of Vibrio parahaemolyticus (n = 1749) isolated from tidal water and mud of southern coast of South Korea. The number of unique genotypes observed from June (163, 51.9%), September (307, 63.9%), December (205, 73.8%) and February (136, 74.7%), indicating a high degree of genetic diversity. Contrary, lower genetic diversity was detected in April (99, 46.8%), including predominant genotypes comprised >30 V. parahaemolyticus isolates. Jackknife analysis indicated that 65.1% tidal water isolates and 87.1% mud isolates were correctly assigned to their source groups. Sixty-nine isolates of pathogenic V. parahaemolyticus were clustered into two groups, separated by sampling month, source of isolation and serogroups. Serotypes O1, O4, O5, O10/O12 and O11 were the dominant serovariants, while serotypes O3/O13 were highly detected in April where there were no pathogenic V. parahaemolyticus isolates. Most of the V. parahaemolyticus isolates were resistant to ampicillin, ceftazidime and sulfamethoxazole. Interestingly, four V. parahaemolyticus isolates resistant to carbepenem did not contain the known carbapenemase-encoding gene, but possess an extended-spectrum ß-lactamase blaTEM.


Assuntos
Água do Mar/microbiologia , Vibrio parahaemolyticus/genética , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/efeitos dos fármacos , Farmacorresistência Bacteriana/genética , Variação Genética , Genótipo , Filogenia , República da Coreia , Estações do Ano , Sorogrupo , Vibrio parahaemolyticus/classificação , Vibrio parahaemolyticus/efeitos dos fármacos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...