Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
1.
Sensors (Basel) ; 24(4)2024 Feb 07.
Article in English | MEDLINE | ID: mdl-38400233

ABSTRACT

The unconsolidated near surface and large, daily temperature variations in the desert environment degrade the vertical seismic profiling (VSP) data, posing the need for rigorous quality control. Distributed acoustic sensing (DAS) VSP data are often benchmarked using geophone surveys as a gold standard. This study showcases a new simulation-based way to assess the quality of DAS VSP acquired in the desert without geophone data. The depth uncertainty of the DAS channels in the wellbore is assessed by calibrating against formation depth based on the concept of conservation of the energy flux. Using the 1D velocity model derived from checkshot data, we simulate both DAS and geophone VSP data via an elastic pseudo-spectral finite difference method, and estimate the source and receiver signatures using matching filters. These field geophone data show high amplitude variations between channels that cannot be replicated in the simulation. In contrast, the DAS simulation shows a high visual similarity with the field DAS first arrival waveforms. The simulated source and receiver signatures are visually indistinguishable from the field DAS data in this study. Since under perfect conditions, the receiver signatures should be invariant with depth, we propose a new DAS data quality control metric based on local variations of the receiver signatures which does not require geophone measurements.

2.
Environ Sci Technol ; 57(46): 18058-18066, 2023 Nov 21.
Article in English | MEDLINE | ID: mdl-37582237

ABSTRACT

Machine learning (ML) techniques promise to revolutionize environmental research and management, but collecting the necessary volumes of high-quality data remains challenging. Environmental sensors are often deployed under harsh conditions, requiring labor-intensive quality assurance and control (QAQC) processes. The need for manual QAQC is a major impediment to the scalability of these sensor networks. Existing techniques for automated QAQC make strong assumptions about noise profiles in the data they filter that do not necessarily hold for broadly deployed environmental sensors, however. Toward the goal of increasing the volume of high-quality environmental data, we introduce an ML-assisted QAQC methodology that is robust to low signal-to-noise ratio data. Our approach embeds sensor measurements into a dynamical feature space and trains a binary classification algorithm (Support Vector Machine) to detect deviation from expected process dynamics, indicating whether a sensor has become compromised and requires maintenance. This strategy enables the automated detection of a wide variety of nonphysical signals. We apply the methodology to three novel data sets produced by 136 low-cost environmental sensors (stream level, drinking water pH, and drinking water electroconductivity), deployed by our group across 250,000 km2 in Michigan, USA. The proposed methodology achieved accuracy scores of up to 0.97 and consistently outperformed state-of-the-art anomaly detection techniques.


Subject(s)
Drinking Water , Machine Learning , Algorithms , Michigan
3.
Ther Innov Regul Sci ; 57(6): 1217-1228, 2023 11.
Article in English | MEDLINE | ID: mdl-37450198

ABSTRACT

Monitoring of clinical trials is a fundamental process required by regulatory agencies. It assures the compliance of a center to the required regulations and the trial protocol. Traditionally, monitoring teams relied on extensive on-site visits and source data verification. However, this is costly, and the outcome is limited. Thus, central statistical monitoring (CSM) is an additional approach recently embraced by the International Council for Harmonisation (ICH) to detect problematic or erroneous data by using visualizations and statistical control measures. Existing implementations have been primarily focused on detecting inlier and outlier data. Other approaches include principal component analysis and distribution of the data. Here we focus on the utilization of comparisons of centers to the Grand mean for different model types and assumptions for common data types, such as binomial, ordinal, and continuous response variables. We implement the usage of multiple comparisons of single centers to the Grand mean of all centers. This approach is also available for various non-normal data types that are abundant in clinical trials. Further, using confidence intervals, an assessment of equivalence to the Grand mean can be applied. In a Monte Carlo simulation study, the applied statistical approaches have been investigated for their ability to control type I error and the assessment of their respective power for balanced and unbalanced designs which are common in registry data and clinical trials. Data from the German Multiple Sclerosis Registry (GMSR) including proportions of missing data, adverse events and disease severity scores were used to verify the results on Real-World-Data (RWD).


Subject(s)
Multiple Sclerosis , Humans , Multiple Sclerosis/drug therapy , Computer Simulation
4.
BMC Bioinformatics ; 24(1): 77, 2023 Mar 03.
Article in English | MEDLINE | ID: mdl-36869285

ABSTRACT

BACKGROUND: Data archiving and distribution are essential to scientific rigor and reproducibility of research. The National Center for Biotechnology Information's Database of Genotypes and Phenotypes (dbGaP) is a public repository for scientific data sharing. To support curation of thousands of complex data sets, dbGaP has detailed submission instructions that investigators must follow when archiving their data. RESULTS: We developed dbGaPCheckup, an R package which implements a series of check, awareness, reporting, and utility functions to support data integrity and proper formatting of the subject phenotype data set and data dictionary prior to dbGaP submission. For example, as a tool, dbGaPCheckup ensures that the data dictionary contains all fields required by dbGaP, and additional fields required by dbGaPCheckup; the number and names of variables match between the data set and data dictionary; there are no duplicated variable names or descriptions; observed data values are not more extreme than the logical minimum and maximum values stated in the data dictionary; and more. The package also includes functions that implement a series of minor/scalable fixes when errors are detected (e.g., a function to reorder the variables in the data dictionary to match the order listed in the data set). Finally, we also include reporting functions that produce graphical and textual descriptives of the data to further reduce the likelihood of data integrity issues. The dbGaPCheckup R package is available on CRAN ( https://CRAN.R-project.org/package=dbGaPCheckup ) and developed on GitHub ( https://github.com/lwheinsberg/dbGaPCheckup ). CONCLUSION: dbGaPCheckup is an innovative assistive and timesaving tool that fills an important gap for researchers by making dbGaP submission of large and complex data sets less error prone.


Subject(s)
Biotechnology , Information Dissemination , Reproducibility of Results , Databases, Factual , Phenotype
5.
Front Plant Sci ; 14: 1077196, 2023.
Article in English | MEDLINE | ID: mdl-36760650

ABSTRACT

Variety testing is an indispensable and essential step in the process of creating new improved varieties from breeding to adoption. The performance of the varieties can be compared and evaluated based on multi-trait data from multi-location variety tests in multiple years. Although high-throughput phenotypic platforms have been used for observing some specific traits, manual phenotyping is still widely used. The efficient management of large amounts of data is still a significant problem for crop variety testing. This study reports a variety test platform (VTP) that was created to manage the whole workflow for the standardization and data quality improvement of crop variety testing. Through the VTP, the phenotype data of varieties can be integrated and reused based on standardized data elements and datasets. Moreover, the information support and automated functions for the whole testing workflow help users conduct tests efficiently through a series of functions such as test design, data acquisition and processing, and statistical analyses. The VTP has been applied to regional variety tests covering more than seven thousand locations across the whole country, and then a standardized and authoritative phenotypic database covering five crops has been generated. In addition, the VTP can be deployed on either privately or publicly available high-performance computing nodes so that test management and data analysis can be conveniently done using a web-based interface or mobile application. In this way, the system can provide variety test management services to more small and medium-sized breeding organizations, and ensures the mutual independence and security of test data. The application of VTP shows that the platform can make variety testing more efficient and can be used to generate a reliable database suitable for meta-analysis in multi-omics breeding and variety development projects.

6.
Methods Mol Biol ; 2426: 267-302, 2023.
Article in English | MEDLINE | ID: mdl-36308693

ABSTRACT

Protein post-translational modifications (PTMs) are essential elements of cellular communication. Their variations in abundance can affect cellular pathways, leading to cellular disorders and diseases. A widely used method for revealing PTM-mediated regulatory networks is their label-free quantitation (LFQ) by high-resolution mass spectrometry. The raw data resulting from such experiments are generally interpreted using specific software, such as MaxQuant, MassChroQ, or Proline for instance. They provide data matrices containing quantified intensities for each modified peptide identified. Statistical analyses are then necessary (1) to ensure that the quantified data are of good enough quality and sufficiently reproducible, (2) to highlight the modified peptides that are differentially abundant between the biological conditions under study. The objective of this chapter is therefore to provide a complete data analysis pipeline for analyzing the quantified values of modified peptides in presence of two or more biological conditions using the R software. We illustrate our pipeline starting from MaxQuant outputs dealing with the analysis of A549-ACE2 cells infected by SARS-CoV-2 at different time stamps, freely available on PRIDE (PXD020019).


Subject(s)
COVID-19 , Proteomics , Humans , Proteomics/methods , SARS-CoV-2 , Protein Processing, Post-Translational , Software , Peptides/metabolism
7.
Mar Pollut Bull ; 185(Pt A): 114181, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36308819

ABSTRACT

Assessing the status of marine pollution at regional and sub-regional scales requires the use of comparable and harmonized data provided by multiple institutions, located in several countries. Standardized data management and quality control are crucial for supporting a coherent evaluation of marine pollution. Taking the Eastern Mediterranean Sea as a case study, we propose an approach to improve the quality control procedures used for sediment pollution data, thus supporting a harmonized environmental assessment. The regional ranges of contaminant concentrations in sediments were identified based on an in-depth literature review, and the lowest measured concentrations were evaluated to determine the "background concentrations" of chemical substances not yet targeted in the Mediterranean Sea. In addition, to verify the suitability of the approach for validating large data collections provided by multiple sources, the determined ranges were used to validate a regional dataset available through EMODnet data infrastructure.


Subject(s)
Polycyclic Aromatic Hydrocarbons , Trace Elements , Water Pollutants, Chemical , Polycyclic Aromatic Hydrocarbons/analysis , Geologic Sediments/chemistry , Environmental Monitoring/methods , Water Pollutants, Chemical/analysis , Data Collection , Quality Control
8.
JMIR Med Inform ; 10(4): e36481, 2022 Apr 13.
Article in English | MEDLINE | ID: mdl-35416792

ABSTRACT

BACKGROUND: With the advent of data-intensive science, a full integration of big data science and health care will bring a cross-field revolution to the medical community in China. The concept big data represents not only a technology but also a resource and a method. Big data are regarded as an important strategic resource both at the national level and at the medical institutional level, thus great importance has been attached to the construction of a big data platform for health care. OBJECTIVE: We aimed to develop and implement a big data platform for a large hospital, to overcome difficulties in integrating, calculating, storing, and governing multisource heterogeneous data in a standardized way, as well as to ensure health care data security. METHODS: The project to build a big data platform at West China Hospital of Sichuan University was launched in 2017. The West China Hospital of Sichuan University big data platform has extracted, integrated, and governed data from different departments and sections of the hospital since January 2008. A master-slave mode was implemented to realize the real-time integration of multisource heterogeneous massive data, and an environment that separates heterogeneous characteristic data storage and calculation processes was built. A business-based metadata model was improved for data quality control, and a standardized health care data governance system and scientific closed-loop data security ecology were established. RESULTS: After 3 years of design, development, and testing, the West China Hospital of Sichuan University big data platform was formally brought online in November 2020. It has formed a massive multidimensional data resource database, with more than 12.49 million patients, 75.67 million visits, and 8475 data variables. Along with hospital operations data, newly generated data are entered into the platform in real time. Since its launch, the platform has supported more than 20 major projects and provided data service, storage, and computing power support to many scientific teams, facilitating a shift in the data support model-from conventional manual extraction to self-service retrieval (which has reached 8561 retrievals per month). CONCLUSIONS: The platform can combine operation systems data from all departments and sections in a hospital to form a massive high-dimensional high-quality health care database that allows electronic medical records to be used effectively and taps into the value of data to fully support clinical services, scientific research, and operations management. The West China Hospital of Sichuan University big data platform can successfully generate multisource heterogeneous data storage and computing power. By effectively governing massive multidimensional data gathered from multiple sources, the West China Hospital of Sichuan University big data platform provides highly available data assets and thus has a high application value in the health care field. The West China Hospital of Sichuan University big data platform facilitates simpler and more efficient utilization of electronic medical record data for real-world research.

9.
Sensors (Basel) ; 22(4)2022 Feb 18.
Article in English | MEDLINE | ID: mdl-35214486

ABSTRACT

The rapid evolution of sensors and communication technologies has led to the production and transfer of mass data streams from vehicles either inside their electronic units or to the outside world using the internet infrastructure. The "outside world", in most cases, consists of third-party applications, such as fleet or traffic management control centers, which utilize vehicular data for reporting and monitoring functionalities. Such applications, in most cases, in order to facilitate their needs, require the exchange and processing of vast amounts of data which can be handled by the so-called Big Data technologies. The purpose of this study is to present a hybrid platform suitable for data collection, storing and analysis enhanced with quality control actions. In particular, the collected data contain various formats originating from different vehicle sensors and are stored in the aforementioned platform in a continuous way. The stored data in this platform must be checked in order to determine and validate them in terms of quality. To do so, certain actions, such as missing values checks, format checks, range checks, etc., must be carried out. The results of the quality control functions are presented herein, and useful conclusions are drawn in order to avoid possible data quality problems which may occur in further analysis and use of the data, e.g., for training of artificial intelligence models.

10.
Article in Chinese | WPRIM (Western Pacific) | ID: wpr-996019

ABSTRACT

Quality management and control of single disease is a means to continuously improve medical quality and safety by building a set of quality control indicators and evaluation systems based on the whole process of disease diagnosis and treatment. In the actual single disease management process, the reporting of each disease involved data from various systems such as electronic medical records, and the data integration was difficult. While the traditional manual reporting method took a lot of time and the data accuracy could not be guaranteed. In the development process of hospital informatization, a hospital has designed a set of intelligent full-closed loop single disease management platform based on the hospital information system, by integrating the existing human and information data resources of the hospital. This platform integrated functions of single disease intranet reporting, in-depth capture of reporting elements, single-disease quality index management, and single-disease real-time intelligent control, in order to promote more refined and intelligent disease management and thus steadily improve medical quality and safety.

11.
Comput Methods Programs Biomed ; 211: 106394, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34560604

ABSTRACT

BACKGROUND AND OBJECTIVE: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). METHODS: We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. RESULTS: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. CONCLUSION: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.


Subject(s)
COVID-19 , Pandemics , Humans , Logistic Models , Machine Learning , SARS-CoV-2
12.
BMC Res Notes ; 14(1): 366, 2021 Sep 20.
Article in English | MEDLINE | ID: mdl-34544495

ABSTRACT

OBJECTIVE: Among the different methods to profile the genome-wide patterns of transcription factor binding and histone modifications in cells and tissues, CUT&RUN has emerged as a more efficient approach that allows for a higher signal-to-noise ratio using fewer number of cells compared to ChIP-seq. The results from CUT&RUN and other related sequence enrichment assays requires comprehensive quality control (QC) and comparative analysis of data quality across replicates. While several computational tools currently exist for read mapping and analysis, a systematic reporting of data quality is lacking. Our aims were to (1) compare methods for using frozen versus fresh cells for CUT&RUN and (2) to develop an easy-to-use pipeline for assessing data quality. RESULTS: We compared a workflow for CUT&RUN with fresh and frozen samples, and present an R package called ssvQC for quality control and comparison of data quality derived from CUT&RUN and other enrichment-based sequence data. Using ssvQC, we evaluate results from different CUT&RUN protocols for transcription factors and histone modifications from fresh and frozen tissue samples. Overall, this process facilitates evaluation of data quality across datasets and permits inspection of peak calling analysis, replicate analysis of different data types. The package ssvQC is readily available at https://github.com/FrietzeLabUVM/ssvQC .


Subject(s)
Histone Code , Transcription Factors , Chromatin Immunoprecipitation , High-Throughput Nucleotide Sequencing , Quality Control , Workflow
13.
Mycorrhiza ; 31(6): 671-683, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34508280

ABSTRACT

Nearly 150 years of research has accumulated large amounts of data on mycorrhizal association types in plants. However, this important resource includes unreliable allocated traits for some species. An audit of six commonly used data sources revealed a high degree of consistency in the mycorrhizal status of most species, genera and families of vascular plants, but there were some records that contradict the majority of other data (~ 10% of data overall). Careful analysis of contradictory records using rigorous definitions of association types revealed that the majority were diagnosis errors, which often stem from references predating modern knowledge of mycorrhiza types. Other errors are linked to inadequate microscopic examinations of roots or plants with complex root anatomy, such as phi thickenings or beaded roots. Errors consistently occurred at much lower frequencies than correct records but have accumulated in uncorrected databases. This results in less accurate knowledge about dominant plants in some ecosystems because they were sampled more often. Errors have also propagated from one database to another over decades when data were amalgamated without checking their suitability. Due to these errors, it is often incorrect to designate plants reported to have inconsistent mycorrhizas as "facultatively mycorrhizal". Updated protocols for resolving conflicting mycorrhizal data are provided here. These are based on standard morphological definitions of association types, which are the foundations of mycorrhizal science. This analysis also identifies the need for adequate training and mentoring of researchers to maintain the quality of mycorrhizal research.


Subject(s)
Magnoliopsida , Mycorrhizae , Databases, Factual , Ecosystem , Plants
14.
BMC Bioinformatics ; 22(Suppl 6): 396, 2021 Aug 06.
Article in English | MEDLINE | ID: mdl-34362304

ABSTRACT

BACKGROUND: Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. RESULTS: Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. CONCLUSIONS: BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization .


Subject(s)
Heterochromatin , Mobile Applications , Animals , Chromosome Mapping , Drosophila melanogaster/genetics , Heterochromatin/genetics , Recombination, Genetic
15.
Sensors (Basel) ; 21(10)2021 May 14.
Article in English | MEDLINE | ID: mdl-34069085

ABSTRACT

In seismology, an increased effort to observe all 12 degrees of freedom of seismic ground motion by complementing translational ground motion observations with measurements of strain and rotational motions could be witnessed in recent decades, aiming at an enhanced probing and understanding of Earth and other planetary bodies. The evolution of optical instrumentation, in particular large-scale ring laser installations, such as G-ring and ROMY (ROtational Motion in seismologY), and their geoscientific application have contributed significantly to the emergence of this scientific field. The currently most advanced, large-scale ring laser array is ROMY, which is unprecedented in scale and design. As a heterolithic structure, ROMY's ring laser components are subject to optical frequency drifts. Such Sagnac interferometers require new considerations and approaches concerning data acquisition, processing and quality assessment, compared to conventional, mechanical instrumentation. We present an automated approach to assess the data quality and the performance of a ring laser, based on characteristics of the interferometric Sagnac signal. The developed scheme is applied to ROMY data to detect compromised operation states and assign quality flags. When ROMY's database becomes publicly accessible, this assessment will be employed to provide a quality control feature for data requests.

16.
Int J Med Inform ; 150: 104454, 2021 06.
Article in English | MEDLINE | ID: mdl-33866231

ABSTRACT

OBJECTIVE: This study compares seven machine learning models developed to predict childhood obesity from age > 2 to ≤ 7 years using Electronic Healthcare Record (EHR) data up to age 2 years. MATERIALS AND METHODS: EHR data from of 860,510 patients with 11,194,579 healthcare encounters were obtained from the Children's Hospital of Philadelphia. After applying stringent quality control to remove implausible growth values and including only individuals with all recommended wellness visits by age 7 years, 27,203 (50.78 % male) patients remained for model development. Seven machine learning models were developed to predict obesity incidence as defined by the Centers for Disease Control and Prevention (age/sex adjusted BMI>95th percentile). Model performance was evaluated by multiple standard classifier metrics and the differences among seven models were compared using the Cochran's Q test and post-hoc pairwise testing. RESULTS: XGBoost yielded 0.81 (0.001) AUC, which outperformed all other models. It also achieved statistically significant better performance than all other models on standard classifier metrics (sensitivity fixed at 80 %): precision 30.90 % (0.22 %), F1-socre 44.60 % (0.26 %), accuracy 66.14 % (0.41 %), and specificity 63.27 % (0.41 %). DISCUSSION AND CONCLUSION: Early childhood obesity prediction models were developed from the largest cohort reported to date. Relative to prior research, our models generalize to include males and females in a single model and extend the time frame for obesity incidence prediction to 7 years of age. The presented machine learning model development workflow can be adapted to various EHR-based studies and may be valuable for developing other clinical prediction models.


Subject(s)
Electronic Health Records , Pediatric Obesity , Child , Child, Preschool , Cohort Studies , Female , Humans , Incidence , Machine Learning , Male , Pediatric Obesity/epidemiology
17.
Sci Total Environ ; 779: 146381, 2021 Jul 20.
Article in English | MEDLINE | ID: mdl-33743460

ABSTRACT

Low-cost air quality sensor networks have been increasingly used for high spatial resolution air quality monitoring in recent years. Ensuring data reliability during continuous operation is critical for these sensor networks. Using particulate matter sensor as an example, this study reports a data quality control method, including sensor selection, pre-calibration, and online inspection. It was used in developing and operating the dense low-cost particle sensor networks in two Chinese cities. Firstly, seven mainstream sensors were tested and one model of particle sensor was selected due to its better linearity and stability. For a batch of sensors of the same model, although they were calibrated after manufactured, there are differences in response toward the same concentration of pollutants. The systematical variation of sensors was corrected and unified through pre-calibration. After deploying them in the field, a data analysis method is established for online inspecting their working status. Using data from these sensors, it evaluates parameters such as intraclass correlation coefficients and normalized root mean square error. These two metrics help to construct a two-dimensional coordinate system and to classify sensors into four status, including normal, fluctuation, hotspots, and malfunction. During a one-month operation in the two cities, 8 (out of 82) and 10 (out of 59) sensors with suspected malfunctions were screened out for further on-site inspection. Moreover, the sensor networks show potential in identifying illegal emission sources that cannot be typically detected by sparse regulatory air quality monitoring stations.

18.
Sci Total Environ ; 728: 138844, 2020 Aug 01.
Article in English | MEDLINE | ID: mdl-32361361

ABSTRACT

Mie-scatter lidar can capture the vertical distribution of aerosols, and a high degree of quantification of lidar data would be capable of coupling with a chemical transport model (CTM). Thus, we develop a data quality assurance and control scheme for aerosol lidar (TRANSFER) that mainly includes a Monte Carlo uncertainty analysis (MCA) and bilateral filtering (BF). The AErosol RObotic NETwork (AERONET) aerosol optical depth (AOD) is utilized as the ground truth to evaluate the validity of TRANSFER, and the result exhibits a sharp 41% (0.36) decrease in root mean square error (RMSE), elucidating an acceptable overall performance of TRANSFER. The maximum removal of uncertainties appears in MCA with an RMSE of 0.08 km-1, followed by denoising (DN) with 50% of MCA in RMSE. BF can smooth interior data without destroying the edge of the structure. The most noteworthy correction occurs in summer with an RMSE of 0.15 km-1 and Pearson correlation coefficient of 0.8, and the least correction occurs in winter with values of 0.07 km-1 and 0.93, respectively. Overestimations of raw data are mostly identified, and representative values occur with weak southerly winds, low visibility, high relative humidity (RH) and high concentrations of both ground fine particulate matter (PM2.5) and ozone. Apart from long-term variations, the intuitional variation in a typical overestimated pollution episode, especially represented by vertical profiles, shows a favorable performance of TRANSFER during stages of transport and local accumulation, as verified by backward trajectories. Few underestimation cases are mainly attributed to BF smoothing data with a sudden decrease. The main limitation of TRANSFER is the zigzag profiles found in a few cases with very small extinction coefficients. As a supplement to the research community of aerosol lidar and an exploration under complicated pollution in China, TRANSFER can aid in the preprocessing of lidar data-powered applications.

19.
Sensors (Basel) ; 20(3)2020 Feb 10.
Article in English | MEDLINE | ID: mdl-32050581

ABSTRACT

Maritime surveillance videos provide crucial on-spot kinematic traffic information (traffic volume, ship speeds, headings, etc.) for varied traffic participants (maritime regulation departments, ship crew, ship owners, etc.) which greatly benefits automated maritime situational awareness and maritime safety improvement. Conventional models heavily rely on visual ship features for the purpose of tracking ships from maritime image sequences which may contain arbitrary tracking oscillations. To address this issue, we propose an ensemble ship tracking framework with a multi-view learning algorithm and wavelet filter model. First, the proposed model samples ship candidates with a particle filter following the sequential importance sampling rule. Second, we propose a multi-view learning algorithm to obtain raw ship tracking results in two steps: extracting a group of distinct ship contour relevant features (i.e., Laplacian of Gaussian, local binary pattern, Gabor filter, histogram of oriented gradient, and canny descriptors) and learning high-level intrinsic ship features by jointly exploiting underlying relationships shared by each type of ship contour features. Third, with the help of the wavelet filter, we performed a data quality control procedure to identify abnormal oscillations in the ship positions which were further corrected to generate the final ship tracking results. We demonstrate the proposed ship tracker's performance on typical maritime traffic scenarios through four maritime surveillance videos.

20.
Methods Mol Biol ; 2117: 93-108, 2020.
Article in English | MEDLINE | ID: mdl-31960374

ABSTRACT

Chromatin organization and epigenetic marks play a critical role in stem cell pluripotency and differentiation. Chromatin digestion by micrococcal nuclease (MNase) followed by high-throughput sequencing (MNase-seq) is the most widely used genome-wide method for studying nucleosome organization, that is, the first level of DNA packaging into chromatin. Combined with chromatin immunoprecipitation (ChIP), MNase-ChIP-seq represents a high-resolution method for investigating both chromatin organization and the distribution of epigenetic marks and histone variants. The plot2DO package presented here is a flexible tool for evaluating the quality of MNase-seq and MNase-ChIP-seq data, and for visualizing the distribution of nucleosomes near the functional regions of the genome. The plot2DO package is open-source software, and it is freely available from https://github.com/rchereji/plot2DO under the MIT license.


Subject(s)
Computational Biology/methods , Nucleosomes/genetics , Nucleosomes/metabolism , Animals , Chromatin Immunoprecipitation , Computer Simulation , Epigenesis, Genetic , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...