Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Commun Biol ; 7(1): 553, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724695

ABSTRACT

For the last two decades, the amount of genomic data produced by scientific and medical applications has been growing at a rapid pace. To enable software solutions that analyze, process, and transmit these data in an efficient and interoperable way, ISO and IEC released the first version of the compression standard MPEG-G in 2019. However, non-proprietary implementations of the standard are not openly available so far, limiting fair scientific assessment of the standard and, therefore, hindering its broad adoption. In this paper, we present Genie, to the best of our knowledge the first open-source encoder that compresses genomic data according to the MPEG-G standard. We demonstrate that Genie reaches state-of-the-art compression ratios while offering interoperability with any other standard-compliant decoder independent from its manufacturer. Finally, the ISO/IEC ecosystem ensures the long-term sustainability and decodability of the compressed data through the ISO/IEC-supported reference decoder.


Subject(s)
Data Compression , Genomics , Software , Genomics/methods , Data Compression/methods , Humans
2.
BMC Bioinformatics ; 24(1): 121, 2023 Mar 28.
Article in English | MEDLINE | ID: mdl-36978010

ABSTRACT

BACKGROUND: In recent years, advances in high-throughput sequencing technologies have enabled the use of genomic information in many fields, such as precision medicine, oncology, and food quality control. The amount of genomic data being generated is growing rapidly and is expected to soon surpass the amount of video data. The majority of sequencing experiments, such as genome-wide association studies, have the goal of identifying variations in the gene sequence to better understand phenotypic variations. We present a novel approach for compressing gene sequence variations with random access capability: the Genomic Variant Codec (GVC). We use techniques such as binarization, joint row- and column-wise sorting of blocks of variations, as well as the image compression standard JBIG for efficient entropy coding. RESULTS: Our results show that GVC provides the best trade-off between compression and random access compared to the state of the art: it reduces the genotype information size from 758 GiB down to 890 MiB on the publicly available 1000 Genomes Project (phase 3) data, which is 21% less than the state of the art in random-access capable methods. CONCLUSIONS: By providing the best results in terms of combined random access and compression, GVC facilitates the efficient storage of large collections of gene sequence variations. In particular, the random access capability of GVC enables seamless remote data access and application integration. The software is open source and available at https://github.com/sXperfect/gvc/ .


Subject(s)
Data Compression , Data Compression/methods , Algorithms , Genome-Wide Association Study , Genomics/methods , Software , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods
3.
Folia Phoniatr Logop ; 75(1): 1-12, 2023.
Article in English | MEDLINE | ID: mdl-36209730

ABSTRACT

BACKGROUND: Language sample analysis (LSA) is invaluable to describe and understand child language use and development for clinical purposes and research. Digital tools supporting LSA are available, but many of the LSA steps have not been automated. Nevertheless, programs that include automatic speech recognition (ASR), the first step of LSA, have already reached mainstream applicability. SUMMARY: To better understand the complexity, challenges, and future needs of automatic LSA from a technological perspective, including the tasks of transcribing, annotating, and analysing natural child language samples, this article takes on a multidisciplinary view. Requirements of a fully automated LSA process are characterized, features of existing LSA software tools compared, and prior work from the disciplines of information science and computational linguistics reviewed. KEY MESSAGES: Existing tools vary in their extent of automation provided across the process of LSA. Advances in machine learning for speech recognition and processing have potential to facilitate LSA, but the specifics of child speech and language as well as the lack of child data complicate software design. A transdisciplinary approach is recommended as feasible to support future software development for LSA.


Subject(s)
Language Disorders , Language , Child , Humans , Speech , Software , Child Language
4.
Sci Rep ; 11(1): 16047, 2021 08 06.
Article in English | MEDLINE | ID: mdl-34362967

ABSTRACT

High-throughput root phenotyping in the soil became an indispensable quantitative tool for the assessment of effects of climatic factors and molecular perturbation on plant root morphology, development and function. To efficiently analyse a large amount of structurally complex soil-root images advanced methods for automated image segmentation are required. Due to often unavoidable overlap between the intensity of fore- and background regions simple thresholding methods are, generally, not suitable for the segmentation of root regions. Higher-level cognitive models such as convolutional neural networks (CNN) provide capabilities for segmenting roots from heterogeneous and noisy background structures, however, they require a representative set of manually segmented (ground truth) images. Here, we present a GUI-based tool for fully automated quantitative analysis of root images using a pre-trained CNN model, which relies on an extension of the U-Net architecture. The developed CNN framework was designed to efficiently segment root structures of different size, shape and optical contrast using low budget hardware systems. The CNN model was trained on a set of 6465 masks derived from 182 manually segmented near-infrared (NIR) maize root images. Our experimental results show that the proposed approach achieves a Dice coefficient of 0.87 and outperforms existing tools (e.g., SegRoot) with Dice coefficient of 0.67 by application not only to NIR but also to other imaging modalities and plant species such as barley and arabidopsis soil-root images from LED-rhizotron and UV imaging systems, respectively. In summary, the developed software framework enables users to efficiently analyse soil-root images in an automated manner (i.e. without manual interaction with data and/or parameter tuning) providing quantitative plant scientists with a powerful analytical tool.

5.
J Acoust Soc Am ; 149(2): 1324, 2021 02.
Article in English | MEDLINE | ID: mdl-33639785

ABSTRACT

Wireless transmission of audio from or to signal processors of cochlear implants (CIs) is used to improve speech understanding of CI users. This transmission requires wireless communication to exchange the necessary data. Because they are battery powered devices, energy consumption needs to be kept low in CIs, therefore making bitrate reduction of the audio signals necessary. Additionally, low latency is essential. Previously, a codec for the electrodograms of CIs, called the Electrocodec, was proposed. In this work, a subjective evaluation of the Electrocodec is presented, which investigates the impact of the codec on monaural speech performance. The Electrocodec is evaluated with respect to speech recognition and quality in ten CI users and compared to the Opus audio codec. Opus is a low latency and low bitrate audio codec that best met the CI requirements in terms of bandwidth, bitrate, and latency. Achieving equal speech recognition and quality as Opus, the Electrocodec achieves lower mean bitrates than Opus. Actual rates vary from 24.3 up to 53.5 kbit/s, depending on the codec settings. While Opus has a minimum algorithmic latency of 5 ms, the Electrocodec has an algorithmic latency of 0 ms.


Subject(s)
Cochlear Implantation , Cochlear Implants , Speech Perception , Electric Stimulation , Noise
6.
IEEE Trans Pattern Anal Mach Intell ; 43(10): 3540-3554, 2021 10.
Article in English | MEDLINE | ID: mdl-32305893

ABSTRACT

In this work, we present a new versatile 3D multilinear statistical face model, based on a tensor factorisation of 3D face scans, that decomposes the shapes into person and expression subspaces. Investigation of the expression subspace reveals an inherent low-dimensional substructure, and further, a star-shaped structure. This is due to two novel findings. (1) Increasing the strength of one emotion approximately forms a linear trajectory in the subspace. (2) All these trajectories intersect at a single point - not at the neutral expression as assumed by almost all prior works-but at an apathetic expression. We utilise these structural findings by reparameterising the expression subspace by the fourth-order moment tensor centred at the point of apathy. We propose a 3D face reconstruction method from single or multiple 2D projections by assuming an uncalibrated projective camera model. The non-linearity caused by the perspective projection can be neatly included into the model. The proposed algorithm separates person and expression subspaces convincingly, and enables flexible, natural modelling of expressions for a wide variety of human faces. Applying the method on independent faces showed that morphing between different persons and expressions can be performed without strong deformations.


Subject(s)
Algorithms , Pattern Recognition, Automated , Artificial Intelligence , Face , Humans , Models, Statistical
7.
Bioinformatics ; 36(7): 2275-2277, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31830243

ABSTRACT

MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. RESULTS: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. AVAILABILITY AND IMPLEMENTATION: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Data Compression , High-Throughput Nucleotide Sequencing , Genome , Genomics , Software
8.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 4168-4172, 2019 Jul.
Article in English | MEDLINE | ID: mdl-31946788

ABSTRACT

Binaural sound coding strategies can improve speech intelligibility for cochlear implant (CI) users. These require a signal transmission between two CIs. As power consumption needs to be kept low in CIs, efficient coding or bit-rate reduction of the signals is necessary. In this work, it is proposed to code the electrical signals or excitation patterns (EP) of the CI instead of the audio signals captured by the microphones. For this purpose we designed a differential pulse code modulation based codec with zero algorithmic delay to code the EP of the advanced combination encoder (ACE) sound coding strategy for CIs. Our EP codec was compared to the G.722 64 kbit/s audio codec using the signal-to-noise ratio (SNR) as objective measure of quality. On two audio-sets the mean SNR was 0.5 to 13.9 dB higher when coding the EP with the proposed coding method while achieving a mean bit-rate between 34.1 and 40.3 kbit/s.


Subject(s)
Cochlear Implantation , Cochlear Implants , Speech Perception , Electric Stimulation , Humans , Speech Intelligibility
9.
J Comput Biol ; 25(10): 1141-1151, 2018 10.
Article in English | MEDLINE | ID: mdl-30059248

ABSTRACT

Previous studies on quality score compression can be classified into two main lines: lossy schemes and lossless schemes. Lossy schemes enable a better management of computational resources. Thus, in practice, and for preliminary analyses, bioinformaticians may prefer to work with a lossy quality score representation. However, the original quality scores might be required for a deeper analysis of the data. Hence, it might be necessary to keep them; in addition to lossy compression this requires lossless compression as well. We developed a space-efficient hierarchical representation of quality scores, QScomp, which allows the users to work with lossy quality scores in routine analysis, without sacrificing the capability of reaching the original quality scores when further investigations are required. Each quality score is represented by a tuple through a novel decomposition. The first and second dimensions of these tuples are separately compressed such that the first-level compression is a lossy scheme. The compressed information of the second dimension allows the users to extract the original quality scores. Experiments on real data reveal that the downstream analysis with the lossy part-spending only 0.49 bits per quality score on average-shows a competitive performance, and that the total space usage with the inclusion of the compressed second dimension is comparable to the performance of competing lossless schemes.


Subject(s)
Algorithms , Data Compression/methods , Data Compression/standards , Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/standards , Genomics , Humans
10.
Bioinformatics ; 34(10): 1650-1658, 2018 05 15.
Article in English | MEDLINE | ID: mdl-29186284

ABSTRACT

Motivation: Recent advancements in high-throughput sequencing technology have led to a rapid growth of genomic data. Several lossless compression schemes have been proposed for the coding of such data present in the form of raw FASTQ files and aligned SAM/BAM files. However, due to their high entropy, losslessly compressed quality values account for about 80% of the size of compressed files. For the quality values, we present a novel lossy compression scheme named CALQ. By controlling the coarseness of quality value quantization with a statistical genotyping model, we minimize the impact of the introduced distortion on downstream analyses. Results: We analyze the performance of several lossy compressors for quality values in terms of trade-off between the achieved compressed size (in bits per quality value) and the Precision and Recall achieved after running a variant calling pipeline over sequencing data of the well-known NA12878 individual. By compressing and reconstructing quality values with CALQ, we observe a better average variant calling performance than with the original data while achieving a size reduction of about one order of magnitude with respect to the state-of-the-art lossless compressors. Furthermore, we show that CALQ performs as good as or better than the state-of-the-art lossy compressors in terms of variant calling Recall and Precision for most of the analyzed datasets. Availability and implementation: CALQ is written in C ++ and can be downloaded from https://github.com/voges/calq. Contact: voges@tnt.uni-hannover.de or mhernaez@illinois.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Data Compression/methods , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Software , Algorithms , Humans , Models, Statistical , Sequence Alignment , Sequence Analysis, DNA/methods
11.
Nat Methods ; 13(12): 1005-1008, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27776113

ABSTRACT

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.


Subject(s)
Computational Biology/methods , Data Compression/methods , High-Throughput Nucleotide Sequencing/methods , Animals , Cacao/genetics , Drosophila melanogaster/genetics , Escherichia coli/genetics , Humans , Pseudomonas aeruginosa/genetics
12.
Sensors (Basel) ; 13(1): 137-51, 2012 Dec 21.
Article in English | MEDLINE | ID: mdl-23344378

ABSTRACT

An important part of computed tomography is the calculation of a three-dimensional reconstruction of an object from series of X-ray images. Unfortunately, some applications do not provide sufficient X-ray images. Then, the reconstructed objects no longer truly represent the original. Inside of the volumes, the accuracy seems to vary unpredictably. In this paper, we introduce a novel method to evaluate any reconstruction, voxel by voxel. The evaluation is based on a sophisticated probabilistic handling of the measured X-rays, as well as the inclusion of a priori knowledge about the materials that the object receiving the X-ray examination consists of. For each voxel, the proposed method outputs a numerical value that represents the probability of existence of a predefined material at the position of the voxel while doing X-ray. Such a probabilistic quality measure was lacking so far. In our experiment, false reconstructed areas get detected by their low probability. In exact reconstructed areas, a high probability predominates. Receiver Operating Characteristics not only confirm the reliability of our quality measure but also demonstrate that existing methods are less suitable for evaluating a reconstruction.


Subject(s)
Imaging, Three-Dimensional/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Algorithms , Models, Theoretical , Probability , Quality Control , ROC Curve , Reproducibility of Results , X-Rays
13.
Med Image Comput Comput Assist Interv ; 13(Pt 3): 327-34, 2010.
Article in English | MEDLINE | ID: mdl-20879416

ABSTRACT

Breathing motion complicates many image-guided interventions working on the thorax or upper abdomen. However, prior knowledge provided by a statistical breathing model, can reduce the uncertainties of organ location. In this paper, a prediction framework for statistical motion modeling is presented and different representations of the dynamic data for motion model building of the lungs are investigated. Evaluation carried out on 4D-CT data sets of 10 patients showed that a displacement vector-based representation can reduce most of the respiratory motion with a prediction error of about 2 mm, when assuming the diaphragm motion to be known.


Subject(s)
Artifacts , Imaging, Three-Dimensional/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Radiography, Thoracic/methods , Respiratory Mechanics , Respiratory-Gated Imaging Techniques/methods , Tomography, X-Ray Computed/methods , Computer Simulation , Data Interpretation, Statistical , Humans , Models, Biological , Models, Statistical , Movement , Reproducibility of Results , Sensitivity and Specificity
14.
Med Image Anal ; 13(3): 471-82, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19285910

ABSTRACT

For many orthopaedic, neurological, and oncological applications, an exact segmentation of the vertebral column including an identification of each vertebra is essential. However, although bony structures show high contrast in CT images, the segmentation and labelling of individual vertebrae is challenging. In this paper, we present a comprehensive solution for automatically detecting, identifying, and segmenting vertebrae in CT images. A framework has been designed that takes an arbitrary CT image, e.g., head-neck, thorax, lumbar, or whole spine, as input and provides a segmentation in form of labelled triangulated vertebra surface models. In order to obtain a robust processing chain, profound prior knowledge is applied through the use of various kinds of models covering shape, gradient, and appearance information. The framework has been tested on 64 CT images even including pathologies. In 56 cases, it was successfully applied resulting in a final mean point-to-surface segmentation error of 1.12+/-1.04mm. One key issue is a reliable identification of vertebrae. For a single vertebra, we achieve an identification success of more than 70%. Increasing the number of available vertebrae leads to an increase in the identification rate reaching 100% if 16 or more vertebrae are shown in the image.


Subject(s)
Algorithms , Artificial Intelligence , Pattern Recognition, Automated/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Spine/diagnostic imaging , Tomography, X-Ray Computed/methods , Humans , Radiographic Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
15.
Med Image Comput Comput Assist Interv ; 11(Pt 1): 227-34, 2008.
Article in English | MEDLINE | ID: mdl-18979752

ABSTRACT

Including prior shape in the form of anatomical models is a well-known approach for improving segmentation results in medical images. Currently, most approaches are focused on the modeling and segmentation of individual objects. In case of object constellations, a simultaneous segmentation of the ensemble that uses not only prior knowledge of individual shapes but also additional information about spatial relations between the objects is often beneficial. In this paper, we present a two-scale framework for the modeling and segmentation of the spine as an example for object constellations. The global spine shape is expressed as a consecution of local vertebra coordinate systems while individual vertebrae are modeled as triangulated surface meshes. Adaptation is performed by attracting the model to image features but restricting the attraction to a former learned shape. With the developed approach, we obtained a segmentation accuracy of 1.0 mm in average for ten thoracic CT images improving former results.


Subject(s)
Algorithms , Artificial Intelligence , Pattern Recognition, Automated/methods , Radiographic Image Enhancement/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Spine/diagnostic imaging , Subtraction Technique , Tomography, X-Ray Computed/methods , Reproducibility of Results , Sensitivity and Specificity
16.
Med Image Comput Comput Assist Interv ; 10(Pt 2): 195-202, 2007.
Article in English | MEDLINE | ID: mdl-18044569

ABSTRACT

We present a new model-based approach for an automated labeling and segmentation of the rib cage in chest CT scans. A mean rib cage model including a complete vertebral column is created out of 29 data sets. We developed a ray search based procedure for rib cage detection and initial model pose. After positioning the model, it was adapted to 18 unseen CT data. In 16 out of 18 data sets, detection, labeling, and segmentation succeeded with a mean segmentation error of less than 1.3 mm between true and detected object surface. In one case the rib cage detection failed, in another case the automated labeling.


Subject(s)
Algorithms , Artificial Intelligence , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Ribs/diagnostic imaging , Tomography, X-Ray Computed/methods , Computer Simulation , Humans , Models, Biological , Radiographic Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...