Pesquisa | Portal Regional da BVS (teste)

1.

Genie: the first open-source ISO/IEC encoder for genomic data.

Müntefering, Fabian; Adhisantoso, Yeremia Gunawan; Chandak, Shubham; Ostermann, Jörn; Hernaez, Mikel; Voges, Jan.

Commun Biol ; 7(1): 553, 2024 May 09.

Artigo em Inglês | MEDLINE | ID: mdl-38724695

RESUMO

For the last two decades, the amount of genomic data produced by scientific and medical applications has been growing at a rapid pace. To enable software solutions that analyze, process, and transmit these data in an efficient and interoperable way, ISO and IEC released the first version of the compression standard MPEG-G in 2019. However, non-proprietary implementations of the standard are not openly available so far, limiting fair scientific assessment of the standard and, therefore, hindering its broad adoption. In this paper, we present Genie, to the best of our knowledge the first open-source encoder that compresses genomic data according to the MPEG-G standard. We demonstrate that Genie reaches state-of-the-art compression ratios while offering interoperability with any other standard-compliant decoder independent from its manufacturer. Finally, the ISO/IEC ecosystem ensures the long-term sustainability and decodability of the compressed data through the ISO/IEC-supported reference decoder.

Assuntos

Compressão de Dados , Genômica , Software , Genômica/métodos , Compressão de Dados/métodos , Humanos

2.

GVC: efficient random access compression for gene sequence variations.

Adhisantoso, Yeremia Gunawan; Voges, Jan; Rohlfing, Christian; Tunev, Viktor; Ohm, Jens-Rainer; Ostermann, Jörn.

BMC Bioinformatics ; 24(1): 121, 2023 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-36978010

RESUMO

BACKGROUND: In recent years, advances in high-throughput sequencing technologies have enabled the use of genomic information in many fields, such as precision medicine, oncology, and food quality control. The amount of genomic data being generated is growing rapidly and is expected to soon surpass the amount of video data. The majority of sequencing experiments, such as genome-wide association studies, have the goal of identifying variations in the gene sequence to better understand phenotypic variations. We present a novel approach for compressing gene sequence variations with random access capability: the Genomic Variant Codec (GVC). We use techniques such as binarization, joint row- and column-wise sorting of blocks of variations, as well as the image compression standard JBIG for efficient entropy coding. RESULTS: Our results show that GVC provides the best trade-off between compression and random access compared to the state of the art: it reduces the genotype information size from 758 GiB down to 890 MiB on the publicly available 1000 Genomes Project (phase 3) data, which is 21% less than the state of the art in random-access capable methods. CONCLUSIONS: By providing the best results in terms of combined random access and compression, GVC facilitates the efficient storage of large collections of gene sequence variations. In particular, the random access capability of GVC enables seamless remote data access and application integration. The software is open source and available at https://github.com/sXperfect/gvc/ .

Assuntos

Compressão de Dados , Compressão de Dados/métodos , Algoritmos , Estudo de Associação Genômica Ampla , Genômica/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos

3.

Multidisciplinary Perspectives on Automatic Analysis of Children's Language Samples: Where Do We Go from Here?

Lüdtke, Ulrike; Bornman, Juan; de Wet, Febe; Heid, Ulrich; Ostermann, Jörn; Rumberg, Lars; van der Linde, Jeannie; Ehlert, Hanna.

Folia Phoniatr Logop ; 75(1): 1-12, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36209730

RESUMO

BACKGROUND: Language sample analysis (LSA) is invaluable to describe and understand child language use and development for clinical purposes and research. Digital tools supporting LSA are available, but many of the LSA steps have not been automated. Nevertheless, programs that include automatic speech recognition (ASR), the first step of LSA, have already reached mainstream applicability. SUMMARY: To better understand the complexity, challenges, and future needs of automatic LSA from a technological perspective, including the tasks of transcribing, annotating, and analysing natural child language samples, this article takes on a multidisciplinary view. Requirements of a fully automated LSA process are characterized, features of existing LSA software tools compared, and prior work from the disciplines of information science and computational linguistics reviewed. KEY MESSAGES: Existing tools vary in their extent of automation provided across the process of LSA. Advances in machine learning for speech recognition and processing have potential to facilitate LSA, but the specifics of child speech and language as well as the lack of child data complicate software design. A transdisciplinary approach is recommended as feasible to support future software development for LSA.

Assuntos

Transtornos da Linguagem , Idioma , Criança , Humanos , Fala , Software , Linguagem Infantil

4.

Fully-automated root image analysis (faRIA).

Narisetti, Narendra; Henke, Michael; Seiler, Christiane; Junker, Astrid; Ostermann, Jörn; Altmann, Thomas; Gladilin, Evgeny.

Sci Rep ; 11(1): 16047, 2021 08 06.

Artigo em Inglês | MEDLINE | ID: mdl-34362967

RESUMO

High-throughput root phenotyping in the soil became an indispensable quantitative tool for the assessment of effects of climatic factors and molecular perturbation on plant root morphology, development and function. To efficiently analyse a large amount of structurally complex soil-root images advanced methods for automated image segmentation are required. Due to often unavoidable overlap between the intensity of fore- and background regions simple thresholding methods are, generally, not suitable for the segmentation of root regions. Higher-level cognitive models such as convolutional neural networks (CNN) provide capabilities for segmenting roots from heterogeneous and noisy background structures, however, they require a representative set of manually segmented (ground truth) images. Here, we present a GUI-based tool for fully automated quantitative analysis of root images using a pre-trained CNN model, which relies on an extension of the U-Net architecture. The developed CNN framework was designed to efficiently segment root structures of different size, shape and optical contrast using low budget hardware systems. The CNN model was trained on a set of 6465 masks derived from 182 manually segmented near-infrared (NIR) maize root images. Our experimental results show that the proposed approach achieves a Dice coefficient of 0.87 and outperforms existing tools (e.g., SegRoot) with Dice coefficient of 0.67 by application not only to NIR but also to other imaging modalities and plant species such as barley and arabidopsis soil-root images from LED-rhizotron and UV imaging systems, respectively. In summary, the developed software framework enables users to efficiently analyse soil-root images in an automated manner (i.e. without manual interaction with data and/or parameter tuning) providing quantitative plant scientists with a powerful analytical tool.

5.

A subjective and objective evaluation of a codec for the electrical stimulation patterns of cochlear implants.

Hinrichs, Reemt; Gajecki, Tom; Ostermann, Jörn; Nogueira, Waldo.

J Acoust Soc Am ; 149(2): 1324, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33639785

RESUMO

Wireless transmission of audio from or to signal processors of cochlear implants (CIs) is used to improve speech understanding of CI users. This transmission requires wireless communication to exchange the necessary data. Because they are battery powered devices, energy consumption needs to be kept low in CIs, therefore making bitrate reduction of the audio signals necessary. Additionally, low latency is essential. Previously, a codec for the electrodograms of CIs, called the Electrocodec, was proposed. In this work, a subjective evaluation of the Electrocodec is presented, which investigates the impact of the codec on monaural speech performance. The Electrocodec is evaluated with respect to speech recognition and quality in ten CI users and compared to the Opus audio codec. Opus is a low latency and low bitrate audio codec that best met the CI requirements in terms of bandwidth, bitrate, and latency. Achieving equal speech recognition and quality as Opus, the Electrocodec achieves lower mean bitrates than Opus. Actual rates vary from 24.3 up to 53.5 kbit/s, depending on the codec settings. While Opus has a minimum algorithmic latency of 5 ms, the Electrocodec has an algorithmic latency of 0 ms.

Assuntos

Implante Coclear , Implantes Cocleares , Percepção da Fala , Estimulação Elétrica , Ruído

6.

Multilinear Modelling of Faces and Expressions.

Grasshof, Stella; Ackermann, Hanno; Brandt, Sami Sebastian; Ostermann, Jorn.

IEEE Trans Pattern Anal Mach Intell ; 43(10): 3540-3554, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-32305893

RESUMO

In this work, we present a new versatile 3D multilinear statistical face model, based on a tensor factorisation of 3D face scans, that decomposes the shapes into person and expression subspaces. Investigation of the expression subspace reveals an inherent low-dimensional substructure, and further, a star-shaped structure. This is due to two novel findings. (1) Increasing the strength of one emotion approximately forms a linear trajectory in the subspace. (2) All these trajectories intersect at a single point - not at the neutral expression as assumed by almost all prior works-but at an apathetic expression. We utilise these structural findings by reparameterising the expression subspace by the fourth-order moment tensor centred at the point of apathy. We propose a 3D face reconstruction method from single or multiple 2D projections by assuming an uncalibrated projective camera model. The non-linearity caused by the perspective projection can be neatly included into the model. The proposed algorithm separates person and expression subspaces convincingly, and enables flexible, natural modelling of expressions for a wide variety of human faces. Applying the method on independent faces showed that morphing between different persons and expressions can be performed without strong deformations.

Assuntos

Algoritmos , Reconhecimento Automatizado de Padrão , Inteligência Artificial , Face , Humanos , Modelos Estatísticos

7.

GABAC: an arithmetic coding solution for genomic data.

Voges, Jan; Paridaens, Tom; Müntefering, Fabian; Mainzer, Liudmila S; Bliss, Brian; Yang, Mingyu; Ochoa, Idoia; Fostier, Jan; Ostermann, Jörn; Hernaez, Mikel.

Bioinformatics ; 36(7): 2275-2277, 2020 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-31830243

RESUMO

MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. RESULTS: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. AVAILABILITY AND IMPLEMENTATION: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Compressão de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Genoma , Genômica , Software

8.

Coding of Electrical Stimulation Patterns for Binaural Sound Coding Strategies for Cochlear Implants.

Hinrichs, Reemt; Gajecki, Tom; Ostermann, Jorn; Nogueira, Waldo.

Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 4168-4172, 2019 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-31946788

RESUMO

Binaural sound coding strategies can improve speech intelligibility for cochlear implant (CI) users. These require a signal transmission between two CIs. As power consumption needs to be kept low in CIs, efficient coding or bit-rate reduction of the signals is necessary. In this work, it is proposed to code the electrical signals or excitation patterns (EP) of the CI instead of the audio signals captured by the microphones. For this purpose we designed a differential pulse code modulation based codec with zero algorithmic delay to code the EP of the advanced combination encoder (ACE) sound coding strategy for CIs. Our EP codec was compared to the G.722 64 kbit/s audio codec using the signal-to-noise ratio (SNR) as objective measure of quality. On two audio-sets the mean SNR was 0.5 to 13.9 dB higher when coding the EP with the proposed coding method while achieving a mean bit-rate between 34.1 and 40.3 kbit/s.

Assuntos

Implante Coclear , Implantes Cocleares , Percepção da Fala , Estimulação Elétrica , Humanos , Inteligibilidade da Fala

9.

A Two-Level Scheme for Quality Score Compression.

Voges, Jan; Fotouhi, Ali; Ostermann, Jörn; Külekci, Muhammed Oguzhan.

J Comput Biol ; 25(10): 1141-1151, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-30059248

RESUMO

Previous studies on quality score compression can be classified into two main lines: lossy schemes and lossless schemes. Lossy schemes enable a better management of computational resources. Thus, in practice, and for preliminary analyses, bioinformaticians may prefer to work with a lossy quality score representation. However, the original quality scores might be required for a deeper analysis of the data. Hence, it might be necessary to keep them; in addition to lossy compression this requires lossless compression as well. We developed a space-efficient hierarchical representation of quality scores, QScomp, which allows the users to work with lossy quality scores in routine analysis, without sacrificing the capability of reaching the original quality scores when further investigations are required. Each quality score is represented by a tuple through a novel decomposition. The first and second dimensions of these tuples are separately compressed such that the first-level compression is a lossy scheme. The compressed information of the second dimension allows the users to extract the original quality scores. Experiments on real data reveal that the downstream analysis with the lossy part-spending only 0.49 bits per quality score on average-shows a competitive performance, and that the total space usage with the inclusion of the compressed second dimension is comparable to the performance of competing lossless schemes.

Assuntos

Algoritmos , Compressão de Dados/métodos , Compressão de Dados/normas , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/normas , Genômica , Humanos

10.

CALQ: compression of quality values of aligned sequencing data.

Voges, Jan; Ostermann, Jörn; Hernaez, Mikel.

Bioinformatics ; 34(10): 1650-1658, 2018 05 15.

Artigo em Inglês | MEDLINE | ID: mdl-29186284

RESUMO

Motivation: Recent advancements in high-throughput sequencing technology have led to a rapid growth of genomic data. Several lossless compression schemes have been proposed for the coding of such data present in the form of raw FASTQ files and aligned SAM/BAM files. However, due to their high entropy, losslessly compressed quality values account for about 80% of the size of compressed files. For the quality values, we present a novel lossy compression scheme named CALQ. By controlling the coarseness of quality value quantization with a statistical genotyping model, we minimize the impact of the introduced distortion on downstream analyses. Results: We analyze the performance of several lossy compressors for quality values in terms of trade-off between the achieved compressed size (in bits per quality value) and the Precision and Recall achieved after running a variant calling pipeline over sequencing data of the well-known NA12878 individual. By compressing and reconstructing quality values with CALQ, we observe a better average variant calling performance than with the original data while achieving a size reduction of about one order of magnitude with respect to the state-of-the-art lossless compressors. Furthermore, we show that CALQ performs as good as or better than the state-of-the-art lossy compressors in terms of variant calling Recall and Precision for most of the analyzed datasets. Availability and implementation: CALQ is written in C ++ and can be downloaded from https://github.com/voges/calq. Contact: voges@tnt.uni-hannover.de or mhernaez@illinois.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Compressão de Dados/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Algoritmos , Humanos , Modelos Estatísticos , Alinhamento de Sequência , Análise de Sequência de DNA/métodos

11.

Comparison of high-throughput sequencing data compression tools.

Numanagic, Ibrahim; Bonfield, James K; Hach, Faraz; Voges, Jan; Ostermann, Jörn; Alberti, Claudio; Mattavelli, Marco; Sahinalp, S Cenk.

Nat Methods ; 13(12): 1005-1008, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-27776113

RESUMO

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

Assuntos

Biologia Computacional/métodos , Compressão de Dados/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Cacau/genética , Drosophila melanogaster/genética , Escherichia coli/genética , Humanos , Pseudomonas aeruginosa/genética

12.

Probabilistic evaluation of three-dimensional reconstructions from X-ray images spanning a limited angle.

Frost, Anja; Renners, Eike; Hötter, Michael; Ostermann, Jörn.

Sensors (Basel) ; 13(1): 137-51, 2012 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-23344378

RESUMO

An important part of computed tomography is the calculation of a three-dimensional reconstruction of an object from series of X-ray images. Unfortunately, some applications do not provide sufficient X-ray images. Then, the reconstructed objects no longer truly represent the original. Inside of the volumes, the accuracy seems to vary unpredictably. In this paper, we introduce a novel method to evaluate any reconstruction, voxel by voxel. The evaluation is based on a sophisticated probabilistic handling of the measured X-rays, as well as the inclusion of a priori knowledge about the materials that the object receiving the X-ray examination consists of. For each voxel, the proposed method outputs a numerical value that represents the probability of existence of a predefined material at the position of the voxel while doing X-ray. Such a probabilistic quality measure was lacking so far. In our experiment, false reconstructed areas get detected by their low probability. In exact reconstructed areas, a high probability predominates. Receiver Operating Characteristics not only confirm the reliability of our quality measure but also demonstrate that existing methods are less suitable for evaluating a reconstruction.

Assuntos

Imageamento Tridimensional/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Tomografia Computadorizada por Raios X/métodos , Algoritmos , Modelos Teóricos , Probabilidade , Controle de Qualidade , Curva ROC , Reprodutibilidade dos Testes , Raios X

13.

Prediction framework for statistical respiratory motion modeling.

Klinder, Tobias; Lorenz, Cristian; Ostermann, Jörn.

Med Image Comput Comput Assist Interv ; 13(Pt 3): 327-34, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20879416

RESUMO

Breathing motion complicates many image-guided interventions working on the thorax or upper abdomen. However, prior knowledge provided by a statistical breathing model, can reduce the uncertainties of organ location. In this paper, a prediction framework for statistical motion modeling is presented and different representations of the dynamic data for motion model building of the lungs are investigated. Evaluation carried out on 4D-CT data sets of 10 patients showed that a displacement vector-based representation can reduce most of the respiratory motion with a prediction error of about 2 mm, when assuming the diaphragm motion to be known.

Assuntos

Artefatos , Imageamento Tridimensional/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Torácica/métodos , Mecânica Respiratória , Técnicas de Imagem de Sincronização Respiratória/métodos , Tomografia Computadorizada por Raios X/métodos , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Modelos Biológicos , Modelos Estatísticos , Movimento , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

14.

Automated model-based vertebra detection, identification, and segmentation in CT images.

Klinder, Tobias; Ostermann, Jörn; Ehm, Matthias; Franz, Astrid; Kneser, Reinhard; Lorenz, Cristian.

Med Image Anal ; 13(3): 471-82, 2009 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-19285910

RESUMO

For many orthopaedic, neurological, and oncological applications, an exact segmentation of the vertebral column including an identification of each vertebra is essential. However, although bony structures show high contrast in CT images, the segmentation and labelling of individual vertebrae is challenging. In this paper, we present a comprehensive solution for automatically detecting, identifying, and segmenting vertebrae in CT images. A framework has been designed that takes an arbitrary CT image, e.g., head-neck, thorax, lumbar, or whole spine, as input and provides a segmentation in form of labelled triangulated vertebra surface models. In order to obtain a robust processing chain, profound prior knowledge is applied through the use of various kinds of models covering shape, gradient, and appearance information. The framework has been tested on 64 CT images even including pathologies. In 56 cases, it was successfully applied resulting in a final mean point-to-surface segmentation error of 1.12+/-1.04mm. One key issue is a reliable identification of vertebrae. For a single vertebra, we achieve an identification success of more than 70%. Increasing the number of available vertebrae leads to an increase in the identification rate reaching 100% if 16 or more vertebrae are shown in the image.

Assuntos

Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Coluna Vertebral/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Humanos , Intensificação de Imagem Radiográfica/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

15.

Spine segmentation using articulated shape models.

Klinder, Tobias; Wolz, Robin; Lorenz, Cristian; Franz, Astrid; Ostermann, Jörn.

Med Image Comput Comput Assist Interv ; 11(Pt 1): 227-34, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18979752

RESUMO

Including prior shape in the form of anatomical models is a well-known approach for improving segmentation results in medical images. Currently, most approaches are focused on the modeling and segmentation of individual objects. In case of object constellations, a simultaneous segmentation of the ensemble that uses not only prior knowledge of individual shapes but also additional information about spatial relations between the objects is often beneficial. In this paper, we present a two-scale framework for the modeling and segmentation of the spine as an example for object constellations. The global spine shape is expressed as a consecution of local vertebra coordinate systems while individual vertebrae are modeled as triangulated surface meshes. Adaptation is performed by attracting the model to image features but restricting the attraction to a former learned shape. With the developed approach, we obtained a segmentation accuracy of 1.0 mm in average for ten thoracic CT images improving former results.

Assuntos

Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Intensificação de Imagem Radiográfica/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Coluna Vertebral/diagnóstico por imagem , Técnica de Subtração , Tomografia Computadorizada por Raios X/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

16.

Automated model-based rib cage segmentation and labeling in CT images.

Klinder, Tobias; Lorenz, Cristian; von Berg, Jens; Dries, Sebastian P M; Bülow, Thomas; Ostermann, Jörn.

Med Image Comput Comput Assist Interv ; 10(Pt 2): 195-202, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-18044569

RESUMO

We present a new model-based approach for an automated labeling and segmentation of the rib cage in chest CT scans. A mean rib cage model including a complete vertebral column is created out of 29 data sets. We developed a ray search based procedure for rib cage detection and initial model pose. After positioning the model, it was adapted to 18 unseen CT data. In 16 out of 18 data sets, detection, labeling, and segmentation succeeded with a mean segmentation error of less than 1.3 mm between true and detected object surface. In one case the rib cage detection failed, in another case the automated labeling.

Assuntos

Algoritmos , Inteligência Artificial , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Costelas/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Simulação por Computador , Humanos , Modelos Biológicos , Intensificação de Imagem Radiográfica/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA