Pesquisa | Portal Regional da BVS (teste)

1.

Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing.

Conrad, Tim O F; Ferrer, Eloi; Mietchen, Daniel; Pusch, Larissa; Stegmüller, Johannes; Schubotz, Moritz.

Sci Data ; 11(1): 676, 2024 Jun 22.

Artigo em Inglês | MEDLINE | ID: mdl-38909043

RESUMO

The sharing and citation of research data is becoming increasingly recognized as an essential building block in scientific research across various fields and disciplines. Sharing research data allows other researchers to reproduce results, replicate findings, and build on them. Ultimately, this will foster faster cycles in knowledge generation. Some disciplines, such as astronomy or bioinformatics, already have a long history of sharing data; many others do not. The current landscape of available systems for sharing research data is diverse. In this article, we conduct a detailed analysis of existing web-based systems, specifically focusing on mathematical research data.

2.

Do the Math: Making Mathematics in Wikipedia Computable.

Greiner-Petter, Andre; Schubotz, Moritz; Breitinger, Corinna; Scharpf, Philipp; Aizawa, Akiko; Gipp, Bela.

IEEE Trans Pattern Anal Mach Intell ; 45(4): 4384-4395, 2023 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-35914035

RESUMO

Wikipedia combines the power of AI solutions and human reviewers to safeguard article quality. Quality control objectives include detecting malicious edits, fixing typos, and spotting inconsistent formatting. However, no automated quality control mechanisms currently exist for mathematical formulae. Spell checkers are widely used to highlight textual errors, yet no equivalent tool exists to detect algebraically incorrect formulae. Our paper addresses this shortcoming by making mathematical formulae computable. We present a method that (1) gathers the semantic information surrounding the context of each mathematical formulae, (2) provides access to the information in a graph-structured dependency hierarchy, and (3) performs automatic plausibility checks on equations. We evaluate the performance of our approach on 6,337 mathematical expressions contained in 104 Wikipedia articles on the topic of orthogonal polynomials and special functions. Our system, [Formula: see text], verified 358 out of 1,516 equations as error-free. [Formula: see text] successfully translated 27% of the mathematical expressions and outperformed existing translation approaches by 16%. Additionally, [Formula: see text] achieved an F1 score of .495 for annotating mathematical expressions with relevant textual descriptions, which is a significant step towards advancing searchability, readability, and accessibility of mathematical formulae in Wikipedia. A prototype of [Formula: see text] and the semantically enhanced Wikipedia articles are available at: https://tpami.wmflabs.org.

3.

Caching and Reproducibility: Making Data Science Experiments Faster and FAIRer.

Schubotz, Moritz; Satpute, Ankit; Greiner-Petter, André; Aizawa, Akiko; Gipp, Bela.

Front Res Metr Anal ; 7: 861944, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35531060

RESUMO

Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computational expensive experiments. In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written. This article outlines caching recommendations for developing research software in data science projects. Our recommendations provide a perspective to circumvent common problems such as propriety dependence, speed, etc. At the same time, caching contributes to the reproducibility of experiments in the open science workflow. Concerning the four guiding principles, i.e., Findability, Accessibility, Interoperability, and Reusability (FAIR), we foresee that including the proposed recommendation in a research software development will make the data related to that software FAIRer for both machines and humans. We exhibit the usefulness of some of the proposed recommendations on our recently completed research software project in mathematical information retrieval.

4.

Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems.

Greiner-Petter, André; Schubotz, Moritz; Cohl, Howard S; Gipp, Bela.

ASLIB J Inf Manag ; 71(3)2019.

Artigo em Inglês | MEDLINE | ID: mdl-34603731

RESUMO

PURPOSE : Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. DESIGN/METHODOLOGY/APPROACH : Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. FINDINGS : The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. ORIGINALITY/VALUE : This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

5.

Improving the representation and conversion of mathematical formulae by considering their textual context.

Schubotz, Moritz; Greiner-Petter, André; Scharpf, Philipp; Meuschke, Norman; Cohl, Howard S; Gipp, Bela.

TUGboat (Provid) ; 39(3)2018 May.

Artigo em Inglês | MEDLINE | ID: mdl-34584342

RESUMO

Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial for communicating information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA