Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Publication year range
1.
Z Erziehwiss ; : 1-39, 2023 Jun 12.
Article in German | MEDLINE | ID: mdl-37359181

ABSTRACT

In light of the Covid-19-related school lockdowns in Germany in 2020 schools, families and the students were faced with the major challenge to continue instruction at home. This paper examines the expectations of the parents that their children will experience school-related problems as a result to the lockdown-induced homeschooling within the next six months. For our explorative analysis, we choose a nonlinear regression approach. In the course of this, we introduce nonlinear models and highlight their added value compared to methods commonly used in empirical educational research. For the analysis we combine data from the National Educational Panel Study (NEPS) with additional data sources like the COVID-19-Dashboard of the Robert-Koch-Institut (RKI). Our results show that parental expectations of future school problems were particularly prevalent among those parents whose children had low reading competencies and low diligence as an aspect of school effort. In addition, we find a relationship between a lower occupational status (ISEI) and higher parental expectations of school-related problems. Furthermore, parents' short-term and long-term concerns about Covid-19 show a positive association, making school problems more likely in the eyes of the parents. The purpose of this paper, in addition to applying and explaining nonlinear models for the first time in empirical educational research, is to analyze expectations regarding problems of homeschooling in the first lockdown from a parents' perspective and to explore variables that influence these parental expectations.

2.
Comput Stat ; 38(2): 647-674, 2023.
Article in English | MEDLINE | ID: mdl-37223721

ABSTRACT

Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we propose the simulation of pseudo-documents as a novel evaluation method. In a case study with short and sparse text, the models are evaluated on tweets filtered by keywords relating to the Covid-19 pandemic. We find that standard coherence scores that are often used for the evaluation of topic models perform poorly as an evaluation metric. The results of our simulation-based approach suggest that the GSDMM and GPM topic models may generate better topics than the standard LDA model.

3.
J Appl Stat ; 50(3): 574-591, 2023.
Article in English | MEDLINE | ID: mdl-36819086

ABSTRACT

Unsupervised document classification for imbalanced data sets poses a major challenge. To obtain accurate classification results, training data sets are often created manually by humans which requires expert knowledge, time and money. Depending on the imbalance of the data set, this approach also either requires human labelling of all of the data or it fails to adequately recognize underrepresented categories. We propose an integration of web scraping, one-class Support Vector Machines (SVM) and Latent Dirichlet Allocation (LDA) topic modelling as a multi-step classification rule that circumvents manual labelling. Unsupervised one-class document classification with the integration of out-of-domain training data is achieved and >80% of the target data is correctly classified. The proposed method thus even outperforms common machine learning classifiers and is validated on multiple data sets.

4.
Int J Data Sci Anal ; : 1-21, 2022 May 06.
Article in English | MEDLINE | ID: mdl-35542313

ABSTRACT

Conspiracy theories have seen a rise in popularity in recent years. Spreading quickly through social media, their disruptive effect can lead to a biased public view on policy decisions and events. We present a novel approach for LDA-pre-processing called Iterative Filtering to study such phenomena based on Twitter data. In combination with Hashtag Pooling as an additional pre-processing step, we are able to achieve a coherent framing of the discussion and topics of interest, despite of the inherent noisiness and sparseness of Twitter data. Our novel approach enables researchers to gain detailed insights into discourses of interest on Twitter, allowing them to identify tweets iteratively that are related to an investigated topic of interest. As an application, we study the dynamics of conspiracy-related topics on US Twitter during the last four months of 2020, which were dominated by the US-Presidential Elections and Covid-19. We monitor the public discourse in the USA with geo-spatial Twitter data to identify conspiracy-related contents by estimating Latent Dirichlet Allocation (LDA) Topic Models. We find that in this period, usual conspiracy-related topics played a marginal role in comparison with dominating topics, such as the US-Presidential Elections or the general discussions about Covid-19. The main conspiracy theories in this period were the ones linked to "Election Fraud" and the "Covid-19-hoax." Conspiracy-related keywords tended to appear together with Trump-related words and words related to his presidential campaign.

5.
Int J Biostat ; 17(2): 317-329, 2021 01 13.
Article in English | MEDLINE | ID: mdl-34826371

ABSTRACT

Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.


Subject(s)
Algorithms , Models, Statistical , Likelihood Functions , Linear Models
SELECTION OF CITATIONS
SEARCH DETAIL
...