Search | VHL Regional Portal

HPOSS: A hierarchical portfolio optimization stacking strategy to reduce the generalization error of ensembles of models.

Ozelim, Luan Carlos de Sena Monteiro; Ribeiro, Dimas Betioli; Schiavon, José Antonio; Domingues, Vinicius Resende; Queiroz, Paulo Ivo Braga de.

PLoS One ; 18(8): e0290331, 2023.

Article in English | MEDLINE | ID: mdl-37651433

ABSTRACT

Surrogate models are frequently used to replace costly engineering simulations. A single surrogate is frequently chosen based on previous experience or by fitting multiple surrogates and selecting one based on mean cross-validation errors. A novel stacking strategy will be presented in this paper. This new strategy results from reinterpreting the model selection process based on the generalization error. For the first time, this problem is proposed to be translated into a well-studied financial problem: portfolio management and optimization. In short, it is demonstrated that the individual residues calculated by leave-one-out procedures are samples from a given random variable Ïµi, whose second non-central moment is the i-th model's generalization error. Thus, a stacking methodology based solely on evaluating the behavior of the linear combination of the random variables Ïµi is proposed. At first, several surrogate models are calibrated. The Directed Bubble Hierarchical Tree (DBHT) clustering algorithm is then used to determine which models are worth stacking. The stacking weights can be calculated using any financial approach to the portfolio optimization problem. This alternative understanding of the problem enables practitioners to use established financial methodologies to calculate the models' weights, significantly improving the ensemble of models' out-of-sample performance. A study case is carried out to demonstrate the applicability of the new methodology. Overall, a total of 124 models were trained using a specific dataset: 40 Machine Learning models and 84 Polynomial Chaos Expansion models (which considered 3 types of base random variables, 7 least square algorithms for fitting the up to fourth order expansion's coefficients). Among those, 99 models could be fitted without convergence and other numerical issues. The DBHT algorithm with Pearson correlation distance and generalization error similarity was able to select a subgroup of 23 models from the 99 fitted ones, implying a reduction of about 77% in the total number of models, representing a good filtering scheme which still preserves diversity. Finally, it has been demonstrated that the weights obtained by building a Hierarchical Risk Parity (HPR) portfolio perform better for various input random variables, indicating better out-of-sample performance. In this way, an economic stacking strategy has demonstrated its worth in improving the out-of-sample capabilities of stacked models, which illustrates how the new understanding of model stacking methodologies may be useful.

Subject(s)

Algorithms , Engineering , Female , Pregnancy , Humans , Cluster Analysis , Generalization, Psychological , Machine Learning

Um método de ensaio para determinação da concentração de óleo em amostras de águas contaminadas com óleos e graxas / A test method for determining the oil concentration in water samples contaminated with oil and grease

Carvalho, Roberto Gonçalves de; Kruk, Nadiane Smaha; Kawachi, Elizabete Yoshie; Queiroz, Paulo Ivo Braga de.

Eng. sanit. ambient ; 24(3): 515-523, maio-jun. 2019. tab, graf

Article in Portuguese | LILACS-Express | LILACS | ID: biblio-1012048

ABSTRACT

RESUMO A concentração de óleos e graxas em amostras de águas contaminadas com resíduos oleosos pode ser determinada pelos procedimentos estabelecidos no Standard Methods for the Examination of Water and Wastewater. No entanto, sua aplicação nem sempre resulta em valores adequados ou níveis de precisão satisfatórios para atendimento de padrões normativos. Nesse sentido, este artigo apresenta uma proposta de ensaio para determinação da concentração de óleos minerais em águas provenientes de áreas pavimentadas, sujeitas ao derramamento de óleos e graxas. Tal método tem por base o método de partição gravimétrica (5520 B), estabelecido pelo Standard Methods. No novo procedimento, a etapa de separação entre o solvente de extração contendo os resíduos e o restante da fase aquosa foi substituída pela evaporação de toda a água da amostra, em estufa a 85ºC. Para avaliar a eficiência do método, foram preparadas amostras com concentrações conhecidas de óleo de 200, 100, 50, 25 e 15 mg.L-1 em água destilada e realizados ensaios de laboratório para determinação do teor de óleo, conforme tal procedimento. Os valores obtidos para as concentrações de óleo são bastante satisfatórios, apresentando comportamento linear em relação às concentrações de referência. Esse fato evidencia a confiabilidade do método proposto e sua aplicabilidade na determinação da concentração de óleos em amostras de águas contaminadas provenientes do escoamento superficial em pavimentos.

ABSTRACT Oil and grease concentration in water samples contaminated by oily residues can be determined by the procedures established in the Standard Methods for the Examination of Water and Wastewater. However, its application does not always result in adequate values or satisfactory accuracy levels in order to meet regulatory standards. In this sense, this paper presents a test-method proposal for determining mineral oil concentration in water samples from runoff of paved areas subject to oil and grease spillages. This method is based on the partition-gravimetric method (5520 B) established by the Standard Methods. In the new procedure, the separation between the extraction solvent containing residues and the aqueous phase remainder was replaced by the whole water sample evaporation in an oven at 85ºC. In order to assess the proposed method's efficiency, samples were prepared with known oil concentrations of 200, 100, 50, 25 and 15 mg.L-1, in distilled water and laboratory tests were performed to determine the oil content, according to the new procedure. The values obtained for the oil concentrations through the proposed procedure are quite satisfactory, presenting linear behavior in relation to the reference concentrations. This fact evidences the reliability of the new method and its applicability in determining the oil concentration in contaminated water samples from runoff in pavement surfaces.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL