ABSTRACT
Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
Subject(s)
Algorithms , Reinforcement, Psychology , RewardABSTRACT
Parameter optimization is a long-standing challenge in various production processes. Particularly, powder film forming processes entail multiscale and multiphysical phenomena, each of which is usually controlled by a combination of several parameters. Therefore, it is difficult to optimize the parameters either by numerical-model-based analysis or by "brute force" experiment-based exploration. In this study, we focus on a Bayesian optimization method that has led to breakthroughs in materials informatics. Specifically, we apply this method to exploration of production-process-parameter for the powder film forming process. To this end, a slurry containing a powder, polymer, and solvent was dropped, the drying temperature and time were controlled as parameters to be explored, and the uniformity of the fabricated film was evaluated. Using this experiment-based Bayesian optimization system, we searched for the optimal parameters among 32,768 (85) parameter sets to minimize defects. This optimization converged at 40 experiments, which is a substantially smaller number than that observed in brute-force exploration and traditional design-of-experiments methods. Furthermore, we inferred the mechanism corresponding to the unknown drying conditions discovered in the parameter exploration that resulted in uniform film formation. This demonstrates that a data-driven approach leads to high-throughput exploration and the discovery of novel parameters, which inspire further research.
ABSTRACT
Living bone must be cut before performing arthroplasty. For example, the distal part of the femur and the proximal part of the tibia must be cut to perform total knee arthroplasty. Osteocytes begin to necrose when the cutting temperature during such procedures exceeds 50 degrees C. In this study, the temperature distribution inside bone was calculated theoretically using a linear heat source moving on a semi-infinite plane. Moreover, the temperature distribution on the surface layer of the cutting edge was measured using thermography. Analytical and experimental results showed that the cutting temperature could be estimated for the cortical bone and that the cutting temperature increased suddenly near the surface of bone. Consequently, the cutting temperature of bone tissue within 0.15 mm of the surface layer may be above 35 degrees C if the cutting environment is not cooled, and consequently, the bone surface is heated to more than 55 degrees C.