Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 79
Filter
1.
J Am Chem Soc ; 146(23): 16052-16061, 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38822795

ABSTRACT

The application of machine learning models to the prediction of reaction outcomes currently needs large and/or highly featurized data sets. We show that a chemistry-aware model, NERF, which mimics the bonding changes that occur during reactions, allows for highly accurate predictions of the outcomes of Diels-Alder reactions using a relatively small training set, with no pretraining and no additional features. We establish a diverse data set of 9537 intramolecular, hetero-, aromatic, and inverse electron demand Diels-Alder reactions. This data set is used to train a NERF model, and the performance is compared against state-of-the-art classification and generative machine learning models across low- and high-data regimes, with and without pretraining. The predictive accuracy (regio- and site selectivity in the major product) achieved by NERF exceeds 90% when as little as 40% of the data set is used for training. Another high-performing model, Chemformer, requires a larger training data set (>45%) and pretraining to reach 90% Top-1 accuracy. Accurate predictions of less-represented reaction subclasses, such as those involving heteroatomic or aromatic substrates, require higher percentages of training data. We also show how NERF can use small amounts of additional training data to quickly learn new systems and improve its overall understanding of reactivity. Synthetic chemists stand to benefit as this model can be rapidly expanded and tailored to areas of chemistry corresponding to the low-data regime.

2.
IEEE Trans Cybern ; 54(1): 486-495, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37022240

ABSTRACT

Finding the causal structure from a set of variables given observational data is a crucial task in many scientific areas. Most algorithms focus on discovering the global causal graph but few efforts have been made toward the local causal structure (LCS), which is of wide practical significance and easier to obtain. LCS learning faces the challenges of neighborhood determination and edge orientation. Available LCS algorithms build on conditional independence (CI) tests, they suffer the poor accuracy due to noises, various data generation mechanisms, and small-size samples of real-world applications, where CI tests do not work. In addition, they can only find the Markov equivalence class, leaving some edges undirected. In this article, we propose a GradieNt-based LCS learning approach (GraN-LCS) to determine neighbors and orient edges simultaneously in a gradient-descent way, and, thus, to explore LCS more accurately. GraN-LCS formulates the causal graph search as minimizing an acyclicity regularized score function, which can be optimized by efficient gradient-based solvers. GraN-LCS constructs a multilayer perceptron (MLP) to simultaneously fit all other variables with respect to a target variable and defines an acyclicity-constrained local recovery loss to promote the exploration of local graphs and to find out direct causes and effects of the target variable. To improve the efficacy, it applies preliminary neighborhood selection (PNS) to sketch the raw causal structure and further incorporates an l1 -norm-based feature selection on the first layer of MLP to reduce the scale of candidate variables and to pursue sparse weight matrix. GraN-LCS finally outputs LCS based on the sparse weighted adjacency matrix learned from MLPs. We conduct experiments on both synthetic and real-world datasets and verify its efficacy by comparing against state-of-the-art baselines. A detailed ablation study investigates the impact of key components of GraN-LCS and the results prove their contribution.

3.
Gene ; 898: 148111, 2024 Mar 10.
Article in English | MEDLINE | ID: mdl-38147897

ABSTRACT

BACKGROUND: Hyperthermia is used as an adjunctive treatment for gastric cancer; however, the corresponding antitumor mechanism remains unclear. OBJECTIVE: To investigate the expression of PLEK2 in gastric cancer and the mechanism by which hyperthermia inhibits gastric cancer progression and participating in immunomodulation. METHODS: PLEK2 was screened by combining microarray analysis with gene knockdown and proliferation assays. Analysis based on the TCGA database, GEPIA website, and detection of clinical samples was employed to investigate the expression and correlation of PLEK2 and PD-L1. Knockdown of the expression PLEK2, subsequent experiments including western blotting, RT-qPCR, cell functional assays, and flow cytometry were used to assess the effects on cell migration, invasion, viability, and apoptosis. Intervention with hyperthermia to explore its effects. To evaluate the impact on immunity by detecting T cell proliferation and the release of IFNγ, activated T cells were co-cultured with the target cells. RESULTS: Hyperthermia significantly reduced the expression of PLEK2 and PD-L1, while both were increased in gastric cancer. Knockdown of PLEK2 inhibited PD-L1 expression and significantly inhibited the proliferation, invasion, migration, and viability of gastric cancer cells. A decrease in PLEK2 expression promotes cell apoptosis. Although it cannot affect the proliferation of activated T cells, it can partially reverse IFNγ suppression. CONCLUSION: PLEK2 plays a promoting role in gastric cancer, and hyperthermia downregulates PLEK2/PD-L1, which further inhibits cell proliferation, invasion, and migration, promotes cell apoptosis, and possibly participates in immune regulation.


Subject(s)
Hyperthermia, Induced , Stomach Neoplasms , Humans , Stomach Neoplasms/pathology , B7-H1 Antigen/genetics , Cell Proliferation , Immunomodulation , Cell Line, Tumor , Gene Expression Regulation, Neoplastic , Membrane Proteins/genetics
4.
Article in English | MEDLINE | ID: mdl-38153831

ABSTRACT

Walking is one of the most common daily movements of the human body. Therefore, quantitative evaluation of human walking has been commonly used to assist doctors in grasping the disease degree and rehabilitation process of patients in the clinic. Compared with the kinematic characteristics, the ground reaction force (GRF) during walking can directly reflect the dynamic characteristics of human walking. It can further help doctors understand the degree of muscle recovery and joint coordination of patients. This paper proposes a GRF estimation method based on the elastic elements and Newton-Euler equation hybrid driving GRF estimation method. Compared with the existing research, the innovations are as follows. 1) The hardware system consists of only two inertial measurement units (IMUs) placed on shanks. The acquisition of the overall motion characteristics of human walking is realized through the simplified four-link walking model and the thigh prediction method. 2) The method was validated not only on 10 healthy subjects but also on 11 Parkinson's patients and 10 stroke patients with normalized mean absolute errors (NMAEs) of 5.95%±1.32%, 6.09%±2.00%, 5.87%±1.59%. 3) This paper proposes a dynamic balance assessment method based on the acquired motion data and the estimated GRF. It evaluates the overall balance ability and fall risk at four key time points for all subjects recruited. Because of the low-cost system, ease of use, low motion interference and environmental constraints, and high estimation accuracy, the proposed GRF estimation method and walking balance automatic assessment have broad clinical value.


Subject(s)
Foot , Gait , Humans , Gait/physiology , Foot/physiology , Walking/physiology , Mechanical Phenomena , Leg , Biomechanical Phenomena
5.
BMC Med Res Methodol ; 23(1): 277, 2023 11 24.
Article in English | MEDLINE | ID: mdl-38001462

ABSTRACT

The interrupted time series (ITS) design is widely used to examine the effects of large-scale public health interventions and has the highest level of evidence validity. However, there is a notable gap regarding methods that account for lag effects of interventions.To address this, we introduced activation functions (ReLU and Sigmoid) to into the classic segmented regression (CSR) of the ITS design during the lag period. This led to the proposal of proposed an optimized segmented regression (OSR), namely, OSR-ReLU and OSR-Sig. To compare the performance of the models, we simulated data under multiple scenarios, including positive or negative impacts of interventions, linear or nonlinear lag patterns, different lag lengths, and different fluctuation degrees of the outcome time series. Based on the simulated data, we examined the bias, mean relative error (MRE), mean square error (MSE), mean width of the 95% confidence interval (CI), and coverage rate of the 95% CI for the long-term impact estimates of interventions among different models.OSR-ReLU and OSR-Sig yielded approximately unbiased estimates of the long-term impacts across all scenarios, whereas CSR did not. In terms of accuracy, OSR-ReLU and OSR-Sig outperformed CSR, exhibiting lower values in MRE and MSE. With increasing lag length, the optimized models provided robust estimates of long-term impacts. Regarding precision, OSR-ReLU and OSR-Sig surpassed CSR, demonstrating narrower mean widths of 95% CI and higher coverage rates.Our optimized models are powerful tools, as they can model the lag effects of interventions and provide more accurate and precise estimates of the long-term impact of interventions. The introduction of an activation function provides new ideas for improving of the CSR model.


Subject(s)
Aortic Aneurysm, Abdominal , Humans , Time Factors , Interrupted Time Series Analysis , Treatment Outcome
6.
Glob Health Res Policy ; 8(1): 29, 2023 07 24.
Article in English | MEDLINE | ID: mdl-37482607

ABSTRACT

BACKGROUND: The interrupted time series (ITS) design is a widely used approach to examine the effects of interventions. However, the classic segmented regression (CSR) method, the most popular statistical technique for analyzing ITS data, may not be adequate when there is a transitional period between the pre- and post-intervention phases. METHODS: To address this issue and better capture the distribution patterns of intervention effects during the transition period, we propose using different cumulative distribution functions in the CSR model and developing corresponding optimized segmented regression (OSR) models. This study illustrates the application of OSR models to estimate the long-term impact of a national free delivery service policy intervention in Ethiopia. RESULTS: Regardless of the choice of transition length ([Formula: see text]) and distribution patterns of intervention effects, the OSR models outperformed the CSR model in terms of mean square error (MSE), indicating the existence of a transition period and the validity of our model's assumptions. However, the estimates of long-term impacts using OSR models are sensitive to the selection of L, highlighting the importance of reasonable parameter specification. We propose a data-driven approach to select the transition period length to address this issue. CONCLUSIONS: Overall, our OSR models provide a powerful tool for modeling intervention effects during the transition period, with a superior model fit and more accurate estimates of long-term impacts. Our study highlights the importance of appropriate statistical methods for analyzing ITS data and provides a useful framework for future research.


Subject(s)
Research Design , Regression Analysis , Interrupted Time Series Analysis , Ethiopia
7.
Nat Biotechnol ; 41(9): 1208-1220, 2023 09.
Article in English | MEDLINE | ID: mdl-37365259

ABSTRACT

Human societies depend on marine ecosystems, but their degradation continues. Toward mitigating this decline, new and more effective ways to precisely measure the status and condition of marine environments are needed alongside existing rebuilding strategies. Here, we provide an overview of how sensors and wearable technology developed for humans could be adapted to improve marine monitoring. We describe barriers that have slowed the transition of this technology from land to sea, update on the developments in sensors to advance ocean observation and advocate for more widespread use of wearables on marine organisms in the wild and in aquaculture. We propose that large-scale use of wearables could facilitate the concept of an 'internet of marine life' that might contribute to a more robust and effective observation system for the oceans and commercial aquaculture operations. These observations may aid in rationalizing strategies toward conservation and restoration of marine communities and habitats.


Subject(s)
Ecosystem , Wearable Electronic Devices , Humans , Aquatic Organisms , Oceans and Seas , Technology
8.
IEEE Trans Med Imaging ; 42(11): 3179-3193, 2023 11.
Article in English | MEDLINE | ID: mdl-37027573

ABSTRACT

Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained and immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.


Subject(s)
Decision Making , Neoplasms , Humans , Forests , Neural Networks, Computer , Neoplasms/diagnostic imaging , Tumor Microenvironment
9.
PLOS Glob Public Health ; 3(1): e0001044, 2023.
Article in English | MEDLINE | ID: mdl-36962843

ABSTRACT

Global health services are disrupted by the COVID-19 pandemic. We evaluated extent and duration of impacts of the pandemic on health services utilization in different economically developed regions of mainland China. Based on monthly health services utilization data in China, we used Seasonal Autoregressive Integrated Moving Average (S-ARIMA) models to predict outpatient and emergency department visits to hospitals (OEH visits) per capita without pandemic. The impacts were evaluated by three dimensions:1) absolute instant impacts were evaluated by difference between predicted and actual OEH visits per capita in February 2020 and relative instant impacts were the ratio of absolute impacts to baseline OEH visits per capita; 2) absolute and relative accumulative impacts from February 2020 to March 2021; 3) duration of impacts was estimated by time that actual OEH visits per capita returned to its predicted value. From February 2020 to March 2021, the COVID-19 pandemic reduced OEH visits by 0.4676 per capita, equivalent to 659,453,647 visits, corresponding to a decrease of 15.52% relative to the pre-pandemic average annual level in mainland China. The instant impacts in central, northeast, east and west China were 0.1279, 0.1265, 0.1215, and 0.0986 visits per capita, respectively; and corresponding relative impacts were 77.63%, 66.16%, 44.39%, and 50.57%, respectively. The accumulative impacts in northeast, east, west and central China were up to 0.5898, 0.4459, 0.3523, and 0.3324 visits per capita, respectively; and corresponding relative impacts were 23.72%, 12.53%, 13.91%, and 16.48%, respectively. The OEH visits per capita has returned back to predicted values within the first 2, 6, 9, 9 months for east, central, west and northeast China, respectively. Less economically developed areas were affected for a longer time. Safe and equitable access to health services, needs paying great attention especially for undeveloped areas.

10.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4345-4358, 2023 Aug.
Article in English | MEDLINE | ID: mdl-34665744

ABSTRACT

Due to the sparsity of available features in web-scale predictive analytics, combinatorial features become a crucial means for deriving accurate predictions. As a well-established approach, a factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering. With the prominent development of deep neural networks (DNNs), there is a recent and ongoing trend of enhancing the expressiveness of FM-based models with DNNs. However, though better results are obtained with DNN-based FM variants, such performance gain is paid off by an enormous amount (usually millions) of excessive model parameters on top of the plain FM. Consequently, the heavy parameterization impedes the real-life practicality of those deep models, especially efficient deployment on resource-constrained Internet of Things (IoT) and edge devices. In this article, we move beyond the traditional real space where most deep FM-based models are defined and seek solutions from quaternion representations within the hypercomplex space. Specifically, we propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM), which are two novel lightweight and memory-efficient quaternion-valued models for sparse predictive analytics. By introducing a brand new take on FM-based models with the notion of quaternion algebra, our models not only enable expressive inter-component feature interactions but also significantly reduce the parameter size due to lower degrees of freedom in the hypercomplex Hamilton product compared with real-valued matrix multiplication. Extensive experimental results on three large-scale datasets demonstrate that QFM achieves 4.36% performance improvement over the plain FM without introducing any extra parameters, while QNFM outperforms all baselines with up to two magnitudes' parameter size reduction in comparison to state-of-the-art peer methods.

12.
Adv Sci (Weinh) ; 9(36): e2202922, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36372546

ABSTRACT

Topological phases of matter are conventionally characterized by the bulk-boundary correspondence in Hermitian systems. The topological invariant of the bulk in d dimensions corresponds to the number of (d - 1)-dimensional boundary states. By extension, higher-order topological insulators reveal a bulk-edge-corner correspondence, such that nth order topological phases feature (d - n)-dimensional boundary states. The advent of non-Hermitian topological systems sheds new light on the emergence of the non-Hermitian skin effect (NHSE) with an extensive number of boundary modes under open boundary conditions. Still, the higher-order NHSE remains largely unexplored, particularly in the experiment. An unsupervised approach-physics-graph-informed machine learning (PGIML)-to enhance the data mining ability of machine learning with limited domain knowledge is introduced. Through PGIML, the second-order NHSE in a 2D non-Hermitian topoelectrical circuit is experimentally demonstrated. The admittance spectra of the circuit exhibit an extensive number of corner skin modes and extreme sensitivity of the spectral flow to the boundary conditions. The violation of the conventional bulk-boundary correspondence in the second-order NHSE implies that modification of the topological band theory is inevitable in higher dimensional non-Hermitian systems.

13.
Article in English | MEDLINE | ID: mdl-36293836

ABSTRACT

To date, there is a lack of comprehensive understanding regarding the effect of coronavirus disease 2019 (COVID-19) on the healthcare-seeking behavior and utilization of health services in rural areas where healthcare resources are scarce. We aimed to quantify the long-term impact of COVID-19 on hospital visits of rural residents in China. We collected data on the hospitalization of all residents covered by national health insurance schemes in a county in southern China from April 2017 to March 2021. We analyzed changes in residents' hospitalization visits in different areas, i.e., within-county, out-of-county but within-city, and out-of-city, via a controlled interrupted time series approach. Subgroup analyses based on gender, age, hospital levels, and ICD-10 classifications for hospital visits were examined. After experiencing a significant decline in hospitalization cases after the COVID-19 outbreak in early 2020, the pattern of rural residents' hospitalization utilization differed markedly by disease classification. Notably, we found that the overall demand for hospitalization utilization of mental and neurological illness among rural residents in China has been suppressed during the pandemic, while the utilization of inpatient services for other common chronic diseases was redistributed across regions. Our findings suggest that in resource-poor areas, focused strategies are urgently needed to ensure that people have access to adequate healthcare services, particularly mental and neurological healthcare, during the COVID-19 pandemic.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Interrupted Time Series Analysis , Pandemics , Rural Population , China/epidemiology , Hospitals
14.
Mol Ther Nucleic Acids ; 29: 217, 2022 Sep 13.
Article in English | MEDLINE | ID: mdl-35892092

ABSTRACT

[This retracts the article DOI: 10.1016/j.omtn.2019.12.029.].

15.
Article in English | MEDLINE | ID: mdl-35862332

ABSTRACT

Personalized federated learning (PFL) learns a personalized model for each client in a decentralized manner, where each client owns private data that are not shared and data among clients are non-independent and identically distributed (i.i.d.) However, existing PFL solutions assume that clients have sufficient training samples to jointly induce personalized models. Thus, existing PFL solutions cannot perform well in a few-shot scenario, where most or all clients only have a handful of samples for training. Furthermore, existing few-shot learning (FSL) approaches typically need centralized training data; as such, these FSL methods are not applicable in decentralized scenarios. How to enable PFL with limited training samples per client is a practical but understudied problem. In this article, we propose a solution called personalized federated few-shot learning (pFedFSL) to tackle this problem. Specifically, pFedFSL learns a personalized and discriminative feature space for each client by identifying which models perform well on which clients, without exposing local data of clients to the server and other clients, and which clients should be selected for collaboration with the target client. In the learned feature spaces, each sample is made closer to samples of the same category and farther away from samples of different categories. Experimental results on four benchmark datasets demonstrate that pFedFSL outperforms competitive baselines across different settings.

17.
IEEE Trans Neural Netw Learn Syst ; 33(1): 304-314, 2022 Jan.
Article in English | MEDLINE | ID: mdl-33052870

ABSTRACT

Hashing has been widely adopted for large-scale data retrieval in many domains due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across modalities is readily available. This assumption is unrealistic in practical applications. In addition, existing methods generally require the same number of samples across different modalities, which restricts their flexibility. We propose a flexible cross-modal hashing approach (FlexCMH) to learn effective hashing codes from weakly paired data, whose correspondence across modalities is partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the structure of each cluster and, thus, to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence, it jointly optimizes the potential correspondence, the cross-modal hashing functions derived from the correspondence, and a hashing quantitative loss in a unified objective function. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions and reinforce the reciprocal effects of the two objectives. Experiments on public multimodal data sets show that FlexCMH achieves significantly better results than state-of-the-art methods, and it, indeed, offers a high degree of flexibility for practical cross-modal hashing tasks.

18.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2166-2176, 2022.
Article in English | MEDLINE | ID: mdl-33571094

ABSTRACT

Alternative splicing produces different isoforms from the same gene locus, it is an important mechanism for regulating gene expression and proteome diversity. Although the prediction of gene(ncRNA)-disease associations has been extensively studied, few (or no) computational solutions have been proposed for the prediction of isoform-disease association (IDA) at a large scale, mainly due to the lack of disease annotations of isoforms. However, increasing evidences confirm the associations between diseases and isoforms, which can more precisely uncover the pathology of complex diseases. Therefore, it is highly desirable to predict IDAs. To bridge this gap, we propose a deep neural network based solution (DeepIDA) to fuse multi-type genomics and transcriptomics data to predict IDAs. Particularly, DeepIDA uses gene-isoform relations to dispatch gene-disease associations to isoforms. In addition, it utilizes two DNN sub-networks with different structures to capture nucleotide and expression features of isoforms, Gene Ontology data and miRNA target data, respectively. After that, these two sub-networks are merged in a dense layer to predict IDAs. The experimental results on public datasets show that DeepIDA can effectively predict IDAs with AUPRC (area under the precision-recall curve) of 0.9141, macro F-measure of 0.9155, G-mean of 0.9278 and balanced accuracy of 0.9303 across 732 diseases, which are much higher than those of competitive methods. Further study on sixteen isoform-disease association cases again corroborates the superiority of DeepIDA. The code of DeepIDA is available at http://mlda.swu.edu.cn/codes.php?name=DeepIDA.


Subject(s)
Alternative Splicing , Neural Networks, Computer , Alternative Splicing/genetics , Computational Biology/methods , Gene Ontology , Protein Isoforms/genetics , Proteome/metabolism
19.
IEEE Trans Neural Netw Learn Syst ; 33(9): 4311-4321, 2022 Sep.
Article in English | MEDLINE | ID: mdl-33577462

ABSTRACT

Multiview multi-instance multilabel learning (M3L) is a framework for modeling complex objects. In this framework, each object (or bag) contains one or more instances, is represented with different feature views, and simultaneously annotated with a set of nonexclusive semantic labels. Given the multiplicity of the studied objects, traditional M3L methods generally demand a large number of labeled bags to train a predictive model to annotate bags (or instances) with semantic labels. However, annotating sufficient bags is very expensive and often impractical. In this article, we present an active learning-based M3L approach (M3AL) to reduce the labeling costs of bags and to improve the performance as much as possible. M3AL first adapts the multiview self-representation learning to evacuate the shared and individual information of bags and to learn the shared/individual similarities between bags across/within views. Next, to avoid scrutinizing all the possible labels, M3AL introduces a new query strategy that leverages the shared and individual information, and the diverse instance distribution of bags across views, to select the most informative bag-label pair for the query. Experimental studies on benchmark data sets show that M3AL can significantly reduce the query costs while achieving a better performance than other related competitive methods at the same cost.

20.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 770-782, 2022 Feb.
Article in English | MEDLINE | ID: mdl-33621166

ABSTRACT

Graph node embedding aims at learning a vector representation for all nodes given a graph. It is a central problem in many machine learning tasks (e.g., node classification, recommendation, community detection). The key problem in graph node embedding lies in how to define the dependence to neighbors. Existing approaches specify (either explicitly or implicitly) certain dependencies on neighbors, which may lead to loss of subtle but important structural information within the graph and other dependencies among neighbors. This intrigues us to ask the question: can we design a model to give the adaptive flexibility of dependencies to each node's neighborhood. In this paper, we propose a novel graph node embedding method (named PINE) via a novel notion of partial permutation invariant set function, to capture any possible dependence. Our method 1) can learn an arbitrary form of the representation function from the neighborhood, without losing any potential dependence structures, and 2) is applicable to both homogeneous and heterogeneous graph embedding, the latter of which is challenged by the diversity of node types. Furthermore, we provide theoretical guarantee for the representation capability of our method for general homogeneous and heterogeneous graphs. Empirical evaluation results on benchmark data sets show that our proposed PINE method outperforms the state-of-the-art approaches on producing node vectors for various learning tasks of both homogeneous and heterogeneous graphs.

SELECTION OF CITATIONS
SEARCH DETAIL
...