Browsing by Author "Shen, Cencheng"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item The Chi-Square Test of Distance Correlation(Journal of Computational and Graphical Statistics, 2021-07-19) Shen, Cencheng; Panda, Sambit; Vogelstein, Joshua T.Distance correlation has gained much recent attention in the data science community: the sample statistic is straightforward to compute and asymptotically equals zero if and only if independence, making it an ideal choice to discover any type of dependency structure given sufficient sample size. One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data. To overcome the difficulty, in this article, we propose a chi-squared test for distance correlation. Method-wise, the chi-squared test is nonparametric, extremely fast, and applicable to bias-corrected distance correlation using any strong negative type metric or characteristic kernel. The test exhibits a similar testing power as the standard permutation test, and can be used for K-sample and partial testing. Theory-wise, we show that the underlying chi-squared distribution well approximates and dominates the limiting null distribution in upper tail, prove the chi-squared test can be valid and universally consistent for testing independence, and establish a testing power inequality with respect to the permutation test. Supplementary files for this article are available online.Item Discovering Communication Pattern Shifts in Large-Scale Labeled Networks Using Encoder Embedding and Vertex Dynamics(IEEE Transactions on Network Science and Engineering, 2023-11-29) Shen, Cencheng; Larson, Jonathan; Trinh, Ha; Qin, Xihan; Park, Youngser; Priebe, Carey E.Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends. In particular, the scalability of graph analysis is a critical hurdle impeding progress in large-scale downstream inference. To address this challenge, we introduce a temporal encoder embedding method. This approach leverages ground-truth or estimated vertex labels, enabling an efficient embedding of large-scale graph data and the processing of billions of edges within minutes. Furthermore, this embedding unveils a temporal dynamic statistic capable of detecting communication pattern shifts across all levels, ranging from individual vertices to vertex communities and the overall graph structure. We provide theoretical support to confirm its soundness under random graph models, and demonstrate its numerical advantages in capturing evolving communities and identifying outliers. Finally, we showcase the practical application of our approach by analyzing an anonymized time-series communication network from a large organization spanning 2019–2020, enabling us to assess the impact of Covid-19 on workplace communication patterns.Item One-Hot Graph Encoder Embedding(IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022-11-28) Shen, Cencheng; Wang, Qizhe; Priebe, Carey E.In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding. It has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC — making it an ideal candidate for huge graph processing. It is applicable to either adjacency matrix or graph Laplacian, and can be viewed as a transformation of the spectral embedding. Under random graph models, the graph encoder embedding is approximately normally distributed per vertex, and asymptotically converges to its mean. We showcase three applications: vertex classification, vertex clustering, and graph bootstrap. In every case, the graph encoder embedding exhibits unrivalled computational advantages.