Language: 简体中文 English


Short Courses

第二届JCSDS 2024系列短课暨云南省统计建模与数据分析重点实验室2024年暑期学校通知

为了更好地为从事数据科学与统计学相关领域研究的专业人士提供更为广阔的学术交流平台,2024年7月6日-11日(7月5日报到)在云南大学东陆校区举办“第二届 JCSDS 2024短期课程暨云南省统计建模与数据分析重点实验室2024年暑期学校”。热忱欢迎国内外从事数据科学与统计学研究的高年级本科生、研究生、青年教师报名参加。


从事数据科学与统计学相关领域研究的高年级本科生、研究生及青年教师。本期暑期学校计划招收学员 100 名左右。


课程1Statistical Hypothesis Testing: From Fundamental Principles to Frontier Research

授课嘉宾:Zheyang Wu, Professor, Mathematical Sciences, Worcester Polytechnic Institute(WPI)

课程2Penalized Least Squares: Theory and Algorithms

授课嘉宾:Weichen Wang, Assistant Professor, Business School, Hong Kong University

课程3Statistical and Algorithmic Foundations of Reinforcement Learning

授课嘉宾:Yuting Wei, Assistant Professor, Statistics and Data Science Department at the Wharton School, University of Pennsylvania

课程4:Statistical foundations of Deep Neural Network Models 

授课嘉宾:Lizhen Lin, Professor, Department of Mathematics  , University of Maryland

课程5An Introduction to Selected Classification Algorithm

授课嘉宾Yang Feng, Professor, New York University

课程6An Introduction to Transfer Learning

授课嘉宾:Yang Feng, Professor, New York University

课程7:Statistical perspectives on Clustering and PCA 

授课嘉宾:Wen Zhou, Associate Professor, Colorado State University

课程8:Statistical Models for Text Analysis 

授课嘉宾:Tracy Ke,Associate Professor , Harvard University

课程9:Analysis of large social networks

授课嘉宾:Jiashun Jin,Professor, Carnegie Mellon University


课程1:Statistical Hypothesis Testing: From Fundamental Principles to Frontier Research

课程简介:This short course is designed to introduce foundational concepts and techniques, as well as recent cutting-edge research, in statistical hypothesis testing with a focus on high-dimensional data analysis. It covers topics in multiple-hypothesis testing and global testing related to vectors, matrices, and networks. Relevant computational tools and application scenarios, including meta-analysis, data integration, and signal detection, will be presented to facilitate a better understanding of statistical hypothesis testing.


Prof Zheyang Wu is a Professor of Mathematical Sciences at Worcester Polytechnic Institute (WPI). His expertise lies in biostatistics, particularly in statistical learning in genetics and genomics, such as employing statistical testing-based signal detection methods for identifying novel disease genes using whole genome sequencing data. He obtained his PhD in Biostatistics from Yale University before joining WPI in 2009.


课程2:Penalized Least Squares: Theory and Algorithms

课程简介:In this short course, we will introduce the high-dimensional linear regression model and the problem of variable selection. To accurately estimate the coefficients with sparsity, we use a powerful method called penalized least squares (PLS). We are going to study its theoretical framework, leading to discussions on pros and cons of different choices of penalties. We will further discuss the computational algorithms for PLS, especially when the objective function is non-convex


Prof. Weichen Wang joined The University of Hong Kong in 2021 as an Assistant Professor. He obtained his PhD in Operations Research and Financial Engineering from Princeton University in 2016. After graduation, he joined Two Sigma Investments as a quantitative researcher. Before his PhD, he received his bachelor’s degree in Mathematics and Physics from Tsinghua University in 2011. Prof. Wang’s research areas include big data analysis, econometrics, robust statistics and machine learning, and he is particularly interested in the factor structure of the financial market and real-world applications of machine learning. His works have been published in top journals including Annals of Statistics, Journal of Machine Learning Research, Journal of Econometrics etc.


课程3:Statistical and Algorithmic Foundations of Reinforcement Learning

课程简介:As a paradigm for sequential decision-making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficiencies of RL algorithms is thus of great interest and imminent need. In this short course, we aim to present a coherent framework that covers important algorithmic and theoretical developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we start by introducing classical dynamic programming algorithms when precise descriptions of the environments are available. Equipped with this preliminary background, we introduce four distinctive RL scenarios (i.e., RL with a generative model, offline RL, online RL, and multi-agent RL), and present three mainstream RL paradigms (i.e., model-based approach, model-free approach, and policy optimization). Our discussions gravitate around the issues of sample complexity and computational efficiency, as well as algorithm-dependent and information-theoretic lower bounds in the non-asymptotic regime. We will systematically introduce several effective algorithmic ideas (e.g., stochastic approximation, variance reduction, optimism in the face of uncertainty for online RL, pessimism in the face of uncertainty for offline RL) that permeate the design of efficient RL. 


Yuting Wei is currently an assistant professor in the Statistics and Data Science Department at the Wharton School, University of Pennsylvania. Prior to that, Yuting spent two years at Carnegie Mellon University as an assistant professor and one year at Stanford University as a Stein Fellow. She received her Ph.D. in statistics at the University of California, Berkeley. She was the recipient of the 2023 Google Research Scholar Award, NSF Career Award, and the Erich L. Lehmann Citation from the Berkeley statistics department. Her research interests include high-dimensional and non-parametric statistics, statistical machine learning, and reinforcement learning.


课程4:Statistical foundations of Deep Neural Network Models

课程简介:As deep learning has achieved breakthrough performance in a variety of application domains  such as image recognition , speech recognition  natural language processing, and healthcare,  a significant effort has also been made to understand  theoretical foundations of such models.   This short course  will  in particular focus on understanding statistical foundations of deep neural network models.  From a statistical viewpoint, a deep learning model can largely be considered as a nonparametric function or distribution estimation where the underlying function or distribution can be parametrized by a deep neural network (DNN). In the supervised setting such as regression and classification analysis, the underlying regression function or classification map can be modeled using a deep neural network such a feedforward DNN.  For distribution estimation based on a DNN, a popular model is the so-called deep generative model. In understanding the theoretical foundations of DNN models, statisticians have devoted for example to understanding why deep neural network models outperform classical nonparametric models or estimates. Characterizing the statistical foundations of DNNs would allow us to explain why deep neural networks can perform well in practice from the lens of statistical theory.  

More specifically, the tutorial will focus on the following sub themes: 

(1) Approximation theory of DNNs 

(2) Statistical properties  of DNNS (e.g, a feedforward deep neural network model) for regression and classification learning 

(3) Statistical properties of deep generative models 

(4) Bayes and Variational Bayes learning of DNNs. 


Lizhen Lin is a professor of statistics in the Department of Mathematics at the University of Maryland, where she currently also serves as the director of the statistics program. Her areas of expertise are in Bayesian modeling and theory for high-dimensional and infinite-dimensional models, statistics on manifolds, statistical network analysis and statistical properties of deep generative models. 


课程 5An Introduction to Selected Classification Algorithm

       课程简介:Dive into the world of data classification with our in-depth course, designed to provide a thorough understanding of key methods in this field. Explore a wide range of techniques such as Linear and Quadratic Discriminant Analysis, K-Nearest Neighbor, Decision Trees,Support Vector Machines, Random Forest, Boosting, and Neural Networks.This course is crafted to equip you with the knowledge and skills necessary to effectively apply these methods in various data analysis scenarios.

          课程 6An Introduction to Transfer Learning

     课程简介This course offers a comprehensive introduction to the statistical foundations underpinning a prevalent machine learning technique: transfer learning. We delve into how transfer learning effectively transfers knowledge from one task to another in an adaptive and robust fashion, thereby enhancing model performance across both supervised and unsupervised learning frameworks. Various transfer learning frameworks will be discussed, including covariate shift and posterior drift, under different assumptions such as model sparsity and low-rank structure. Participants will gain the essential knowledge required to skillfully implement these techniques in diverse supervised and unsupervised learning scenarios.


Yang Feng is a Professor of Biostatistics at New York University.He obtained his Ph.D. in Operations Research at Princeton University in 2010. Feng’s research interests encompass the theoretical and methodological aspects of machine learning, high-dimensional statistics, network models, and nonparametric statistics, leading to a wealth of practical applications. He has published more than 70 papers in statistical and machine-learning journals. His research has been funded by multiple grants from the National Institutes of Health (NIH) and the National Science Foundation (NSF), notably the NSF CAREER Award. He is currently an Associate Editor for the Journal of the American Statistical Association (JASA), the Journal of Business & Economic Statistics (JBES), and the Annals of Applied Statistics(AoAS). His professional recognition includes being named a fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS), as well as an elected member of the International Statistical Institute (ISI). 


课程7:Statistical perspectives on Clustering and PCA 

     课程简介: This explosion encompasses data that is ultra high-dimensional, complexly structured or unstructured, dynamic, and even derived from heterogeneous sources. These data are being produced, collected, stored, and made increasingly accessible to a wide range of stakeholders, including industrial institutions, academic researchers, investors, and individuals. However, learning from this vast trove of information and making accurate predictions poses significant challenges for both algorithm-driven machine learning methods and traditional statistical approaches. Key among these challenges are learning from data heterogeneity and implementing effective dimension reduction techniques without compromising the integrity of the data. This short course aims to lay the groundwork for understanding cluster analysis—an essential unsupervised statistical learning technique for uncovering data heterogeneity. It also covers Principal Component Analysis (PCA) and its variants, which are among the most prevalent tools for dimension reduction.


       Dr. Zhou is an Associate Professor in the Department of Statistics at Colorado State University and the Department of Biostatistics and Informatics at the Colorado School of Public Health. Before joining CSU, he received his Ph.D. in Statistics at Iowa State University, where he also received his Ph.D. in Applied Mathematics. His research is focused on developing theory and methods for high dimensional inference, network modeling, machine learning, statistical genomics and genetics, and causal inference. He is currently serving as the Co-Editor in Chief for Journal of Biopharmaceutical Statistics, as well as an associate editor for Biometrics, Statistica Sinica, and Journal of Multivariate Analysis, in addition to the editorial advisory board for New Phytologist. Starting in 2024, he has been elected  as the WNAR program coordinator.


课程8:Statistical Models for Text Analysis

课程简介:This short course aims to introduce some recent developments of statistical methods for text analysis. After an overview of the history, mainstream methods, and available data sets, I will delve into the problem of Topic Modeling. This part covers the Hoffman’s topic model, the anchor word assumption, a fast spectral method, and the minimax results for topic modeling. Next, I will discuss some inference problems related to text analysis, including but not limited global detection of topics, authorship attribution, and detection of variability of online text reviews. The third part of this course will cover applications of these methods on the MADStat data sets (text abstracts of 83K statistical papers) and Dow Jones Newswire (2.5M financial news articles). Finally, I will review the recent developments of large language models (LLMs), especially the transformer architecture. I will compare statistical models with these LLMs and discuss how to combine the advantages of both.


Tracy Ke is currently Associate Professor of Statistics at Harvard University. She received her PhD from Princeton University in 2014, advised by Professor Jianqing Fan. Prior to joining Harvard, she was Assistant Professor of Statistics at The University of Chicago from 2014 to 2018. Her research interests include network data analysis, high-dimensional statistics, text mining, and machine learning. Recently, she is especially interested in developing optimal spectral methods for network and text data. She was the recipient of NSF CAREER award, ASA Gottfried E. Noether Young Scholar Award, IMS Peter Gavin Hall Early Career Prize, and is currently a Sloan Research Fellow.


课程9:Analysis of large social networks

课程简介:In this part of the short course, I will discuss several research topics on  large-scale social networks,  including network data resources, network modeling, recent approaches in network community detection, nonnegative matrix factorization, degree matching techniques and also recent approaches in network testing. The course will cover a variety of methods and theory (eg., SCORE, Mixed-SCORE, a Self-normalizing Cycle Count (SCC) statistics) and especially with applications to the MADStat dataset. MADStat is a recent data set on the publications of statisticians. It contains the bibtex and citation information of 83,331 papers published in 36 journals in statistics, probability, and machine learning, between 1975 and 2015. 


Jiashun Jin is Professor in Statistics and Affiliated Professor in Machine Learning at Carnegie Mellon University. His expertise is in statistical inference for Rare and Weak signals in Big Data. His earlier work was on large-scale multiple testing, focusing on the development of (Tukey's) Higher Criticism and practical False Discovery Rate (FDR) controlling methods. His  more recent interest is on the analysis of social networks and  text mining.  Jin is an elected IMS fellow and an elected ASA fellow.  He has also delivered the highly selective IMS Medallion Lecture in 2015 and IMS AoAS (Annals of Applied Statistics) Lecture in 2016. Jin has co-authored  three Editor's Invited  Review papers and three Editor's Invited Discussion papers.  He has served as Associate Editor for several statistical journals including Annals of Statistics and JASA, and has also gained valuable experience in financial industry by doing research for two years at Two-Sigma Investment from 2016 to 2017.

















2024年 1 25