短期课程

Short Courses

**第二届****JCSDS 2024****系列短课暨云南省统计建模与数据分析重点实验室****2024****年暑期学校****通知**

为了更好地为从事数据科学与统计学相关领域研究的专业人士提供更为广阔的学术交流平台，2024年7月6日-11日（7月5日报到）在云南大学东陆校区举办“第二届 JCSDS 2024短期课程暨云南省统计建模与数据分析重点实验室2024年暑期学校”。热忱欢迎国内外从事数据科学与统计学研究的高年级本科生、研究生、青年教师报名参加。

**一、参加培训的对象及规模**

从事数据科学与统计学相关领域研究的高年级本科生、研究生及青年教师。本期暑期学校计划招收学员 100 名左右。

**二、课程内容**

**课程****1****：**Statistical Hypothesis Testing: From Fundamental Principles to Frontier Research

授课嘉宾：Zheyang Wu, Professor, Mathematical Sciences, Worcester Polytechnic Institute(WPI)

**课程****2****：****Penalized Least Squares: Theory and Algorithms**

授课嘉宾：Weichen Wang, Assistant Professor, Business School, Hong Kong University

**课程****3****：****Statistical and Algorithmic Foundations of Reinforcement Learning**

授课嘉宾：Yuting Wei, Assistant Professor, Statistics and Data Science Department at the Wharton School, University of Pennsylvania

**课程4：Statistical foundations of Deep Neural Network Models **

授课嘉宾：Lizhen Lin, Professor, Department of Mathematics , University of Maryland

**课程5： An Introduction to Selected Classification Algorithm**

授课嘉宾：Yang Feng, Professor, New York University

**课程6：**An Introduction to Transfer Learning

授课嘉宾：Yang Feng, Professor, New York University

**课程7：****Statistical perspectives on ****Clustering and PCA **

授课嘉宾：Wen Zhou, Associate Professor, Colorado State University

**课程8：****Statistical Models for Text Analysis**** **

授课嘉宾：Tracy Ke,Associate Professor , Harvard University

**课程9：****Analysis of large social networks**

授课嘉宾：Jiashun Jin,Professor, Carnegie Mellon University

**三、课程摘要及授课嘉宾简介**

**课程1：**Statistical Hypothesis Testing: From Fundamental Principles to Frontier Research

**课程简介：**This short course is designed to introduce foundational concepts and techniques, as well as recent cutting-edge research, in statistical hypothesis testing with a focus on high-dimensional data analysis. It covers topics in multiple-hypothesis testing and global testing related to vectors, matrices, and networks. Relevant computational tools and application scenarios, including meta-analysis, data integration, and signal detection, will be presented to facilitate a better understanding of statistical hypothesis testing.

**授课嘉宾简介：**

Prof Zheyang Wu is a Professor of Mathematical Sciences at Worcester Polytechnic Institute (WPI). His expertise lies in biostatistics, particularly in statistical learning in genetics and genomics, such as employing statistical testing-based signal detection methods for identifying novel disease genes using whole genome sequencing data. He obtained his PhD in Biostatistics from Yale University before joining WPI in 2009.

课程2：Penalized Least Squares: Theory and Algorithms

**课程简介：**In this short course, we will introduce the high-dimensional linear regression model and the problem of variable selection. To accurately estimate the coefficients with sparsity, we use a powerful method called penalized least squares (PLS). We are going to study its theoretical framework, leading to discussions on pros and cons of different choices of penalties. We will further discuss the computational algorithms for PLS, especially when the objective function is non-convex.

**授课嘉宾简介：**

Prof. Weichen Wang joined The University of Hong Kong in 2021 as an Assistant Professor. He obtained his PhD in Operations Research and Financial Engineering from Princeton University in 2016. After graduation, he joined Two Sigma Investments as a quantitative researcher. Before his PhD, he received his bachelor’s degree in Mathematics and Physics from Tsinghua University in 2011. Prof. Wang’s research areas include big data analysis, econometrics, robust statistics and machine learning, and he is particularly interested in the factor structure of the financial market and real-world applications of machine learning. His works have been published in top journals including Annals of Statistics, Journal of Machine Learning Research, Journal of Econometrics etc.

**课程3：**Statistical and Algorithmic Foundations of Reinforcement Learning

**课程简介：**As a paradigm for sequential decision-making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficiencies of RL algorithms is thus of great interest and imminent need. In this short course, we aim to present a coherent framework that covers important algorithmic and theoretical developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we start by introducing classical dynamic programming algorithms when precise descriptions of the environments are available. Equipped with this preliminary background, we introduce four distinctive RL scenarios (i.e., RL with a generative model, offline RL, online RL, and multi-agent RL), and present three mainstream RL paradigms (i.e., model-based approach, model-free approach, and policy optimization). Our discussions gravitate around the issues of sample complexity and computational efficiency, as well as algorithm-dependent and information-theoretic lower bounds in the non-asymptotic regime. We will systematically introduce several effective algorithmic ideas (e.g., stochastic approximation, variance reduction, optimism in the face of uncertainty for online RL, pessimism in the face of uncertainty for offline RL) that permeate the design of efficient RL.

**授课嘉宾简介：**

Yuting Wei is currently an assistant professor in the Statistics and Data Science Department at the Wharton School, University of Pennsylvania. Prior to that, Yuting spent two years at Carnegie Mellon University as an assistant professor and one year at Stanford University as a Stein Fellow. She received her Ph.D. in statistics at the University of California, Berkeley. She was the recipient of the 2023 Google Research Scholar Award, NSF Career Award, and the Erich L. Lehmann Citation from the Berkeley statistics department. Her research interests include high-dimensional and non-parametric statistics, statistical machine learning, and reinforcement learning.

课程4：Statistical foundations of Deep Neural Network Models

**课程简介：**As deep learning has achieved breakthrough performance in a variety of application domains such as image recognition , speech recognition natural language processing, and healthcare, a significant effort has also been made to understand theoretical foundations of such models. This short course will in particular focus on understanding statistical foundations of deep neural network models. From a statistical viewpoint, a deep learning model can largely be considered as a nonparametric function or distribution estimation where the underlying function or distribution can be parametrized by a deep neural network (DNN). In the supervised setting such as regression and classification analysis, the underlying regression function or classification map can be modeled using a deep neural network such a feedforward DNN. For distribution estimation based on a DNN, a popular model is the so-called deep generative model. In understanding the theoretical foundations of DNN models, statisticians have devoted for example to understanding why deep neural network models outperform classical nonparametric models or estimates. Characterizing the statistical foundations of DNNs would allow us to explain why deep neural networks can perform well in practice from the lens of statistical theory.

More specifically, the tutorial will focus on the following sub themes:

(1) Approximation theory of DNNs

(2) Statistical properties of DNNS (e.g, a feedforward deep neural network model) for regression and classification learning

(3) Statistical properties of deep generative models

(4) Bayes and Variational Bayes learning of DNNs.

**授课嘉宾简介：**

Lizhen Lin is a professor of statistics in the Department of Mathematics at the University of Maryland, where she currently also serves as the director of the statistics program. Her areas of expertise are in Bayesian modeling and theory for high-dimensional and infinite-dimensional models, statistics on manifolds, statistical network analysis and statistical properties of deep generative models.

An Introduction to Selected Classification Algorithm课程5：

** 课程简介：**Dive into the world of data classification with our in-depth course, designed to provide a thorough understanding of key methods in this field. Explore a wide range of techniques such as Linear and Quadratic Discriminant Analysis, K-Nearest Neighbor, Decision Trees,Support Vector Machines, Random Forest, Boosting, and Neural Networks.This course is crafted to equip you with the knowledge and skills necessary to effectively apply these methods in various data analysis scenarios.

**课程 ****6****：**An Introduction to Transfer Learning

** 课程简介****：**This course offers a comprehensive introduction to the statistical foundations underpinning a prevalent machine learning technique: transfer learning. We delve into how transfer learning effectively transfers knowledge from one task to another in an adaptive and robust fashion, thereby enhancing model performance across both supervised and unsupervised learning frameworks. Various transfer learning frameworks will be discussed, including covariate shift and posterior drift, under different assumptions such as model sparsity and low-rank structure. Participants will gain the essential knowledge required to skillfully implement these techniques in diverse supervised and unsupervised learning scenarios.

** ****授课嘉宾简介：****
**

Yang Feng is a Professor of Biostatistics at New York University.He obtained his Ph.D. in Operations Research at Princeton University in 2010. Feng’s research interests encompass the theoretical and methodological aspects of machine learning, high-dimensional statistics, network models, and nonparametric statistics, leading to a wealth of practical applications. He has published more than 70 papers in statistical and machine-learning journals. His research has been funded by multiple grants from the National Institutes of Health (NIH) and the National Science Foundation (NSF), notably the NSF CAREER Award. He is currently an Associate Editor for the Journal of the American Statistical Association (JASA), the Journal of Business & Economic Statistics (JBES), and the Annals of Applied Statistics(AoAS). His professional recognition includes being named a fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS), as well as an elected member of the International Statistical Institute (ISI).

课程7：Statistical perspectives onClustering and PCA

授课嘉宾简介：

Dr. Zhou is an Associate Professor in the Department of Statistics at Colorado State University and the Department of Biostatistics and Informatics at the Colorado School of Public Health. Before joining CSU, he received his Ph.D. in Statistics at Iowa State University, where he also received his Ph.D. in Applied Mathematics. His research is focused on developing theory and methods for high dimensional inference, network modeling, machine learning, statistical genomics and genetics, and causal inference. He is currently serving as the Co-Editor in Chief for Journal of Biopharmaceutical Statistics, as well as an associate editor for Biometrics, Statistica Sinica, and Journal of Multivariate Analysis, in addition to the editorial advisory board for New Phytologist. Starting in 2024, he has been elected as the WNAR program coordinator.

**课程8：****Statistical Models for Text Analysis**

**课程简介：**This short course aims to introduce some recent developments of statistical methods for text analysis. After an overview of the history, mainstream methods, and available data sets, I will delve into the problem of Topic Modeling. This part covers the Hoffman’s topic model, the anchor word assumption, a fast spectral method, and the minimax results for topic modeling. Next, I will discuss some inference problems related to text analysis, including but not limited global detection of topics, authorship attribution, and detection of variability of online text reviews. The third part of this course will cover applications of these methods on the MADStat data sets (text abstracts of 83K statistical papers) and Dow Jones Newswire (2.5M financial news articles). Finally, I will review the recent developments of large language models (LLMs), especially the transformer architecture. I will compare statistical models with these LLMs and discuss how to combine the advantages of both.

**授课嘉宾简介：**

Tracy Ke is currently Associate Professor of Statistics at Harvard University. She received her PhD from Princeton University in 2014, advised by Professor Jianqing Fan. Prior to joining Harvard, she was Assistant Professor of Statistics at The University of Chicago from 2014 to 2018. Her research interests include network data analysis, high-dimensional statistics, text mining, and machine learning. Recently, she is especially interested in developing optimal spectral methods for network and text data. She was the recipient of NSF CAREER award, ASA Gottfried E. Noether Young Scholar Award, IMS Peter Gavin Hall Early Career Prize, and is currently a Sloan Research Fellow.

**课程9：****Analysis of large social networks**

**课程简介：**In this part of the short course, I will discuss several research topics on large-scale social networks, including network data resources, network modeling, recent approaches in network community detection, nonnegative matrix factorization, degree matching techniques and also recent approaches in network testing. The course will cover a variety of methods and theory (eg., SCORE, Mixed-SCORE, a Self-normalizing Cycle Count (SCC) statistics) and especially with applications to the MADStat dataset. MADStat is a recent data set on the publications of statisticians. It contains the bibtex and citation information of 83,331 papers published in 36 journals in statistics, probability, and machine learning, between 1975 and 2015.

**授课嘉宾简介：**

Jiashun Jin is Professor in Statistics and Affiliated Professor in Machine Learning at Carnegie Mellon University. His expertise is in statistical inference for Rare and Weak signals in Big Data. His earlier work was on large-scale multiple testing, focusing on the development of (Tukey's) Higher Criticism and practical False Discovery Rate (FDR) controlling methods. His more recent interest is on the analysis of social networks and text mining. Jin is an elected IMS fellow and an elected ASA fellow. He has also delivered the highly selective IMS Medallion Lecture in 2015 and IMS AoAS (Annals of Applied Statistics) Lecture in 2016. Jin has co-authored three Editor's Invited Review papers and three Editor's Invited Discussion papers. He has served as Associate Editor for several statistical journals including Annals of Statistics and JASA, and has also gained valuable experience in financial industry by doing research for two years at Two-Sigma Investment from 2016 to 2017.

**四、时间安排**

报到时间：2024年7月5日

培训时间：2024年7月6日-7月11日

**五、报名及录取**

请扫描下方的二维码报名，报名截止日期：2024年4月30日。

根据各校报名实际情况对学员名额进行调整，正式学员以邮件形式通知参训。

**六****、****培训费用**

学员收到录取通知后，应在2024年5月30日之前缴纳培训费用1000元，由云南省应用统计学会收取。

**七、联系方式**

联系人：周老师

电子邮箱：jcsds2024@163.com

通讯地址：云南大学呈贡校区数学与统计学院邮编：650504

END

第二届全国统计与数据科学联合会议组委会

云南省统计建模与数据分析重点实验室

云南大学数学与统计学院

2024年 1 月 25 日