**Our team will develop fundamental and generalizable computational advances.**

**Key objectives:**

- Develop a computational theory of local-to-global dynamics over multi-scale multi-layer (MSML) networks.
- Develop computational foundations for forecasting, control, and optimization problems in epidemiology.
- Discover fundamental limits to forecasting and inference.
- Develop new statistical and machine learning techniques for ensemble modeling, spatial detection of weak signals, and change detection.
- Explore the use of crowdsourced and active learning methods for inferring individual and community level awareness and behavioral changes.
- Characterize the joint effects of network structure and local interactions on spreading processes.
- Develop HPC-enabled rigorous solutions for scalable validation, calibration, sensitivity analysis, and uncertainty quantification for spreading processes on MSML networks.

### Science and Engineering of MSML Networks: We have worked on various problems of analysis, control, and optimization of different kinds of spreading processes on networks arising in epidemiology—this includes the SIS/SIR type processes, as well as complex contagion models.

Epidemic planning and response problems are naturally Markov Decision Processes (MDPs), since decisions by the planners have to be done over time, in response to the spread of the disease, and how individuals respond. Such MDPs are very challenging to solve, especially on network models. We initiate work on MDPs for epidemic response problems in the context of efficient contact tracing and isolation, which is an important strategy to control epidemics. It was used effectively during the Ebola epidemic in 2014 and has also been used successfully in several parts of the world during the ongoing COVID-19 pandemic. An important consideration while implementing contact tracing is the number of contact tracers available — the number of such individuals is limited for economic reasons. In ongoing work, we formalize an MDP framework for the problem of efficient contact tracing that reduces the size of the outbreak while using a limited number of contact tracers. We formulate each step of the MDP as a combinatorial problem: given a budget B and a set of infections I in a network G = (V, E), what subset of nodes (of size no more than B) adjacent to I should be targeted for isolation so that the expected number of exposed nodes in the second neighborhood of I is minimized? The isolated nodes are the ones reached by contact tracers. Nodes that are adjacent to I and not isolated are likely to be exposed and can spread the disease. We propose an integer linear programming-based algorithm and show rigorous bounds on its approximation factor relative to the optimal solution. Its analysis provides insights for a simpler and easily-implementable greedy algorithm based on node degrees. We carry out detailed computational experiments and simulations of the MDP and both algorithms on real-world networks, and show how the algorithms can help in bending the epidemic curve while limiting the number of isolated individuals.

Fairness constraints arise naturally in different kinds of epidemic planning and response problems. In problems of healthcare facility location in the face of an epidemic, we may not want to open too many facilities in a geographic region, or with certain types of equipment, or near only some groups of patients, etc. Similarly, one aims to offer roughly-equal vaccination opportunities to people with similar demographics in a community (e.g., during a pandemic).

In our ongoing work, we are exploring how fairness can be built into social-distancing measures so that no one community is unduly impacted by it [7, 22]. In ongoing work, we have also studied problems of designing interventions to control epidemic spread modeled as an SIR process on a graph, in which an infection spreads from an infected node to its susceptible neighbors with probability p. The problem of designing social distancing interventions involves choosing a minimum cost subset of edges to break (which corresponds to social distancing), so that the expected number of infections is kept below a given threshold. This is a very fundamental problem in epidemiology and network science, and has remained generally open. While a number of heuristic approaches have been proposed, the only rigorous prior work has been for the case of p = 1. We present the first rigorous results for the case of p < 1, which becomes a very challenging stochastic optimization problem. We also show how to incorporate demographic fairness constraints to the model, in which we are given additional bounds on the number of infections for each demographic group.

In [8], we consider the simultaneous propagation of two contagions over a social network. We assume a threshold model for the propagation of the two contagions and use the formal framework of discrete dynamical systems. In particular, we study an optimization problem where the goal is to minimize the total number of infections, subject to a budget constraint on the total number of nodes that can be vaccinated. While this problem has been considered in the literature for a single contagion, our work considers the simultaneous propagation of two contagions. We propose a new model for the problem and show that the optimization problem remains NP-hard. We develop a heuristic based on a generalization of the set cover problem. Using experiments on several real-world networks, we compare the performance of the heuristic with some baseline methods. These results show that our heuristic algorithm has very good performance with respect to blocking and can be used even for reasonably large networks.

MIT team members are working on theoretical bounds for spread and herd-immunity on graphs. They also participated in outreach activities aimed at the theoretical computer science community, including a talk in the hot topics in computing series at MIT [37] and a panel at the Simons Institute for Theory of Computing [39].

In [32], we study the minimum size of an initial adoption set (denoted by IAmin), that would ensure that all nodes get infected in a contagion model with threshold q (in such a model, a node becomes infected if q fraction of its neighbors are infected); this problem is also referred to as the Target Set selection problem. We observe that in a large set of real-world networks we study, the fraction f min = IAmin/N varies as a step function with q, i.e., there is an abrupt increase in f min beyond a certain value of q; this value is denoted by qstep. The fraction 1 qstep for a network could be perceived as a measure of the intra-cluster density of the blocking cluster of the network; this cluster of the network cannot be penetrated by a contagion unless we include one or more nodes of this blocking cluster as part of the set of initial adopters. The intra-cluster density of a cluster is computed as the minimum of the intra-cluster densities of the bridge nodes of the cluster (i.e., nodes that have edges to nodes both inside and outside the cluster). The intra-cluster density of a bridge node is the ratio of the number of neighbors inside the cluster to the total number of neighbors. We refer to 1 qstep as the Cascade Blocking Index (CBI) of a network, a quantitative measure of the difficulty in penetrating the blocking cluster(s) of the network. The larger the CBI value for a network, the larger is the intra-cluster density of the blocking clusters of the network and vice-versa. While a lower CBI value is preferred for positive information to seamlessly get adopted by the nodes in the network, a larger CBI value is preferred for a network to keep away an epidemic from spreading through the network.

There are a number of reasons for the fall in immunization rates, but chief among them are concerns about possible side-effects, parents’ own religious and philosophical beliefs, and misperceptions about the risks. It has been observed that peer effects have a significant role in the spread of anti-vaccine sentiment––individuals with such sentiment are in communities with similar sentiment. There is a certain disutility an individual gets by not conforming to their social contacts; on the other hand, the individual gets a utility by conforming to its social contacts. This phenomenon can be viewed as a coordination game, which has been very well studied. In its basic form, the utility of a node in a coordination game is a function of the number of neighbors having the same state as the node. In our recent work [21], we extend the framework of coordination games and incorporate vaccination decisions–––this requires considering two important components, namely, the benefit a node derives from vaccination and the benefit of herd immunity that all individuals obtain if a large enough fraction of the population is vaccinated. We use Nash equilibria (NE) as the solution concept in such games, and characterize their structure. We show that NE are closely related to the notion of strong communities. We also show a connection between NE and the dynamics of bootstrap percolation, and use it to find the “worst NE”, i.e, the one with the largest number of anti-vaccine nodes. We show that the social optimum (a strategy that maximizes the total utility) can be computed optimally in polynomial time, and derive tight bounds on the Price of Anarchy (the maximum ratio of the total utility of the social optimum and any NE). We study the properties of NE in a diverse class of real-world and social networks and random graphs. We find that there is a threshold value theta critical for the ratio C/alpha, where C and alpha are parameters associated with the benefit from vaccination and conformity with neighbors, respectively, such that the number of anti-vaccine nodes in the worst NE shows a dramatic change beyond this threshold.

Several recent papers study algorithmic and complexity aspects of diffusion problems for dynamical systems whose underlying graphs are directed, and may contain directed cycles. In particular, [13] examined two problems related to convergence of opinions and showed that the problems are computationally intractable for dynamical systems whose underlying directed graphs may contain cycles. Such problems can be formally specified as reachability problems in the phase space of the corresponding dynamical system. We showed that computational intractability results for reachability problems hold even for dynamical systems on directed acyclic graphs (dags). We also identify some versions of the reachability problem that are efficiently solvable for dynamical systems on dags, even though these problems are intractable when cycles are permitted. In establishing these results, we also identify several structural properties of the phase spaces of such dynamical systems. For example, we show that the length of each cycle in the phase space of any synchronous dynamical system on a dag is a power of 2. These results are discussed in [53].

We say that two configurations of a dynamical system where each node has a state from 0, 1 are similar if the Hamming distance between them is small. Also, a predecessor of a configuration B is a configuration A such that B can be reached in one step from A. In [44], we study problems related to the similarity of predecessor configurations from which two similar configurations can be reached in one time step. We address these problems both analytically and experimentally. Our analytical results point out that the level of similarity between predecessors of two similar configurations depends strongly on the local functions of the dynamical system. Our experimental results considered one class of dynamical systems, namely those with threshold functions. We considered random graphs as well as small world networks. The experimental procedure exploits the fact that the problem of finding predecessors can be reduced to the Boolean Satisfiability problem (SAT). This allows the use of public domain SAT solvers. In general, the experimental results indicate that for large threshold values, as the Hamming distance between a pair of configurations is increased, the average Hamming distance between predecessor sets remains more or less stable. These results suggest that the evolution of trajectories in systems with large thresholds exhibits a certain level of uniformity; that is, similar configurations result from similar predecessors.

A lot of network data is being used in epidemic analysis and response during the COVID-19 pandemic. Such data is very sensitive, and ensuring privacy is a crucial requirement. In ongoing work, we are studying different network analysis problems with differential privacy. One of the problems we study is densest subgraph detection, which is a fundamental graph mining problem, with a large number of applications. Such subgraphs are also relevant for the analysis of epidemic dynamics, since they are well connected. We study this problem in the edge differential privacy model, in which the edges of the graph are private. We present the first sequential and parallel differentially private algorithms for this problem. We show that our algorithms have an additive approximation guarantee. We evaluate our algorithms on a large number of real-world networks and observe a good privacy-accuracy trade-off when the network has high density. In ongoing work, we are studying problems of epidemic control under differential privacy.

[1] T. Abate. Crowdsourcing site collects county-level policy data to inform decisions about easing social- distancing, April 2020. https://news.stanford.edu/2020/04/13/stanford-crowdsources-county- level-covid-19-policy-data/.

[2] V. Abeykoon, N. Perera, C. Widanage, S. Kamburugamuve, T. A. Kanewala, H. Maithree, P. Wick- ramasinghe, A. Uyar, and G. Fox. Data engineering for hpc with python. In 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC), pages 13–21. IEEE, 2020.

[3] A. Adiga, J. Chen, M. Marathe, H. Mortveit, S. Venkatramanan, and A. Vullikanti. Data-driven modeling for different stages of pandemic response. Journal of the Indian Institute of Science, pages 1–15, 2020.

[4] A. Adiga, D. Dubhashi, B. Lewis, M. Marathe, S. Venkatramanan, and A. Vullikanti. Mathematical models for covid-19 pandemic: a comparative analysis. Journal of the Indian Institute of Science, pages 1–15, 2020.

[5] A. Adiga, L. Wang, B. Hurt, A. Peddireddy, P. Porebski, S. Venkatramanan, B. Lewis, and M. Marathe. All models are useful: Bayesian ensembling for robust high resolution covid-19 forecasting, 2021. Sub- mitted.

[6] E. Bradley, M. Marathe, M. Moses, W. D. Gropp, and D. Lopresti. Pandemic informatics: Preparation, robustness, and resilience. arXiv preprint arXiv:2012.09300, published as part of CCC Quadrennial pa- pers, 2020. https://cra.org/ccc/resources/ccc-led-whitepapers/#2020-quadrennial-papers.

[7] B. Brubach, D. Chakrabarti, J. P. Dickerson, A. Srinivasan, and L. Tsepenekas. Fairness, semi- supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. arXiv preprint arXiv:2103.02013, 2021.

[8] H. L. Carscadden, C. J. Kuhlman, M. V. Marathe, S. S. Ravi, and D. J. Rosenkrantz. Blocking the propagation of two simultaneous contagions over networks. In R. M. Benito, C. Cherifi, H. Cherifi, E. Moro, L. M. Rocha, and M. Sales-Pardo, editors, Proc. 9th International Conference on Complex Networks and Applications (Complex Networks), pages 455–468, Chan, Switzerland, 2020. Springer.

[9] S. Chang, E. Pierson, P. W. Koh, J. Gerardin, B. Redbird, D. Grusky, and J. Leskovec. Mobility network models of covid-19 explain inequities and inform reopening. Nature, 589(7840):82–87, 2020.

[10] J. Chen, S. Levin, S. Eubank, H. Mortveit, S. Venkatramanan, A. Vullikanti, and M. Marathe. Net- worked epidemiology for covid-19. Siam news, 2020.

[11] J. Chen, A. Vullikanti, S. Hoops, H. Mortveit, B. Lewis, S. Venkatramanan, W. You, S. Eubank, M. Marathe, C. Barrett, and M. A. Medical costs of keeping the us economy open during covid-19. Scientific reports, Oct 2020.

[12] J. Chen, A. Vullikanti, J. Santos, S. Venkatramanan, S. Hoops, H. Mortveit, B. Lewis, W. You, S. Eu- bank, M. Marathe, C. Barrett, and A. Marathe. Epidemiological and economic impact of covid-19 in the us. medRxiv : the preprint server for health sciences, November 2020.

[13] D. Chistikov, G. Lisowski, M. Paterson, and P. Turrini. Convergence of opinion diffusion is pspace- complete. In Proc. AAAI, pages 7103–7110. AAAI Press, 2020.

[14] E. Y. Cramer, V. K. Lopez, J. Niemi, G. E. George, J. C. Cegan, I. D. Dettwiller, W. P. England, M. W. Farthing, R. H. Hunter, B. Lafferty, et al. Evaluation of individual and ensemble probabilistic forecasts of covid-19 mortality in the us. medRxiv, 2021.

[15] S. Eubank, I. Eckstrand, B. Lewis, S. Venkatramanan, M. Marathe, and C. Barrett. Commentary on ferguson, et al.,“impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand”. Bulletin of mathematical biology, 82(4):1–7, 2020.

[16] M. C. Fitzpatrick and A. P. Galvani. Optimizing age-specific vaccination. Science, 371(6532):890–891, 2021.

[17] G. Fox. Deep learning based time evolution, 2020. http://dsc.soic.indiana.edu/publications/ Summary-DeepLearningBasedTimeEvolution.pdf.

[18] G. Fox. Deep learning for spatial time series, 2020. http://dsc.soic.indiana.edu/publications/ Deep%20Learning%20for%20Spatial%20Time%20Series.pdf.

[19] G. C. Fox, G. von Laszewski, F. Wang, and S. Pyne. Aicov: An integrative deep learning framework for covid-19 forecasting with population covariates. arXiv preprint arXiv:2010.03757, 2020.

[20] F. Haghpanah, G. Lin, S. Levin, and E. Y. Klein. Analysis of the potential efficacy and timing of covid-19 vaccine on morbidity and mortality. Available at SSRN 3745195, 2021.

[21] A. Haque, M. Thakur, M. Bielskas, and A. V. A Marathe. Persistence of anti-vaccine sentiment in social networks through strategic interactions. In Thirty Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 2021.

[22] D. Harris, T. Pensyl, A. Srinivasan, and K. Trinh. Dependent randomized rounding for clustering and partition systems with knapsack constraints. In International Conference on Artificial Intelligence and Statistics, pages 2273–2283. PMLR, 2020.

[23] P. J. Hotez, R. E. Cooney, R. M. Benjamin, N. T. Brewer, A. M. Buttenheim, T. Callaghan, A. Caplan, R. M. Carpiano, C. Clinton, R. DiResta, et al. Announcing the lancet commission on vaccine refusal, acceptance, and demand in the usa. The Lancet, 2021.

[24] K. Huang, T. Fu, W. Gao, Y. Zhao, Y. Roohani, J. Leskovec, C. W. Coley, C. Xiao, J. Sun, and M. Zitnik. Therapeutics data commons: Machine learning datasets and tasks for therapeutics. Nature Communications. Also as arXiv preprint arXiv:2102.09548, 2021.

[25] H. Kamarthi, L. Kong, A. Rodriguez, C. Zhang, and B. A. Prakash. When in doubt: Explainable and principled uncertainty quantification for epidemic forecasting. Submitted, 2021.

[26] X. Kang, S. Ranganathan, L. Kang, J. Gohlke, and X. Deng. Bayesian auxiliary variable model for birth records data with qualitative and quantitative responses. arXiv preprint arXiv:2008.06525, 2020.

[27] R. Laxminarayan, S. Jameel, and S. Sarkar. India’s battle against covid-19: Progress and challenges. The American Journal of Tropical Medicine and Hygiene, 103(4):1343, 2020.

[28] R. Laxminarayan, B. Wahl, S. R. Dudala, K. Gopal, S. Neelima, K. J. Reddy, J. Radhakrishnan, J. A. Lewnard, et al. Epidemiology and transmission dynamics of covid-19 in two indian states. Science, 370(6517):691–697, 2020.

[29] G. Lin, A. T. Strauss, M. Pinz, D. A. Martinez, K. K. Tseng, E. Schueller, O. Gatalo, Y. Yang, S. A. Levin, E. Y. Klein, et al. Explaining the “bomb-like” dynamics of covid-19 with modeling and the implications for policy. medRxiv, 2020.

[30] D. Machi, P. Bhattacharya, S. Hoops, J. Chen, H. Mortveit, S. Venkatramanan, B. Lewis, M. Wilson, A. Fadikar, T. Maiden, and C. L. Barrett. Scalable epidemiological workflows to support covid-19 planning and response. In In Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021.

[31] M. Marathe, A. Vullikanti, D. Rosenkratz, S. Ravi, R. Stearns, and S. Levin. Computational challenges and opportunities for forecasting epidemic dynamics using network models, 2020.

[32] N. Meghanathan. Exploring the step function distribution of the threshold fraction of adopted neighbors vs. minimum fraction of nodes as initial adopters to assess the cascade blocking intra-cluster density of complex real-world networks. Springer Applied Network Science (Published as part of Special issue on Epidemics Dynamics & Control on Networks), 5(97):1–33, December 2020.

[33] Z. Mehrab, A. G. Ranga, D. Sarkar, S. Venkatramanan, Y. C. Baek, S. Swarup, and M. V. Marathe. High resolution proximity statistics as early warning for us universities reopening during covid-19, 2020. medRxiv preprint 2020.11.21.20236042.

[34] M. Minutoli, P. Sambaturu, M. Halappanavar, A. Tumeo, A. Kalyanaraman, and A. Vullikanti. Preempt: Scalable epidemic interventions using submodular optimization on multi-gpu systems. In Proc. SC, SC ’20. IEEE Press, 2020.

[35] S. M. Moghadas, M. C. Fitzpatrick, P. Sah, A. Pandey, A. Shoukat, B. H. Singer, and A. P. Galvani. The implications of silent transmission for the control of covid-19 outbreaks. Proceedings of the National Academy of Sciences, 117(30):17513–17515, 2020.

[36] S. M. Moghadas, T. N. Vilches, K. Zhang, C. R. Wells, A. Shoukat, B. H. Singer, L. A. Meyers, K. M. Neuzil, J. M. Langley, M. C. Fitzpatrick, et al. The impact of vaccination on covid-19 outbreaks in the united states. medRxiv, 2020.

[37] A. Moitra and E. Mossel. Hot topics in computing: An invitation to computational epidemiology, March 2021.

[38] D. Morris, F. Rossine, J. Plotkin, and S. Levin. Optimal, near-optimal, and robust epidemic control. communication physics. OSF Preprints, 2020.

[39] Theoretically speaking — computational and statistical tools to control a pandemic: A panel discussion, simons institute, May 2020. https://simons.berkeley.edu/events/covid19.

[40] A. S. Peddireddy, D. Xie, P. Patil, M. L. Wilson, D. Machi, S. Venkatramanan, B. Klahn, P. Porebski, P. Bhattacharya, S. Dumbre, and M. Marathe. From 5vs to 6cs: Operationalizing epidemic data management with covid-19 surveillance. In Proceedings of the IEEE International Conference on Big Data (BigData), 2020.

[41] B. Peng, J. Li, S. Akkas, F. Wang, T. Araki, O. Yoshiyuki, and J. Qiu. Rank position forecasting in car racing. arXiv preprint arXiv:2010.01707, 2020.

[42] N. Perera, V. Abeykoon, C. Widanage, S. Kamburugamuve, T. A. Kanewala, P. Wickramasinghe, A. Uyar, H. Maithree, D. Lenadora, and G. Fox. A fast, scalable, universal approach for distributed data reductions. arXiv preprint arXiv:2010.14596, 2020.

[43] A. Pilehvari, W. You, J. Chen, S. Venkatramanan, J. Krulick, and A. Marathe. Differential impact of social distancing on covid-19 spread in the us: by rurality and social vulnerability, 2021. Submitted.

[44] J. D. Priest, M. V. Marathe, S. S. Ravi, D. J. Rosenkrantz, and R. E. Stearns. Evolution of similar configurations in graph dynamical systems. In R. M. Benito, C. Cherifi, H. Cherifi, E. Moro, L. M. Rocha, and M. Sales-Pardo, editors, Proc. 9th International Conference on Complex Networks and Applications (Complex Networks), pages 544–555, Chan, Switzerland, 2020. Springer.

[45] J. Quetzalc´oatl Toledo-Mar´ın, G. Fox, J. P. Sluka, and J. A. Glazier. Deep learning approaches to surrogates for solving the diffusion equation for mechanistic real-world simulations. arXiv e-prints, pages arXiv–2102, 2021.

[46] E. L. Ray, N. Wattanachit, J. Niemi, A. H. Kanji, K. House, E. Y. Cramer, J. Bracher, A. Zheng, T. K. Yamana, X. Xiong, et al. Ensemble forecasts of coronavirus disease 2019 (covid-19) in the us. MedRXiv, 2020.

[47] A. Rodr´ıguez, B. Adhikari, A. D. Gonz´alez, C. Nicholson, A. Vullikanti, and B. A. Prakash. Mapping network states using connectivity queries. Proceedings of IEEE International Conference on Big Data 2020 (IEEE BigData 2020), 2020.

[48] A. Rodr´ıguez, B. Adhikari, A. D. Gonz´alez, C. Nicholson, A. Vullikanti, and B. A. Prakash. Mapping network states using connectivity queries. NeurIPS 2020 Artificial Intelligence and Humanitarian and Disaster Relief (AI + HADR) Workshop, 2020.

[49] A. Rodr´ıguez, B. Adhikari, N. Ramakrishnan, and B. A. Prakash. Incorporating expert guidance in epidemic forecasting. arXiv preprint arXiv:2101.10247, 2020.

[50] A. Rodriguez, N. Muralidhar, B. Adhikari, A. Tabassum, N. Ramakrishnan, and B. A. Prakash. Steering a historical disease forecasting model under a pandemic: Case of flu and covid-19. NeurIPS 2020 Machine Learning in Public Health (MLPH) Workshop, 2020.

[51] A. Rodriguez, N. Muralidhar, B. Adhikari, A. Tabassum, N. Ramakrishnan, and B. A. Prakash. Steering a historical disease forecasting model under a pandemic: Case of flu and covid-19. Proceedings of the AAAI Conference on Artificial Intelligence, 2021.

[52] A. Rodriguez, A. Tabassum, J. Cui, J. Xie, J. Ho, P. Agarwal, B. Adhikari, and B. A. Prakash. Deep- COVID: An operational deep learning-driven framework for explainable real-time COVID-19 forecasting. In Proc. AAAI Conference on Artificial Intelligence, 2021.

[53] D. J. Rosenkrantz, M. V. Marathe, S. S. Ravi, and R. E. Stearns. Synchronous dynamical systems on directed acyclic graphs: Complexity and algorithms. In Proc. AAAI-2021 (to appear), 2021. 9 pages.

[54] C. Ruiz, M. Zitnik, and J. Leskovec. Identification of disease treatment mechanisms through the mul- tiscale interactome. bioRxiv, 2020.

[55] C. M. Saad-Roy, N. Arinaminpathy, N. S. Wingreen, S. A. Levin, J. M. Akey, and B. T. Grenfell. Implications of localized charge for human influenza a h1n1 hemagglutinin evolution: Insights from deep mutational scans. PLoS computational biology, 16(6):e1007892, 2020.

[56] C. M. Saad-Roy, B. T. Grenfell, S. A. Levin, L. Pellis, H. B. Stage, P. van den Driessche, and N. S. Wingreen. Superinfection and the evolution of an initial asymptomatic stage. Royal Society open science, 8(1):202212, 2021.

[57] C. M. Saad-Roy, S. A. Levin, C. J. E. Metcalf, and B. T. Grenfell. Trajectory of individual immunity and vaccination required for sars-cov-2 community immunity: a conceptual investigation. Journal of the Royal Society Interface, 18(175):20200683, 2021.

[58] C. M. Saad-Roy, S. E. Morris, C. J. E. Metcalf, M. J. Mina, R. E. Baker, J. Farrar, E. C. Holmes, O. Pybus, A. L. Graham, S. A. Levin, et al. Epidemiological and evolutionary considerations of sars- cov-2 vaccine dosing regimes. medRxiv, 2021. Science.

[59] C. M. Saad-Roy, C. E. Wagner, R. E. Baker, S. E. Morris, J. Farrar, A. L. Graham, S. A. Levin, M. J. Mina, C. J. E. Metcalf, and B. T. Grenfell. Immune life history, vaccination, and the dynamics of sars-cov-2 over the next 5 years. Science, 370(6518):811–818, 2020.

[60] Y. Serkez. The magic number for reducing infections and keeping businesses open, December 2020.

[61] A. Uyar, G. Gunduz, S. Kamburugamuve, P. Wickramasinghe, C. Widanage, K. Govindarajan, N. Per- era, V. Abeykoon, S. Akkas, and G. Fox. Twister2 cross-platform resource scheduler for big data, 2021.

[62] J. Vekemans, M. Hasso-Agopsowicz, G. Kang, W. P. Hausdorff, A. Fiore, E. Tayler, E. J. Klemm, R. Laxminarayan, P. Srikantiah, M. Friede, et al. Leveraging vaccines to reduce antibiotic use and prevent antimicrobial resistance: a who action framework. Clinical Infectious Diseases, 2021.

[63] C. E. Wagner, J. A. Prentice, C. M. Saad-Roy, L. Yang, B. T. Grenfell, S. A. Levin, and R. Laxmi- narayan. Economic and behavioral influencers of vaccination and antimicrobial use. Frontiers in public health, 8:975, 2020.

[64] F. Wang, C. Widanage, W. Liu, J. Li, X. Wang, H. Tang, and J. Fox. Privacy-preserving genomic computing with sgx-based big-data analytics framework. 20th IEEE International Workshop on High Performance Computational Biology (HiCOMB) [under review], 2021.

[65] L. Wang, A. Adiga, J. Chen, A. Sadilek, S. Venkatramanan, and M. Marathe. CausalGNN: Causal-based graph neural networks for spatio-temporal epidemic forecasting, 2021. Submitted.

[66] L. Wang, A. Adiga, S. Venkatramanan, J. Chen, B. Lewis, and M. Marathe. Examining deep learning models with multiple data sources for covid-19 forecasting, 2020. arXiv preprint arXiv:2010.14491.

[67] L. Wang, X. Ben, A. Adiga, A. Sadilek, A. Tendulkar, S. Venkatramanan, A. Vullikanti, G. Aggarwal, A. Talekar, J. Chen, B. Lewis, S. Swarup, A. Kapoor, M. Tambe, and M. Marathe. Using mobility data to understand and forecast covid-19 dynamics. In Proceedings of the 29th International Joint Conference on Artificial Intelligence Workshop on AI for Social Good., 2021.

[68] L. Wang, D. Ghosh, M. T. G. Diaz, A. K. Farahat, M. Alam, C. Gupta, J. Chen, and M. Marathe. Wisdom of the ensemble: Improving consistency of deep learning models. In NeurIPS, 2020.

[69] S. Weekes, P. March, M. Berger, M. Marathe, J. Crawley, C. M. Lois, A. El Bakry, R. Renaut, A. Gelb, F. Santosa, T. Grandine, P. Seshaiyer, and J. Hestheven. Report on future research directions for the national science foundation in the era of covid-19. A Report by SIAM Task Force on COVID-19, 2020. https://www.siam.org/Portals/0/reports/Report%20on%20Future% 20Research%20Directions%20for%20NSF.pdf?ver=2020-12-10-144744-750.

[70] Y. Wei and X. Deng. A simple gaussian process modeling approach for experiments with quantitative- sequence factors. Submitted, 2021.

[71] C. Widanage, N. Perera, V. Abeykoon, S. Kamburugamuve, T. A. Kanewala, H. Maithree, P. Wickra- masinghe, A. Uyar, G. Gunduz, and G. Fox. High performance data engineering everywhere. In 2020 IEEE International Conference on Smart Data Services (SMDS), pages 122–132. IEEE, 2020.

[72] Q. Xiao, Y. Wang, A. Mandal, and X. Deng. Modeling and active learning for experiments with quantitative-sequence factors, revision for journal of the american statistical association. Submitted, 2021.

[73] O. Yagan, A. Sridhar, R. Eletreby, S. A. Levin, J. B. Plotkin, and H. V. Poor. Modeling and analysis of the spread of covid-19 under a multiple-strain model with mutations. Harvard Data Science Review, 2021.

[74] A. Yalaman, G. Basbug, C. Elgin, and A. P. Galvani. Cross-country evidence on the association between contact tracing and covid-19 case fatality rates. Scientific reports, 11(1):1–6, 2021.

### AI, Theory-Guided Machine Learning, and Massive Data Science: Our team members have worked on a diverse class of problems in this area, both in terms of the foundations as well as applications to problems in epidemiology and related areas.

As AI is increasingly supporting human decision making, there is more emphasis than ever on building trustworthy AI systems. Despite the discrepancies in terminology, almost all recent research on trustworthy AI agrees on the need for building AI models that produce consistently correct outputs for the same input. Although this seems like a straightforward requirement, as we periodically retrain AI models in the field there is no guarantee that different generations of the model will be consistently correct when presented with the same input. Consider the damage that can be caused by an AI-agent for COVID-19 diagnosis that correctly recommends a true patient to self-isolate, then changes its recommendation after being retrained with more data. Producing consistent outputs by deep learning models is crucial for all domains including epidemic forecasting. In [68], we introduce a new concept of learning models—consistency. We define the consistency as the ability of a model to reproduce an output for the same input after retraining, irrespective of whether the outputs are correct/incorrect, and define correct-consistency as the ability of the model to reproduce a correct output for the same input. We provide a theoretical explanation of why and how ensemble learning can improve consistency and correct- consistency. We also prove that adding components with accuracy higher than the average accuracy of ensemble component learners to an ensemble learner can yield a better consistency for correct predictions. Based on these theoretical findings, we propose a dynamic snapshot ensemble with pruning algorithm to boost predictive correct-consistency and accuracy in an efficient way. We examined the proposed theorems and method using multiple text and image datasets for classification. The proposed method has boosted the state-of-the-art performance. The results demonstrated the effectiveness and efficiency of the proposed ensemble method and prove the theorems empirically.

In [2, 42, 61, 71], we present the study of technologies to support deep learning with emphasis on the efficient linking of C++, Java, and Python modules. In [64], we present the study of a privacy-preserving SGX-based Big Data framework on human genome analytics. In terms of applications, [45] shows approaches to surrogates for COVID computational biology simulations addressing the large computational bottleneck in solving the diffusion equation. Our study of deep learning for time series that started with work on COVID-19 data [17, 18] extended to other areas (hydrology and earthquake) with the identification of spatial time [2] series that are very common and important. This research looked at recurrent and transformer neural net architectures.

The COVID-19 pandemic dramatically changed human mobility patterns, necessitating epidemiological models which capture the effects of changes in mobility on virus spread. We developed a novel metapopulation SEIR model, published in Nature [9], that integrates fine-grained, dynamic mobility networks to simulate the spread of SARS-CoV-2 in 10 of the largest US metropolitan statistical areas. Derived from cell phone data, our mobility networks map the hourly movements of 98 million people from neighborhoods (census block groups, or CBGs) to points of interest (POIs) such as restaurants and religious establishments, connecting 57k CBGs to 553k POIs with 5.4 billion hourly edges. We showed that by integrating these networks our model can accurately fit the real case trajectory, despite substantial changes in population behavior over time. Our model indicates that a small minority of “superspreader” POIs account for a large majority of infections and that restricting maximum occupancy at each POI is more effective than uniformly reducing mobility. Our model also correctly predicts higher infection rates among disadvantaged racial and socioeconomic groups solely from differences in mobility: we found that disadvantaged groups have not been able to reduce mobility as sharply, and that the POIs they visit are more crowded and therefore higher-risk. By capturing who is infected at which locations, our model supports detailed analyses that can inform more effective and equitable policy responses to COVID-19. Building on and further refining the model above, we collaborated with the Virginia Department of Health on a decision-support tool that utilizes large-scale data and epidemiological modeling to quantify the impact of changes in mobility on infection rates [4]. To balance these competing demands, policy makers need analytical tools to assess the costs and benefits of different mobility reduction measures. Our model captures the spread of COVID-19 by using a fine-grained, dynamic mobility network that encodes the hourly movements of people from neighborhoods to individual places, with over 3 billion hourly edges. By perturb- ing the mobility network, we can simulate a wide variety of reopening plans and forecast their impact in terms of new infections and the loss in visits per sector. To deploy this model in practice, we built a robust computational infrastructure to support running millions of model realizations, and we worked with policy makers to develop an intuitive dashboard interface that communicates our model’s predictions for thousands of potential policies, tailored to their jurisdiction. The resulting decision-support environment provides pol- icy makers with much-needed analytical machinery to assess the trade-offs between future infections and mobility restrictions.

The COVID-19 pandemic demonstrated the need for accelerated drug discovery pipelines. We have been developing novel machine learning-based approaches for this problem. Our work covers: (i) identification of disease treatment mechanisms, (ii) a machine learning platform for evaluating therapeutics, and (iii) improving drug safety. Most diseases disrupt multiple proteins, and drugs treat such diseases by restoring the functions of the disrupted proteins. How drugs restore these functions, however, is often unknown as a drug’s therapeutic effects are not limited only to the proteins that the drug directly targets. We developed the multiscale interactome, a powerful approach to explain disease treatment [54]. We integrated disease-perturbed proteins, drug targets, and biological functions into a multiscale interactome network, which contains 478,728 interactions between 1,661 drugs, 840 diseases, 17,660 human proteins, and 9,798 biological functions. We found that a drug’s effectiveness can often be attributed to targeting proteins that are distinct from disease-associated proteins but that affect the same biological functions. We developed a random walk-based method that captures how drug effects propagate through a hierarchy of biological functions and are coordinated by the protein-protein interaction network in which drugs act. On three key pharmacological tasks, we found that the multiscale interactome predicts what drugs will treat a given disease more effectively than prior approaches, identifies proteins and biological functions related to treatment, and predicts genes that interfere with treatment to alter drug efficacy and cause serious adverse reactions. Our results indicate that physical interactions between proteins alone are unable to explain the therapeutic effects of drugs as many drugs treat diseases by affecting the same biological functions disrupted by the disease rather than directly targeting disease proteins or their regulators. Our general framework allows for identification of proteins and biological functions relevant in treatment, even when drugs seem unrelated to the diseases they are recommended for. Machine learning for therapeutics is an emerging field with incredible opportunities for innovation and expansion. We introduced Therapeutics Data Commons (TDC), the first unifying framework to systematically access and evaluate machine learning across the entire range of therapeutics [24]. At its core, TDC is a collection of curated datasets and learning tasks that can translate algorithmic innovation into biomedical and clinical implementation. To date, TDC includes 66 machine learning-ready datasets from 22 learning tasks, spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles. All datasets and learning tasks are integrated and accessible via an open-source library. We envision that TDC can facilitate algorithmic and scientific advances and accelerate development, validation, and transition into production and clinical implementation. TDC is a continuous, open-source initiative, and we invite contributions from the research community. TDC is publicly available at https://tdcommons.ai. Improving drug safety is critical to mitigating the health burden of infectious diseases. Approximately 1 in 3 novel therapeutics approved by the FDA is associated with a severe post- market safety event, where new safety risks are identified after the initial regulatory approval. Understanding the root causes of these post-market safety events can inform clinical trial design and regulatory decision- making, which will ultimately improve drug safety. Using a dataset of 10,443,476 reports across 3,624 drugs and 19,193 adverse events, we have started to explore the relationship between post-market safety events and the diversity of subject population in pre-market clinical trials across race, gender, age, and geography. Our initial findings in this ongoing study indicate that the racial representativeness of pre-market clinical trials is strongly associated with post-market safety events and that there is a negative association between international clinical trial sites and safety.

Can we infer all the failed components of an infrastructure network, given a sample of reachable nodes from supply nodes? One of the most critical post-disruption processes after a natural disaster is to quickly determine the damage or failure states of critical infrastructure components. However, this is non-trivial considering that often only a fraction of components may be accessible or observable after a disruptive event. Past work has looked into inferring failed components given point probes, i.e. with a direct sample of failed components. In contrast, we study the harder problem of inferring failed components given partial information of some ‘serviceable’ reachable nodes and a small sample of point probes, being the first often more practical to obtain. We formulate this novel problem using the Minimum Description Length (MDL) principle, and then present a greedy algorithm that minimizes MDL cost effectively. We evaluate our algorithm on domain-expert simulations of real networks in the aftermath of an earthquake. Our algorithm successfully identifies failed components, especially the critical ones affecting overall system performance. This work has been published in [47] (full version); a preliminary version appeared as [48].

[1] T. Abate. Crowdsourcing site collects county-level policy data to inform decisions about easing social- distancing, April 2020. https://news.stanford.edu/2020/04/13/stanford-crowdsources-county- level-covid-19-policy-data/.

[2] V. Abeykoon, N. Perera, C. Widanage, S. Kamburugamuve, T. A. Kanewala, H. Maithree, P. Wick- ramasinghe, A. Uyar, and G. Fox. Data engineering for hpc with python. In 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC), pages 13–21. IEEE, 2020.

[3] A. Adiga, J. Chen, M. Marathe, H. Mortveit, S. Venkatramanan, and A. Vullikanti. Data-driven modeling for different stages of pandemic response. Journal of the Indian Institute of Science, pages 1–15, 2020.

[4] A. Adiga, D. Dubhashi, B. Lewis, M. Marathe, S. Venkatramanan, and A. Vullikanti. Mathematical models for covid-19 pandemic: a comparative analysis. Journal of the Indian Institute of Science, pages 1–15, 2020.

[5] A. Adiga, L. Wang, B. Hurt, A. Peddireddy, P. Porebski, S. Venkatramanan, B. Lewis, and M. Marathe. All models are useful: Bayesian ensembling for robust high resolution covid-19 forecasting, 2021. Sub- mitted.

[6] E. Bradley, M. Marathe, M. Moses, W. D. Gropp, and D. Lopresti. Pandemic informatics: Preparation, robustness, and resilience. arXiv preprint arXiv:2012.09300, published as part of CCC Quadrennial pa- pers, 2020. https://cra.org/ccc/resources/ccc-led-whitepapers/#2020-quadrennial-papers.

[7] B. Brubach, D. Chakrabarti, J. P. Dickerson, A. Srinivasan, and L. Tsepenekas. Fairness, semi- supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. arXiv preprint arXiv:2103.02013, 2021.

[8] H. L. Carscadden, C. J. Kuhlman, M. V. Marathe, S. S. Ravi, and D. J. Rosenkrantz. Blocking the propagation of two simultaneous contagions over networks. In R. M. Benito, C. Cherifi, H. Cherifi, E. Moro, L. M. Rocha, and M. Sales-Pardo, editors, Proc. 9th International Conference on Complex Networks and Applications (Complex Networks), pages 455–468, Chan, Switzerland, 2020. Springer.

[9] S. Chang, E. Pierson, P. W. Koh, J. Gerardin, B. Redbird, D. Grusky, and J. Leskovec. Mobility network models of covid-19 explain inequities and inform reopening. Nature, 589(7840):82–87, 2020.

[10] J. Chen, S. Levin, S. Eubank, H. Mortveit, S. Venkatramanan, A. Vullikanti, and M. Marathe. Net- worked epidemiology for covid-19. Siam news, 2020.

[11] J. Chen, A. Vullikanti, S. Hoops, H. Mortveit, B. Lewis, S. Venkatramanan, W. You, S. Eubank, M. Marathe, C. Barrett, and M. A. Medical costs of keeping the us economy open during covid-19. Scientific reports, Oct 2020.

[12] J. Chen, A. Vullikanti, J. Santos, S. Venkatramanan, S. Hoops, H. Mortveit, B. Lewis, W. You, S. Eu- bank, M. Marathe, C. Barrett, and A. Marathe. Epidemiological and economic impact of covid-19 in the us. medRxiv : the preprint server for health sciences, November 2020.

[13] D. Chistikov, G. Lisowski, M. Paterson, and P. Turrini. Convergence of opinion diffusion is pspace- complete. In Proc. AAAI, pages 7103–7110. AAAI Press, 2020.

[14] E. Y. Cramer, V. K. Lopez, J. Niemi, G. E. George, J. C. Cegan, I. D. Dettwiller, W. P. England, M. W. Farthing, R. H. Hunter, B. Lafferty, et al. Evaluation of individual and ensemble probabilistic forecasts of covid-19 mortality in the us. medRxiv, 2021.

[15] S. Eubank, I. Eckstrand, B. Lewis, S. Venkatramanan, M. Marathe, and C. Barrett. Commentary on ferguson, et al.,“impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand”. Bulletin of mathematical biology, 82(4):1–7, 2020.

[16] M. C. Fitzpatrick and A. P. Galvani. Optimizing age-specific vaccination. Science, 371(6532):890–891, 2021.

[17] G. Fox. Deep learning based time evolution, 2020. http://dsc.soic.indiana.edu/publications/ Summary-DeepLearningBasedTimeEvolution.pdf.

[18] G. Fox. Deep learning for spatial time series, 2020. http://dsc.soic.indiana.edu/publications/ Deep%20Learning%20for%20Spatial%20Time%20Series.pdf.

[19] G. C. Fox, G. von Laszewski, F. Wang, and S. Pyne. Aicov: An integrative deep learning framework for covid-19 forecasting with population covariates. arXiv preprint arXiv:2010.03757, 2020.

[20] F. Haghpanah, G. Lin, S. Levin, and E. Y. Klein. Analysis of the potential efficacy and timing of covid-19 vaccine on morbidity and mortality. Available at SSRN 3745195, 2021.

[21] A. Haque, M. Thakur, M. Bielskas, and A. V. A Marathe. Persistence of anti-vaccine sentiment in social networks through strategic interactions. In Thirty Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 2021.

[22] D. Harris, T. Pensyl, A. Srinivasan, and K. Trinh. Dependent randomized rounding for clustering and partition systems with knapsack constraints. In International Conference on Artificial Intelligence and Statistics, pages 2273–2283. PMLR, 2020.

[23] P. J. Hotez, R. E. Cooney, R. M. Benjamin, N. T. Brewer, A. M. Buttenheim, T. Callaghan, A. Caplan, R. M. Carpiano, C. Clinton, R. DiResta, et al. Announcing the lancet commission on vaccine refusal, acceptance, and demand in the usa. The Lancet, 2021.

[24] K. Huang, T. Fu, W. Gao, Y. Zhao, Y. Roohani, J. Leskovec, C. W. Coley, C. Xiao, J. Sun, and M. Zitnik. Therapeutics data commons: Machine learning datasets and tasks for therapeutics. Nature Communications. Also as arXiv preprint arXiv:2102.09548, 2021.

[25] H. Kamarthi, L. Kong, A. Rodriguez, C. Zhang, and B. A. Prakash. When in doubt: Explainable and principled uncertainty quantification for epidemic forecasting. Submitted, 2021.

[26] X. Kang, S. Ranganathan, L. Kang, J. Gohlke, and X. Deng. Bayesian auxiliary variable model for birth records data with qualitative and quantitative responses. arXiv preprint arXiv:2008.06525, 2020.

[27] R. Laxminarayan, S. Jameel, and S. Sarkar. India’s battle against covid-19: Progress and challenges. The American Journal of Tropical Medicine and Hygiene, 103(4):1343, 2020.

[28] R. Laxminarayan, B. Wahl, S. R. Dudala, K. Gopal, S. Neelima, K. J. Reddy, J. Radhakrishnan, J. A. Lewnard, et al. Epidemiology and transmission dynamics of covid-19 in two indian states. Science, 370(6517):691–697, 2020.

[29] G. Lin, A. T. Strauss, M. Pinz, D. A. Martinez, K. K. Tseng, E. Schueller, O. Gatalo, Y. Yang, S. A. Levin, E. Y. Klein, et al. Explaining the “bomb-like” dynamics of covid-19 with modeling and the implications for policy. medRxiv, 2020.

[30] D. Machi, P. Bhattacharya, S. Hoops, J. Chen, H. Mortveit, S. Venkatramanan, B. Lewis, M. Wilson, A. Fadikar, T. Maiden, and C. L. Barrett. Scalable epidemiological workflows to support covid-19 planning and response. In In Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021.

[31] M. Marathe, A. Vullikanti, D. Rosenkratz, S. Ravi, R. Stearns, and S. Levin. Computational challenges and opportunities for forecasting epidemic dynamics using network models, 2020.

[32] N. Meghanathan. Exploring the step function distribution of the threshold fraction of adopted neighbors vs. minimum fraction of nodes as initial adopters to assess the cascade blocking intra-cluster density of complex real-world networks. Springer Applied Network Science (Published as part of Special issue on Epidemics Dynamics & Control on Networks), 5(97):1–33, December 2020.

[33] Z. Mehrab, A. G. Ranga, D. Sarkar, S. Venkatramanan, Y. C. Baek, S. Swarup, and M. V. Marathe. High resolution proximity statistics as early warning for us universities reopening during covid-19, 2020. medRxiv preprint 2020.11.21.20236042.

[34] M. Minutoli, P. Sambaturu, M. Halappanavar, A. Tumeo, A. Kalyanaraman, and A. Vullikanti. Preempt: Scalable epidemic interventions using submodular optimization on multi-gpu systems. In Proc. SC, SC ’20. IEEE Press, 2020.

[35] S. M. Moghadas, M. C. Fitzpatrick, P. Sah, A. Pandey, A. Shoukat, B. H. Singer, and A. P. Galvani. The implications of silent transmission for the control of covid-19 outbreaks. Proceedings of the National Academy of Sciences, 117(30):17513–17515, 2020.

[36] S. M. Moghadas, T. N. Vilches, K. Zhang, C. R. Wells, A. Shoukat, B. H. Singer, L. A. Meyers, K. M. Neuzil, J. M. Langley, M. C. Fitzpatrick, et al. The impact of vaccination on covid-19 outbreaks in the united states. medRxiv, 2020.

[37] A. Moitra and E. Mossel. Hot topics in computing: An invitation to computational epidemiology, March 2021.

[38] D. Morris, F. Rossine, J. Plotkin, and S. Levin. Optimal, near-optimal, and robust epidemic control. communication physics. OSF Preprints, 2020.

[39] Theoretically speaking — computational and statistical tools to control a pandemic: A panel discussion, simons institute, May 2020. https://simons.berkeley.edu/events/covid19.

[40] A. S. Peddireddy, D. Xie, P. Patil, M. L. Wilson, D. Machi, S. Venkatramanan, B. Klahn, P. Porebski, P. Bhattacharya, S. Dumbre, and M. Marathe. From 5vs to 6cs: Operationalizing epidemic data management with covid-19 surveillance. In Proceedings of the IEEE International Conference on Big Data (BigData), 2020.

[41] B. Peng, J. Li, S. Akkas, F. Wang, T. Araki, O. Yoshiyuki, and J. Qiu. Rank position forecasting in car racing. arXiv preprint arXiv:2010.01707, 2020.

[42] N. Perera, V. Abeykoon, C. Widanage, S. Kamburugamuve, T. A. Kanewala, P. Wickramasinghe, A. Uyar, H. Maithree, D. Lenadora, and G. Fox. A fast, scalable, universal approach for distributed data reductions. arXiv preprint arXiv:2010.14596, 2020.

[43] A. Pilehvari, W. You, J. Chen, S. Venkatramanan, J. Krulick, and A. Marathe. Differential impact of social distancing on covid-19 spread in the us: by rurality and social vulnerability, 2021. Submitted.

[44] J. D. Priest, M. V. Marathe, S. S. Ravi, D. J. Rosenkrantz, and R. E. Stearns. Evolution of similar configurations in graph dynamical systems. In R. M. Benito, C. Cherifi, H. Cherifi, E. Moro, L. M. Rocha, and M. Sales-Pardo, editors, Proc. 9th International Conference on Complex Networks and Applications (Complex Networks), pages 544–555, Chan, Switzerland, 2020. Springer.

[45] J. Quetzalc´oatl Toledo-Mar´ın, G. Fox, J. P. Sluka, and J. A. Glazier. Deep learning approaches to surrogates for solving the diffusion equation for mechanistic real-world simulations. arXiv e-prints, pages arXiv–2102, 2021.

[46] E. L. Ray, N. Wattanachit, J. Niemi, A. H. Kanji, K. House, E. Y. Cramer, J. Bracher, A. Zheng, T. K. Yamana, X. Xiong, et al. Ensemble forecasts of coronavirus disease 2019 (covid-19) in the us. MedRXiv, 2020.

[47] A. Rodr´ıguez, B. Adhikari, A. D. Gonz´alez, C. Nicholson, A. Vullikanti, and B. A. Prakash. Mapping network states using connectivity queries. Proceedings of IEEE International Conference on Big Data 2020 (IEEE BigData 2020), 2020.

[48] A. Rodr´ıguez, B. Adhikari, A. D. Gonz´alez, C. Nicholson, A. Vullikanti, and B. A. Prakash. Mapping network states using connectivity queries. NeurIPS 2020 Artificial Intelligence and Humanitarian and Disaster Relief (AI + HADR) Workshop, 2020.

[49] A. Rodr´ıguez, B. Adhikari, N. Ramakrishnan, and B. A. Prakash. Incorporating expert guidance in epidemic forecasting. arXiv preprint arXiv:2101.10247, 2020.

[50] A. Rodriguez, N. Muralidhar, B. Adhikari, A. Tabassum, N. Ramakrishnan, and B. A. Prakash. Steering a historical disease forecasting model under a pandemic: Case of flu and covid-19. NeurIPS 2020 Machine Learning in Public Health (MLPH) Workshop, 2020.

[51] A. Rodriguez, N. Muralidhar, B. Adhikari, A. Tabassum, N. Ramakrishnan, and B. A. Prakash. Steering a historical disease forecasting model under a pandemic: Case of flu and covid-19. Proceedings of the AAAI Conference on Artificial Intelligence, 2021.

[52] A. Rodriguez, A. Tabassum, J. Cui, J. Xie, J. Ho, P. Agarwal, B. Adhikari, and B. A. Prakash. Deep- COVID: An operational deep learning-driven framework for explainable real-time COVID-19 forecasting. In Proc. AAAI Conference on Artificial Intelligence, 2021.

[53] D. J. Rosenkrantz, M. V. Marathe, S. S. Ravi, and R. E. Stearns. Synchronous dynamical systems on directed acyclic graphs: Complexity and algorithms. In Proc. AAAI-2021 (to appear), 2021. 9 pages.

[54] C. Ruiz, M. Zitnik, and J. Leskovec. Identification of disease treatment mechanisms through the mul- tiscale interactome. bioRxiv, 2020.

[55] C. M. Saad-Roy, N. Arinaminpathy, N. S. Wingreen, S. A. Levin, J. M. Akey, and B. T. Grenfell. Implications of localized charge for human influenza a h1n1 hemagglutinin evolution: Insights from deep mutational scans. PLoS computational biology, 16(6):e1007892, 2020.

[56] C. M. Saad-Roy, B. T. Grenfell, S. A. Levin, L. Pellis, H. B. Stage, P. van den Driessche, and N. S. Wingreen. Superinfection and the evolution of an initial asymptomatic stage. Royal Society open science, 8(1):202212, 2021.

[57] C. M. Saad-Roy, S. A. Levin, C. J. E. Metcalf, and B. T. Grenfell. Trajectory of individual immunity and vaccination required for sars-cov-2 community immunity: a conceptual investigation. Journal of the Royal Society Interface, 18(175):20200683, 2021.

[58] C. M. Saad-Roy, S. E. Morris, C. J. E. Metcalf, M. J. Mina, R. E. Baker, J. Farrar, E. C. Holmes, O. Pybus, A. L. Graham, S. A. Levin, et al. Epidemiological and evolutionary considerations of sars- cov-2 vaccine dosing regimes. medRxiv, 2021. Science.

[59] C. M. Saad-Roy, C. E. Wagner, R. E. Baker, S. E. Morris, J. Farrar, A. L. Graham, S. A. Levin, M. J. Mina, C. J. E. Metcalf, and B. T. Grenfell. Immune life history, vaccination, and the dynamics of sars-cov-2 over the next 5 years. Science, 370(6518):811–818, 2020.

[60] Y. Serkez. The magic number for reducing infections and keeping businesses open, December 2020.

[61] A. Uyar, G. Gunduz, S. Kamburugamuve, P. Wickramasinghe, C. Widanage, K. Govindarajan, N. Per- era, V. Abeykoon, S. Akkas, and G. Fox. Twister2 cross-platform resource scheduler for big data, 2021.

[62] J. Vekemans, M. Hasso-Agopsowicz, G. Kang, W. P. Hausdorff, A. Fiore, E. Tayler, E. J. Klemm, R. Laxminarayan, P. Srikantiah, M. Friede, et al. Leveraging vaccines to reduce antibiotic use and prevent antimicrobial resistance: a who action framework. Clinical Infectious Diseases, 2021.

[63] C. E. Wagner, J. A. Prentice, C. M. Saad-Roy, L. Yang, B. T. Grenfell, S. A. Levin, and R. Laxmi- narayan. Economic and behavioral influencers of vaccination and antimicrobial use. Frontiers in public health, 8:975, 2020.

[64] F. Wang, C. Widanage, W. Liu, J. Li, X. Wang, H. Tang, and J. Fox. Privacy-preserving genomic computing with sgx-based big-data analytics framework. 20th IEEE International Workshop on High Performance Computational Biology (HiCOMB) [under review], 2021.

[65] L. Wang, A. Adiga, J. Chen, A. Sadilek, S. Venkatramanan, and M. Marathe. CausalGNN: Causal-based graph neural networks for spatio-temporal epidemic forecasting, 2021. Submitted.

[66] L. Wang, A. Adiga, S. Venkatramanan, J. Chen, B. Lewis, and M. Marathe. Examining deep learning models with multiple data sources for covid-19 forecasting, 2020. arXiv preprint arXiv:2010.14491.

[67] L. Wang, X. Ben, A. Adiga, A. Sadilek, A. Tendulkar, S. Venkatramanan, A. Vullikanti, G. Aggarwal, A. Talekar, J. Chen, B. Lewis, S. Swarup, A. Kapoor, M. Tambe, and M. Marathe. Using mobility data to understand and forecast covid-19 dynamics. In Proceedings of the 29th International Joint Conference on Artificial Intelligence Workshop on AI for Social Good., 2021.

[68] L. Wang, D. Ghosh, M. T. G. Diaz, A. K. Farahat, M. Alam, C. Gupta, J. Chen, and M. Marathe. Wisdom of the ensemble: Improving consistency of deep learning models. In NeurIPS, 2020.

[69] S. Weekes, P. March, M. Berger, M. Marathe, J. Crawley, C. M. Lois, A. El Bakry, R. Renaut, A. Gelb, F. Santosa, T. Grandine, P. Seshaiyer, and J. Hestheven. Report on future research directions for the national science foundation in the era of covid-19. A Report by SIAM Task Force on COVID-19, 2020. https://www.siam.org/Portals/0/reports/Report%20on%20Future% 20Research%20Directions%20for%20NSF.pdf?ver=2020-12-10-144744-750.

[70] Y. Wei and X. Deng. A simple gaussian process modeling approach for experiments with quantitative- sequence factors. Submitted, 2021.

[71] C. Widanage, N. Perera, V. Abeykoon, S. Kamburugamuve, T. A. Kanewala, H. Maithree, P. Wickramasinghe, A. Uyar, G. Gunduz, and G. Fox. High performance data engineering everywhere. In 2020 IEEE International Conference on Smart Data Services (SMDS), pages 122–132. IEEE, 2020.

[72] Q. Xiao, Y. Wang, A. Mandal, and X. Deng. Modeling and active learning for experiments with quantitative-sequence factors, revision for journal of the american statistical association. Submitted, 2021.

[73] O. Yagan, A. Sridhar, R. Eletreby, S. A. Levin, J. B. Plotkin, and H. V. Poor. Modeling and analysis of the spread of covid-19 under a multiple-strain model with mutations. Harvard Data Science Review, 2021.

[74] A. Yalaman, G. Basbug, C. Elgin, and A. P. Galvani. Cross-country evidence on the association between contact tracing and covid-19 case fatality rates. Scientific reports, 11(1):1–6, 2021.

### Calibration, validation and uncertainty quantification.

For reliable forecasting, producing accurate as well as calibrated probabilistic prediction distribution is normally more useful compared to point estimates. Most work disregards this aspect. We design probabilistic deep generative models that directly model the probability density of forecast values. We observe that our models provide accurate and well-calibrated forecasts as well as adapt to abnormal scenarios where other models simply fail to capture novel patterns. This work is discussed in [25].

The Virginia Tech (VT) group has worked on uncertainty quantification (UQ) problems in epidemiology and other domains. We have investigated a Bayesian auxiliary variable model for analyzing birth records data with both continuous and discrete outcomes, as such types of data commonly arise in epidemiological studies. The use of the Bayesian method can not only have accurate prediction performance, but also can provide UQ for the prediction. UQ is very important when data contains non-ignorable noise, which is often the case for epidemiological data [26]. We also worked on a new type of data which involves quantitative-sequence (QS) factors. The QS factor contains both the sequence orders and their quantities of multiple components in a system, such as the combination of drug experiments. Yanran Wei, a female Ph.D. student who participated in this research was supported by this grant during Summer 2020 [72]. We have developed a simple Gaussian Process model for analyzing such data via transforming the QS factor into a generalized permutation matrix. The proposed method works well and has the capability to support uncertainty quantification. This work enabled Wei to obtain significant training on statistical analysis of data in health-care related fields. A Python package for the proposed method is also under preparation [70].

[1] T. Abate. Crowdsourcing site collects county-level policy data to inform decisions about easing social- distancing, April 2020. https://news.stanford.edu/2020/04/13/stanford-crowdsources-county- level-covid-19-policy-data/.

[2] V. Abeykoon, N. Perera, C. Widanage, S. Kamburugamuve, T. A. Kanewala, H. Maithree, P. Wick- ramasinghe, A. Uyar, and G. Fox. Data engineering for hpc with python. In 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC), pages 13–21. IEEE, 2020.

[3] A. Adiga, J. Chen, M. Marathe, H. Mortveit, S. Venkatramanan, and A. Vullikanti. Data-driven modeling for different stages of pandemic response. Journal of the Indian Institute of Science, pages 1–15, 2020.

[4] A. Adiga, D. Dubhashi, B. Lewis, M. Marathe, S. Venkatramanan, and A. Vullikanti. Mathematical models for covid-19 pandemic: a comparative analysis. Journal of the Indian Institute of Science, pages 1–15, 2020.

[5] A. Adiga, L. Wang, B. Hurt, A. Peddireddy, P. Porebski, S. Venkatramanan, B. Lewis, and M. Marathe. All models are useful: Bayesian ensembling for robust high resolution covid-19 forecasting, 2021. Sub- mitted.

[6] E. Bradley, M. Marathe, M. Moses, W. D. Gropp, and D. Lopresti. Pandemic informatics: Preparation, robustness, and resilience. arXiv preprint arXiv:2012.09300, published as part of CCC Quadrennial pa- pers, 2020. https://cra.org/ccc/resources/ccc-led-whitepapers/#2020-quadrennial-papers.

[7] B. Brubach, D. Chakrabarti, J. P. Dickerson, A. Srinivasan, and L. Tsepenekas. Fairness, semi- supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. arXiv preprint arXiv:2103.02013, 2021.

[8] H. L. Carscadden, C. J. Kuhlman, M. V. Marathe, S. S. Ravi, and D. J. Rosenkrantz. Blocking the propagation of two simultaneous contagions over networks. In R. M. Benito, C. Cherifi, H. Cherifi, E. Moro, L. M. Rocha, and M. Sales-Pardo, editors, Proc. 9th International Conference on Complex Networks and Applications (Complex Networks), pages 455–468, Chan, Switzerland, 2020. Springer.

[9] S. Chang, E. Pierson, P. W. Koh, J. Gerardin, B. Redbird, D. Grusky, and J. Leskovec. Mobility network models of covid-19 explain inequities and inform reopening. Nature, 589(7840):82–87, 2020.

[10] J. Chen, S. Levin, S. Eubank, H. Mortveit, S. Venkatramanan, A. Vullikanti, and M. Marathe. Net- worked epidemiology for covid-19. Siam news, 2020.

[11] J. Chen, A. Vullikanti, S. Hoops, H. Mortveit, B. Lewis, S. Venkatramanan, W. You, S. Eubank, M. Marathe, C. Barrett, and M. A. Medical costs of keeping the us economy open during covid-19. Scientific reports, Oct 2020.

[12] J. Chen, A. Vullikanti, J. Santos, S. Venkatramanan, S. Hoops, H. Mortveit, B. Lewis, W. You, S. Eu- bank, M. Marathe, C. Barrett, and A. Marathe. Epidemiological and economic impact of covid-19 in the us. medRxiv : the preprint server for health sciences, November 2020.

[13] D. Chistikov, G. Lisowski, M. Paterson, and P. Turrini. Convergence of opinion diffusion is pspace- complete. In Proc. AAAI, pages 7103–7110. AAAI Press, 2020.

[14] E. Y. Cramer, V. K. Lopez, J. Niemi, G. E. George, J. C. Cegan, I. D. Dettwiller, W. P. England, M. W. Farthing, R. H. Hunter, B. Lafferty, et al. Evaluation of individual and ensemble probabilistic forecasts of covid-19 mortality in the us. medRxiv, 2021.

[15] S. Eubank, I. Eckstrand, B. Lewis, S. Venkatramanan, M. Marathe, and C. Barrett. Commentary on ferguson, et al.,“impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand”. Bulletin of mathematical biology, 82(4):1–7, 2020.

[16] M. C. Fitzpatrick and A. P. Galvani. Optimizing age-specific vaccination. Science, 371(6532):890–891, 2021.

[17] G. Fox. Deep learning based time evolution, 2020. http://dsc.soic.indiana.edu/publications/ Summary-DeepLearningBasedTimeEvolution.pdf.

[18] G. Fox. Deep learning for spatial time series, 2020. http://dsc.soic.indiana.edu/publications/ Deep%20Learning%20for%20Spatial%20Time%20Series.pdf.

[19] G. C. Fox, G. von Laszewski, F. Wang, and S. Pyne. Aicov: An integrative deep learning framework for covid-19 forecasting with population covariates. arXiv preprint arXiv:2010.03757, 2020.

[20] F. Haghpanah, G. Lin, S. Levin, and E. Y. Klein. Analysis of the potential efficacy and timing of covid-19 vaccine on morbidity and mortality. Available at SSRN 3745195, 2021.

[21] A. Haque, M. Thakur, M. Bielskas, and A. V. A Marathe. Persistence of anti-vaccine sentiment in social networks through strategic interactions. In Thirty Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 2021.

[22] D. Harris, T. Pensyl, A. Srinivasan, and K. Trinh. Dependent randomized rounding for clustering and partition systems with knapsack constraints. In International Conference on Artificial Intelligence and Statistics, pages 2273–2283. PMLR, 2020.

[23] P. J. Hotez, R. E. Cooney, R. M. Benjamin, N. T. Brewer, A. M. Buttenheim, T. Callaghan, A. Caplan, R. M. Carpiano, C. Clinton, R. DiResta, et al. Announcing the lancet commission on vaccine refusal, acceptance, and demand in the usa. The Lancet, 2021.

[24] K. Huang, T. Fu, W. Gao, Y. Zhao, Y. Roohani, J. Leskovec, C. W. Coley, C. Xiao, J. Sun, and M. Zitnik. Therapeutics data commons: Machine learning datasets and tasks for therapeutics. Nature Communications. Also as arXiv preprint arXiv:2102.09548, 2021.

[25] H. Kamarthi, L. Kong, A. Rodriguez, C. Zhang, and B. A. Prakash. When in doubt: Explainable and principled uncertainty quantification for epidemic forecasting. Submitted, 2021.

[26] X. Kang, S. Ranganathan, L. Kang, J. Gohlke, and X. Deng. Bayesian auxiliary variable model for birth records data with qualitative and quantitative responses. arXiv preprint arXiv:2008.06525, 2020.

[27] R. Laxminarayan, S. Jameel, and S. Sarkar. India’s battle against covid-19: Progress and challenges. The American Journal of Tropical Medicine and Hygiene, 103(4):1343, 2020.

[28] R. Laxminarayan, B. Wahl, S. R. Dudala, K. Gopal, S. Neelima, K. J. Reddy, J. Radhakrishnan, J. A. Lewnard, et al. Epidemiology and transmission dynamics of covid-19 in two indian states. Science, 370(6517):691–697, 2020.

[29] G. Lin, A. T. Strauss, M. Pinz, D. A. Martinez, K. K. Tseng, E. Schueller, O. Gatalo, Y. Yang, S. A. Levin, E. Y. Klein, et al. Explaining the “bomb-like” dynamics of covid-19 with modeling and the implications for policy. medRxiv, 2020.

[30] D. Machi, P. Bhattacharya, S. Hoops, J. Chen, H. Mortveit, S. Venkatramanan, B. Lewis, M. Wilson, A. Fadikar, T. Maiden, and C. L. Barrett. Scalable epidemiological workflows to support covid-19 planning and response. In In Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021.

[31] M. Marathe, A. Vullikanti, D. Rosenkratz, S. Ravi, R. Stearns, and S. Levin. Computational challenges and opportunities for forecasting epidemic dynamics using network models, 2020.

[32] N. Meghanathan. Exploring the step function distribution of the threshold fraction of adopted neighbors vs. minimum fraction of nodes as initial adopters to assess the cascade blocking intra-cluster density of complex real-world networks. Springer Applied Network Science (Published as part of Special issue on Epidemics Dynamics & Control on Networks), 5(97):1–33, December 2020.

[33] Z. Mehrab, A. G. Ranga, D. Sarkar, S. Venkatramanan, Y. C. Baek, S. Swarup, and M. V. Marathe. High resolution proximity statistics as early warning for us universities reopening during covid-19, 2020. medRxiv preprint 2020.11.21.20236042.

[34] M. Minutoli, P. Sambaturu, M. Halappanavar, A. Tumeo, A. Kalyanaraman, and A. Vullikanti. Preempt: Scalable epidemic interventions using submodular optimization on multi-gpu systems. In Proc. SC, SC ’20. IEEE Press, 2020.

[35] S. M. Moghadas, M. C. Fitzpatrick, P. Sah, A. Pandey, A. Shoukat, B. H. Singer, and A. P. Galvani. The implications of silent transmission for the control of covid-19 outbreaks. Proceedings of the National Academy of Sciences, 117(30):17513–17515, 2020.

[36] S. M. Moghadas, T. N. Vilches, K. Zhang, C. R. Wells, A. Shoukat, B. H. Singer, L. A. Meyers, K. M. Neuzil, J. M. Langley, M. C. Fitzpatrick, et al. The impact of vaccination on covid-19 outbreaks in the united states. medRxiv, 2020.

[37] A. Moitra and E. Mossel. Hot topics in computing: An invitation to computational epidemiology, March 2021.

[38] D. Morris, F. Rossine, J. Plotkin, and S. Levin. Optimal, near-optimal, and robust epidemic control. communication physics. OSF Preprints, 2020.

[39] Theoretically speaking — computational and statistical tools to control a pandemic: A panel discussion, simons institute, May 2020. https://simons.berkeley.edu/events/covid19.

[40] A. S. Peddireddy, D. Xie, P. Patil, M. L. Wilson, D. Machi, S. Venkatramanan, B. Klahn, P. Porebski, P. Bhattacharya, S. Dumbre, and M. Marathe. From 5vs to 6cs: Operationalizing epidemic data management with covid-19 surveillance. In Proceedings of the IEEE International Conference on Big Data (BigData), 2020.

[41] B. Peng, J. Li, S. Akkas, F. Wang, T. Araki, O. Yoshiyuki, and J. Qiu. Rank position forecasting in car racing. arXiv preprint arXiv:2010.01707, 2020.

[42] N. Perera, V. Abeykoon, C. Widanage, S. Kamburugamuve, T. A. Kanewala, P. Wickramasinghe, A. Uyar, H. Maithree, D. Lenadora, and G. Fox. A fast, scalable, universal approach for distributed data reductions. arXiv preprint arXiv:2010.14596, 2020.

[43] A. Pilehvari, W. You, J. Chen, S. Venkatramanan, J. Krulick, and A. Marathe. Differential impact of social distancing on covid-19 spread in the us: by rurality and social vulnerability, 2021. Submitted.

[44] J. D. Priest, M. V. Marathe, S. S. Ravi, D. J. Rosenkrantz, and R. E. Stearns. Evolution of similar configurations in graph dynamical systems. In R. M. Benito, C. Cherifi, H. Cherifi, E. Moro, L. M. Rocha, and M. Sales-Pardo, editors, Proc. 9th International Conference on Complex Networks and Applications (Complex Networks), pages 544–555, Chan, Switzerland, 2020. Springer.

[45] J. Quetzalc´oatl Toledo-Mar´ın, G. Fox, J. P. Sluka, and J. A. Glazier. Deep learning approaches to surrogates for solving the diffusion equation for mechanistic real-world simulations. arXiv e-prints, pages arXiv–2102, 2021.

[46] E. L. Ray, N. Wattanachit, J. Niemi, A. H. Kanji, K. House, E. Y. Cramer, J. Bracher, A. Zheng, T. K. Yamana, X. Xiong, et al. Ensemble forecasts of coronavirus disease 2019 (covid-19) in the us. MedRXiv, 2020.

[47] A. Rodr´ıguez, B. Adhikari, A. D. Gonz´alez, C. Nicholson, A. Vullikanti, and B. A. Prakash. Mapping network states using connectivity queries. Proceedings of IEEE International Conference on Big Data 2020 (IEEE BigData 2020), 2020.

[48] A. Rodr´ıguez, B. Adhikari, A. D. Gonz´alez, C. Nicholson, A. Vullikanti, and B. A. Prakash. Mapping network states using connectivity queries. NeurIPS 2020 Artificial Intelligence and Humanitarian and Disaster Relief (AI + HADR) Workshop, 2020.

[49] A. Rodr´ıguez, B. Adhikari, N. Ramakrishnan, and B. A. Prakash. Incorporating expert guidance in epidemic forecasting. arXiv preprint arXiv:2101.10247, 2020.

[50] A. Rodriguez, N. Muralidhar, B. Adhikari, A. Tabassum, N. Ramakrishnan, and B. A. Prakash. Steering a historical disease forecasting model under a pandemic: Case of flu and covid-19. NeurIPS 2020 Machine Learning in Public Health (MLPH) Workshop, 2020.

[51] A. Rodriguez, N. Muralidhar, B. Adhikari, A. Tabassum, N. Ramakrishnan, and B. A. Prakash. Steering a historical disease forecasting model under a pandemic: Case of flu and covid-19. Proceedings of the AAAI Conference on Artificial Intelligence, 2021.

[52] A. Rodriguez, A. Tabassum, J. Cui, J. Xie, J. Ho, P. Agarwal, B. Adhikari, and B. A. Prakash. Deep- COVID: An operational deep learning-driven framework for explainable real-time COVID-19 forecasting. In Proc. AAAI Conference on Artificial Intelligence, 2021.

[53] D. J. Rosenkrantz, M. V. Marathe, S. S. Ravi, and R. E. Stearns. Synchronous dynamical systems on directed acyclic graphs: Complexity and algorithms. In Proc. AAAI-2021 (to appear), 2021. 9 pages.

[54] C. Ruiz, M. Zitnik, and J. Leskovec. Identification of disease treatment mechanisms through the mul- tiscale interactome. bioRxiv, 2020.

[55] C. M. Saad-Roy, N. Arinaminpathy, N. S. Wingreen, S. A. Levin, J. M. Akey, and B. T. Grenfell. Implications of localized charge for human influenza a h1n1 hemagglutinin evolution: Insights from deep mutational scans. PLoS computational biology, 16(6):e1007892, 2020.

[56] C. M. Saad-Roy, B. T. Grenfell, S. A. Levin, L. Pellis, H. B. Stage, P. van den Driessche, and N. S. Wingreen. Superinfection and the evolution of an initial asymptomatic stage. Royal Society open science, 8(1):202212, 2021.

[57] C. M. Saad-Roy, S. A. Levin, C. J. E. Metcalf, and B. T. Grenfell. Trajectory of individual immunity and vaccination required for sars-cov-2 community immunity: a conceptual investigation. Journal of the Royal Society Interface, 18(175):20200683, 2021.

[58] C. M. Saad-Roy, S. E. Morris, C. J. E. Metcalf, M. J. Mina, R. E. Baker, J. Farrar, E. C. Holmes, O. Pybus, A. L. Graham, S. A. Levin, et al. Epidemiological and evolutionary considerations of sars- cov-2 vaccine dosing regimes. medRxiv, 2021. Science.

[59] C. M. Saad-Roy, C. E. Wagner, R. E. Baker, S. E. Morris, J. Farrar, A. L. Graham, S. A. Levin, M. J. Mina, C. J. E. Metcalf, and B. T. Grenfell. Immune life history, vaccination, and the dynamics of sars-cov-2 over the next 5 years. Science, 370(6518):811–818, 2020.

[60] Y. Serkez. The magic number for reducing infections and keeping businesses open, December 2020.

[61] A. Uyar, G. Gunduz, S. Kamburugamuve, P. Wickramasinghe, C. Widanage, K. Govindarajan, N. Per- era, V. Abeykoon, S. Akkas, and G. Fox. Twister2 cross-platform resource scheduler for big data, 2021.

[62] J. Vekemans, M. Hasso-Agopsowicz, G. Kang, W. P. Hausdorff, A. Fiore, E. Tayler, E. J. Klemm, R. Laxminarayan, P. Srikantiah, M. Friede, et al. Leveraging vaccines to reduce antibiotic use and prevent antimicrobial resistance: a who action framework. Clinical Infectious Diseases, 2021.

[63] C. E. Wagner, J. A. Prentice, C. M. Saad-Roy, L. Yang, B. T. Grenfell, S. A. Levin, and R. Laxmi- narayan. Economic and behavioral influencers of vaccination and antimicrobial use. Frontiers in public health, 8:975, 2020.

[64] F. Wang, C. Widanage, W. Liu, J. Li, X. Wang, H. Tang, and J. Fox. Privacy-preserving genomic computing with sgx-based big-data analytics framework. 20th IEEE International Workshop on High Performance Computational Biology (HiCOMB) [under review], 2021.

[65] L. Wang, A. Adiga, J. Chen, A. Sadilek, S. Venkatramanan, and M. Marathe. CausalGNN: Causal-based graph neural networks for spatio-temporal epidemic forecasting, 2021. Submitted.

[66] L. Wang, A. Adiga, S. Venkatramanan, J. Chen, B. Lewis, and M. Marathe. Examining deep learning models with multiple data sources for covid-19 forecasting, 2020. arXiv preprint arXiv:2010.14491.

[67] L. Wang, X. Ben, A. Adiga, A. Sadilek, A. Tendulkar, S. Venkatramanan, A. Vullikanti, G. Aggarwal, A. Talekar, J. Chen, B. Lewis, S. Swarup, A. Kapoor, M. Tambe, and M. Marathe. Using mobility data to understand and forecast covid-19 dynamics. In Proceedings of the 29th International Joint Conference on Artificial Intelligence Workshop on AI for Social Good., 2021.

[68] L. Wang, D. Ghosh, M. T. G. Diaz, A. K. Farahat, M. Alam, C. Gupta, J. Chen, and M. Marathe. Wisdom of the ensemble: Improving consistency of deep learning models. In NeurIPS, 2020.

[69] S. Weekes, P. March, M. Berger, M. Marathe, J. Crawley, C. M. Lois, A. El Bakry, R. Renaut, A. Gelb, F. Santosa, T. Grandine, P. Seshaiyer, and J. Hestheven. Report on future research directions for the national science foundation in the era of covid-19. A Report by SIAM Task Force on COVID-19, 2020. https://www.siam.org/Portals/0/reports/Report%20on%20Future% 20Research%20Directions%20for%20NSF.pdf?ver=2020-12-10-144744-750.

[70] Y. Wei and X. Deng. A simple gaussian process modeling approach for experiments with quantitative- sequence factors. Submitted, 2021.

[71] C. Widanage, N. Perera, V. Abeykoon, S. Kamburugamuve, T. A. Kanewala, H. Maithree, P. Wickramasinghe, A. Uyar, G. Gunduz, and G. Fox. High performance data engineering everywhere. In 2020 IEEE International Conference on Smart Data Services (SMDS), pages 122–132. IEEE, 2020.

[72] Q. Xiao, Y. Wang, A. Mandal, and X. Deng. Modeling and active learning for experiments with quantitative-sequence factors, revision for journal of the american statistical association. Submitted, 2021.

[73] O. Yagan, A. Sridhar, R. Eletreby, S. A. Levin, J. B. Plotkin, and H. V. Poor. Modeling and analysis of the spread of covid-19 under a multiple-strain model with mutations. Harvard Data Science Review, 2021.

[74] A. Yalaman, G. Basbug, C. Elgin, and A. P. Galvani. Cross-country evidence on the association between contact tracing and covid-19 case fatality rates. Scientific reports, 11(1):1–6, 2021.