Synthesizing MSML networks using massive social data, crowdsourcing, and active learning methods.


    Building on our long-standing work on synthesizing social networks, we will develop new methods for estimating MSML networks. This estimation will operate on diverse global data sets; the team has developed strong relationships with multiple vendors to acquire novel data sets to achieve this goal. We will create open source synthetic data sets, applications, and tools.

    Additionally, we have developed methods for using traditional administrative data sets — our initial research has led to global synthetic contact networks. This research will be extended using new sources of anonymized mobility data, electronic health records, and smart-device data, as well as real-time data pertaining to individual and community level awareness and behavioral changes. Some of these signals are voluminous, and parsing them through crowdsourcing would be expensive and time-consuming; carefully choosing a subset of the signals that can be parsed intelligently by the “crowd” is a novel stochastic-optimization problem that generalizes standard PAC (probably approximately correct) approaches. Validation of such MSML networks is a challenge which will be addressed by extending our earlier work.

    Data figure
    Schematic diagram depicting synthesis of MSML networks from multiple data sources