Title: Contemporary Challenges in Applying Data Science to Social Sciences
Abstract: In this presentation, I delve into the dynamic field of data science as it intersects with social sciences, highlighting pertinent challenges and exploring statistical aspects alongside machine learning algorithms. I focus on modern obstacles, including the implications of dataset size, complexity in structures, attribute abundance, and potential sample bias. Emphasizing the importance of statistical considerations, I introduce the utilization of the Lasso technique for variable selection, offering a pathway to dimension reduction applicable not only within statistical frameworks but also within machine learning paradigms. I will also address a central concern: uncertainty quantification, which plays a pivotal role in the accurate interpretation of data-driven results. This concept is pivotal for enhancing the reliability of insights gained from data analysis. In terms of methodology, I contrast two prevalent approaches: the model-free strategy of nested cross-validation, widely adopted within machine learning, and the model-based inference approach, which holds a significant place in the domain of statistics. By navigating through these challenges and methodologies, my presentation offers a comprehensive perspective on the intersection of data science and social sciences, shedding light on effective strategies to harness the power of data for informed decision-making and understanding complex societal phenomena.