OERC in Action: Lina Osorio-Copete and Tian Lou and the Applied Data Analytics Training Program
October 9, 2020
Since 2019, the Ohio Education Research Center (OERC) has partnered with the Coleridge Initiative to host the Coleridge Initiative’s Applied Data Analytics Training program. The Applied Data Analytics Training program focuses on increasing public employees’ capacity to skillfully use workforce and postsecondary education administrative records, like those contained in the Ohio Longitudinal Data Archive, to solve public policy problems. The training program accomplishes this goal by assembling a diverse group of public agency personnel and data analysts and then teaching them how to apply data analytic techniques and tools to solve real problems with actual administrative data.
In the course of the partnership between the OERC and the Coleridge Initiative, several members of the OERC team have dedicated their time and expertise to administering the training programs. Two team members in particular have applied their coding skills, experience as data analysts, and subject matter expertise as instructors for classes delivered at OSU and in other states.
Meet the OERC’s Lina Osorio-Copete and Tian Lou
Lina Osorio-Copete (pictured to the left above) is a research associate with the Ohio Education Research Center. She currently oversees research and data analyses using the Ohio Longitudinal Data Archive. She works at the intersection of urban policy, housing, and workforce development programs to support OERC partners' policy-making discussions. Before joining the OERC, Lina worked at the World Bank implementing financial modeling to support renewable energy projects, and as a graduate consultant applying life cycle analysis to quantify costs of pollution to the DCNR, an environmental agency located in the Pittsburgh area. Furthermore, she served as a senior adviser to the Colombian Ministry of Finance. Ms. Osorio-Copete earned her MS in Public Policy and Management from Carnegie Mellon University in 2018 and holds an MA in Economics from Universidad de Los Andes in Bogota, Colombia.
Tian Lou (pictured on the right above) is a Post Doctoral Scholar at the John Glenn College of Public Affairs and the Center for Human Resource Research at The Ohio State University. Tian is a labor economist and has been doing research on unemployment, job training and welfare programs, education, immigrants, and social networks.
Tian’s job at OERC is to conduct research by using econometric models and administrative data to answer questions about the Ohio workforce. Currently, Tian is leading the analysis of Ohio UI claims data and has designed a dashboard to inform policymakers about the jobs lost in Ohio since the breakout of COVID-19. Tian has also worked on the encouragement randomization experiment of Comprehensive Case Management and Employment programs (CCMEP) and has evaluated whether the program improves participants’ labor market and education outcomes. Another program evaluation Tian has worked on is Wage Pathways, which aims to improve participants’ labor market outcomes by encouraging them to work consistently with cash incentives. In 2018, Tian received funding from Department of Labor Scholar Program for her research on registered apprenticeships.
Lina and Tian first began working on the Applied Data Analytics Training program hosted by OSU in the spring of 2019 and have since participated as instructors for the OSU Spring 2020 class and the Kentucky Summer 2020 program. I had the pleasure of interviewing them to discuss their role as instructors and their experiences with the program. After reading the interview, if you’d like to learn more about the Applied Data Analytics Training program or apply to be a trainee in the program, you can visit the Applied Data Analytics Training website to find more information.
1. What is the Coleridge Initiative Applied Data Analytics Training
Lina: The Coleridge training is a two week long technical training directed to data analysts working with administrative data. During each exercise, around 30 trainees receive lectures on data analysis techniques and machine learning techniques. Parallel to those lectures, students work on smaller teams on a research project that allows them to apply the methods they learned using public administrative data on higher education and employment.
Tian: Coleridge Initiative training teaches people advanced data analytical skills in SQL, Python and R and provides them opportunities to apply these skills to administrative data. Trainees work in groups and collaborate with people from different workplaces (university, research center, federal, state or local government) and even different states. They come up with questions related to public policy and use the techniques they learn during the training to conduct analysis and answer their questions. Each training has different topics and uses different data. In the OSU class held this spring, we focused on using Ohio administrative data to analyze community college and technical school graduates’ labor market outcomes.
2. What was your role as a Coleridge Initiative Applied Data Analytics Training instructor?
Lina: As a Coleridge instructor, I contributed by preparing the training content and working closely with trainees' teams to guide them to complete their final research projects. The training materials I created are Jupyter notebooks on which I used Python code. Some of the topics covered on those notebooks are data exploration, data visualization, machine learning applications, and inference and data imputation.
Tian: As an instructor, I wrote some of the class materials, gave presentations, and guided groups to finish their projects. During the training, we provided trainees a project template to show the steps they should follow to develop their research questions, conduct analysis, and plan for their own projects. We also provided them Jupyter Notebooks which show them example codes, detailed explanations about the data and analysis methods, and visualizations. For example, in the OSU class, we showed trainees how to use Ohio Higher Education data and Unemployment Insurance data to calculate community college graduates’ post-graduation earnings and how their earnings vary by degrees, institution locations, etc. We also show them more advanced analysis methods, such as how to use unsupervised machine learning models to categorize Ohio employers.
Between the lectures, trainees can discuss their projects and explore data. When the class was held in person, instructors walked around in the classroom and answered groups’ questions about data, codes, and research. Since the class was moved online, each instructor is assigned to a group and leads that group’s discussions.
Each Coleridge training consists two to three sessions. In between sessions, we meet with groups every one or two weeks to check on their project progress and answer their questions.
If you are interested in the materials that Lina and Tian produced, you can find them here: https://github.com/Coleridge-Initiative/ada-2020-osu
3. What is the value of the Applied Data Analytics Training program for individuals and their organizations?
Lina: The Coleridge training groups experts in different areas, giving trainees a unique opportunity to amplify their network while upscaling their technical skills. Students learn a new programming language and data analysis techniques applied to their areas of expertise. They can bring those new skills to their organizations, given that all notebooks and class materials are shared on a GitHub data repository of free access.
Tian: Administrative data is undoubtedly valuable for research and public policy, but it can be tricky to use if you don’t realize its potential and/or limitations. This training is not limited to data analytical skills. It provides a holistic view of administrative data and facilitates people in different positions to think about how they can improve the quality of the data and the product of the data. For example, the Ohio Unemployment Insurance (UI) wage records include employment information of most workers in Ohio. We usually use it to analyze different groups of workers’ earnings and employment. However, it’s important to keep in mind that the data we use does not cover out-of-state employment, self-employment, federal employees and independent contractors. For another example, we usually use short-term post-graduation earnings to assess college graduates’ labor market outcomes. However, the quality of their employers could also impact their long-term career development and life-long earnings. In the OSU training, we showed people an innovative way to assess employers’ qualities. We used unsupervised machine learning model to cluster Ohio employers and evaluate employers in each cluster by looking at their average characteristics, such as average payrolls, firm sizes, job separation rate, new hire rate, etc.
4. What did you learn through your involvement as an instructor?
Tian: I’ve gained new knowledge throughout the training. Professor Julia Lane’s lectures cover a wide range of topics, such as data management, data visualization, record linkage, missing data imputation, machine learning, confidentiality and privacy issues, etc. Trainees have also brought in different perspectives, research topics, and analytical methods. In some way, they are helping us to improve too. For example, in the OSU class, one group looked at how earnings vary by a student’s degree and the industry he/she works in. They showed their results by using a Sankey diagram on Tableau, which we didn’t cover in the training but is an effective and interactive way to deliver information to policymakers, researchers, and the public.
Lina: Coleridge training has been a very stimulating experience. The pace is fast; preparation of training materials required a high level of collaborative work. I had to work closely with the other instructors and learn and share my knowledge to get the training materials done. The Coleridge Initiative facilitates this teamwork dynamic because it has a data sharing infrastructure called the Administrative Data Research Facility (ADRF), a secured environment to share code, data, and communications with all the other team members, following high-quality data management practices.
5. What is the long term value of the Coleridge Initiative Applied Data Analytics Training program and the ADRF?
Tian: As I mentioned earlier, administrative data has its limitations and they may bias our research findings to some extent. The ADRF alleviates some of the limitations. For example, our labor market outcome analyses were limited to Ohio because we didn’t have access to other states’ UI wage records. The ADRF provides a secure platform to store data from different states and allows researchers to do cross-state analyses. This facilitates new research ideas and attenuates measurement errors.
Lina: The Coleridge Initiative training allows different states to share public administrative records in the same environment putting the information available for researchers to explore that data and try to answer relevant policy questions applying sophisticated data analysis techniques.
Visit the Coleridge Initiative’s training page to learn more about upcoming offerings of the Applied Data Analytics course.