The University of Tsukuba is a large comprehensive university that offers education in an extremely wide range of fields, including the humanities, life sciences, science and engineering, information science, medicine, physical education, and the arts. In each educational organization, subjects are offered to study probability and statistics, which are the foundation of data science. However, because data science skills are needed in all academic fields, and because the need for objective data-based judgment and decision-making in all administrative and industrial fields will continue to grow even after graduation, we have decided to offer Data Science (2 credits) as a required first-year subject in all schools.
The Data Science Literacy Program consists of two subjects with three credits. These are Data Science and Information Literacy (lecture) (1 credit, required in the first year) which serves as a subject to learn the basics of computer science necessary for completing the Data Science subject.
The specialized fields of the educational organizations to which the students of the University of Tsukuba belong are diverse, and the background knowledge and skills possessed by each student differ greatly. Taking this into consideration, the course curriculum of Data Science is designed with the following points in mind.
For detailed educational content of Data Science, please refer to the list of subjects offered.
In the first week, students will learn about the positioning of data science in society and its significance. Here, students will watch video lectures with practical exercises provided by researchers involved in data science in various fields at the University of Tsukuba, including the humanities, life sciences, science and engineering, information science, medicine, physical education, and the arts.
In addition, students will learn the basic knowledge of laws and regulations that they need to know when handling human-related data, such as the Act on the Protection of Personal Information and the Statistics Act, examples of human rights and privacy violations related to data use, and the concept and procedures of research ethics necessary for conducting research using human-related data.
In the second and third weeks, students will learn about data science as a whole and its lifecycle (data collection, management, and analysis). As an introduction, they will learn about types of data, data collection, data preprocessing, and data reusability.
Weeks 4-5 will cover topics such as the significance and purpose of data management, design of data collection information, separation of information structure and representation, and fundamentals of data engineering. Advanced content such as IoT, CPS, and other advanced data management as well as advanced utilization of big data will be taught through video lectures with practical exercises by instructors specializing in data engineering at the university.
In Week 6, students will learn the significance and purpose of data visualization, how to visualize data, and how to choose visual representations.
In Weeks 7-9, students will learn how to analyze data. Specifically, this includes topics such as understanding discrete variables and statistics of discrete variables, understanding and statistics of quantitative variables, causation and correlation, and analysis of complex data (time series data, network data). Depending on their field, students will also deal with statistical tests for discrete variables (chi-square test), statistical tests for quantitative variables (z-test, t-test), and linear regression, as needed.
In Week 10, students will learn about machine learning and artificial intelligence as advanced data analysis through video lectures with practical exercises by instructors specializing in the field of artificial intelligence at the university.
The number of subjects offered is 50 classes in Japanese and 1 class in English. Data Science subjects are designed as 10-week, 2-session (1 session is 75 minutes), 2-credit subjects, and all classes are held in a computer lab with one computer available for each student. Class time is 150 minutes, and the standard class schedule consists of 60 minutes of lecture, 15 minutes for quizzes using the class management system, and 75 minutes for practical exercises using a calculator. Practical exercises are also assigned as post-class assignments. Of the 150 minutes in each class, 75 minutes are allocated to teaching assistants, who are graduate students, to assist students in conducting the practical exercises.
Keeping in mind that this is a required subject for students in different schools, we have created standard teaching materials (slides), practical exercises, and quizzes (short tests) that can be used in all undergraduate courses, not only in the fields of science and engineering, but also in the humanities, physical education, arts, and other non-technical fields. The subject is designed to provide multiple practical exercises from introductory to advanced levels, allowing students the flexibility to structure the course content according to their mathematical skills. Video lectures and practical exercises are created and provided by instructors specializing in areas such as the application of data science to specific fields, advanced data management, data analysis, and more. In addition, in order to support the English-language programs offered by the University of Tsukuba, English-language materials have also been created for the standard teaching materials and the video teaching materials. Video lectures are available to the public as OpenCourseWare.
We plan to provide the created teaching materials to the Japan Inter-University Consortium for Mathematics, Data Science and AI Education to promote mathematical and data science education.
A book summarizing the educational content of "Data Science" has been published by Kodansha under the title "The First Step in Data Science(データサイエンスはじめの一歩)."
Based on our experience of introducing "Data Science" as a mandatory subject for all students since 2019, this book includes an improved version of the teaching materials along with explanations of video resources.
We encourage beginners in data science to make use of this textbook.