Education
Development and effectiveness verification of AI education data sets based on constructivist learning principles for enhancing AI literacy
Analysis of requirements for AI education dataset
To analyze requirements for AI education datasets, we first investigated current dataset usage trends. The UCI ML Repository, which provides various types of datasets for AI modeling research, offers 664 different datasets. As shown in Fig. 3, users can check information such as appropriate modeling algorithms, number of variables, and access frequency. Based on access frequency, the most frequently used datasets were identified as ‘Iris’, ‘Dry Bean Dataset’, ‘Heart Disease’, ‘Rice’, and ‘Adult’34.
To analyze the usage of specific datasets for educational purposes, we examined the dataset usage status in “Entry,” South Korea’s representative educational programming language that provides practical functions for AI education. As shown in Fig. 4, Entry is a visual programming language. This visual programming language helps reduce the difficulties associated with learning syntax and maintains students’ interest while they understand and learn the basic concepts of AI35,36. Additionally, Entry provides basic resources for AI education and offers 19 datasets through the “Data Analysis” feature. In the Entry platform (https://playentry.org), you can view datasets by clicking the following menu options sequentially: [Create] – [Analyze data] – [Load tables] – [Add tables] – [Select tables]. It also enables AI modeling practices such as data visualization, linear regression, binary classification, multi-class classification, and clustering, and provides datasets with specified purposes for AI modeling, such as ‘Iris’, ‘Boston Housing’, ‘Palmer Penguins’, and ‘Titanic’37.
We analyzed program outputs utilizing AI modeling features and datasets in Entry between December 31, 2020, and December 31, 2021, deriving the total dataset usage count from these artifacts. To analyze overall usage patterns, we visualized the utilization status of the top 10 most frequently used datasets as shown in Fig. 5.
The top 10 most frequently used datasets in Entry were identified as ‘Iris’, ‘Population by City’, ‘Boston Housing’, followed by ‘Consumer Price Index’ in descending order. Detailed analysis of the visualized chart for AI modeling datasets reveals consistent usage of Iris and Boston Housing datasets, with Iris being used 7499 times and Boston Housing 6619 times during the study period. This figure demonstrates significantly higher usage compared to other AI modeling datasets that failed to rank within the top 10.
Both the UCI ML repository (a platform providing datasets for AI modeling) and Entry (an educational programming language platform) showed Iris as the most utilized dataset. The Iris dataset, composed of continuous independent variables and categorical dependent variables, is particularly suitable for multiclass classification tasks. Its high usage frequency suggests it serves as a representative dataset for AI modeling practice.
While widely-used datasets for AI modeling practice offer the advantage of easily accessible examples for various AI modeling and computing activities, they exhibit limitations including lack of relevance to students’ daily lives, difficulty in connecting to real-world contexts, and inability to provide authentic practical experiences. Notably, the Boston dataset has been discontinued in major machine learning libraries like scikit-learn due to ethical concerns, necessitating the development of alternative datasets37.
Design and implement for AI education datasets
This study develops datasets for AI education by benchmarking Entry, a widely adopted educational programming platform with significant classroom impact. Through analysis of Entry’s technical specifications, we established requirements for datasets suitable for supervised and unsupervised learning implementations, as detailed in Table 10. The platform supports essential modeling algorithms including Linear Regression, Logistic Regression, k-Nearest Neighbors (kNN), Support Vector Machines (SVM), Classification and Regression Trees (CART), and Clustering. For model configuration, users can define up to 6 continuous variables as independent features, while dependent variables may incorporate either continuous or categorical data types depending on the learning task.
To provide AI educational datasets contextualized to students’ daily lives, we explored and structured preliminary dataset drafts as shown in Table 11, incorporating contextual frameworks from PISA 2022 Mathematics. We investigated public data platforms and diverse dataset sources while verifying appropriate variable inclusion for respective modeling methods. All datasets explicitly specify applicable licenses for educational purposes, with particular emphasis on exploring publicly available data centered around daily life topics likely to engage student interest. For certain datasets requiring specialized context, researchers directly collected and structured original data to complete the draft dataset compositions.
The draft datasets were systematically restructured according to AI modeling methodologies and educational objectives to facilitate effective utilization in AI education, as detailed in Table 12.
The datasets were primarily restructured according to AI modeling methods and educational objectives. For Linear Regression datasets, it was necessary to ensure completeness by specifying independent and dependent variables. The Seoul Mosquito Activity Index and Synoptic Meteorological Observation datasets were joined by date after establishing variable relationships through synthesis of prior research on meteorological environments and mosquito populations42,43. Key variables were extracted and reorganized to enhance student comprehension and provide successful modeling experiences16.
The baseball game results dataset was collected from sports information websites, then completely synthesized using statistical simulation methods based on original data to minimize team-specific bias and enhance objectivity44. The dataset was further restructured by extracting key variables influencing the dependent variable for student accessibility.
The body measurements and t-shirt size dataset, initially collected from students, was replaced with a fully synthetic version to address privacy concerns and improve size appropriateness through synthetic data generation techniques44. This approach enhanced objectivity while resolving issues with limited original data scale.
The earthquake location dataset was restructured by removing entries below magnitude 2, which are classified as non-impactful seismic events based on domain expertise, to improve size appropriateness41.
Testing AI education dataset
Experts review
The evaluation results of the draft datasets through data quality assessment and authentic activity characteristics analysis, along with group interview findings, are summarized in Table 13.
Regarding overall feedback on the datasets, experts frequently noted that the developed AI education datasets showed high applicability due to their relevance to students’ daily lives, while emphasizing the need to provide concrete usage examples. Several reviewers suggested intentionally incorporating elements like data preprocessing activities to encourage diverse approaches and outcome variations among students.
The detailed specifications of the finalized AI education datasets, reflecting expert interview outcomes, are presented in Table 14.
In the ‘Mosquito activity index’ dataset, some data fields were found to contain uniformly input values during the data collection process. While some experts recommended preprocessing these values before providing the dataset to students, we ultimately preserved the uniformly input values to facilitate practical data preprocessing exercises in educational settings44.
For the ‘Baseball game results’ dataset, we removed the ‘Team name’ column containing categorical information and revised variable names based on expert recommendations to enhance student comprehension.
The ‘T-shirt sizes’ dataset was recognized as particularly suitable for introductory AI education, especially for transparent understanding of decision tree models. Experts noted that variables like BMI showed high correlation with other factors, potentially causing multicollinearity issues if used as independent variables. Since addressing this through preprocessing might exceed students’ current capabilities, we simplified the dataset to essential ‘Height’ and ‘Weight’ variables. Additionally, we modified some data points to create overlapping size categories, addressing concerns about excessive model accuracy from overly distinct clusters.
For the ‘Earthquake information’ dataset, we addressed structural simplicity concerns by reintroducing preprocessed data points below magnitude 2 (previously excluded) and structuring the dataset to demonstrate clustering differences through preprocessing activities.
Review AI modeling accuracy
The AI education datasets developed through a constructivist lens, which are closely connected to students’ daily lives, must be effectively utilized for their intended educational purposes and should ultimately lead to the development of integrated intelligent systems as tangible outcomes of learners’ computational activities22. Prior to implementation, it is essential to evaluate the accuracy and usability of outputs – key factors that often hinder effective education using real-world data19. To address this, we conducted comprehensive testing of the developed datasets through modeling and evaluation using appropriate performance metrics including accuracy measures.
The ‘Mosquito activity index’ dataset is designed for linear regression analysis using continuous dependent and independent variables. In the Entry programming environment, setting one dependent and one independent variable allows visual confirmation of results, significantly enhancing students’ understanding of AI modeling principles. We selected ‘average mosquito activity index’ as the dependent variable and ‘average ground temperature’ as the independent variable based on their statistically significant correlation, implementing the model using Scikit-Learn’s LinearRegression. To validate the modeling results, we visualized the data and regression line as shown in Fig. 6, reserving 20% of the data for testing. We employed Mean Squared Error (MSE) and R-squared (R²) values, standard metrics for linear regression accuracy assessment, with results detailed in Table 15. Notably, we compared model accuracy between the original dataset containing uniformly input values and its preprocessed version to validate our initial dataset construction rationale.
The comparative analysis revealed enhanced performance on test data after preprocessing, demonstrating that the refined linear model exhibits greater generalizability and explanatory power (R²-Test = 0.81). This dataset’s structure allows for modeling with various combinations of two or more independent variables, enabling comparative analysis of results and encouraging diverse student outcomes through multiple analytical approaches. These characteristics confirm the dataset’s effectiveness for both linear regression applications and comprehensive education about regression techniques, including preprocessing considerations.
The ‘Baseball game results’ dataset proves suitable for binary classification tasks using various dependent variables to predict game outcomes. Within the Entry programming environment, we implemented binary classification through TensorFlow, offering optional use of Adam Optimizer or SGD (Stochastic Gradient Descent) Optimizer. Through correlation analysis and variance inflation factor examination, we selected six independent variables (‘runs scored’, ‘triples’, ‘home runs’, ‘stolen bases’, ‘strikeouts’, and ‘double plays’) while excluding those showing multicollinearity. Using Keras framework, we constructed a neural network comprising a single fully connected layer with 32 neurons and a Sigmoid activation function. We evaluated both optimization approaches by reserving 20% of data for testing, visualizing accuracy/loss trajectories in Fig. 7. To ensure rigorous validation, we employed comprehensive metrics including accuracy, precision, recall, and F1-Score, supplemented by averaged results from 1,000 iterative modeling trials as detailed in Table 16.
The analysis revealed consistently high accuracy across all available optimization functions in the Entry programming environment. The 1,000 iterative measurements demonstrated robust mean accuracy with low standard deviation, confirming the dataset’s effectiveness for teaching binary classification concepts while allowing students to freely configure independent variables and explore diverse modeling approaches.
The ‘T-shirt sizes’ dataset is optimized for multiclass classification using categorical dependent variables. The Entry environment implements CART (Classification and Regression Tree) methodology for this purpose, where we designate the categorical ‘t-shirt size’ variable as the dependent feature and reserve 20% of data for testing. Using Scikit-Learn’s DecisionTreeClassifier, we established modeling parameters by setting the minimum leaf node count to 5 (matching the unique category count in the dependent variable) and systematically increasing maximum tree depth from 1 to 10. For each depth configuration, we performed 1,000 modeling iterations to calculate mean, maximum, and minimum accuracy values, as detailed in Table 17.
Analysis of the decision tree models revealed that maximum tree depth plateaued at 7, with no further depth increases observed beyond this threshold. Accuracy evaluation demonstrated two distinct patterns: shallow trees (depth = 1) showed limited classification capability across all dependent variables (accuracy = 0.42), while deeper configurations (depth ≥ 4) achieved peak performance (accuracy = 0.87). This progression confirms the ‘T-shirt sizes’ dataset’s effectiveness for teaching decision tree principles and implementing multiclass classification models.
The ‘Earthquake information’ dataset serves as an unsupervised learning resource featuring magnitude estimates for seismic intensity and geospatial coordinates (latitude/longitude) for cluster analysis. Using the Entry platform’s k-Means implementation with Scikit-Learn, we conducted cluster modeling experiments with varying group quantities (2–9 clusters). To objectively determine optimal clustering, we calculated inertia values—the sum of squared distances between cluster centers and their member points. We performed comparative analysis using both raw data and a preprocessed subset containing only seismically significant events (magnitude ≥ 2.0), with visualization results shown in Fig. 8.
Visual analysis of inertia values revealed distinct clustering patterns and centroid positions between preprocessed and raw data when using 5–7 clusters. This demonstrates the dataset’s educational value for implementing AI-driven decision-making processes in classroom settings, as students can critically compare different clustering outcomes. The dataset’s effectiveness for cluster modeling education was thereby confirmed.
Maintanance AI education dataset
To enhance accessibility and educational utility of the developed datasets, we implemented distribution through the Entry programming platform following standardized procedures. As shown in Fig. 9, educators and students can access datasets through Entry’s practice interface using the workflow: [Table] → [Load Data Table] → [Add Table], ensuring consistency with other educational datasets available on the platform.
The dataset interface incorporates expert recommendations from the testing phase, particularly addressing dataset quality assessment and practical application requirements. As demonstrated in Fig. 10, each dataset includes: (1) Basic description, (2) Key variable explanations, (3) Column/row metadata, and (4) Usage examples—implementing expert guidance that “datasets should be easily understandable from a quality assessment perspective” and “must enable creation of complete outputs reflecting real-world activities”36.
We established a maintenance framework featuring multiple feedback channels: an integrated bulletin board within Entry and a dedicated web portal with usage guides. This infrastructure allows users to submit improvement suggestions, which researchers can implement through collaborative review with the Connect Foundation (Entry’s governing organization). Approved modifications undergo immediate integration into the programming environment through automated deployment pipelines.
Education
It is this government’s moral mission to give every child in Britain the best start in life | Bridget Phillipson
Like many young mothers, Jenna was unsure where to start. But that’s where her local family support service came in. Offering breastfeeding advice, a space to come together with other parents and for her son Billy to play with other babies, it reassured Jenna that she was on the right track – and crucially, that Billy was set up to achieve when he got to school.
Jenna’s service was the first of Labour’s renowned Sure Start centres in Washington, my home town in north-east England. I knew it well: before becoming an MP I ran a refuge nearby for women fleeing domestic violence. I linked up the women who used our refuge with Sure Start. It was a lifeline for those women who, despite everything, were determined to give their children the very best start in life.
But, sadly, after 14 years of Conservative government, stories like Jenna’s, and those of the many women who were offered that lifeline, are much less common. Funding was stripped out of Sure Start centres and services scrapped in rebranded family hubs. Today, 65 councils, and the children and families who live under their authority, have missed out on recent funding. Many more are lacking the childcare places that so many families in our country need.
For every Jenna, there are a host of other young mothers, and families, who missed out on crucial pillars of support, whose children have fallen behind before they have even started school.
One in three five-year-olds enters year 1 without the basic skills – like holding a pencil and writing their own name – that they need to make the most of what education has to offer them. Some haven’t reached essential milestones such as putting on a coat or going to the toilet by themselves.
For the most vulnerable children, the situation is graver. Just over half of those eligible for free school meals reach a good level of development at age five. For children in social care, it’s just over one in three. And for children with special educational needs, it’s one in five.
The gap in achievement we see between our poorest and most affluent children at 16 is baked in before they even start school, creating a vicious cycle of lost life chances that’s all too visible in the shameful number of young people not earning or learning.
It’s this government’s moral mission to bridge that gap, but to do it we must build an education system where all children can achieve and thrive, starting from day one.
That is why reforming the early years education system is my number one priority. And it’s why, just 12 months after Labour entered government, I am so proud to be setting out our strategy to give every child the best start in life.
Backed by £1.5bn over the next three years, it brings together the best of Sure Start, health services, community groups and the early years sector, with the shared goal of setting up children to succeed when they get to school.
We will create 1,000 Best Start Family Hubs, at least one in every council area, invest a record £9bn in funded childcare and early years places – and hundreds of millions to improve quality in early years settings and reception classes.
These hubs will bring disjointed support systems into one place, allowing thousands of families to access help with anything from birth registration to breastfeeding, from housing support to children’s speech and language development.
The strategy takes inspiration from around the world. I’ve been really impressed by what happens in countries I’ve visited, such as Estonia, where early education and family support are bound tightly together with stellar results. Its disadvantage gap is negligible because children get to school ready to learn. Its children outperform those from much larger, wealthier countries in international rankings. The country punches above its weight economically as a result.
At the heart of our strategy is the recognition that for our country to succeed in a fast-changing world, it is not enough for only some children to do well in education: every child must have the opportunity and the tools not just to get by, but to get on in life.
Working people have always known that education is the best way to break the link between their background and what they go on to achieve, the route to prosperity not just for individuals, but for all of society. It’s a common thread that runs through every Labour government: that we must use education to spread the freedoms that today too few enjoy, so that tomorrow they are common to us all.
It’s the essence of our politics, the socialism of extending freedom to allow working people to choose their own path to fulfilment: to get better employment, to achieve a better quality of life or even to start a family.
This strategy is a watershed moment for our government, but more importantly for every single family who needs our support. To make it a reality, we will begin unprecedented collaboration between parents, councils, nurseries, childminders, schools and government, enmeshing family support, early education and childcare so deeply that no rightwing government can ever unpick it, as the Tories did with Sure Start over 14 long years.
Our plan for change will ensure Jenna’s experience – and Billy’s future success – is shared by every family and every child in our country.
Education
Labour vows to protect Sure Start-type system from any future Reform assault | Children
Labour will aim to embed a Sure Start-type system of help for deprived children and families so deeply and completely into the state that a future Reform or Conservative government would not be able to dismantle it, Bridget Phillipson has pledged.
Arguing that efforts to close the attainment gap between poorer and richer children was the government’s “moral mission”, the education secretary promised to build on this weekend’s announcement of a new wave of family hubs across England, an effective successor to Sure Start.
Sure Start, a network of centres offering integrated services for the under-fives and their families, launched in 1998 under the last Labour government, and was seen as one of its major successes, with one study saying it generated longer-term savings worth twice the system’s cost.
But much of Sure Start was dismantled amid massive spending cuts by the Conservatives. The new policy of family hubs will commit £500m to opening 1,000 centres from April 2026.
In an article for the Guardian, Phillipson said the centres should become part of a wider network of help for families, one that would not just be impossible to take apart, but that would become so popular that they would become an untouchable “third rail” of British politics.
The family hubs strategy was “a watershed moment” for both government and families, Phillipson wrote.
She went on: “To make it a reality we will begin unprecedented collaboration between parents, councils, nurseries, childminders, schools and government, enmeshing family support, early education, and childcare so deeply that no rightwing government can ever unpick it, as the Tories did with Sure Start over 14 long years.
“We will ensure any such assault on the system will become the new third rail of British politics.”
In a follow-up announcement to the plan for family hub centres, which are intended to be created in every council area in England by 2028, Phillipson’s department has also announced plans to pay qualified early years teachers to work in the most deprived areas, where their work could have the greatest impact.
Currently, the Department for Education says, just one in 10 nurseries have a qualified early years teacher. The incentive scheme will involve a tax-free payment of £4,500 to early years teachers who take a job in a nursery in one of the 20 most disadvantaged communities in England.
In another change, the education watchdog Ofsted will inspect any new early years providers within 18 months of opening, with subsequent inspections taking place at least once every four years, rather than the current six.
after newsletter promotion
Sure Start and its successor programmes have a near-totemic role in the narrative of the modern Labour party, with Angela Rayner, its deputy leader, saying her life as a teenage mother and that of her son were turned around by her local centre, which offered her a parenting course.
In her Guardian article, Phillipson recounted working closely with the first-ever Sure Start centre in Washington, Tyne and Wear, when she ran a refuge for women fleeing domestic violence, before she entered politics.
“It was a lifeline for those women who, despite everything, were determined to give their children the very best start in life,” she wrote. “The gap in achievement we see between our poorest and most affluent children at 16 is baked in before they even start school, creating a vicious cycle of lost life chances that’s all too visible in the shameful number of young people not earning or learning.”
Speaking in interviews on Sunday morning, Phillipson said Labour was also committed to tackling child poverty, but said the fiscal cost of Downing Street’s U-turn on changes to welfare last week would make it harder to implement other policies such as potentially scrapping the two-child benefit cap.
Education
America’s future depends on more first-generation students from underestimated communities earning an affordable bachelor’s degree
I recently stood before hundreds of young people in California’s Central Valley; more than 60 percent were on that day becoming the first in their family to earn a bachelor’s degree.
Their very presence at University of California, Merced’s spring commencement ceremony disrupted a major narrative in our nation about who college is for — and the value of a degree.
Many of these young people arrived already balancing jobs, caregiving responsibilities and family obligations. Many were Pell Grant-eligible and came from communities that are constantly underestimated and where a higher education experience is a rarity.
These students graduated college at a critical moment in American history: a time when the value of a bachelor’s degree is being called into question, when public trust in higher education is vulnerable and when supports for first-generation college students are eroding. Yet an affordable bachelor’s degree remains the No. 1 lever for financial, professional and social mobility in this country.
Related: Interested in innovations in higher education? Subscribe to our free biweekly higher education newsletter.
A recent Gallup poll showed that the number of Americans who have a great deal of confidence in higher education is dwindling, with a nearly equal amount responding that they have little to none. In 2015, when Gallup first asked this question, those expressing confidence outnumbered those without by nearly six to one.
There is no doubt that higher education must continue to evolve — to be more accessible, more relevant and more affordable — but the impact of a bachelor’s degree remains undeniable.
And the bigger truth is this: America’s long-term strength — its economic competitiveness, its innovation pipeline, its social fabric — depends on whether we invest in the education of the young people who reflect the future of this country.
There are many challenges for today’s workforce, from a shrinking talent pipeline to growing demands in STEM, healthcare and the public sector. These challenges can’t be solved unless we ensure that more first-generation students and those from underserved communities earn their degrees in affordable ways and leverage their strengths in ways they feel have purpose.
Those of us in education must create conditions in which students’ talent is met with opportunity and higher education institutions demonstrate that they believe in the potential of every student who comes to their campuses to learn.
UC Merced is a fantastic example of what this can look like. The youngest institution in the California University system, it was recently designated a top-tier “R1” research university. At the same time, it earned a spot on Carnegie’s list of “Opportunity Colleges and Universities,” a new classification that recognizes institutions based on the success of their students and alumni. It is one of only 21 institutions in the country to be nationally ranked for both elite research and student success and is proving that excellence and equity can — and must — go hand in hand.
In too many cases, students who make it to college campuses are asked to navigate an educational experience that wasn’t built with their lived experiences and dreams in mind. In fact, only 24 percent of first-generation college students earn a bachelor’s degree in six years, compared to nearly 59 percent of students who have a parent with a bachelor’s. This results in not just a missed opportunity for individual first-generation students — it’s a collective loss for our country.
Related: To better serve first-generation students, expand the definition
The graduates I spoke to in the Central Valley that day will become future engineers, climate scientists, public health leaders, artists and educators. Their bachelor’s degrees equip them with critical thinking skills, confidence and the emotional intelligence needed to lead in an increasingly complex world.
Their future success will be an equal reflection of their education and the qualities they already possess as first-generation college graduates: persistence, focus and unwavering drive. Because of this combination, they will be the greatest contributors to the future of work in our nation.
This is a reality I know well. As the Brooklyn-born daughter of Dominican immigrants, I never planned to go away from home to a four-year college. My father drove a taxi, and my mother worked in a factory. I was the first in my family to earn a bachelor’s degree. I attended college as part of an experimental program to get kids from neighborhoods like mine into “top” schools. When it was time for me to leave for college, my mother and I boarded a bus with five other students and their moms for a 26-hour ride to Vanderbilt University in Nashville, Tennessee.
Like so many first-generation college students, I carried with me the dreams and sacrifices of my family and community. I had one suitcase, a box of belongings and no idea what to expect at a place I’d never been to before. That trip — and the bachelor’s degree I earned — changed the course of my life.
First-generation college students from underserved communities reflect the future of America. Their success is proof that the American Dream is not only alive but thriving. And right now, the stakes are national, and they are high.
That is why we must collectively remove the obstacles to first-generation students’ individual success and our collective success as a nation. That’s the narrative that we need to keep writing — together.
Shirley M. Collado is president emerita at Ithaca College and the president and CEO of College Track, a college completion program dedicated to democratizing potential among first-generation college students from underserved communities.
Contact the opinion editor at opinion@hechingerreport.org.
This story about first-generation students was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Hechinger’s weekly newsletter.
-
Funding & Business6 days ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers6 days ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions6 days ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business6 days ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers6 days ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Funding & Business3 days ago
Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%
-
Funding & Business6 days ago
From chatbots to collaborators: How AI agents are reshaping enterprise work
-
Jobs & Careers4 days ago
Ilya Sutskever Takes Over as CEO of Safe Superintelligence After Daniel Gross’s Exit
-
Funding & Business4 days ago
Dust hits $6M ARR helping enterprises build AI agents that actually do stuff instead of just talking
-
Jobs & Careers6 days ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle