Connect with us

Education

Development and effectiveness verification of AI education data sets based on constructivist learning principles for enhancing AI literacy

Published

on


Analysis of requirements for AI education dataset

To analyze requirements for AI education datasets, we first investigated current dataset usage trends. The UCI ML Repository, which provides various types of datasets for AI modeling research, offers 664 different datasets. As shown in Fig. 3, users can check information such as appropriate modeling algorithms, number of variables, and access frequency. Based on access frequency, the most frequently used datasets were identified as ‘Iris’, ‘Dry Bean Dataset’, ‘Heart Disease’, ‘Rice’, and ‘Adult’34.

Fig. 3

Usage of key datasets from the UCI ML repository.

To analyze the usage of specific datasets for educational purposes, we examined the dataset usage status in “Entry,” South Korea’s representative educational programming language that provides practical functions for AI education. As shown in Fig. 4, Entry is a visual programming language. This visual programming language helps reduce the difficulties associated with learning syntax and maintains students’ interest while they understand and learn the basic concepts of AI35,36. Additionally, Entry provides basic resources for AI education and offers 19 datasets through the “Data Analysis” feature. In the Entry platform (https://playentry.org), you can view datasets by clicking the following menu options sequentially: [Create] – [Analyze data] – [Load tables] – [Add tables] – [Select tables]. It also enables AI modeling practices such as data visualization, linear regression, binary classification, multi-class classification, and clustering, and provides datasets with specified purposes for AI modeling, such as ‘Iris’, ‘Boston Housing’, ‘Palmer Penguins’, and ‘Titanic’37.

Fig. 4
figure 4

Educational programming language entry and datasets in the visual environment.

We analyzed program outputs utilizing AI modeling features and datasets in Entry between December 31, 2020, and December 31, 2021, deriving the total dataset usage count from these artifacts. To analyze overall usage patterns, we visualized the utilization status of the top 10 most frequently used datasets as shown in Fig. 5.

Fig. 5
figure 5

Usage of AI education datasets in entry.

The top 10 most frequently used datasets in Entry were identified as ‘Iris’, ‘Population by City’, ‘Boston Housing’, followed by ‘Consumer Price Index’ in descending order. Detailed analysis of the visualized chart for AI modeling datasets reveals consistent usage of Iris and Boston Housing datasets, with Iris being used 7499 times and Boston Housing 6619 times during the study period. This figure demonstrates significantly higher usage compared to other AI modeling datasets that failed to rank within the top 10.

Both the UCI ML repository (a platform providing datasets for AI modeling) and Entry (an educational programming language platform) showed Iris as the most utilized dataset. The Iris dataset, composed of continuous independent variables and categorical dependent variables, is particularly suitable for multiclass classification tasks. Its high usage frequency suggests it serves as a representative dataset for AI modeling practice.

While widely-used datasets for AI modeling practice offer the advantage of easily accessible examples for various AI modeling and computing activities, they exhibit limitations including lack of relevance to students’ daily lives, difficulty in connecting to real-world contexts, and inability to provide authentic practical experiences. Notably, the Boston dataset has been discontinued in major machine learning libraries like scikit-learn due to ethical concerns, necessitating the development of alternative datasets37.

Design and implement for AI education datasets

This study develops datasets for AI education by benchmarking Entry, a widely adopted educational programming platform with significant classroom impact. Through analysis of Entry’s technical specifications, we established requirements for datasets suitable for supervised and unsupervised learning implementations, as detailed in Table 10. The platform supports essential modeling algorithms including Linear Regression, Logistic Regression, k-Nearest Neighbors (kNN), Support Vector Machines (SVM), Classification and Regression Trees (CART), and Clustering. For model configuration, users can define up to 6 continuous variables as independent features, while dependent variables may incorporate either continuous or categorical data types depending on the learning task.

Table 10 Requirements of modeling methods and variables available in entry.

To provide AI educational datasets contextualized to students’ daily lives, we explored and structured preliminary dataset drafts as shown in Table 11, incorporating contextual frameworks from PISA 2022 Mathematics. We investigated public data platforms and diverse dataset sources while verifying appropriate variable inclusion for respective modeling methods. All datasets explicitly specify applicable licenses for educational purposes, with particular emphasis on exploring publicly available data centered around daily life topics likely to engage student interest. For certain datasets requiring specialized context, researchers directly collected and structured original data to complete the draft dataset compositions.

Table 11 Details of dataset development draft.

The draft datasets were systematically restructured according to AI modeling methodologies and educational objectives to facilitate effective utilization in AI education, as detailed in Table 12.

The datasets were primarily restructured according to AI modeling methods and educational objectives. For Linear Regression datasets, it was necessary to ensure completeness by specifying independent and dependent variables. The Seoul Mosquito Activity Index and Synoptic Meteorological Observation datasets were joined by date after establishing variable relationships through synthesis of prior research on meteorological environments and mosquito populations42,43. Key variables were extracted and reorganized to enhance student comprehension and provide successful modeling experiences16.

The baseball game results dataset was collected from sports information websites, then completely synthesized using statistical simulation methods based on original data to minimize team-specific bias and enhance objectivity44. The dataset was further restructured by extracting key variables influencing the dependent variable for student accessibility.

The body measurements and t-shirt size dataset, initially collected from students, was replaced with a fully synthetic version to address privacy concerns and improve size appropriateness through synthetic data generation techniques44. This approach enhanced objectivity while resolving issues with limited original data scale.

The earthquake location dataset was restructured by removing entries below magnitude 2, which are classified as non-impactful seismic events based on domain expertise, to improve size appropriateness41.

Table 12 Dataset reconstruction metrics and methods.

Testing AI education dataset

Experts review

The evaluation results of the draft datasets through data quality assessment and authentic activity characteristics analysis, along with group interview findings, are summarized in Table 13.

Table 13 Expert interview key comments.

Regarding overall feedback on the datasets, experts frequently noted that the developed AI education datasets showed high applicability due to their relevance to students’ daily lives, while emphasizing the need to provide concrete usage examples. Several reviewers suggested intentionally incorporating elements like data preprocessing activities to encourage diverse approaches and outcome variations among students.

The detailed specifications of the finalized AI education datasets, reflecting expert interview outcomes, are presented in Table 14.

In the ‘Mosquito activity index’ dataset, some data fields were found to contain uniformly input values during the data collection process. While some experts recommended preprocessing these values before providing the dataset to students, we ultimately preserved the uniformly input values to facilitate practical data preprocessing exercises in educational settings44.

For the ‘Baseball game results’ dataset, we removed the ‘Team name’ column containing categorical information and revised variable names based on expert recommendations to enhance student comprehension.

The ‘T-shirt sizes’ dataset was recognized as particularly suitable for introductory AI education, especially for transparent understanding of decision tree models. Experts noted that variables like BMI showed high correlation with other factors, potentially causing multicollinearity issues if used as independent variables. Since addressing this through preprocessing might exceed students’ current capabilities, we simplified the dataset to essential ‘Height’ and ‘Weight’ variables. Additionally, we modified some data points to create overlapping size categories, addressing concerns about excessive model accuracy from overly distinct clusters.

For the ‘Earthquake information’ dataset, we addressed structural simplicity concerns by reintroducing preprocessed data points below magnitude 2 (previously excluded) and structuring the dataset to demonstrate clustering differences through preprocessing activities.

Table 14 Finalized AI education dataset.

Review AI modeling accuracy

The AI education datasets developed through a constructivist lens, which are closely connected to students’ daily lives, must be effectively utilized for their intended educational purposes and should ultimately lead to the development of integrated intelligent systems as tangible outcomes of learners’ computational activities22. Prior to implementation, it is essential to evaluate the accuracy and usability of outputs – key factors that often hinder effective education using real-world data19. To address this, we conducted comprehensive testing of the developed datasets through modeling and evaluation using appropriate performance metrics including accuracy measures.

The ‘Mosquito activity index’ dataset is designed for linear regression analysis using continuous dependent and independent variables. In the Entry programming environment, setting one dependent and one independent variable allows visual confirmation of results, significantly enhancing students’ understanding of AI modeling principles. We selected ‘average mosquito activity index’ as the dependent variable and ‘average ground temperature’ as the independent variable based on their statistically significant correlation, implementing the model using Scikit-Learn’s LinearRegression. To validate the modeling results, we visualized the data and regression line as shown in Fig. 6, reserving 20% of the data for testing. We employed Mean Squared Error (MSE) and R-squared (R²) values, standard metrics for linear regression accuracy assessment, with results detailed in Table 15. Notably, we compared model accuracy between the original dataset containing uniformly input values and its preprocessed version to validate our initial dataset construction rationale.

Fig. 6
figure 6

Visualization results of linear regression of mosquito activity index dataset.

Table 15 Linear regression accuracy measurement results.

The comparative analysis revealed enhanced performance on test data after preprocessing, demonstrating that the refined linear model exhibits greater generalizability and explanatory power (R²-Test = 0.81). This dataset’s structure allows for modeling with various combinations of two or more independent variables, enabling comparative analysis of results and encouraging diverse student outcomes through multiple analytical approaches. These characteristics confirm the dataset’s effectiveness for both linear regression applications and comprehensive education about regression techniques, including preprocessing considerations.

The ‘Baseball game results’ dataset proves suitable for binary classification tasks using various dependent variables to predict game outcomes. Within the Entry programming environment, we implemented binary classification through TensorFlow, offering optional use of Adam Optimizer or SGD (Stochastic Gradient Descent) Optimizer. Through correlation analysis and variance inflation factor examination, we selected six independent variables (‘runs scored’, ‘triples’, ‘home runs’, ‘stolen bases’, ‘strikeouts’, and ‘double plays’) while excluding those showing multicollinearity. Using Keras framework, we constructed a neural network comprising a single fully connected layer with 32 neurons and a Sigmoid activation function. We evaluated both optimization approaches by reserving 20% of data for testing, visualizing accuracy/loss trajectories in Fig. 7. To ensure rigorous validation, we employed comprehensive metrics including accuracy, precision, recall, and F1-Score, supplemented by averaged results from 1,000 iterative modeling trials as detailed in Table 16.

Fig. 7
figure 7

An accuracy and loss graph according to the optimizer of the binary classification model.

Table 16 Accuracy according to the optimization function of the binary classification model.

The analysis revealed consistently high accuracy across all available optimization functions in the Entry programming environment. The 1,000 iterative measurements demonstrated robust mean accuracy with low standard deviation, confirming the dataset’s effectiveness for teaching binary classification concepts while allowing students to freely configure independent variables and explore diverse modeling approaches.

The ‘T-shirt sizes’ dataset is optimized for multiclass classification using categorical dependent variables. The Entry environment implements CART (Classification and Regression Tree) methodology for this purpose, where we designate the categorical ‘t-shirt size’ variable as the dependent feature and reserve 20% of data for testing. Using Scikit-Learn’s DecisionTreeClassifier, we established modeling parameters by setting the minimum leaf node count to 5 (matching the unique category count in the dependent variable) and systematically increasing maximum tree depth from 1 to 10. For each depth configuration, we performed 1,000 modeling iterations to calculate mean, maximum, and minimum accuracy values, as detailed in Table 17.

Table 17 Accuracy according to the decision tree maximum depth hyper parameter.

Analysis of the decision tree models revealed that maximum tree depth plateaued at 7, with no further depth increases observed beyond this threshold. Accuracy evaluation demonstrated two distinct patterns: shallow trees (depth = 1) showed limited classification capability across all dependent variables (accuracy = 0.42), while deeper configurations (depth ≥ 4) achieved peak performance (accuracy = 0.87). This progression confirms the ‘T-shirt sizes’ dataset’s effectiveness for teaching decision tree principles and implementing multiclass classification models.

The ‘Earthquake information’ dataset serves as an unsupervised learning resource featuring magnitude estimates for seismic intensity and geospatial coordinates (latitude/longitude) for cluster analysis. Using the Entry platform’s k-Means implementation with Scikit-Learn, we conducted cluster modeling experiments with varying group quantities (2–9 clusters). To objectively determine optimal clustering, we calculated inertia values—the sum of squared distances between cluster centers and their member points. We performed comparative analysis using both raw data and a preprocessed subset containing only seismically significant events (magnitude ≥ 2.0), with visualization results shown in Fig. 8.

Fig. 8
figure 8

Visualization of cluster and Inertia values through k-Means before and after preprocessing.

Visual analysis of inertia values revealed distinct clustering patterns and centroid positions between preprocessed and raw data when using 5–7 clusters. This demonstrates the dataset’s educational value for implementing AI-driven decision-making processes in classroom settings, as students can critically compare different clustering outcomes. The dataset’s effectiveness for cluster modeling education was thereby confirmed.

Maintanance AI education dataset

To enhance accessibility and educational utility of the developed datasets, we implemented distribution through the Entry programming platform following standardized procedures. As shown in Fig. 9, educators and students can access datasets through Entry’s practice interface using the workflow: [Table] → [Load Data Table] → [Add Table], ensuring consistency with other educational datasets available on the platform.

Fig. 9
figure 9

Publishing datasets via entry.

The dataset interface incorporates expert recommendations from the testing phase, particularly addressing dataset quality assessment and practical application requirements. As demonstrated in Fig. 10, each dataset includes: (1) Basic description, (2) Key variable explanations, (3) Column/row metadata, and (4) Usage examples—implementing expert guidance that “datasets should be easily understandable from a quality assessment perspective” and “must enable creation of complete outputs reflecting real-world activities”36.

Fig. 10
figure 10

Dataset Presented via [Table] Menu of the Entry.

We established a maintenance framework featuring multiple feedback channels: an integrated bulletin board within Entry and a dedicated web portal with usage guides. This infrastructure allows users to submit improvement suggestions, which researchers can implement through collaborative review with the Connect Foundation (Entry’s governing organization). Approved modifications undergo immediate integration into the programming environment through automated deployment pipelines.



Source link

Education

Labour must keep EHCPs in Send system, says education committee chair | Special educational needs

Published

on


Downing Street should commit to education, health and care plans (EHCPs) to keep the trust of families who have children with special educational needs, the Labour MP who chairs the education select committee has said.

A letter to the Guardian on Monday, signed by dozens of special needs and disability charities and campaigners, warned against government changes to the Send system that would restrict or abolish EHCPs. More than 600,000 children and young people rely on EHCPs for individual support in England.

Helen Hayes, who chairs the cross-party Commons education select committee, said mistrust among many families with Send children was so apparent that ministers should commit to keeping EHCPs.

“I think at this stage that would be the right thing to do,” she told BBC Radio 4’s Today programme. “We have been looking, as the education select committee, at the Send system for the last several months. We have heard extensive evidence from parents, from organisations that represent parents, from professionals and from others who are deeply involved in the system, which is failing so many children and families at the moment.

“One of the consequences of that failure is that parents really have so little trust and confidence in the Send system at the moment. And the government should take that very seriously as it charts a way forward for reform.

“It must be undertaking reform and setting out new proposals in a way that helps to build the trust and confidence of parents and which doesn’t make parents feel even more fearful than they do already about their children’s future.”

She added: “At the moment, we have a system where all of the accountability is loaded on to the statutory part of the process, the EHCP system, and I think it is understandable that many parents would feel very, very fearful when the government won’t confirm absolutely that EHCPs and all of the accountabilities that surround them will remain in place.”

The letter published in the Guardian is evidence of growing public concern, despite reassurances from the education secretary, Bridget Phillipson, that no decisions have yet been taken about the fate of EHCPs.

Labour MPs who spoke to the Guardian are worried ministers are unable to explain key details of the special educational needs shake-up being considered in the schools white paper to be published in October.

Stephen Morgan, a junior education minister, reiterated Phillipson’s refusal to say whether the white paper would include plans to change or abolish EHCPs, telling Sky News he could not “get into the mechanics” of the changes for now.

However, he said change was needed: “We inherited a Send system which was broken. The previous government described it as lose, lose, lose, and I want to make sure that children get the right support where they need it, across the country.”

Hayes reiterated this wider point, saying: “It is absolutely clear to us on the select committee that we have a system which is broken. It is failing families, and the government will be wanting to look at how that system can be made to work better.

“But I think they have to take this issue of the lack of trust and confidence, the fear that parents have, and the impact that it has on the daily lives of families. This is an everyday lived reality if you are battling a system that is failing your child, and the EHCPs provide statutory certainty for some parents. It isn’t a perfect system … but it does provide important statutory protection and accountability.”



Source link

Continue Reading

Education

The Trump administration pushed out a university president – its latest bid to close the American mind | Robert Reich

Published

on


Under pressure from the Trump administration, the University of Virginia’s president of nearly seven years, James Ryan, stepped down on Friday, declaring that while he was committed to the university and inclined to fight, he could not in good conscience push back just to save his job.

The Department of Justice demanded that Ryan resign in order to resolve an investigation into whether UVA had sufficiently complied with Donald Trump’s orders banning diversity, equity and inclusion.

UVA dissolved its DEI office in March, though Trump’s lackeys claim the university didn’t go far enough in rooting out DEI.

This is the first time the Trump regime has pushed for the resignation of a university official. It’s unlikely to be the last.

On Monday, the Trump regime said Harvard University had violated federal civil rights law over the treatment of Jewish students on campus.

On Tuesday, the regime released $175m in previously frozen federal funding to the University of Pennsylvania, after the school agreed to bar transgender athletes from women’s teams and delete the swimmer Lia Thomas’s records.

Let’s be clear: DEI, antisemitism, and transgender athletes are not the real reasons for these attacks on higher education. They’re excuses to give the Trump regime power over America’s colleges and universities.

Why do Trump and his lackeys want this power?

They’re following Hungarian president Viktor Orbán’s playbook for creating an “illiberal democracy” – an authoritarian state masquerading as a democracy. The playbook goes like this:

First, take over military and intelligence operations by purging career officers and substituting ones personally loyal to you. Check.

Next, intimidate legislators by warning that if they don’t bend to your wishes, you’ll run loyalists against them. (Make sure they also worry about what your violent supporters could do to them and their families.) Check.

Next, subdue the courts by ignoring or threatening to ignore court rulings you disagree with. Check in process.

Then focus on independent sources of information. Sue media that publish critical stories and block their access to news conferences and interviews. Check.

Then go after the universities.

Crapping on higher education is also good politics, as demonstrated by the congresswoman Elise Stefanik (Harvard 2006) who browbeat the presidents of Harvard, University of Pennsylvania and MIT over their responses to student protests against Israel’s bombardment of Gaza, leading to several of them being fired.

It’s good politics, because many of the 60% of adult Americans who lack college degrees are stuck in lousy jobs. Many resent the college-educated, who lord it over them economically and culturally.

But behind this cultural populism lies a deeper anti-intellectual, anti-Enlightenment ideology closer to fascism than authoritarianism.

JD Vance (Yale Law 2013) has called university professors “the enemy” and suggested using Orbán’s method for ending “leftwing domination” of universities. Vance laid it all out on CBS’s Face the Nation on 19 May 2024:

Universities are controlled by leftwing foundations. They’re not controlled by the American taxpayer and yet the American taxpayer is sending hundreds of billions of dollars to these universities every single year.

I’m not endorsing every single thing that Viktor Orbán has ever done [but] I do think that he’s made some smart decisions there that we could learn from.

His way has to be the model for us: not to eliminate universities, but to give them a choice between survival or taking a much less biased approach to teaching. [The government should be] aggressively reforming institutions … in a way to where they’re much more open to conservative ideas.”

Yet what, exactly, constitutes a “conservative idea?” That dictatorship is preferable to democracy? That white Christian nationalism is better than tolerance and openness? That social Darwinism is superior to human decency?

The claim that higher education must be more open to such “conservative ideas” is dangerous drivel.

So what’s the real, underlying reason for the Trump regime’s attack on education?

Not incidentally, that attack extends to grade school. Trump’s education department announced on Tuesday it’s withholding $6.8bn in funding for schools, and Trump has promised to dismantle the department.

Why? Because the greatest obstacle to dictatorship is an educated populace. Ignorance is the handmaiden of tyranny.

That’s why enslavers prohibited enslaved people from learning to read. Fascists burn books. Tyrants close universities.

In their quest to destroy democracy, Trump, Vance and their cronies are intent on shutting the American mind.

  • Robert Reich, a former US secretary of labor, is a professor of public policy emeritus at the University of California, Berkeley. He is a Guardian US columnist. His newsletter is at robertreich.substack.com



Source link

Continue Reading

Education

Minister won’t rule out support cuts for children with EHCPs amid Send overhaul – UK politics live – UK politics live | Politics

Published

on


Minister won’t rule out support cuts for children with EHCPs amid Send overhaul

Good morning. Less than a week after the government had to abandon the main pillar of its welfare reform plans 90 minutes before a vote it was otherwise likely to lose, the government is now facing another revolt over plans to scale back support available to disabled people. But this row affects children, not adults – specifically pupils with special educational needs who have education, health and care plans (EHCPs) that guarantee them extra help in schools.

As Richard Adams and Kiran Stacey report, although the plans have not been announced yet, campaigners are alarmed by reports that access to EHCPs is set to be restricted.

Guardian splash Photograph: Guardian

The Times has splashed on the same issue.

Times splash
Times splash Photograph: The Times

The Times quotes an unnamed senior Labour MP saying: “If they thought taking money away from disabled adults was bad, watch what happens when they try the same with disabled kids.”

Stephen Morgan, the early education minister, was giving interviews this morning. He was supposed to be talking about the government’s Giving Every Child the Best Start in Life strategy being announced today, but instead he mostly took questions on EHCPs.

On Times Radio, asked if he could guarantee that every child who currently has an EHCP would continue to keep the same provisions, Morgan would not confirm that. Instead he replied:

We absolutely want to make sure that we deliver better support for vulnerable children and their parents and we’re committed to absolutely getting that right. So it’s a real priority for us.

When it was put to him that he was not saying yes, he replied:

Well of course we want to make sure that every child gets the support that they need. That’s why we’re doing the wider reform and we’re publishing the white paper later this year.

Here is the agenda for the day.

Morning: Nigel Farage attends a meeting of Kent county council where his party, Reform UK, is in power.

11.30am: Downing Street holds a lobby briefing.

11.30am: Keir Starmer and other leaders attend a memorial service at St Paul’s Cathedral in London to commemorate the 20th anniversary of the 7/7 attacks.

2.30pm: Yvette Cooper, the home secretary, takes questions in the Commons.

If you want to contact me, please post a message below the line when comments are open (normally between 10am and 3pm at the moment), or message me on social media. I can’t read all the messages BTL, but if you put “Andrew” in a message aimed at me, I am more likely to see it because I search for posts containing that word.

If you want to flag something up urgently, it is best to use social media. You can reach me on Bluesky at @andrewsparrowgdn.bsky.social. The Guardian has given up posting from its official accounts on X, but individual Guardian journalists are there, I still have my account, and if you message me there at @AndrewSparrow, I will see it and respond if necessary.

I find it very helpful when readers point out mistakes, even minor typos. No error is too small to correct. And I find your questions very interesting too. I can’t promise to reply to them all, but I will try to reply to as many as I can, either BTL or sometimes in the blog.

Share

Updated at 

Key events

Unison and Usdaw join other unions in urging Labour to consider introducing wealth tax

As Peter Walker reports, Neil Kinnock, the former Labour leader, said the government should consider a wealth tax, in an interview with Sky News.

Today the Daily Telegraph has splashed on the proposal.

Telegraph splash Photograph: Telegraph/Daily Telegraph

In their story, Ben Riley-Smith, Dominic Penna and Hannah Boland quote five trade unions also supporting a wealth tax.

Some of them them are leftwing unions long associated with calls for wealth taxes. Unite told the paper it had “led the campaign for a wealth tax inside and outside the Labour party”. Steve Wright, general secretary of the FBU, told the paper that “introducing a wealth tax to fund public services, a generous welfare state, and workers’ pay must be a priority in the second year of a Labour government. And Matt Wrack, the former FBU general secretary who is now acting general secretary of Nasuwt, called for an “immediate introduction of a wealth tax”, which he said had “very significant public support”.

But two unions seen as less militant and more aligned with the Labour leadership (which is wary of ‘tax the rich’ rhetoric) have backed the idea. Christina McAnea, general secretary of Unison, told the Telegraph: “A wealth tax would be a much fairer way of raising revenue to invest in public services and grow the economy.”

And Paddy Lillis, the general secretary of Usdaw, said: “We know wealth in this country is with a small number of people. [A wealth tax] is one way of raising money quickly.”



Source link

Continue Reading

Trending