Learn how to get reliable, consistent and purposeful data in your school, with Caroline Keep.

Teachers are Excel spreadsheet experts; we all spend hours filling them in. How many SLTs, I wonder, download such a spreadsheet, use it to develop a detailed document and then present it as ‘What’s going on in our school’?

Does the data show what’s going on in our schools?

The question I’ve been trying to answer is: how are we using our data?

When I first came into teaching from Geotechnical Engineering, I quickly realised that how I used data in schools and how I used data as a scientist were very different. Early on, whilst teaching a GCSE cohort, I recognised my dislike of the common misconception that ‘Excel is a database.’ However, I had to get over myself, as Excel is still the dominant tool we use to store and sort data for day-today assessments, formative assessments and anything else we collect. The problem is, there’s an Excel spreadsheet for everything and they all sit in a drive somewhere, all labelled as something like ‘YEAR9SET1.’

At some point, somebody must compile that data into a bigger spreadsheet; fight for hours to match the column headings and student numbers; remove any typos; change abbreviations to full words or vice versa for consistency; before finally trying to summarise it all in a report, the style and rigour of which depends entirely on the statistical nose and data-mindedness of the poor soul tasked with the job!

Don’t get me wrong, some excellent schools go in for more sophisticated data management tools and use real-time dashboards to monitor data. However, it’s not the norm. The norm, as Neil Selwyn pointed out in his 2020 paper ‘Just playing around with Excel and pivot tables,’ involves just that: ‘playing around’ and, I could add, throwing in the odd ‘graph’, usually of some kind of ‘averages’ – and aggregated (group) ones at that!

We ask questions like “How do boys perform compared to girls?” or “How do SEND pupils compare to non-SEND pupils?” Rarely, however, do schools disaggregate (separate) the data beyond basic demographics. I’ve never seen a school drill down to try to answer
questions like “How do year 6 non-SEND boys compare to year 6 SEND dyslexic girls who are all doing English, relative to their scores on entry to the school?”

Data is powerful when considered thoughtfully – but we tend to look at the surface of the educational onion to see if it’s mouldy or not, rather than peel away the layers to look at what’s underneath.

Part of the reason is a lack of time (teachers are busy), a lack of statistical understanding (who wants more CPD, and in maths at that?), and a lack of need – at least perhaps as far as the DfE is concerned. They want certain data, so we give them what they want, rather than use our data to support our students and predict which ones are going to struggle. We can do so much more. And that’s where machine learning and AI come in (more of that later).

Now, to clarify, I love Excel. It’s a valuable tool for doing swift transformation of data. Nevertheless, after several years, I wanted to have a more thorough look at our data. So, I did an MSc in Data Science alongside teaching during the pandemic – it was the most exhausting year of my life! Mathematics is challenging; the learning is hard. Still, there is something extraordinary about being able to build your own AI models and understand Random Forests. I decided to follow up my MSc with a PhD in Artificial Intelligence (AI) and Machine Learning. It’s early days, but given my research so far, here are a few things we need to consider in comparison to data scientists’ methodology. 

Delving deeper

I’m not talking about how we assess students. This has long been documented and there is a hoard of research behind it. Data has come a long way – from your Amazon recommendations to bedlinen at your hotel. As data scientists, we use machine learning models and artificial intelligence sophisticatedly. I recently researched AI in assessment and how natural language programming is being used not only to answer GCSE exam questions, but one of them, GPT-3 AI, is being used to both write and mark essays! It’s scary how good it is.

I am talking about what we do with assessment data after we collect it. You see, in Data Science, we have a complete process and one which, as educators, we might wish to adopt to get the most out of our data.

Here are two such methods: the IBM Model and the Crisp-DM diagram.

The IBM Model - explained if you follow link in the text to the left.

The IBM Model

The CRISP -DM explained if you follow the link in the text to the left.

Crisp-DM diagram

And steps 2,3, 4…

  1. Gather preliminary data: Obtain the required information. What exactly are we intending to gather?
  2. What is the purpose of gathering it? What is the aim? How will you know it’s been achieved?
  3. When? Do we collect it every year, every term or couple of terms?
  4. What format? Format of data, number of records, or field names.
  5. How will it be kept? Controls for versioning: don’t build 12 sets, just keep them all in one location
  6.  Verify data quality: Is the data of good quality? In the initial collection, make a note of any quality difficulties.
  7. Choose your data: Determine which data sets will be used and why they were included/excluded.
  8. Clean data: This is often the most time-consuming operation. You’ll fall victim to garbage-in, garbage-out if you don’t agree labelling.
  9. Construct data: Create new attributes that will be useful. For example, if you wanted to calculate a person’s BMI, you would collect height and weight fields.
  10. Integrate data: Combine data from numerous sources to create new data sets.
  11. Format data: Re-format data as necessary, e.g., you might convert string values (like text) that store numbers to numeric values so you can perform mathematical operations.
  12. Modelling: Build your graphs/charts/ demonstratives.

This process makes data standardised so it can be compared year on year, disaggregated (separated into components) and used as a factual basis for understanding.

All about those gains

We can gain a lot from just looking at how we manage our data. If you are continuously collecting data, just adopt a few data science techniques (very basic ones) and use them regularly to make your actual core data much stronger.

You can then start to compare year on year, cohort to cohort, SEND types to SEND types, across all subjects easily. Take tips from the ONS: agree on your codes (e.g., Z = Not applicable) and create a Notepad doc of them on the first tab. On the second tab (and those following), put your data in with clear, meaningful column headings. You may wish to use underscores rather than spaces, as some systems don’t like column headings with gaps. If you do graphs/tables, do them in a copy of your original spreadsheet. Keep this original unedited and in a safe place, clearly labelled.

Data guidance - Example 1

Data guidance - Example 2

Data guidance - Example 3

We can gain a lot from just looking at how we manage our data. If you are continuously collecting data, just adopt a few data science techniques (very basic ones) and use them regularly to make your actual core data much stronger.

Until your data is of a quality that can be verified and quality assured to be the same year on year, you don’t know what you’re looking at or how to make the best use of it. This is a snapshot at best, because around the corner is…

Machine Learning

Machine learning is a branch of artificial intelligence that uses algorithms to learn from data and make predictions. It can be used for a variety of tasks, including classification (classifying objects into categories), prediction (predicting the future), and recommendation systems (recommending items based on past purchases). Machine learning applications in education are often used as tools to enhance student engagement with course material by allowing students to practice concepts they have learned previously or help them identify what they don’t understand. For example, machine-learning algorithms can be used to automatically generate summaries of content during class time so students can review them at their own pace. These automated summaries could then be reviewed by teachers for additional feedback
before being turned into actual assignments. This enables teachers to spend more time providing individualised feedback rather than spending hours manually reviewing each assignment individually. (Kučak et al 2018)

Machine-learning algorithms can automatically grade assignments and other forms of student work. This allows teachers to spend more time providing individualised feedback instead of spending hours manually reviewing each assignment individually. It also helps students identify areas where they need additional help or assistance, so they can focus on those areas during class time rather than trying to catch up on everything at once. (Galhardi et al 2018)

I used AI to write these last two paragraphs and then referenced them using the papers they had built them from.

So, the future of data is here in education. But are we ready?

So, change things up, don’t give up, refine, revise and enjoy your new beginnings this spring!


  1. Selwyn Neil, 2020, ‘Just playing around with Excel and pivot tables’ – the realities of data-driven schooling. Available at: https://www.tandfonline.com/doi/full/10.1080/02671522.2020.1812107?casa_token=y0gt0WUPC0gAAAAA%3Ax6g2Gn3V5RrH1mlMkGm6lRi-MuCKRWSOyc3IdrlO8Z2alMhHsyrpdMs02UTghYowN8f1vKkU0mx2Mg
  2. Kučak, D., Juričić, V. and Đambić, G., 2018. ‘Machine Learning In Education – A Survey Of Current Research Tend’s. Annals of DAAAM & Proceedings, 29.
  3. Galhardi, L.B. and Brancher, J.D., 2018, November. ‘Machine learning approach for automatic short answer grading: A systematic review’. In Ibero-american conference on artificial intelligence (pp. 380-391). Springer, Cham.


  • Caroline Keep

    Caroline is a teacher, Maker educator and data scientist. She publishes on STEM, data science and STEAM learning to promote creative and hands-on learning through physical computing, digital fabrication and coding.
    twitter icon LinkedIn icon