What Your Score Isn't Telling You

When I was preparing for the ESAT the second time, I tried to track my own performance properly. After each mock I'd go back through my answers, log the time I'd spent per question, note where I'd changed my mind, flag the questions I'd got wrong without realising I was uncertain. A spreadsheet, essentially. A manual attempt to find the pattern underneath the score.
It worked, in a limited way. But it took longer than the mock itself. And the insights I got were rough — directional at best, not specific enough to act on cleanly. I was doing by hand what should have been done by a system.
That experience shaped how we built the Arc dashboard. The goal was never to show students more numbers. It was to do the work that most students can't do for themselves: take the raw data from a mock attempt and process it into something specific enough to act on. One insight, clearly stated, every time you open the page.
Most dashboards are, in practice, information dumps. Everything is shown because nothing has been prioritised. The result is that students look at their data, feel a vague sense of things they should probably do, and close the tab. We wanted the opposite: less on the screen, more that's actually useful.
Here's what the system is actually measuring, and why each piece matters.
The Problem With Your Score
Raw accuracy — questions correct over questions attempted — is the number you see. It is not the thing the system weighs most heavily.
Two students can both score 18 out of 27. One finishes in 19 minutes. The other takes 28. On results day, they look identical. In terms of what happens to them on the real exam, they are very different students. The first is pacing correctly. The second is not, and won't know it until it's too late.
The diagnostic layer exists to tell those two students different things.
Efficiency
Every question records two things: whether you got it right, and how long it took. These are combined into an efficiency score for each module, scaled against your own historical range and cross-referenced against the platform cohort.
A student who is fast and correct scores higher than one who is slow and correct. This is intentional. On a 27-question paper with a fixed time limit, accuracy and pace are not separate skills. They are the same skill.
Efficiency scores are most useful as a direction signal. An absolute score of 72 tells you less than a 72 this week against a 59 last week. Movement is the thing to watch.
Time Pressure Analysis
Your attempts are divided into three time buckets: fast (under 60 seconds), medium (60 to 90 seconds), and slow (over 90 seconds). Accuracy is measured separately in each.
The pattern that surfaces most often in struggling students is not low overall accuracy. It is a wide gap between fast-question accuracy and slow-question accuracy. In our data, students with fast-question accuracy above 70% but slow-question accuracy below 45% are among the weakest performers on full mocks. Not because they lack knowledge, but because extended time on a question almost always means pursuing a wrong method, not finding a right one.
When this pattern is flagged, the recommendation is not to work faster. It is to get better at recognising when the current approach is failing, and walking away from it cleanly.
Hesitation Rate
Lab45 records whether you change your answer before submitting. The hesitation rate is the proportion of questions per module where your answer changed at least once.
A rate above 25% is flagged. Not because changing answers is always wrong, but because in our data, and in the broader MCQ research literature, first instincts outperform revised answers in the majority of cases. High hesitation signals one of two things: the underlying concept is not solid, or the question format is unfamiliar enough that pattern recognition hasn't engaged. Both have different fixes, and separating them matters.
Confident-Wrong Detection
The most expensive error is not the question you flagged as uncertain and got wrong. You already knew you weren't sure.
The expensive error is the one you got wrong without hesitation. No flag, no answer change, submitted quickly. Lab45 tracks these separately. In a module where 40% or more of your wrong answers arrive without any signal of uncertainty, your internal confidence signal is not calibrated correctly. You are reaching wrong answers with the same conviction you bring to right ones.
This is benchmarked against platform-wide data for each question set. What gets flagged is the rate specific to you, not the rate that is typical for the material.
Positional Drop-Off
Lab45 tracks accuracy by question position across all your attempts. The number of interest is the gap between early-question accuracy (questions 1 to 18) and late-question accuracy (questions 19 to 27).
A gap of more than 20 percentage points is flagged. It almost always reflects time pressure compounding rather than a topic-specific weakness in later questions. Students who fall behind pace by question 15 carry that deficit forward. They arrive at harder questions with less time, higher stress, and less capacity for careful reasoning. They underperform on questions they would otherwise get right.
The fix is not more practice at the back half. It is cleaner execution early. Time banked in the first ten questions is what creates the conditions to think clearly in the last five.
Plateau Detection
Once you have completed at least three attempts in a module, Lab45 tracks whether your scores are moving. Plateau detection measures the variance of your last three module scores. When variance drops below a set threshold, the module is flagged.
The assumption is not that you have hit your ceiling. It is that the current approach is not generating the variability in exposure that drives improvement. Same inputs, same outputs. A plateau flag shifts the recommendation from more practice to different practice: drilling specific question types, spending more time with individual solutions, or addressing a weakness that the headline score has been obscuring.
How It Becomes One Thing
All of these signals feed into a priority cascade. The system does not present everything at once.
Time pressure problems rank above hesitation problems. Plateaus rank above general momentum notes. The ordering reflects what is most likely to be the lever: not the most dramatic number in your data, and not the most recent one.
The insight card on your dashboard — the verdict, the evidence, the one action — is generated from this cascade every time you load the page. As your data accumulates, the insight updates. A problem that dominated last week may not be the priority this week. That is not a glitch. That is the system doing what a student with a spreadsheet was trying to do manually: cutting through the noise to the thing that is actually worth fixing right now.
The score is not a ranking. It is a direction. Every number on the dashboard is pointing at something specific. The job of the diagnostic layer is to make sure what it points at is worth your time.


