Recap

Establishing a Common Vocabulary
--Mathematics
--Learning
--Attention
--Expert
--Fluency
--Understanding
--Attainment
--Ability

Types of Professional Knowledge

The Importance of Effort

Blending Approaches

Phasing Teaching
--Moving from Current Practice to Mastery Approach
--Types of Questioning

Proportioning Content

The Importance of Knowledge
--Inflexible Knowledge
--Continuing our Shared Language

Human Cognitive Architecture
--Is mathematical problem solving biologically primary or secondary?

Key Principles in Cognitive Science for Learning
--Working memory
--Cognitive Load
--Anxiety and Cognitive Load
--The Difference Between Novice and Expert
--Generic Cognitive Skills and Domain Specific Skills
--Information Store Principle
--Relationships
--Teach Everything Correctly First Time
--Narrow Limits of Change Principle
--Worked Example Effect
--Split Attention Effect
--The Redundancy Effect

Variation Theory
--Mathematical Confidence

Storage and Retrieval
--Performance is not the same as Learning
--The Importance of Forgetting
--Desirable Difficulties
--The Testing Effect
--Testing Potentiates Learning
--Marking and Feedback
--The Hypercorrection Effect
--Better Multiple Choice Questions
--Massed vs Spaced Practice
--Implications for Overlearning
--Blocked vs Interleaved Practice
--The Generation Effect
--Performance is not a Good Proxy for Learning
--Teachers and pupils can be fooled
--The Teacher Parable

Moving from Propositional to Strategic Knowledge

Recap

Part 1 and Part 2 of this blog concerned the background efficacy of the approach, its history, evidence base and evolution over the past 100 years in particular, and how schools could go about adopting the approach in practice.

In this part, I will consider the key logistical and pedagogical considerations for ensuring a mastery approach has as great an impact as possible. The blog is broken into chapters, which have been written in a narrative but can also stand alone should you wish to dip in and out.

By taking Aristotle’s 1-to-1 model of educating, where a tutor works with his pupil, Carleton Washburne was able to create the mastery model of learning. Central to these ideas, as demonstrated by Burk and Ward in the 1910s, was that all pupils can learn all content if they are starting at the right point and given the right amount of time. Morrison further emphasized the importance of considering all pupils individually with his introduction of correctives to Carleton’s model in the 1920s.

It is critical that teaching should start from what the pupil already knows. The new learning will build on this and be just beyond already embedded knowledge and understanding. This makes the mastery model of schooling incompatible with non-homogenised groupings of pupils. That is, mixed ability and mixed attainment classes, where the gap between highest and lowest is large, is anathema to the mastery approach.

The model stood the test of time and was generally accepted as being impactful, but it was not until Bloom picked up the story that the rigour of research and evidence was able to confirm what Carleton had asserted.

Bloom was introduced to the idea of mastery and the potential impact by his friend, John B Carroll. Carroll argued that all pupils can learn well given the right conditions. In 1963, he embarked on a lifetime’s work to prove this was true. Carroll’s Model of Schooling showed that ability is an index of learning rate – all pupils can learn, but they require different amounts of time. He also emphasized the importance of instruction design and resource design, which needed careful thought and planning if a pupil’s attention was to be drawn to the relevant information and ideas.

In addition to the quality and type of the instruction and materials, Carroll highlighted the same key ingredient that both Aristotle and Washburne had insisted on as being essential for learning to take place: effort.

The results of successfully implementing a mastery approach are profound, with much greater numbers of pupils learning well than previously. Carroll, Bloom, Block, Guskey and others find similar results: a significant shifting of the distribution of those who reach particular levels of attainment.

The distribution of attainment in a traditional setting follows broadly a normal curve, but in mastery settings, the distribution is significantly skewed towards greater levels of attainment.

As discussed previously, we are interested in long term memory. The challenge is to ensure as many pupils as possible learn well. The mastery cycle (below) describes the logistical issues in running the approach, but inherent in the approach is that all teaching results in learning. That is to say, all teaching results in a change to the long term memory. This must be the case in all learning episodes.

The fabulous Oliver Caviglioli recently created a poster version of my mastery cycle, which can be downloaded from his website.

In this part of the blog, I explore the crucial approaches and pedagogies necessary to make this happen.

Establishing a Common Vocabulary

Many of the words we use in education are also commonly used in day to day language. This often results in confusion and mixed meanings. But the words we use in the science of education are well defined and have specific meaning. Before continuing, to avoid ambiguity, I will set out some key words and their definitions.

Mathematics

I take ‘mathematics’ to mean a way of existing in the universe. Mathematicians are curious in all aspects of their lives. Mathematicians, when faced with a problem, enjoy the state of not yet knowing the resolution (indeed, knowing there may not even be a resolution). Because they are curious, mathematicians, when faced with a problem, ask themselves questions of it. They can specialise, pattern spot, conjecture, generalise, try to disprove, argue with themselves, monitor their own thinking, reflect and notice how these new encounters have changed them as a human being. That is to say, mathematics is an epistemological model: a way of considering the very nature of knowledge.

Sadly, in many of the North-Western cultures, children have been conditioned to believe that mathematics is about wading through questions, getting ‘right’ or ‘wrong’ answers. This is a confusion to mathematicians, since it does not represent our domain at all. Mathematicians are not in the business of answering lists of questions. Rather, they meet scenarios and, driven by their curiosity, create their own questions and follow their own lines of enquiry. Many of these lines of enquiry result in unexpected results, but we do not consider these to be ‘wrong’, simply not what we thought would happen. Often, great discoveries in mathematics have resulted from lines of enquiry that lead to unexpected results.

Mathematicians enjoy being stuck. They revel in the initial apparent impenetrability of a scenario and understand that by attacking it in a structured way, enlightenment can arise.

Learning

Learning is the bringing about of some change in the long term memory. When faced with enabling a pupil to learn a novel idea, skill or information, as teachers we are concerned with changing their long term memory.

First of all in terms of embedding the novel idea in the long term memory in the form of some mental representation that can be thought about and, secondly, in assimilating this new mental representation into the schema of knowledge and ideas that already exists.

Attention

Dan Willingham’s phrase ‘memory is the residue of thought’ is a handy reminder to us that we are seeking to change the long term memory. But ‘thought’ is too loose a definition. When faced with a mathematical problem, scenario or task, a pupil may well be ‘thinking’ about it, but they may just be thinking, ‘this is crap’. Instead, it is a very specific aspect of thinking that results in a change in the long term memory: attention. Attention is focused and deliberate. We are interested in what pupils are attending to, not just what they are thinking about. When presented with a novel mathematical idea, we want pupils to be attending purely to the mathematics (or as pure as is achievable in reality) and the mathematical structure. We also want to draw attention to how the mathematical idea relates to knowledge they already have in their current schema.

Giving attention is difficult, it requires focus and a belief that what is being attended to is important. This hard, deliberate process is how the long term memory is changed.

Expert

Expertise relates to layers of attention. As one becomes more expert, one can attend to higher layers of attention. For example, as a child I learned how to play the piano. When learning to play the piano, one needs to give huge amounts of attention to the position of one’s hands, their movement, the pressure each figure is exerting, the meaning of musical symbols, and so on. As one becomes more expert at playing the piano, one can attend to higher level aspects. Nowadays, when I play the piano, I have absolutely no idea where my hands are or what they are doing. I can attend to higher levels such as melody, composition or beauty. The process of learning is the process of becoming more expert. It is never ending, there is always more one can attend to. What a sad state of affairs it would be if one day, one simply closed the lid of the piano and said, ‘well, that’s the piano finished!’

Mastery is about becoming more expert, not about ‘mastering’ things. Crucial to the mastery approach is the recognition that there is always more to learn.

Fluency

We consider someone to be fluent in a skill, idea, concept or facts at the point at which they no longer need to give attention. It is important to note that fluency is simply the state of attention not being necessary in order to perform, but this does not mean that one couldn’t, if one wanted to, choose to give attention. Considering the piano example again, although I don’t know where my hands are or what they are doing because I no longer need to give attention to that aspect of performing, I can choose to give attention to it. I might, for example, see another pianist do something with their hands and think, ‘gee, that’s interesting, how did she do that?’ I can then give deliberate attention to that lower level aspect. Quite often, when learning mathematics, great new insight comes from choosing to give deliberate attention to an area of mathematics one is already fluent in. So, fluency is when attention is no longer necessary. Attention is hard, it is effortful, fluency is effortless. When learning to manipulate algebraic expressions, for example, pupils need to give a lot of attention to the rules and conventions in order to carry about even simple rearrangements, but as they become fluent, this becomes effortless and they can attend to other, higher level aspects such as what the underlying relationships between variables are.

Understanding

Let’s imagine mathematics as a complex web of interconnected ideas (it is, of course, not a web, but the analogy holds well and is useful).

Often, understanding is described in quite a wooly way. People will say, for example, it is the number of connections or the ability to use the idea in another area of mathematics. But understanding has a much more precise meaning.

The mathematics is understood if its mental representation is part of a network of representations. The degree of understanding is determined by the number and strength of its connections. A mathematical idea, procedure, or fact is understood thoroughly if it is linked to existing networks with stronger or more numerous connections. – Hiebert and Carpenter, 1992.

Mathematical ideas are connected and, as pupils mature, they assimilate new ideas into their schema in the form of mental representations. These representations form a map of mathematics that can continue to grow – there is no limit to the number of connections we can make. Understanding is about the reasons why the connections are true. Again, there is no limit to the depth of reasoning one can make, so understanding can be thought of as infinite.

The depth and strength of the reasoning why connections are true is what we define as understanding. This has a beautiful corollary: understanding never ends, there is always more that we can understand about ideas.

Understanding is not a dichotomous state, but a continuum . . . Everyone understands to some degree anything that they know about. It also follows that understanding is never complete; for we can always add more knowledge, another episode, say, or refine an image, or see new links between things we know already. – White and Gunston, 1992.

Attainment

Attainment is the point that a pupil has reached in learning a discipline. It can change; pupils can unlearn as well as learn. It is not precise. But it is very useful in determining appropriate points on a curriculum from which to springboard pupils to new learning. We, as educators, continually assess these attainment points so as to best ensure the curriculum we are following can adapt and flex to what has been understood or forgotten. Knowing the prior attainment of pupils (rather than what has been previously presented at them) is crucial if we are to ensure pupils are learning appropriate new ideas and concepts.

Ability

Ability is an index of learning rate. It is the readiness and speed at which a pupil can grip a new idea. It can change; as with all human beings, pupils will make meaning from some metaphors, models or examples, more readily than they will of others. In mathematics, for example, we often see pupils quickly understanding some numerical pattern, say, who then take a long time to grip a geometrical relationship. An individual can have a high index of learning rate during some periods of their life and a low one at others. Again, as educators, we are continually assessing ability so that we are best able to judge the amount of time, additional practice, new explanations or support that a pupil needs in order to really grip an idea. Knowing the ability of a pupil (rather than wooly ideas of engagement or enjoyment) is crucial if we are to ensure that pupils are learning new ideas and concepts for the appropriate amount of time (rather than some arbitrary amount of time presented on a scheme of work).

A common misconception is that pupils who are low attaining are also low ability. This misconception arises when ‘conveyor belt’ approaches to curriculum are deployed rather than a mastery approach. In conveyor belt, coverage rather than learning is the focus. Teachers race through objectives and teach all pupils the same content as mandated by a scheme of work on any given week or day. This results in low attaining pupils being asked to learn material they are simply not ready for. The gap from their true starting point to what they are being asked to grip is a severe handicap, so they appear to be slow learners. But, in a mastery approach, where we ensure that all pupils are learning the right level of content for the right amount of time, low attaining pupils are being taught content just beyond their current understanding and so can assimilate and connect the new learning much more readily, leading to fluency and then understanding. When taught at the right level, all pupils can learn at pace.

Types of Professional Knowledge

Many teachers of mathematics enter the profession with high levels of mathematics content knowledge. This knowledge is connected to, but not the same as, mathematics pedagogical knowledge. Knowing how to bring about learning is complex and requires many years of professional learning to acquire. Some of this knowledge can be studied, reading the best evidence (propositional knowledge), some of it can be acquired through hearing about practice, perhaps a teacher giving a presentation at a CPD event (case knowledge), and, most importantly, some of this knowledge only comes about by teachers experiencing events themselves (strategic knowledge). This strategic knowledge involves teachers thinking about and considering propositional and case knowledge, which they then develop further based on actual practice in real classrooms.

What is set out in this blog has had to pass the test of the three types of knowledge. Propositional knowledge is incredibly useful and stimulates professional enquiry, but many aspects of education research can not be replicated beyond laboratory conditions, so, although theoretically interesting, those ideas do not form part of the mastery approach. Only testable ideas that are able to be applied to real classrooms are considered here.

The Importance of Effort

Both conceptual understanding and procedural fluency are necessary in learning mathematics, but they are not sufficient. As Kilpatrick, Swafford, Findell (Adding It Up: Helping Children Learn Mathematics, 2001) remind us, pupils must also have strategic competence (the ability to solve problems), adaptive reasoning (the capacity for reflecting and reasoning, which leads to understanding) and, critically, productive disposition (a belief that one’s own effort matters)

These combined give the conditions for learning. Often the most important of these, effort, is shied away from to great detriment. Schools avoid honest conversations with pupils and parents, yet it is this honesty that can bring about huge gains in learning. Families and pupils need to understand that their success is a result of their effort and their failure is a result of their laziness. A pupil can have the worst teacher, be at the worst school, have shoddy books, yet still learn well because they put in great effort. Conversely, a pupil may attend the best school in the world and be under the instruction of an amazing teacher who uses the very best materials, yet completely fail to learn because they expend no effort. Effort matters. A lot.

Washburne’s mastery model was based on the teachings of Aristotle. Central to the model is the recognition that effort matters and, further, that pupils understand that it is their own effort that determines their success.

Where pupils recognise this, the impact is profound. Generally, in the North Western cultures today, pupils and families have surrendered their agency. Pupils routinely blame their failure in a lesson or on a task or test on their perceived quality of the teacher, rather than realising they are the key driver.

Attitudes and beliefs around which factors influence success vary wildly around the world.

Where pupils understand that the main factor in success is their own effort, the impact on attainment is significant. A recent McKinsey analysis of attainment against self-efficacy showed pupils in the most disadvantaged circumstances who belief their own effort is key, outperform pupils in the most advantaged settings who believe success is a result of external factors.

By examining sub-sets of pupils who have undertaken PISA tests, John Jerrim was able to identify east Asian children attending Australian schools far out performed native Australian children. In his paper, “Why do East Asian children perform so well in PISA? An investigation of Western-born children of East Asian descent” (John Jerrim, 2014), Jerrim concludes that the hard work ethic of these children is a key factor in them outperforming native pupils by two and a half years of learning. This conclusion is similar to that of Feniger and Lefstein (2014).

The Nuffield Foundation research paper, Values and Variables – Mathematics Education in High Performing Jurisdictions (2010), again points to the importance and impact of a culture of self-efficacy.

This belief in hard work and the transformative impact that effort makes is central to the mastery approach. It is therefore incumbent upon educators to be direct and honest with pupils and their families that they play and active, not passive role in their learning.

Blending Approaches

Mastery is an entire and complete model of schooling. There are many models that exist, having varying degrees of impact both in terms of the currency they give to pupils (school grades) and long term engagement in a subject or discipline (e.g. whether or not pupils pursue mathematics at higher education or enter mathematical careers later in life). Much debate occurs around which model to adopt.

Two models that might be seen as being at the extremes are Inquiry Learning and Teacher Directed Instruction. At the extremes of these lie Discovery Learning and Direct Instruction (here I take Direct Instruction to mean the scripted intervention programme arising from Project Follow Through). Advocates of both often take the view that the approaches are mutually exclusive. Washburne, rightly, understood that education is nuanced and rarely are such fanatical positions helpful.

It has long been an element of the mastery cycle that instruction is varied in order to allow as many opportunities for ‘meaning making’ as possible. The approach very much embraces teacher instruction, but also includes time for inquiry. It can be shown that, at the extremes, Direct Instruction does indeed lead to good outcomes in terms of pupil performance on tests, but not optimal performance. As models move towards teacher direction in all lessons, performance passes a plateau and begins to reduce.

Equally, by increasing the opportunity for pupils to undertake suitable inquiry, performance initially increases, but quickly begins to worsen.

By blending both direct teacher instruction and appropriate opportunities for inquiry, pupil performance increases.

We are seeking to strike a balance between teacher-directed methods and inquiry methods. Getting the recipe right consists of several key considerations, namely the type of instruction in each, the order in which the instruction happens, and the ratio of the methods used.

The pedagogic choices made when phasing teaching have a significant impact on pupil outcomes.

Phasing Teaching

I will suggest that effective teaching of a novel idea in mathematics passes through four phases as the pupil moves from novice to fluency to understanding.

Those phases are

· Teach

· Do

· Practise

· Behave

During the ‘Teach’ phase, the idea is entirely novel to pupils, though just beyond their current knowledge and understanding. The teacher will instruct the pupils, tell them key facts, pass on knowledge, show and describe, use metaphor and model, all in order to bring about connections in the pupil’s current schema so that they can ‘meaning make’. This phase is often described as explicit teaching. It is a crucial phase – after all, the teacher knows things and the pupil does not; so tell them!

The end of the ‘Teach’ phase does not result in learning. It is merely the first step. At this stage the new knowledge is ‘inflexible’, and it is our job as teachers to bring meaning and understanding to the knowledge so that it becomes ‘flexible’ (more on inflexible and flexible knowledge later).

We now ask pupils to ‘Do’. At this stage, they do not yet know or understand the new idea, they are replicating what the teacher has told or shown them. The ‘Do’ phase has two important purposes. Firstly, the teacher is able to observe whether or not the pupils have made meaning of the model, example, metaphor or information they have been given or shown. The teacher can see and act; are the pupils able to replicate what I have demonstrated? If not, the teacher can change their model, example or explanation, perhaps making stronger and more explicit connections to previous knowledge and understanding. The second reason for the ‘Do’ phase is to give pupils a sense that the idea or task is surmountable – that they, quite literally, can do what they are being asked. Well structured ‘Teach’ and ‘Do’ builds pupils’ confidence and shows them there is nothing to be afraid of, the new idea is within their reach.

Once both teacher and pupil are clear that the pupil is able to ‘Do’ – that is to say, they can perform – the teacher now segues the pupil to the ‘Practise’ phase.

During ‘Practise’, we wish to move beyond simply performing. We want the pupil to gain a confidence in working with the new idea, to see its underlying relationships and to assimilate the new idea into their schema of knowledge. In order to achieve these more meaningful goals, the pupil needs to be able to attend to a higher level. In other words, as described earlier, the pupil needs to have achieved fluency at the performing level first, so that they may attend to connections, relationships and a deeper conceptual appreciation.

So, we shall define the point at which the pupil moves from ‘Do’ to ‘Practise’ as the point at which they achieve fluency (as defined earlier in this blog).

The final phase, ‘Behave’, is the most important phase. This is the phase that brings about understanding.

At this stage, teachers create opportunities for pupils to behave mathematically.

I know of no better description of mathematical behaviour than the rubric included in John Mason’s 1982 book, Thinking Mathematically.

This simple flowchart perfectly captures how mathematicians actually behave.

Our assumption at this stage is the pupil has become fluent in the new idea or skill, is able to work confidently with the mathematics and has assimilated the idea into their schema of knowledge. It is tempting, then, to plan ‘Behave’ tasks that are based on the new mathematical idea, which pupils have just gripped, but in learning mathematics and, in particular, in thinking mathematically, maturation matters. The type of thinking and behaving we want pupils to do at this stage requires an embedded sense and understanding of the mathematical ideas that will arise.

When planning for the ‘Behave’ phase, therefore, we will not be asking the pupils to use the novel idea, but instead to be drawing on well embedded and matured mathematical ideas that connect to the new learning. The new learning that has occurred in this learning episode will mature over time as more connections are made and more opportunities are given to see the idea from different perspectives. Later in the journey of learning mathematics, the new idea will be used (many times) in the ‘Behave’ phase. It is incredibly difficult to determine how mature an idea needs to be before pupils can ‘Behave’ mathematically with that idea, but a good rule of thumb would be around 2 years.

As an example, suppose the new idea encountered in this learning episode had been working with fairly interesting 3d trigonometry, at the ‘Behave’ phase, we might be asking pupils to work with ideas of angle facts or simple Pythagoras, which they will have met much earlier on. They can see the connection to the new idea, but it won’t demand that they use it (though there is nothing wrong in scenarios that make it possible to use the new idea and ideas beyond!). Not only do pupils get an appreciation for how their ability to use earlier ideas, which seemed at the time to be complex and now appear simple and fluent, has become more embedded and eloquent, pupils are also benefitting from meeting previous ideas again, bringing benefits of ‘spacing’, which I discuss later in this blog.

Many teachers find it an uncomfortable – perhaps even illogical – process to plan the ‘Behave’ phase as one that relates to much earlier learning rather than the new idea, but it is crucial to do so if we want to bring about optimal gains in learning, understanding and long term recall.

Moving from Current Practice to Mastery Approach

It has been some time since mastery was the dominant model of schooling in the UK. Since the introduction of the National Curriculum in 1988, schools have almost unanimously adopted a conveyor belt approach (see Part 1 of this blog). This approach has resulted in an obsession with coverage rather than learning. Lessons ‘cover’ content and objectives, but tend not to be concerned with understanding and long term recall.

Another result of the conveyor belt is the wholly obtuse belief that learning happens in perfectly apportioned pockets of time. It is a common feature of schemes of work to assume that each mathematical idea will be learnt in precisely 1 hour. How serendipitous this would be!

Worse, we even hear apparently responsible educators, managers and inspectors talking about pupils ‘making progress in 20 minutes’. This is, of course, utter nonsense. Learning is not linear, it is highly complex and involves regressing as well as progressing.

I suggest here, as Washburne, Bloom, Carroll and many others have done before me, that a ‘learning episode’ (the amount of time required to grip a novel idea) has no fixed time period. Yes, some things can be learnt in an hour, but some may take weeks or years.

I take ‘learning episode’ to be my measure when talking about the four phases outlined above. The teacher will flow through the four phases during the ‘learning episode’, taking the right amount of time necessary (informed by their observations, discussions, questions and experience).

Let us consider the optimal phasing of a ‘learning episode’. I will use the following colour coding

When one travels around the UK today, the typical phasing of a learning episode looks something like

Generally, at the moment in conveyor belt approach, the teacher will spend a short amount of time demonstrating and instructing, then ask pupils to work on similar examples. They have to undertake a lot of ‘doing’ before the ideas start to become clear to them. Eventually, they find they no longer have to give great attention to the surface level and can begin to discern relationships and concepts. At this stage, the pupils are now practising, which they are given a large amount of time to do.

In current UK classrooms, most pupils only ever proceed to this ‘Practise’ phase and the ‘Behave’ phase is entirely absent. This makes coverage easier – teachers can ‘get through the curriculum’ – but misses the most important phase, which means pupils do not get the opportunity to reason, understand, embed and improve long term recall.

It is a common feature of the current UK education landscape to hear teachers lamenting the fact that pupils have forgotten what they have been taught previously. But without the ‘Behave’ phase, they have not been taught, they have just had presentation and practise. Yes, they can perform, but performance is not the same thing as learning at all. If learning did not occur, nor did teaching. Perhaps the lament should more accurately be the rather unsurprising statement; my pupils can’t recall something they were never taught!

I suggest that a more impactful phasing could look like this

Notice the increase in time spent on explicitly teaching the novel idea, through modelling, examples, metaphors, information, etc. With an increased amount of teaching time, pupils are able to move more quickly from the ‘Do’ to the ‘Practise’ phase. Now, a good amount of time is reserved for the ‘Behave’ phase. As discussed earlier, and demonstrated in the McKinsey data, increasing the amount of direct teaching results in greater gains, but only to an optimal proportion. In order to achieve the sweet spot between teacher-directed and pupil-inquiry, the ‘Behave’ phase gives opportunities for meaningful inquiry.

This model is more effective than the conveyor belt model, since it takes the pupils into the ‘Behave’ phase, which requires them to make deeper connections and reason and reflect. This time spent considering the ideas at a deep structure level, rather than just at surface level brings about gains in terms of long term memory.

However, the above suggested phasing can be improved further. I suggest that the following distribution of a ‘learning episode’ is even more powerful.

Here the ‘Teach’ and ‘Do’ phases are broken up and intertwined, which helps the teacher to hold their own teaching to account before progressing too far with an idea – a checking activity to ensure the intended meaning is being received by the pupils before attempting to build on it. It also helps to space out the learning of an idea and gives opportunity to disrupt the time spent thinking about one thing. This important aspect of learning is further explored later in this blog.

Our goal is to get the benefits of the ‘sweet spot’, the optimal balance between teacher-directed and pupil-inquiry. There is no hard or fast rule to the proportion of time spent on each, but a good rule of thumb would be an approximate 80:20 split between the combined ‘Teach’, ‘Do’ and ‘Practise’ phases and the ‘Behave’ phase.

With this phasing, teachers are carefully building up an appreciation of the novel idea, ensuring pupils become fluent in its use, then providing a reflective period in which pupils use earlier, but connected, ideas with which to undertake mathematical thinking.

Combined, this phasing pulls together several key benefits for learning that the field of cognitive science has been confirming over the last 50 years. Later, I outline the impact on memory that this approach can have.

Types of Questioning

Each phase uses carefully planned and deliberate types of questioning.

During the first phase, the teacher is teaching. This teaching is carefully considered, planned, well executed and explicit. During this phase, the teacher uses questions as stories, models and examples. These questions are ‘demonstrated’ – literally, the teacher is demonstrating what success looks like, they are to the point, accurate and efficient demonstrations of what pupils might encounter when working with the novel idea and how to resolve such problems. At this stage, the novel idea is not known or understood. The most efficient way to get a child to know a new piece of information or idea is simply to tell them. As teachers, we hold in ourselves a body of knowledge unknown to the pupils. We carefully reveal this knowledge, at the right time taking into account the maturity of their schema, gradually building up their appreciation of our domain.

The implication is clear; curriculum is the single most important tool we have at our disposal. A carefully planned route through our subject – which is not linear, but complex and takes into account forgetting and unlearning as well as learning – is vital if we are to know when and how to reveal the canon of our discipline. This journey through learning a subject spirals upwards as we mature. Ideas are met and then re-met as we grow older. Earlier ideas suddenly have new meaning as we can view them from the perspective of maturity, integrating them with latterly learnt material, shining a new light on them and revealing underlying relationships that did not seem apparent at an earlier stage. All mastery approaches adopt a spiral or staircase curriculum approach – it is vital in bringing about the gains of maturation and schema assimilation. At the end of this blog, I discuss curriculum design and optimal phasing in more detail.

Having demonstrated what we know pupils will be able to do, we then ask them to do so. During the next phase, pupils are doing. The questions at this stage still involve the teacher, since pupils have not yet gripped the novel idea. Pupils are replicating, being successful, performing, gaining confidence. The teacher is a crucial part of this stage, ensuring confidence is being built by continuing to guide pupils. At this stage, therefore, we call the question types ‘guided’.

This transition and mixing of the ‘demonstrated’ and ‘guided’ can be instant, for example, the teacher demonstrates a solution and then immediately asks the pupils to do a similar one (show – do), or can take place with greater explanation, for example, the teacher demonstrates a question, takes some questions from the pupils, addresses these in discussion, points out features, then demonstrates a few more examples before asking the pupils to have a go at a few. These pedagogic choices happen in real time – the teacher can judge the impact of their example (perhaps by surveying the class or asking pupils to show the response to a guided question on mini-whiteboards) and then decide the best course of action (more examples, different models or allowing the pupils to do some more of their own).

The teacher is continually monitoring the level of confidence, deftness, accuracy and insight their pupils are showing during the teach-do interchanges. As the pupils move from significant concentration on surface level issues such as process, the teacher is watching for the transition to procedural fluency. As this is attained, the pupils are slowly, purposefully segueing into practising.

As the pupils move to practising, the teacher delivers questions designed to reveal underlying relationships and deeper structure. These questions are well ordered, carefully planned, with deliberate and purposeful variation such that the novel idea is connected to previous learning and assimilated into the pupils schema because they are able to appreciate connections, logic and relationship. We call these questions ‘structured’.

In the final phase of the ‘learning episode’, our aim is to elicit mathematical thinking. We will call these questions ‘intelligent’

Questions that elicit mathematical thinking can include scenarios where pupils must evaluate mathematical statements, classify mathematical objects, interpret multiple representations, create and solve problems, and analyse reasoning and solutions.

Crucially, we are seeking to take pupils from a point of specialising, through conjecturing, generalising and, critically, reasoning and reflecting.

It is these ‘intelligent’ questions that bring about understanding and make our knowledge much more flexible and memorable.

Proportioning Content

Assimilating new ideas and information into an established complex schema is difficult. Before the moment of the new idea, the pupil has a perception of the universe – a series of held views, beliefs and truths. Asking the pupil to disrupt that view of the world is a significant burden on them. As discussed, connecting already established and understood knowledge and ideas to the new learning, enables a pupil to ‘meaning make’ more readily – after all, if one can see a new idea from the perspective of already believed ideas and how it fits with their wider view of the universe, it is much easier to believe the new truth.

It is such a big ask of pupils to believe and grip novel ideas or knowledge that we should take steps to make this process as gentle and effective as possible. An important step is to not overwhelm the pupil with novel information. In a conveyor belt curriculum, the content of each lesson is almost entirely novel – this objective led approach sees teachers racing through new mathematical ideas like a tick list. All of the questions, discussion and exploration in the lesson is concentrated on the new idea. In a mastery approach, a very different structure is used. In each learning episode (rather than lesson), only a small proportion of the content of the lesson is novel. The majority of the content is drawn from previously encountered material, with links to the new idea at hand. A good rule of thumb for old:new content is approximately 80:20. So, in each learning episode, only around 20% of the content is focused on the new idea. This greatly improves assimilation and also brings about gains of both spacing and interleaving content (more on that later).

The Importance of Knowledge

As discussed, inquiry is a critical element in learning mathematics. It is the stage where reasoning, conjecturing, generalising and reflecting occur. These are all important in revealing underlying relationships and bringing about understanding, which greatly increases likelihood of long term recall.

Unfortunately, inquiry is often conflated with ‘discovery learning’, where pupils are expected to discover and create their own new knowledge. Inquiry is not this. Inquiry is an intellectually demanding process, forcing pupils to give deliberate and sustained attention to ideas, concepts and connections. When carrying out inquiry, pupils draw on embedded knowledge and understanding. It is true that from this existing knowledge, with carefully constructed inquiry, pupils can and do construct new meaning and even new knowledge, but this is an incredibly inefficient process (see later). Rather, we improve the gains from inquiry when we first ensure that the required knowledge is already in place. After all, it is incredibly hard to think about something when one doesn’t know anything!

Phasing the teaching process such that prerequisite knowledge comes before inquiry is key. Furthermore, the required knowledge should be embedded through maturity, meaning the inquiry process may take a couple of years before it really draws on some information or idea being learnt today.

The disastrous meta-study effect sizes often quoted to diminish the importance of inquiry arise from meta-anlyses conflating inquiry with discovery learning or from including studies that do not take account of the importance of correct phasing and maturation. It is easy to paint inquiry as having no impact if we ask pupils to carry out inquiry without having the prerequisite knowledge or to carry out inquiry with novel ideas that they have not yet assimilated into their schema. But, carried out well, inquiry does not only greatly advance understanding, it is also the key to improving long term memory of mathematical ideas.

It cannot be said too often: knowledge comes first. But inquiry must come too, if we are to move from knowing to understanding.

Inflexible Knowledge

Because of the polarised nature of education debate, radical advocates of knowledge often sneer at inquiry and radical advocates of inquiry sneer equally at knowledge. These two camps have established themselves as though never the twain shall meet. They position inquiry and knowledge as mutually exclusive. This is clearly moronic. Education is complex. The debate is never so black and white, there is always nuance. Both knowledge and inquiry matter. It is the phasing and proportion of each that needs to be got right.

The more extreme inquiry promoters paint a picture of knowledge as being about rote learning. In fact, very little knowledge is rote knowledge. Usually, when talking about rote knowledge, people are really describing inflexible knowledge.

Inflexible knowledge is a perfectly normal step in learning. Most of us, when learning something new, will acquire inflexible knowledge.

The oft quoted example of rote knowledge from Anquished English by Richard Lederer is the pupil who gives the response:

“A menagerie lion running about the earth through Africa.”

What question is the pupil responding to? The pupil has been asked to describe the equator!

Clearly, this knowledge is not useful. They have misheard the sentence “an imaginary line…” and have absolutely no concept for what the equator is. Furthermore, they have not tried to assimilate the sentence with known information and ideas – they would surely spot the ridiculousness of the sentence. The pupil has simply remembered the line being said. This is rote knowledge. That is to say, this is memorising in the absence of meaning.

But rote knowledge is rare. Most things that are remembered do have meaning (even if that meaning is not yet understood). Consider, for example, the pupil who tells their teacher “Eight takeaway five is three. But you can’t do five takeaway eight.” Clearly, this has meaning. The model of subtraction they are using is one of removing, literally taking away. This knowledge is not rote. It is true that they don’t yet fully understand subtraction, but the knowledge they have is useful and is a perfectly natural step in learning about subtraction. This knowledge does fit with other learnt ideas (removing objects, say). It is connected, but it is not complete.

We want our pupils to become creative problem solvers, but we should not despair at inflexible knowledge. Our job as teachers is to schedule the learning of mathematics, such that the discipline is carefully revealed to pupils over time at the right stage of maturity.

Inflexible knowledge is very different to rote knowledge. It is meaningful. Inflexible knowledge is inflexible because the knowledge is tied to the surface structure – pupils can use it only in examples that are the same – but does not transfer to the deep structure of the idea. In other words, inflexible knowledge cannot transcend specific examples. In the above, the pupil is not able to say how the concept of subtraction could be applied to the case where the removing model breaks down.

Continuing our Shared Language

Surface structure: particular examples, designed to illustrate the deep structure

Deep structure: a principle that transcends specific examples

Rote knowledge: memorisation in the absence of meaning

Inflexible knowledge: has meaning, but limited to specific examples. A natural step to deep, flexible knowledge

From a teacher’s point of view, it is important to remember that knowledge tends to be inflexible when it is first learnt. This is a natural step. Don’t despair!

Continuing to work with this knowledge, assimilating into schema of established truths, leads to fluency and expertise. The knowledge gradually shifts from being organised around surface structure (examples) to deep structure (principles).

In order to help pupils in this shift, we must use carefully considered examples, showing not only when the learnt idea will apply, but also when it breaks down and non-examples. Teachers should be explicit in telling pupils when they have acquired rote knowledge and also open about inflexible knowledge – learning a discipline is a leap of faith for the pupil, be honest with them when they have inflexible knowledge: tell them that it is not complete, but will be built upon later. This honesty avoids one of the most significant problems in a conveyor belt, objective-led curriculum, where teachers are racing through objectives and are dishonest about inflexible knowledge. For example, it is not unusual to hear a teacher telling a pupil that ‘multiplication makes things bigger’ or that ‘to multiply by 10, just add a zero’. These shortcuts enable a teacher to ‘get through’ an objective more quickly, but they embed serious misconceptions, which are very tricky to undo later. Instead, an honest approach is much more helpful. For example, when the pupil above says, “you can’t do five takeaway eight”, we tell them that we understand why the examples they are using at the moment would make that seem true, but in fact it is possible and we will teach them how later in the curriculum.

At any given time, all human beings, including our pupils, know only what they know. Our schema of knowledge and understanding is continually growing. Educators must appreciate that and celebrate that it is a natural step to deeper and deeper understanding of the universe.

Human Cognitive Architecture

The evolutionary psychologist, David Geary, has proposed a distinction between types of knowledge. He splits knowledge into Biologically Primary and Biologically Secondary knowledge.

Biologically Primary knowledge is knowledge that we have evolved to be able to acquire easily, without the need for thought or attention. For example, speaking. Although we need to think about words and vocabulary, the act of speaking itself is untaught.

Biologically Secondary knowledge is knowledge we, as a culture, have generated. This requires attention and is difficult to learn as described earlier.

A simple example of the distinction is the fact that it is easy to ‘learn’ how to speak, but difficult to learn how to read.

There is no need to teach Biologically Primary knowledge, so schools are concerned with the business of Biologically Secondary knowledge.

This secondary knowledge is the knowledge we, as a species and social collective, have created. It is our art and our music, our science and our literature, our pursuit of sport, our love of dance, our interest in history, our rich languages. Biologically secondary knowledge is our combined culture. I like to think of biologically primary knowledge as the knowledge that keeps us alive, but biologically secondary knowledge is the knowledge that makes it worth living.

A, perhaps unexpected, result is that the problem solving is Biologically Primary. We have evolved to solve complex problems, particularly those that increase chances of survival. But as mathematics educators, we are interested not in generic problem solving, but specifically in mathematical problem solving.

Is mathematical problem solving biologically primary or secondary?

When human beings are unable to obtain knowledge from others, they use randomness as an action for generating new responses, which can then be tested and lead to hypotheses or conclusions. This is known as the ‘Randomness as Genesis Principle’. This way of creating new knowledge is incredibly inefficient and prone to significant misinterpretation.

When faced with a mathematical problem, the randomness as genesis principle could apply. That is to say, it is possible to consider mathematical problem solving as biologically primary. Pupils can learn mathematical ideas and mathematical truths without being taught. They can use randomness as an approach – brute force is the method most pupils will resort to when faced with a mathematical problem that requires prerequisite knowledge they have not been taught. Through trial, testing, errors, re-trial, drawing conclusions and iterating, it is possible for pupils to construct new mathematical meaning. But this approach is inefficient and pupils only have a finite time at school. Instead, it is far more efficient and impactful to simply teach the pupil the knowledge they require. The process of problem solving is also something that can be taught.

Pupils become significantly enhanced problem solvers if they are explicitly taught how to tackle problems. To achieve this, the teacher can:

Prepare Problems and use them in whole-class instruction
Assist students in monitoring and reflecting on the problem-solving process
Teach students how to use visual representations
Expose students to multiple problem-solving strategies
Help students recognise and articulate mathematical concepts and notation

(Woodward, J., Beckmann, S., Driscoll, M., Franke, M., Herzig, P., Jitendra, A., Koedinger, K. R., & Ogbuehi, P. (2012). Improving mathematical problem solving in grades 4 through 8: A practice guide (NCEE 2012-4055). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved from http://ies.ed.gov/ncee/wwc/publications_reviews.aspx#pubsearch)

So, I suggest that mathematical problem solving is both Biologically Primary and Biologically Secondary.

Key Principles in Cognitive Science for Learning

Cognitive science is a cross-disciplinary study of the mind with contributions from fields such as linguistics, computer science, psychology, artificial intelligence, philosophy, neuroscience, and anthropology.

Earlier in this blog, I discussed the three types of professional knowledge; propositional, case and strategic knowledge. The field of cognitive science offers much of use to educators. John B Carroll set out in 1963 on a lifetime of work to uncover how an understanding of human cognitive architecture can help educators plan, design and deliver more effective learning episodes. Many have followed and added to the canon, with some remarkable and surprising results. Much of what is hypothesised in cognitive science remains at the propositional knowledge phase and has not been able to be replicated beyond laboratory conditions. In this section of the blog, I seek to highlight just a few areas of cognitive science that we can draw on for improving the single aim of mastery: learning.

Working memory

Carleton Washburne suggested in the 1920s that the human mind can only cope with thinking about so much at once. This ‘conscious thought’ was to be defined as what one is immediately concerned with. Nowadays, this aspect of short term memory is referred to as 'working memory'. The working memory is responsible for temporarily holding information, so that it is available for processing. Most of us can cope with only a small number of pieces of information at any given time, typically 2 or 3, perhaps as many as 4 or 5.

Suppose you were asked to perform the following calculation in your head

287 x 34

This is a trivial problem to solve with pencil and paper, yet asked to perform the same task mentally, most people struggle. This is because one is being asked to process too many pieces of information at once. The working memory can’t cope.

Working memory is a limited cognitive tool both in terms of capacity and duration. It can be thought of as

interconnected cognitive mechanisms that maintain newly acquired information and retrieve stored information to an active state for processing and manipulation

This limited tool plays a crucial role. It is where thinking takes place. It is in the working memory where complex cognitive tasks such as reasoning and problem solving occur.

The working memory acts as a bottleneck between the learning of a task and the long-term memory. In order to get into the long term memory, the idea or information must first be processed in the short term, working memory.

This makes placing things in the long term memory difficult, which is very important from an evolutionary point of view. Imagine if one remembered every single thing one ever encountered! The bottleneck ensures that only important information, that is information that one has given attention to, is able to pass into the long term memory. The working memory is playing an important role as a buffer between all of the nonsense we encounter and what we remember as truth.

Cognitive Load

Cognitive load is defined as the “total amount of mental energy imposed on working memory at an instance in time” (Cooper, 1998, p. 10).

Sweller et al., 1998, suggests that this overall cognitive load can be broken into three subcomponents:

Intrinsic Cognitive Load (ICL): the load imposed on the learner by the nature of the instructional material that must be processed and learned

Exrtaneous Cognitive Load (ECL): the load imposed by factors such as instructional strategies, message design, interface design, and the quality of instructional materials and learning environments

Germane Cognitive Load (GCL): the load imposed by cognitive processes directly relevant to learning

Clearly, from an educator’s point of view, we should seek to maximise the latter. GCL is the energy being used when attending. Since attention is the only known way of making information and ideas pass into the long term memory, this energy being exerted is desirable. Learning is hard.

Given the limited nature of short term, working memory, we should also seek to minimise both ICL and ECL. Using the following colour coding to consider learning episodes

we can describe much current classroom practice as often placing great demands in terms of ICL and ECL. Learning episodes often look like

where the very nature of the learning materials being used by the teacher creates unnecessary ICL burden. These resources may be muddled, verbose, contain unnecessary information or use confusing language or diagrams. As teachers, we can lessen the demand on the brain by presenting materials that are concise, accurate, clear and relevant.

Similarly, in this typical scenario, ECL is taking up lots of mental energy. ECL is demanding when the way in which information is being communicated is long winded or irrelevant, or when the learning environment is competing for attention by containing other stimuli or distracting features. Again, it is a simple problem to solve. Teachers can communicate precisely, use appropriate media, ensure that learning environments do not distract.

When ICL and ECL are taking up so much mental energy, there is less energy available for the desirable GCL. When we minimise both ICL and ECL, learning episodes can be more fruitful by giving greater energy to attending. A more appropriate load phasing could look like

It is worth noting that significant controversy surrounds the claim that ICL can be reduced. Mayer and Moreno (2010) outline the segementing principle, which aims to reduce ICL by presenting information step by step. They claim that this helps pupils to better organise new information. Mayer (2005) suggests that ICL can be reduced by using the pretraining principle, where pupils are given information about the new content before starting the new learning unit. The intention is to increase the impact of a pupil’s prior learning on the new material.

I believe the controversy is warranted. Both of these approaches, which do appear to reduce ICL, might better be considered as simply changing the task that pupils are meeting and so not actually reducing the intrinsic load of learning the idea at all. For this reason, most instructional designers concentrate on reducing ECL only.

Anxiety and Cognitive Load

It is worth considering the impact of anxiety, since there is good evidence to suggest anxiety takes up working memory and has detrimental impact on cognitive load. (Gerardo Ramirez , Elizabeth A. Gunderson , Susan C. Levine & Sian L. Beilock (2013): Math Anxiety, Working Memory, and Math Achievement in Early Elementary School, Journal of Cognition and Development, 14:2, 187-202).

Anxiety can arise in pupils when learning new mathematics if they have a poor grasp of earlier, pre-requisite mathematics. A mastery approach mitigates against this, since, unlike conveyor belt approaches, in a mastery approach teachers homogenise pupil groups and choose appropriate starting points on the journey through mathematics such that all pupils are building on firm foundations. However, even with this approach, as mentioned earlier pupils will forget or unlearn as a natural part of the non-linear journey through learning a discipline and, given teachers are human beings too(!), we are all fallible and will make mistakes in judging the correct starting points. It is, therefore, important to continually consider this aspect of anxiety and to minimise it by always testing for prerequisite knowledge as shown in the mastery cycle diagram ealier.

Social cues also play a role in bringing about a feeling of anxiety in pupils who are learning mathematics. All mathematics teachers are familiar with the experience of hearing other teachers, parents or the media condemn mathematics as intractable and to be feared.

These fears take up working memory – literally, the pupil is thinking about their fears rather than thinking about the mathematics – so it must be addressed.

When dealing with deep structure rather than surface structure, pupils must attend to higher order aspects such as underlying relationships and general principles. This requires more of the working memory. A result of this is that anxiety is disproportionately damaging to high performing pupils. Their working memory is more disrupted because they tend to work on mathematics using deeper problem solving approaches, rather than the simplistic, single step approaches that lower performing pupils tend to use.

Another aspect of anxiety that is crucial to understand is that of teacher anxiety. Many teachers who teach mathematics do, themselves, have underlying fears about the subject. In the UK, only around 24% of mathematics teachers have a post-school qualification in mathematics, so the vast majority of the workforce is non-specialist. Teacher anxiety is communicated to pupils and can lead them to embed those same anxieties. Studies show that teacher anxiety impacts on pupil performance, with a stronger impact on girls’ performance.

Reducing pupil anxiety is therefore a goal of the effective mathematics teacher. This requires sticking to the mastery cycle, which ensures that fundamental skills are secure and assimilated before moving on and that continual formative assessment monitors for when pre-requisite knowledge is forgotten or not fully secure.

Teacher anxiety can be reduced significantly through effective CPD. This CPD should focus on how to teach a concept, rather than the mathematical concept itself. When the focus is on how to teach, teacher anxiety lessens far more rapidly than when the CPD is really about teaching the teacher the mathematics.

Assessment types can be changed too. In mastery, as described in earlier parts of this blog, assessment is not about labelling pupils, it is about working out whether or not one’s teaching has been impactful yet. There is no need to time tests or to assign grades in a mastery approach. Removing both timing and grading significantly reduces pupil anxiety and has no detrimental impact on learning (quite the opposite, in fact!)

Finally, teachers should avoid consoling pupils. This may sound counterintuitive when talking about reducing anxiety, but consoling a pupil who has answered a question incorrectly is disingenuous and gives them no help to become secure in mathematics. Rather than saying, “Well done, you tried your best, that’s all that matters”, teachers should use responses such as “yes, the work is challenging, but I know, with hard work, you can do it!”

The Difference Between Novice and Expert

As discussed, the more expert a pupil becomes, the greater the impact of anxiety, because experts work at a deep structure level, whereas novices tend to work at the surface structure level.

So, when designing instructional materials and modalities, it is important that the teacher takes account not just of the ICL – ECL – GCL relative proportions, but also the type of audience they are instructing. Novices and experts learn differently and attack problems at different structural levels. The format of instructional materials suitable for an expert may not be appropriate for a novice and vice-versa.

We have seen that as a pupil becomes more expert, they tend to consider mathematical ideas as general principles, which they can work with across various problems and formats. But pupils do not begin with expertise, they begin with inflexible knowledge, which they can use in only restricted examples. Their knowledge is superficial at this stage.

This is true of all learning. We all move from the surface level, superficial knowledge to expertise as we continue to learn. Take, for example, the trainee teacher. We were all once in that position. When observing a trainee teacher, we can see that their attention is focused on the superficial: what am I saying? How long do I spend on this? Where should I be standing? What resources should be on the table? But the expert teacher is attending to much higher level principles, such as pedagogic choice. This expertise comes about by studying (propositional knowledge), networking and learning with other as well as articulating our own experiences for critique and development (case knowledge), and most importantly through actually teaching (strategic knowledge). This latter part is critical if one is to become an expert teacher. It takes a long time – perhaps around 10 years – to experience enough real classroom encounters for this strategic knowledge to develop.

A key weakness of education systems in many of the north-western cultures is the lack of honesty and clarity about how long it takes to become an expert teacher. In the UK, a single year of teacher training, followed by a probationary year, results in Qualified Teacher Status. The assumption of many is that this is the end of the training period and that the teacher is now an skilled educator. This is clearly idiotic. Teaching is an incredibly complex profession and is a skill that continues to develop throughout one’s career. The learning never ends, one never completes the journey. There is always more to learn, always more expertise to develop.

Strategic knowledge is the most important. Experiential learning is necessary for us all to notice our own practice. Take again, for example, the trainee teacher. We all, as maths teachers have to go through the experience of finding out that it is a really bad idea to place compasses and glue sticks on the table before the start of a lesson on constructing 3d shapes. Because the kids bloody stab each other and stick the glue to their foreheads! These are things we have to experience, not simply read.

As expertise develops, the way in which knowledge is organised in the mind moves from disconnected, inflexible knowledge to a problem based schema. Experts encounter problems and are able to connect both the content knowledge and the principles and procedures necessary for attacking the problem. In the novice mind, content information and problem solving knowledge are separate.

This is why, as discussed earlier, novices attack problems with brute force trial and error. The expert, on the other hand, recognises features in the new problem that they can connect to problems they have solved in the past. They work from the known to the unknown.

Because knowledge is organised in different ways, the expert has efficient ways of addressing new problems. Their knowledge is connected, making it more easy to search their memory for similar situations and the resolutions that followed. The novice mind, with its disconnected storage, is inefficient.

Recognising the differences between novice and expert is extremely important if teaching is to be successful. Teachers, who are often expert in the mathematical ideas they wish their pupils to learn, will often forget the experience of being a novice and, in good – but misguided – faith, design instructional materials and learning modes suitable for experts (suitable for themselves!), leaving the novice pupil unable to access the meaning.

Generic Cognitive Skills and Domain Specific Skills

Human beings acquire generic skills without the need to give specific attention to the skill, they come automatically. Domain specific skills are not acquired automatically, so teachers must instruct pupils in domain specific skills if they are to be gained.

A common debate in education is whether or not skills should be the purpose of schooling. I suggest that, by making pupils bright – that is by building their schema of knowledge across multiple disciplines – they are able to think critically and creatively. There is no need to teach creative thinking – it is an byproduct of being learn’d!

Information Store Principle

Human long term memory is indescribably large – despite many efforts to determine the storage capacity (often in the language of computer science) no one has yet been able to find any limit to the long term memory. In practical terms, it appears to be insatiable. Who we are, as human beings, in every sense can be thought of as the record of our experiences, emotions, encounters, and living histories. In a real sense human beings are their long term memories.

Our long term memory is our aptitudes. The chess grandmaster is able to triumph not because of some generic problem solving skill, but because they recognise configurations and the possible futures of those configurations. They remember them. They have encountered them in the past and can all upon them. This is the only reason they are a grandmaster.

To build exceptional competence in any discipline means to build up an enormous knowledge base in the long term memory.

Using information from the long term memory takes up no mental energy. Unlike using the working memory to think about novel information, which is extremely limited, drawing on the long term memory appears to have no bounds on the number of pieces of information that can be utilised at once.

Building this knowledge base is generally achieved by obtaining that knowledge from other people through borrowing, imitating, reading and story telling.

For these processes to occur, the pupil must have a relationship with the teacher.

Relationships

Learning is a social endeavor.

Too often, this aspect of education is ignored, yet, without good relationships learning is unlikely to occur. Human beings have evolved over huge periods of time to borrow knowledge from those around them. For millennia, story telling has been the key mode of knowledge transfer, with one generation handing down a body of knowledge to the next. As described earlier, the working memory acts as a protective buffer to prevent unimportant information getting into the long term memory. So knowledge needs to be considered important by the pupil. Human relationships play an enormous part in bringing about this feeling of importance. The pupil will consider the information important when they have faith in the person telling them the new knowledge. The teacher must establish a relationship with the pupil such that the pupil trusts them and has belief in their assertions. In order to accept new knowledge as truth, the pupil first must believe that the teacher is a carrier of truth and is sincere in their desire for the pupil to become learn’d.

Too little emphasis is placed on the crucial role of human relationships between teacher and pupil (or, indeed, teacher and trainee teacher, mentor and mentee, head teacher and staff).

Teach Everything Correctly First Time

A reason pupils lose faith in a teacher stems from the common practice of teachers lying to pupils. It is a feature of conveyor belt approaches - where teachers are racing through objectives and are more concerned with coverage than learning - that teachers will conceal truth about a mathematical idea. This truth is later revealed, thus exposing the teacher as a liar. Faith falls apart.

For example, our pupil from earlier in this blog who says, “Eight takeaway five is three. But you can’t do five takeaway eight.” It can be tempting for the teacher, who simply wished to ‘get through the lesson objective’ to agree with the pupil, “that’s right, you can’t do five takeaway eight.” The pupil trusts the teacher and remembers this fact. Later, the same teacher will need to break this apparent truth. This happens continually throughout the pupil’s life at school. They are told lies such as;

“to multiply by 10, add a zero to the right hand side of the number”

“multiplication makes things bigger”

“it is not possible to find the root of a negative number”

The experience of the pupil is one of continual disappointment in the teacher.

Rather than adopting these approaches (in fact, scrap any aspects of conveyor belt in your practice!), be truthful at all times. Teach everything correctly first time. Do not use examples that are not generalisable or metaphors that break as the concept develops. Rather than responding, “that’s right, you can’t do five takeaway eight”, tell the pupil, “I can see why you think that at the moment, because we are looking at one type of subtraction, but, actually, it is possible! Isn’t that exciting! And later, as you learn more about subtraction, I will show you how.”

Narrow Limits of Change Principle

When dealing with novel information, the human mind can only process very limited amounts of information at any given time. For most of us, the working memory limit is around 3 or 4 items of new information. As described earlier, the working memory is not only limited in terms of number of pieces of information, but also in duration. Most of us can hold something in working memory for a maximum of around 20 seconds before it is lost or replaced. These two protective devices ensure the long term memory is not inundated with meaningless information. So, from an evolutionary point of view, the dramatic limits of working memory are necessary and helpful. However, from a learning point of view, these limits are inconvenient.

Working memory can also process information that is held in the long term memory. When carrying out processing of information already stored in long term memory, the operation of working memory is dramatically different; there are now no capacity or duration limits. The working memory can cope quite simply and without encumbrance with vast and varied pieces of information.

This can be utilised when learning new information. Take for example, the following list. Read the list of 20 letters and try to remember them:

This is quite a tricky thing to do. This list is new, so the working memory struggles to cope with 20 pieces of new information at once.

However, knowing that, if something is already embedded in the long term memory, we are able to work with any number a pieces of information and that the problem of duration goes away, as a teacher I can rearrange the information such that it draws upon already learnt knowledge.

Suppose we think of the domain of mathematics as a complex web of interconnected ideas

When learning a new idea, as teachers we know what previously learnt and understood ideas connect to the novel idea, so we can shine a light on the new idea from the perspective of established knowledge. This means the pupil can have far less demand on their working memory, since they are using information from their long term memory.

Here is the same list again, read it and remember it:

This list is much easier to learn. The information is the same, but the teacher has presented the information in such a way that it draws on already learnt knowledge. Because the entity ‘BBC’ is a known idea, we can think of this as one piece of information instead of three. This ‘chunking’ is a useful way of partially overcoming the limits of working memory. As teachers, we must therefore ensure the scheduling of our curriculum is such that we can allow pupils to encounter new knowledge and concepts from the view point of well-connected ideas that they have a good understanding of already.

Worked Example Effect

During their time at school, pupils in the UK have approximately 1600 hours of mathematics lessons. In this time, they are to learn around 320 novel mathematical ideas. Of course, we will expect pupils to undertake a great many more hours study and work outside of school, but the time they get to spend with an expert is limited by design. It is important, therefore, that the time pupils actually spend in the company of their teacher is used as effectively and efficiently as possible.

When asked to work on a problem, assuming the underlying knowledge is in place, pupils can go about addressing it. But, if the teacher first shows a worked example of such a problem, the pupil will then be able to address their problem far more readily. The time teachers invest in showing worked examples pays dividends.

Split Attention Effect

This view of lessons having to be efficient is often railed against by teachers – they argue that learning is not a factory process and not about efficiency. Well, duh, of course. But the reality is what it is – they only get so much time with you; you have a moral obligation to make that time as impactful as possible.

Continuing then with the theme of efficiency, we come to the Split Attention Effect. When teachers are demonstrating worked examples or preparing tasks or questions for pupils to work on, it is worth considering the limits of working memory and ensuring that – at the point of learning new material – the information is presented as clearly and with as little burden on the working memory as possible. One very simple example of this is to remove the need for pupils to split their attention between diagrams and information. So, for example, when working on a problem involving angle facts, say, rather than having a diagram on one part of the page and then a few sentences explaining the angles, we can make the information much more integrated by labelling the angles in the diagram. This physical integration of the information reduces the demand on working memory by removing the need to consider two separate sources of information.

For example

becomes

The Redundancy Effect

It should be noted, however, that it is not always necessary or desirable to integrate information into diagrams. Where the information is simply repeating what is on the diagram, there is no need to add it. That is, where the nature of the diagram itself already informs to reader, then adding information becomes redundant.

Mayer (2001) uses the term “coherence effect’ in reference to this situation.

Another aspect of the redundancy effect to consider is the gains that can occur in learning when, rather than using two modes to communicate information, one is eliminated. For example, if showing a PowerPoint slide with text, it is beneficial to avoid reading the text aloud to the audience – let the audience read it.

Variation Theory

The role of variation in learning mathematics has long since been established. Zoltan Dienes wrote on the impact that variance and invariance can have when encountering new mathematical ideas in his 1971 journal article ‘An Example of the Passage from the Concrete to the Manipulation of Formal Systems’, (Educational Studies in Mathematics Vol. 3, No. 3/4, Lectures of the Comprehensive School Mathematics Project (CSMP). Conference on the Teaching of Geometry (Jun., 1971), pp. 337-352).

In the Perceptual Variability Principle, Dienes prescribes the utilisation of a variety of contexts to maximize conceptual learning.

The Mathematical Variability Principle states that children need to experience many variations of “irrelevant attributes”. For example, there are irrelevant attributes inherent to the concept of like and unlike terms in algebra. Concepts of like terms do not depend, for instance, on the nature of the coefficients or signs. By varying the signs and the coefficients using whole numbers, decimals or fractions, and keeping constant the relevant attributes pupils will become conscious of what happens to different numbers in the similar situations while ensuring an understanding of like terms and unlike terms.

Dienes considers the learning of a mathematical concept to be difficult because it is a process involving abstraction and generalisation. He suggests that the two variability principles promote the complementary processes of abstraction and generalisation, both of which are crucial aspects of conceptual development.

The role of variation is therefore to reveal underlying relationships and principles, such that the journey to abstraction is both easier for a pupil to attain and one that they have faith in believing as truth.

Dienes continued to work on his theories of variation, with many others picking up the importance of variance and invariance for learning mathematics over the years and contributing to the evidence base.

Notably, Ference Marton in Swenden working with colleagues in China and Hong Kong to further promote the importance of variation led to their work being translated for Western audiences (Gu, Huang & Marton 2004), which had a great influence on reigniting the discussion around variation theories.

Unfortunately, the translation of their work (or, more accurately, mis-translastion) has led to a false distinction being made between procedural and so-called ‘conceptual variation’. Prima facie, this makes no sense. Concepts in mathematics do not vary!

This distinction has resulted in much muddled and damaging assertions being made in the UK about variation. In recent years, even national organisations have promoted the idea of conceptual variation in mathematics – arguing that the teaching of mathematics should include taking a concept and somehow varying it. This is clearly moronic. Mathematical concepts are not malleable.

However, this confusion should not distract from the important role that variation theory can play in learning mathematics: drawing attention to underlying relationships.

Other notable work on variation includes Mason and Watson, 2006. This important article highlights the issue in Marton’s theory. Marton suggests we learn what varies against an invariant background. But often what we hope pupils will learn in mathematics is a constant underlying dependency relationship.

Labels of ‘procedural’ and ‘conceptual’ variation do not get at the full range of the importance of variation in learning and doing mathematics, that is to draw attention to the underlying relationships.

Mathematical Confidence

A helpful result of careful and intelligent use of variance and invariance can be the building of mathematical confidence, which in turn lowers anxiety and decreases cognitive load.

In order for pupils to become creative mathematical problem solvers, it is necessary that they gain the motivation to want to pursue mathematics and persevere when faced with apparently intractable problems. Motivation – that is, the very desire to continue and go further – is greatly enhanced when pupils are successful and confident.

As described earlier, through examples we can demonstrate to pupils how to attack a type of question, scenario or problem. As teachers, we plan for the problems they will encounter and manipulate the way in which the problem will unfold before them, such that, when they are beginning to solve a problem, pattern emerges from the mist. When pupils notice pattern and relationships, they can begin to conjecture, “Ah! Look! The pattern is X, so when I do Y, what should happen is Z. Let me try!”

This builds an expectation in the pupil’s mind – they believe they have discerned relationship and can now continue to work on the problem, but now with an anticipation of what will happen and why. When these expectations are confirmed through experiment and result, pupils gain a sense of mathematical confidence.

Note: we will, of course, also manipulate problems such that the expectation a pupil has and the conjectures they make will not be confirmed. These unexpected results also play a key role in building a pupil’s ability to reflect and extend their reasoning.

There are many problems and tasks that mathematics teachers have in their canon that are designed to build such mathematical confidence. Suddenly, an apparently intractable problem becomes addressable and pupils can plot a path through.

Variance and invariance can play a powerful role in building mathematical confidence.

Consider the identity

(x – 2) (x + 1) ≡ x² – x – 2

We could demonstrate the truth of this identity in many ways to our pupils and then ask them to follow our examples to find other such identities. Often, text books will contain exercises with random questions for pupils to work through. But what if we used invariance to help build mathematical confidence.

Suppose as the next example, we looked at

(x – 3) (x + 1) ≡ x² – 2x – 3

Here, the x + 1 term has remained invariant. What do you notice?

And perhaps as the next,

(x – 4) (x + 1) ≡ x² – 3x – 4

At this point, pupils may spot pattern emerging and be able to conjecture what the next example would be.

(x – 2) (x + 1) ≡ x² – x – 2

(x – 3) (x + 1) ≡ x² – 2x – 3

(x – 4) (x + 1) ≡ x² – 3x – 4

Most pupils will look at the x – 5 example next and rightly conjecture that the coefficient of x will be 4 and that the constant term will be – 5. This confirmation of their expectation builds confidence. As teachers, we would direct them to try ‘going backwards’ and find the result

(x – 1) (x + 1) ≡ x² – 0x – 1

and so on. The pattern is useful in bringing about confidence but also in revealing the nature of the relationships between the terms in the expressions.

We will see pupils confidently deal with the case where the varying term is x – 0

(x – 0) (x + 1) ≡ x² + x – 0

which then leads, by pattern, to the natural conclusion that the next example will be

(x – ^-1) (x + 1) ≡ x² + 2x + 1

So, by keeping one aspect invariant, we are able to build mathematical confidence at the point where the task is novel and also begin to reveal underlying relationships.

(x – ^-1) (x + 1) ≡ x² + 2x + 1

(x – 0) (x + 1) ≡ x² + x – 0

(x – 1) (x + 1) ≡ x² – 0x – 1

(x – 2) (x + 1) ≡ x² – x – 2

(x – 3) (x + 1) ≡ x² – 2x – 3

(x – 4) (x + 1) ≡ x² – 3x – 4

This systematic way of working – of specialising – is what allows the pupils to conjecture. We can then change features and build towards generalisation.

Note, the power of variation here is in revealing underlying relationships and building mathematical confidence at the point of first learning. Later, when the pupil passes through the ‘Doing’ phase to the ‘Practising’ phase (when they are fluent), it is no longer desirable to give such structure. We want the questions to become random so that the pupil has to decide when to use a principle or not.

As another example, take the following sets of subtraction questions, adapted from Transforming Primary Mathematics (Mike Askew)

Which set is the most helpful in building mathematical confidence and revealing underlying relationships?

Clearly, the sets are identical sets of questions, but arranged differently. Set A is more typical of what pupils will encounter in text books, the questions are arranged randomly, with no obvious pattern emerging. Set B, however, has been arranged in such an order that pupils will spot pattern and connections. They will notice that performing 122 – 92 is the same problem as performing 120 – 90 and begin to reason why this is the case. Working with Set B, at the point at which this idea is novel, gives pupils the chance for expectation, confirmation and confidence. The teacher may suggest, “show me more questions that are the same as 120 – 90. Tell me how you know you are correct.”

As a teacher, what question would you choose to come next in Set B?

Perhaps 500 – 395 or 505 – 400, thus connecting this particular subtraction with other methods of subtraction and giving opportunities to explore relationships and connections.

Variation Theory is useful because it gives these opportunities. Variation Theory is not about extensive lists of questions where pupils stop expecting, testing and conjecturing and simply become passive in stating obvious next answers. Working with variance and invariance requires the teacher to carefully balance the benefits of confidence and relationships with the danger of long sets of questions that result in pupils no longer thinking about what they are doing.

Rohrer and Taylor (2007) found some interesting results when looking at how many questions pupils need to work on, which we shall come to later in this blog.

Again, I would suggest that Set B is a useful and powerful approach when the mathematical idea is novel, but, actually, Set A becomes the useful arrangement later once pupils have gripped the idea – the random nature of the questions forces pupils to attend to the principle as a whole and make decisions about how to work on the problems.

Using variance and invariance to reveal underlying relationships is the key purpose of Variation Theory. Another useful outcome of varying can be for pupils to discern commonalities and differences when working with examples and non-examples.

Ask a pupil to draw a triangle on a piece of paper. Almost all pupils will draw something like

This is because these are the triangles that pupils repeatedly encounter. Traveling around the UK over the past couple of decades, observing and inspecting mathematics, I have time and again seen teachers refer to triangles but only ever use these types. Pupils come to believe that ‘triangleness’ is like a ladder against a wall or the roof of a house. They believe triangles have one horizontal side. Pupils rarely draw

And for many pupils, the following is not a triangle at all

Instead, they will call this an ‘upside down triangle’.

Perhaps even more concerning is that many pupils believe that the following shape is a triangle

It is easy to see why; after all, this does look like the roof of a house.

I use this simple example of triangles to highlight the need, when introducing new ideas, to ensure that pupils encounter many examples and non-examples of the idea. We will return to examples and non-examples later.

Variation in procedure can usefully help pupils discern the key principles of a mathematical idea. We can ask pupils what is the same and what is different about procedures. For example, choose two three-digit numbers and subtract the smallest from the largest.

Use the same numbers and now, instead of how you might have gone about the original subtraction, perform the subtraction using the following rules

Each of the above procedures for performing subtraction is a formal, generalisable, recognised procedure. Each does, of course, give the same result. But why? What is the same? What is different? How are the procedures connected? What do these connections reveal about the nature of subtraction, place value, base systems and digits?

By working on this task (and please do, it’s wonderful!), we begin to see pattern in the process of subtraction, begin to discern, begin to generalise.

[Note: the above activity was handed to me on a piece of paper some years back by a maths teacher at a conference. I don’t know who that person was or who authored this task – if you do, please let me know so I can include a credit here]

I would suggest that this type of activity exemplifies effective use of Variation Theory in mathematics and is considerably more powerful in building confidence, reasoning and understanding than simply asking pupils to work through long lists of questions.

Here is another example of using variation in the mathematics classroom, which I would suggest you pause and try. I created this task a while back and have tried it with many pupils and teachers.

The initial problems are clearly trivial. But, when the base is changed, the task requires a completely different type of thinking. Working in these different bases, but in a systematic way, pattern emerges! This allows for conjecture and, finally, generalising. Importantly, working on these different bases allows pupils to have greater clarity about working in base 10.

A significant weakness in UK mathematics over the last 30 years is the absence of multi-base arithmetic. I would suggest all pupils learn multi-base arithmetic. After all, how can we, as teachers, be sure that pupils understand arithmetic if they only ever work in one base – all we have shown is that they can perform in one specialised case.

As a final example of variation, I include this question set as an exemplar of the need to show pupils the same idea in varied contexts.

Storage and Retrieval

By design, the mastery cycle seeks to optimise learning by

· ensuring all pupils are taught the right level of mathematics (just beyond what they already know), building on secure understanding of prerequisites

· giving all pupils varied experiences with mathematical ideas, that transition from doing, to practising to behaving mathematically

· ensuring that novel ideas are meet carefully in such a way that they are seen as important and draw the pupil’s attention in order to pass the gatekeeper of working memory and enter the long term memory

· continually checking that ideas are being gripped by an ever present cycle of formative assessment and correctives

· never moving on to mathematical ideas that require a current idea if it has not been fully understood and embedded

All of this is with the intention of changing the pupil’s schema of knowledge and understanding, assimilating new truth in a logical way. With all this effort to ensure that a pupil’s long term memory is changed, we now face the next challenge: optimising the storage of that knowledge and making it readily retrievable.

It has long been known that the memory is the key concern of the educator, Washburne, Ward, Burk and others were discussing the role of memory in the 1910s. In 1943, Hull wrote about memory from two points of view. Firstly, what he referred to as the pupil’s ‘momentary reaction potential’ – that is, the potential they have to use their prior knowledge in the moment through recalling learning. He noted that this varied from person to person. The second aspect of memory Hull identifies is what he calls ‘habit strength’. Hull knew that some actions required no thought, they could just be performed. Habit, is a sensible view of this since it reflects a common view of what it means to be able to do something habitually. Later, Estes (1955), refines Hull’s work and talks about ‘response strength’ versus ‘habit strength’. This takes the idea of what one can think about in the moment further and starts to apply the notion of there being a strength to this ability, which can explain the differences various people display. The idea of response strength also takes the debate towards the idea that this aspect of memory is not a fixed potential, but can be improved (strengthened). The research into these two aspects of memory continued for several decades, with large a number of experiments being carried out in laboratory conditions.

Enter Robert and Elizabeth Bjork. The Bjorks have dedicated much of their professional working life to furthering the understanding of human memory. In 1992, they go further and redefine the two aspects as ‘retrieval strength’ versus ‘storage strength’.

We now have a view of the long term memory as being able to be improved both in terms of how readily one can recall knowledge and how well that knowledge is embedded in the long term memory.

Performance is not the same as Learning

In its current incarnation, the formal examination system in the UK measures whether or not a pupil can perform on a certain bank of questions, of certain types, at a certain point in time. Performance is easy to measure, which is why national systems often resort to simplistic, mechanistic approaches for benchmarking the success or otherwise of the system.

Unfortunately, performance is not the same as learning and, more critically, is not even a good predictor of learning. Being able to perform at any given time is heavily influenced by local conditions—cues, predictability, recency—which can serve as crutches that prop up performance, but will not be there later when the knowledge might actually be useful!

We have known for a long time that current performance is a very poor predictor of long term learning, yet schools are forced to operate in ways that reward pupil short term performance over meaningful, long term learning. This, of course, leads to poor design of learning episodes, which can be praised by an inspector or observer in the moment (all the kids had smiley faces, they all put their thumbs up at the end of the lesson, everyone could do the target question at the end, the pupils all made progress!), but are in fact not learning episodes at all – they are presentation and regurgitation.

The key driver of systems adopting poor assessment practices is because they are easier and cheaper to implement. But there is another, serious reason why assessment that actually measures learning is not routinely used by national systems: cheating.

Rather than terminal performance examinations, we could instead choose an approach of continual assessment, where pupils are working very closely with their teacher, who builds strong relationships with them and gets to know them inside out. Pupils can build portfolios of evidence throughout their time at school, demonstrating mathematical understanding on deep structure problems and over sustained periods of time, as we spiral through the curriculum and pupils grow their schema. This teacher assessment led approach could discern what pupils truly do know and understand. So, why did we abandon such approaches (note, for example, the ATM GCSE, which was abolished in the 1990s, had no terminal exam) and opt for systems that measure point in time performance only? Well, continual assessment is very hard to carry out and takes a great deal of time, it also requires teachers to have very high levels of professional knowledge around assessment and make accurate judgements over time that are free of bias, it comes with an enormous moderation burden and, finally, it relies on teachers maintaining their professional integrity and ethics whilst simultaneously working in a high stakes profession. Alas alas, no system has ever been able to achieve all of this!

At a local level, however, there have been many excellent examples of continuous assessment working, including – notably for this blog – Carleton Washburne’s own schools and pupils.

We work in a system that measures performance only and we need to be alert to the flaws of such a system and alive to the extremely weak practice it can drive. It can feel rather scary for the teacher in a high stakes system to change their lesson design to focus on long term learning rather than performance, but it is morally reprehensible not to do so.

Suppose a class has just had a one hour lesson on Pythagoras’ Theorem. During the lesson, the teacher has repeatedly emphasised that the lesson is about Pythagoras’ Theorem and shown multiple examples. The teacher then gives the pupils similar questions to work on. The teacher is then pleased that the pupils can perform.

Well of course they can perform! They have just been given all the cues to do just that. They are replicating.

But what we, as teachers, want to achieve is for pupils to be able to encounter problems in the future that may or may not require the use of Pythagoras’ Theorem and for them to be able to recognise appropriate scenarios and put their learning to good use.

In other words, as teachers we should focus more on getting pupils able to know when to use an approach, rather than simply how to use the approach that day. Again, as discussed earlier, a learning episode phasing that includes only 20% new content and 80% previously learnt content helps with this, since the lesson is not then just populated with questions like the examples just shown.

The Importance of Forgetting

“In the practical use of our intellect, forgetting is as important as remembering. If we remembered everything, we would most occasions be as ill off as if we remembered nothing.” - William James, 1890

We encounter huge amounts of information in our everyday lives. It is important (for one’s own sanity!) that not all of this is remembered. Imagine if you could take a pill so you never forgot anything, it would be awful! If every single thing you had ever been told was continually to mind, the impact would literally be maddening. So, forgetting is a really important evolutionary mechanism that protects the mind. Teachers should be alert to forgetting and phase their learning episodes such that important ideas and information are brought to mind again for the pupil at the point just before being forgotten.

Desirable Difficulties

Learning is difficult, but we want our children to become learn’d. So removing as many of these difficulties as possible is clearly a useful thing to do (e.g. lessening the load on the working memory by removing distractions or giving clear instruction). But not all difficulties are unhelpful to the process of learning.

Many cognitive scientists, and in particular the Bjorks, have explored the impact on introducing difficulties during learning. This has included work on asking participants to practice not at the criteria (e.g. throwing a ball five metres and three metres, when the test will be to throw it 4 metres), interrupting the learning through distraction (e.g. when learning about one idea, periodically diverting the learner to think about an entirely separate idea) and interrupting the learning episode (e.g. instead of asking a novice tennis player to learn everything about serving a ball first, the novice is asked to learn myriad of skills, intertwined in the same learning episode).

Much of this work for a long time focused on physical activity such as sport and much of this work has not be replicable beyond laboratory conditions. However, some work in the last 30 years in particular has shown encouraging results, which bring interesting implications for the mathematics teacher.

These difficulties that increase long term learning are referred to by Robert Bjork as ‘desirable difficulties’. Bjork outlines four key desirable difficulties:

· Varying the conditions of learning

This could include varying the learning environment. Bjork looked at moving pupils between bright, clean, inspiring classrooms to dark, cramped basement like ones. There is propositional knowledge and case knowledge regarding this desirable difficulty. However, in this blog, I shall not be considering this area since I have never been able to find strategic knowledge of any impact (that is to say, I do not know of any real classroom examples)

· Distributing or spacing study or practice

Typically, pupils practise a topic in one period of time and then are tested on the topic. Spacing the topic over a longer period, with gaps in the practice has a significant impact on long term learning.

· Using tests (rather than presentations) as learning events

Rather than only presenting new ideas, asking the pupils to answer a question about that idea first has a significant impact on long term learning (even if they know nothing about it)

· Providing contextual interference during learning (interleaving rather than blocking)

Interrupting the learning of an idea with different ideas has a significant impact on long term learning.

I will expand on each of the three desirable difficulties – that have all three levels of professional knowledge to support them – throughout the rest of this section of the blog.

The Testing Effect

Exercise in repeatedly recalling a thing, strengthens the memory

- Aristotle

Regular low stakes or no stakes quizzing is a key element of mastery approaches. Washburne (though really it was Ward and Burk’s work) outlined entire curriculum journeys through each subject, punctuating the journeys with quizzes and tests.

In conveyor belt approaches, testing is used to label pupils as those who can learn well and those who can’t. In a mastery approach, testing is used to enhance learning.

When faced with learning a novel idea, even when the learning episode is highly effective, pupils very quickly forget much of what was learnt. This is a protective mechanism for the human mind and evolutionarily important. The amount of content retained after a learning episode decays quickly. However, if that learning episode is brought to mind again, the rate of decay lessens and lessens. This is yet another reason why all mastery approaches embrace a spiral curriculum model.

On the whole, the way in which teachers bring learning to mind again is to review it – perhaps through a re-teaching process or asking pupils to read their notes. This is a useful activity and does indeed improve retention by lessening the rate of decay.

A perhaps surprising result, however, is that reviewing material in this way is less impactful than simply asking pupils to answer questions on the previously learnt content. Rather than studying an idea several times throughout the spiral, it is move beneficial to replace the repeated study with testing.

Here are some typical results from Roediger and Karpicke (2006), which is one of several studies to show this ‘testing effect’

As you can see, those students who did two periods of study immediately before a test performed well. Those who did just one period of study followed by a testing exercise, did not perform as well when the test was immediately afterwards (5 minutes gap). This is what we would expect. The first group was engaged in cramming.

But, when a longer period of time passes – 1 week in this case – the results are reversed. The crammers perform significantly worse than those who studied and were then tested.

On the right hand side, another experiment shows the impact of three models. The first group had four periods of study, the second had three periods of study followed by a test and the final group had just one period of study followed by three tests. The results are striking. The crammers perform well if the test is immediately afterwards, but their long term recall is much worse. Now the group that had just 25% of the study time of the crammers, followed by three tests, far outperform all others.

The testing effect can feel counterintuitive – one would imagine that those who study for longer will have the greater long term recall, but this is not the case. Testing instead of reviewing brings much greater long term benefits. As discussed earlier, performance is not the same as learning. This is a clear example of that statement.

It is the act of asking a pupil to recall their learning (testing) that leads to greater retention.

Testing Potentiates Learning

Another powerful use of testing that the teaching for mastery teacher must be aware of is that testing potentiates learning. That is to say, testing a pupil before the teaching of an idea by asking them questions on what has not yet been learnt, alerts them to the fact that learning must happen. By considering the questions, even if they can’t do any of it, pupils become more ready to learn the new idea. They are getting a glimpse of what will be expected of them and are able to recall previously learnt material that may connect to the new problem they are seeing. This makes the pupil more alive to learning the new idea and increases their potential to learn.

Marking and Feedback

In my 2004 book, Chapter 18 is titled, ‘Marking Books’. The chapter in its entirety reads: ‘I wouldn’t bother’.

Few practices in teaching take up such enormous amounts of time and energy as marking. If we are going to dedicate such huge resource to an activity, we must be sure that there will be a significant impact on learning and that this impact is greater than if they time and energy had been invested in undertaking a different activity. Marking books, grading papers, writing comments and other common marking and feedback policies that schools deploy simply do not meet this test.

Marking and feedback can have an impact. If done extremely well and if, and only if, that marking and feedback is genuinely used to change the learning experience. In all practicality this is nigh on impossible for a teacher with 200 pupils and what we see instead is marking and feedback to tick a policy box rather than any meaningful attempt to change learning. The time wasted to such ineffective practice is vast. This time could be spent on planning learning, creating questions, developing subject knowledge and making pedagogic choice. All of these have a greater impact on learning than marking and feedback (even if done well).

The TALIS report gives us a view of the scale of the issue. Teachers in the UK spend around 10 hours per week on marking and administration related to assessment.

Furthermore, this wasted time is also a key factor in lowering professional satisfaction in teachers. Teachers regularly report marking, feedback and the recording of grades as a significant waste of their time.

Marking and feedback are a very poor use of a teacher’s time. Instead, use that time to think carefully about learning episodes and the materials and approaches you will use to communicate mathematical ideas.

If one must mark books, then finding time efficient ways to enhance learning is key. I rather like a suggestion I heard from Dylan William, instead of ticking and crossing questions, a statement on the page along the lines “there are five wrong answers here, find them and correct them”, can be a quick way of making the pupil undertake a useful activity. This creates a situation where the pupil, not the teacher, must locate and identify the incorrect responses they have given. When a pupil finds their own errors and corrects them, the gains are much greater than when they must correct an error their teacher has identified.

The Hypercorrection Effect

An area where feedback might be worth the time invested is to bring about a hypercorrection effect. Hypercorrection occurs when pupils have given a response to a question, which they feel highly confident is correct, but then receive feedback revealing their response was in fact wrong.

The feeling of surprise a pupil has when discovering something they firmly thought to be true was a misconception, leads them to better correct the original problem and to be far more likely to remember the correction in future, improving long term learning of the idea even though, following the original study of it, they had misunderstood.

Designing activities to bring about hypercorrection requires them to be such that feedback is given and takes account of the level of confidence the pupil had in their assertion. This could again lead to significant workload for the teacher and not give the gains in learning needed to justify such investment of time and energy.

Robert Bjork proposes a simple, yet powerful, alternative: better multiple choice questions.

Better Multiple Choice Questions

Traditional multiple choice questions are a quick and easy way for a teacher to glean a sense of the level of understanding in a class of pupils. These models are also useful, when used at scale in technology products, for discerning trends in strengths and weaknesses in the population. But to bring about the hypercorrection effect, we must know something about the level of confidence associated with responses.

Erin Sparck, Elizabeth Bjork and Robert Bjork designed an approach to confidence weighted multiple choice questions that achieves this (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5256426/ for further details)

Rather than only asking the pupil for their response, they must also indicate their confidence in their response. A scoring system is then used to heavily penalise confident wrong answers.

Here are some examples (note the pupils do not see the associated scoring)

As you can see in the two examples above, the pupil must choose between three possible answers to the question, but they can choose to place their response on the answer itself (confidently asserting), between answers (equally or skewed towards one they feel more confident about), or to simply state they ‘don’t know’. Giving the correct response is the best score. Asserting confidently a wrong response is significantly punitive. This helps to bring about the emotional response we are looking for in order for the hypercorrection effect to occur.

The impact of this approach is significant.

The confidence weighted multiple choice approach gives gains in long term learning over a standard multiple choice quiz. Sparck, Bjork and Bjork also explored whether standard multiple choice quizzes could be improved by asking pupils to state their confidence in their response.

There appears to be no additional gains in learning from asking pupils to state the confidence of their response on a standard multiple choice quiz. The impact would appear to be an outcome of the confidence weighting and scoring system.

Creating confidence weighted multiple choice questions is a straightforward and quick task for the teacher. So, in this particular case, it does seem a good use of time to create these simple feedback mechanisms.

Massed vs Spaced Practice

There have been a great many studies into the impact of massed vs spaced practice. Here, I will use Rohrer and Taylor (2007) as the main example, since their study specifically focused on mathematics.

Briefly, massed practice refers to carrying out all of the practice on an idea, skill or concept in one period, whereas with spaced practice the pupil practises over a longer period with gaps between practice.

Rohrer and Taylor also look at the impact of ‘light massing’, where pupils still carry out all of their practice in one period, but undertake much less practice.

In one experiment, three groups were asked to carry out different types of practice, as below

The ‘spacers’ worked on four problems, but spread out over two weeks, the ‘massers’ on the same four problems in one week and the ‘light massers’ worked on just two problems in one week. Each problem was given the same amount of practice time.

The gap between completing practice and taking the test was the same for all groups, one week. The results on the test are shown below

The spacers significantly outperform the massers. This result has been replicated many times across many disciplines.

Implications for Overlearning

The results only consider participants who answer at least one practice problem correctly.

Note that, despite the ‘massers’ undertaking double the amount of practice than the ‘light massers’, there was no significant difference in their test scores.

Because the ‘light massers’ answered at least one practice problem correctly, this finding suggests no gain resulting from overlearning. This has an important implication for teachers who set pupils practice worksheets with dozens (or hundreds!) of minimally different questions or variation theory worksheets that focus on quantity of content over the need to discern underlying relationship. It appears that, as long as pupils get at least one question correct, there is no need for a vast number of practice questions. This is an unsettling finding for many educators who have long been wedded to overlearning as an important element in the learning episode. There is not enough evidence in Rohrer and Taylor’s study to assert that overlearning is not effective, but it should at least raise the question when one is designing practice problems.

It is possible that overlearning might have significantly boosted test scores if there had been, say, a tenfold increase in the amount of practice rather than twofold, but given the constraints on time that teachers face, the gains from such overlearning might not be worth the amount of time needed to undertake the activity.

A null effect of mathematics overlearning was also observed previously (Rohrer & Taylor, 2006).

Blocked vs Interleaved Practice

Blocked practice refers to the practice of learning about and practising one distinct aspect of a domain at any given time. Robert Bjork often tells the story of learning to play tennis under the guidance of a professional tennis coach. The trainee will be instructed on how to serve a ball and then practise this one micro-skill for weeks. Once deemed to have gripped this one aspect, the coach then instructs on the next micro-skill, say, backhand and so on. Interleaved practice refers to the practice of skills or ideas in a phasing that is disrupted by the practising of other skills or ideas. These can be related or not. In the tennis anecdote, the novice player now has practice sessions that include all of the micro-skills. The initial experience of this is confusion and difficulty for the new player, since they are being asked to get to grips with lots of unfamiliar and unconnected movements all at the same time. But, over time, the interleaved practice leads to some interesting results.

Taking another example from Bjork, he looked at participants trying to learn the style of some unfamiliar artists. Some participants were asked to study an individual artist’s work all at once before moving on to the next artist (blocked practice), whilst others had to learn all of the artists’ styles in a randomly presented sequence. For example,

Intuitively, most people think that learning the style of one artist at a time would result in being able to firmly grip the similarities in that artist’s work and, therefore, be able to spot a painting by the same artist in future because it would contain those same similarities.

However, as discussed earlier when considering variation, it would appear that discerning differences in styles, which is what the interleaved approach achieves, was more beneficial in terms of long term learning.

To measure the learning, Bjork showed participants new paintings that they had not yet encountered and asked them to choose the correct artist.

The results of the experiment show a significant increase in performance by those who were asked to study the styles through interleaved (spaced) practice.

It is also interesting to note that the participants themselves expressed strongly that they would perform better using blocked (massed) practice over interleaved practice. This remains their belief even after they have been shown the actual results!

This strong bias for practising in a blocked way is likely a result of experience – after all, it is how almost all educators and trainers ask their pupils to carry out practice.

Given that interleaved practice leads to better retention, the implication for the teaching for mastery teacher is to be able to design practice sequences that highlight not just what is the same but also what is different. There is a strong link to variation theory here, which is about discerning underlying relationships and principles in and across ideas.

For example, the teacher who is trying to get their pupils to grip a sense of ‘triangleness’, should not only use examples of triangles, but should interleave these with examples of non-triangles.

Much of the research around interleaving is centred on physical skills, such as the tennis example, but we, of course, are interested in the evidence directly related to mathematics.

Let us, again, turn to Rohrer and Taylor. In their paper, “The shuffling of mathematics problems improves learning” (2007), they considered these hypotheses using mathematical content.

Participants were taught and then asked to practice finding the volume of four different solids.

They were later tested, with questions looking typically like

Groups of participants followed different practice procedures, with one group undertaking interleaved practice and the other blocked practice.

The results reflected earlier studies of blocked vs interleaved practice, with a significant increase in performance from the interleaved practice group.

Just like the results of cramming shown earlier, immediate performance is better when the participants blocked their practice. But when tested later, the tables are turned, with the interleavers far outperforming the blockers.

Once again, the implication for teachers is to carefully consider the difference between immediate performance and long term learning. Clearly a one-hour lesson containing blocked practice will look more ‘effective’ to the inspector or observer, since the pupils will perform well in that immediate time, but this common practice would appear to have poor results when it comes to long term learning. The somewhat messy looking interleaved lesson – certain to upset the inspector! – is actually the desirable practice procedure to engage pupils with.

Rohrer and Taylor postulate that the superior test performance after interleaved practice is a result of requiring the students to know not only how to solve each kind of problem but also which procedure was appropriate for each kind of problem. This supports the point I make earlier that it is more important for a pupil to know when to use an approach rather than simply how to use the approach.

The Generation Effect

Malcolm Swan asked 779 key stage 4 pupils to recall how often particular scenarios occurred in the classroom Pupils, quite rightly, report that the most common scenario is they listen while the teacher explains. This is, of course, a very good activity.

What is interesting about the results is what pupils report as being less common.

Routinely, pupils report that they do not have many opportunities to create their own questions. In other words, pupils report that they are not being asked to conjecture.

In the 1980s, teaching for mastery was generally referred to as 'diagnostic teaching'. Here are some excepts from the teacher standards at the time:

· Explore existing ideas through tests and interviews, before teaching.

· Expose existing concepts and methods

· Provoke ‘tension’ or ‘cognitive conflict’

· Resolve conflict through discussion and formulate new concepts and methods.

· Consolidate learning by using the new concepts and methods on further problems.

It was an expectation that teachers should provoke tension and cognitive conflict. That is to say, teachers would design problems and activities that led to pupils questioning something they had held as truth (much like the hypercorrection effect discussed earlier). An important part of this process is for pupils to conjecture, test, confirm, generalise and reason. In doing so, pupils follow their own lines of inquiry and ask their own questions.

The ‘generation effect’ tells us that if we give pupils minimal information and then ask them to generate a problem, they will retain the learning far longer than if we simply give them the problem to solve.

It is important that pupils believe they are generating their own problems, but of course, the teacher has designed the scenario such that the pupil will ask the questions we want them to ask. This is not discovery learning!

It is incumbent upon the teacher to ensure that the pupil will be able to succeed at generating appropriate questions by making sure the required knowledge and understanding is in place and by having a good view of what the pupil already knows and believes.

The implication for teachers is clear: the teacher should ask themselves how often they create opportunities for pupils to generate their own questions to solve and how to go about designing such opportunities. Some powerful examples include the use of ‘Always, sometimes or never?’ prompts, asking ‘what is the same and what is different’ and using the prompt ‘and another… and another… and another…’, to make pupils continue to generate new examples or counter-examples.

Performance is not a Good Proxy for Learning

The trap of high stakes systems is that teachers are judged on what can be observed. Unfortunately, what we can observe is performance, which is an unreliable indicator of learning. In the moment, during an inspection, for instance, we can only infer learning.

We have seen that conditions of instruction that make performance improve rapidly often fail to support long-term retention and transfer, whereas conditions of instruction that appear to create difficulties for the learner, slowing the rate of apparent learning, often optimise long-term retention and transfer. This issue presents a real challenge to those who wish to judge the effectiveness of teaching through observation – that is, it’s pretty much impossible to reliably do so! This type of inspection is both laughable and damaging, since it drives counterproductive teaching practices.

The reality is teachers do exist in a landscape of inspection and this is not going to go away, so it is incumbent upon the profession to at least ensure inspection is as meaningful and formative as possible. This means training inspectors to make long term inference rather than immediate performance observations. This is clearly a more intellectually demanding task to carry out, but there is surely no excuse for not trying to make inspection better reflect what we know about long term, sustained and meaningful learning.

Teachers and pupils can be fooled

The lure of performance means that teachers become susceptible to choosing poorer conditions of instruction over better conditions and pupils to preferring those poorer conditions.

If teachers and observers applaud rapidity and apparent ease of learning during lessons over conditions that more readily lead to long term retention, a system wide preference and bias for poorer conditions of learning becomes the accepted norm.

Also, pupils do not appear to develop a nose for identifying impactful ways of learning. Rather, they are misled by indices, such as how fluently they process information during a re-reading of material, into believing in poorer conditions of learning.

This appears to be the case across several aspects discussed above, as the graphs below demonstrate.

This unshakable misconception that we, as learners, carry is an important consideration for the teacher. Pupils are repeatedly biased towards modes of learning that actual results show to be less effective than the modes they determine to be unhelpful.

The Teacher Parable

Another finding – one I believe most teachers actually know in their hearts – that teachers should be aware of is that teachers themselves almost always overestimate the impact of their teaching.

A nice example of this can be found in Newton’s experiment looking at the perception an instructor had about the impact of their teaching against the actual impact.

Newton created two groups of participants; tappers and listeners. The tappers were handed a card on which was the name of a popular melody (e.g. Happy Birthday to You). The tapper then tapped out the melody on the table with their finger. The listeners then recorded the name of the melody they believed they had just heard.

The tappers were asked to predict how many of their melodies had been correctly identified by the tappers. Here are the results

As you can see, the tappers wildly overestimated their musical performance!

In the tapper's mind, the melody they are tapping out is part of the overall song they can ‘hear’ in their head. They hear the instruments and lyrics, the familiar tempo and all the richness of the music. So, to the tapper, it is obvious what melody is being performed.

The listener has none of this context, none of this background information. All they have is a novel set of tapping noises and rhythm.

This is often the case when mathematics is being taught too. The teacher has forgotten what it is like – intellectually and emotionally – to be in the position of novice. They embark on the teaching of, say, introducing trigonometric ratios, with all the richness of background information and connections to other mathematical ideas (including ideas conceptually beyond this stage), and have a sense of ease about the new idea. This sometimes leads to ideas being communicated to pupils as though they are also expert. The teacher, believing their explanation to be clear, sensible and obvious, often gains a false sense of security in the impact of their teaching, just like the tappers did.

This is why in a teaching for mastery approach, continual assessment through questioning, discussion, listening, observing and quizzing is so important. The teacher must always be checking they are not fallen into the trap of the Teacher Parable.

Moving from Propositional to Strategic Knowledge

For decades, I have been implementing the strategies described in this blog in my own classrooms and with schools I work with. Taking John Carroll’s seminal work on cognitive science from the 1960s onwards, building the understanding with findings from many others over the years and trying to untangle those hypotheses that are not replicable beyond controlled laboratory conditions with those theories that have been shown to work in the classroom. I have been fascinated and am obsessed with finding answers to questions such as

· How long passes before someone starts to forget something?

· What is the most effective period of time to allow to pass before using the testing effect to force a pupil to recall?

· When should old material arise again in the spiral?

· How much maturation must occur before a pupil can effectively use that prior learning and understanding in their own inquiry?

These questions have been intractable for many decades now. Experiments have been limited in scale and scope, meaning the data available to address these fundamental questions is not yet sufficient to give educators truly useful guidance.

Around 15 years ago, along with a group of colleagues, I started to propose a large scale data collection that might help to give new insight. We designed and built an online system, over many iterations in different countries, capable of capturing data on not just pupil performance, but also on teaching decisions, curriculum planning, learning episode phasing, pupil retention, forgetfulness and spiral intervals. Now with millions of data points collected, Complete Mathematics (our online platform) is starting to reveal interesting patterns.

Of course, these are only patterns at the moment and we are growing the community all the time and waiting for the data bank to build up into the hundreds of millions of data points rather than just tens of millions. At that stage, I will publish the trends of inferences that the data suggests.

To date, we are seeing interesting commonalities around high test results and long retention related to the nature and phasing of the spiral in use (schools can personalise the model from the default one).

I would like to end this part of the blog by sharing these very tentative results.

Firstly, we are seeing correlation between high retention rates and test results when a novel idea is encountered in study mode and then is met again over three testing moments in the spiral. Beyond three times, there appears to be no discernable difference, below three there is poorer retention and test performance over time.

At the moment, most of the models indicate four sequential study and test encounters with the novel idea over four learning episodes.

The experience for the pupil is a learning episode studying a novel idea (takes as long as it takes) as described earlier in this blog. The next learning episode is concerned with a new idea and the pupils are studying that idea, but content from the previous idea is also contained in the lesson (though no teaching of this previous idea occurs), meaning pupils need to recall and answer questions on the previous idea (the testing effect). This continues to build up so that, by the fourth learning episode, the content of the episode is 20% study of a novel idea and 80% testing of three previous ideas (actually, older ideas are often included too in the form of the weekly, no stakes quizzes that Complete Maths pupils undertake outside of class time, but this content does not appear in class time).

With this scheduling of study-test-test-test for each novel idea, future test performance is greatly enhanced.

The second finding I would like to share here relates to the timing of study episodes. As discussed previously, every idea occurs again in the spiral so that pupils can consider the idea from the point of a more mature schema and see further connections and make further reasoning. These results are very nascent and should be read as simply an interesting early finding and not used to change the scheduling of your curriculum.

We are seeing correlation with high rates of retention of test performance over time with the following spacing of study periods.

There does not, at this stage, appear to be any additional gains in studying the idea again after the 90 day study.

So, on early indications, each mathematical idea will have five study periods and 15 testing periods on the entire journey through mathematics.

There are approximately 320 mathematical ideas that pupils are required to grip in their entire time at school in the Complete Maths model. This means, for a pupil to best grip all of those ideas, we need to provide approximately 1600 learning episodes between year 1 and year 13.

In the coming years, we will continue to monitor, refine and expand our model to take account of effective trends. We will move more deeply from correlation to causation and, since our data set is live and vast, hope to be able to confirm some of the assertions that Carleton Washburne made a century ago and cognitive scientists have been able to replicate in real classroom conditions at the small scale.

***

If you have enjoyed reading the first three parts of this discussion of a mastery approach, you might also like to continue reading the rest of the story in my book Teaching for Mastery.

Saturday, 18 August 2018

Table of Contents

About Me

Blog Archive