In a couple of my recent posts, I’ve tried to counter some of the arguments which some well intentioned although, misinformed dog people, use as arguments against some types of training methods. Here I’ll discuss some of the arguments more traditional, or force based, trainers use to illustrate why we shouldn’t use reward based training exclusively when training our dogs. Again, to reiterate, I am a 100% force free trainer and behaviour consultant.
The first thing to discuss is the scientific explanations of some aspects of learning theory which can cause some inexperienced reward based trainers to become unstuck. Firstly, a few terms and their explanations.
Fixed rate of reinforcement – this is when an dog receives a fixed number of rewards for a fixed number of behaviours. An example would be 1:1, where one behaviour gets one reward. A vending machine pays out on a fixed rate, money goes in, can of Coke comes out. Every time. Other examples could be 2:1 or 3:1, where the animal is reinforced after two and three behaviours respectively.
Variable rate of reinforcement – this is when the reinforcement occurs randomly. It is usually done on average. So, you could have a random variable rate of 3, which means the animal is reinforced after the 1st, then 5th, then 2nd, then 4th behaviour. A slot machine/fruit machine pays out on a variable rate of reinforcement. We keep playing for the chance of a pay out.
So now that the science is out the way, here is how we can apply (or misapply!) it.
When teaching a new behaviour, we want to reward our dog every time he does the behaviour. So when teaching a dog to sit, every time he sits, he is rewarded. Now the reason we do this is to teach him that it is worth his while to do it. When your dog reliably sits on request, it is time to move to a variable rate. The reason we do this is because if we don’t do it quickly enough, the dog will no longer work unless it is being reinforced every time. To use the vending machine example, if the Coke machine swallows your money, you may put in another coin. If it eats that amount, no one puts in more money in the hope that it will pay out this time. You assume the machine is broken and chalk is down to a bad experience. So, back to dog training. If we use a fixed rate of 1:1 for too long, when you try to stop doing it, your dog thinks you are “broken” and stops working.
Now, to add insult to injury, this isn’t as bad as it gets. Say we are using food to train. If we don’t get the food out of our hands quickly enough, the dog then doesn’t work unless he sees the food. At this point we are bribing the dog, not rewarding him. The sequence goes like this:
Bribing; food in hand, dog sees food, dog does behaviour requested, dog gets food (dog only does behaviour if he sees food)
Reward: food is hidden( e.g. in pocket) dog does behaviour requested, dog is rewarded with food for good work (dog is willingly working for the chance of reward)
To make the dog willingly work for the chance of reward, we need to put him on a variable rate of reinforcement as soon as we can. To do this, we begin with low averages, which means we reward the dog, on average, every second or third behaviour. When your dog is showing progress at this level, we can then increase the average to every fourth or fifth behaviour and so on. However high you want to set the bar is up to you, what you want to achieve and what your dog is capable of. Some dogs are willing to keep working and working (border collies are a good example, although this isn’t cast in stone). Other dogs reach a point of diminishing returns where they decide that the level of work they are offering isn’t worth the payout ( my own mastiff breeds for example). Each dog is different, as is the skill level of each owner/trainer.
Not applying the science of learning theory properly, can lead to more traditional trainers calling us “treat dispensers”. Proper application takes a good understanding of the science and capablitly to do it, which usually comes with experience.
To wrap up, a bit of practical advice if you have been bribing or using a fixed rate of 1:1 for too long. If you have been bribing your dog, start by not giving the dog every time. So you might have your treat in your hand and only give the dog the treat 8 times out of 10 (this is a rate of 10:8, ten behaviours for eight reinforcements) and reduce this number. Then try putting the treat away but increase your rate of reward.
If you have fallen into the trap of rewarding your dog every time, do likewise. Offer rewards 8 out of 10 times (10:8), then 7 out of 10 (10:7), then 6 out of 10 (10:6 or (5:3). When you get to about five out of ten (10:5 or 2:1) start to gradually introduce the random rate. This might take a few weeks, but if we reduce the reinforcement rate slowly enough,we should be able to rectify our mistakes. Push your dog enough that you make progress, but not too hard that she stops working, If she is finding it too difficult, go back a step, or half a step and try again.