Are you bribing or rewarding your dog?

Every dog needs to run

In a couple of my recent posts, I’ve tried to counter some of the arguments which some well intentioned although, misinformed dog people, use as arguments against some types of training methods. Here I’ll discuss some of the arguments more traditional, or force based, trainers use to illustrate why we shouldn’t use reward based training exclusively when training our dogs. Again, to reiterate, I am a 100% force free trainer and behaviour consultant.

The first thing to discuss is the scientific explanations of some aspects of learning theory which can cause some inexperienced reward based trainers to become unstuck. Firstly, a few terms and their explanations.

Fixed rate of reinforcement – this is when an dog receives a fixed number of rewards for a fixed number of behaviours. An example would be 1:1, where one behaviour gets one reward. A vending machine pays out on a fixed rate, money goes in, can of Coke comes out. Every time. Other examples could be 2:1 or 3:1, where the animal is reinforced after two and three behaviours respectively.

Variable rate of reinforcement – this is when the reinforcement occurs randomly. It is usually done on average. So, you could have a random variable rate of 3, which means the animal is reinforced after the 1st, then 5th, then 2nd, then 4th behaviour. A slot machine/fruit machine pays out on a variable rate of reinforcement. We keep playing for the chance of a pay out.

So now that the science is out the way, here is how we can apply (or misapply!) it.

When teaching a new behaviour, we want to reward our dog every time he does the behaviour. So when teaching a dog to sit, every time he sits, he is rewarded. Now the reason we do this is to teach him that it is worth his while to do it. When your dog reliably sits on request, it is time to move to a variable rate. The reason we do this is because if we don’t do it quickly enough, the dog will no longer work unless it is being reinforced every time. To use the vending machine example, if the Coke machine swallows your money, you may put in another coin. If it eats that amount, no one puts in more money in the hope that it will pay out this time. You assume the machine is broken and chalk is down to a bad experience. So, back to dog training. If we use a fixed rate of 1:1 for too long, when you try to stop doing it, your dog thinks you are “broken” and stops working.

Now, to add insult to injury, this isn’t as bad as it gets. Say we are using food to train. If we don’t get the food out of our hands quickly enough, the dog then doesn’t work unless he sees the food. At this point we are bribing the dog, not rewarding him. The sequence goes like this:

Bribing; food in hand, dog sees food, dog does behaviour requested, dog gets food (dog only does behaviour if he sees food)

Reward: food is hidden( e.g. in pocket) dog does behaviour requested, dog is rewarded with food for good work (dog is willingly working for the chance of reward)

To make the dog willingly work for the chance of reward, we need to put him on a variable rate of reinforcement as soon as we can. To do this, we begin with low averages, which means we reward the dog, on average, every second or third behaviour. When your dog is showing progress at this level, we can then increase the average to every fourth or fifth behaviour and so on. However high you want to set the bar is up to you, what you want to achieve and what your dog is capable of. Some dogs are willing to keep working and working (border collies are a good example, although this isn’t cast in stone). Other dogs reach a point of diminishing returns where they decide that the level of work they are offering isn’t worth the payout ( my own mastiff breeds for example). Each dog is different, as is the skill level of each owner/trainer.

Not applying the science of learning theory properly, can lead to more traditional trainers calling us “treat dispensers”. Proper application takes a good understanding of the science and capablitly to do it, which usually comes with experience.

To wrap up, a bit of practical advice if you have been bribing or using a fixed rate of 1:1 for too long. If you have been bribing your dog, start by not giving the dog every time. So you might have your treat in your hand and only give the dog the treat 8 times out of 10 (this is a rate of 10:8, ten behaviours for eight reinforcements) and reduce this number. Then try putting the treat away but increase your rate of reward.

If you have fallen into the trap of rewarding your dog every time, do likewise. Offer rewards 8 out of 10 times (10:8), then 7 out of 10 (10:7), then 6 out of 10 (10:6 or (5:3). When you get to about five out of ten (10:5 or 2:1) start to gradually introduce the random rate. This might take a few weeks, but if we reduce the reinforcement rate slowly enough,we should be able to rectify our mistakes. Push your dog enough that you make progress, but not too hard that she stops working, If she is finding it too difficult, go back a step, or half a step and try again.

Wee Staffordshire

5 thoughts on “Are you bribing or rewarding your dog?

  1. Interesting article. There are ways to get the animal on a varable RoR without having to ‘not’ reinforce every attempt and it works through the shaping process. The difference I find with this is that dog trainers talk about finished behaviours, whereas horse trainers are always shaping behaviours. We never seem to class our behaviours as finished, we always see room for improvement.

    Other ways you can avoid the stress that the vending machine (or gambling) effect can bring to some animals is to use smart reinforcers…use other behaviours with a strong history of reinforcement to reinforce the behaviour you are currently working on. This is the taking advantage of the fact that the click works in 2 directions (to reinforce the behaviour that just occured and to cue the next behaviour).

    I’ve also found that its not so much about how fast I get the food to the animals….its about how quickly I get in to food delivery mode. If I’ve always been fast about getting the food to the animal then that is what they will expect. If I have always been prompt at getting in to food delivery mode but not rushing to get the food to them then I have just bought myself more time to be able to take my time and therefore not make mistakes. I find the horses have forced this concept with me as often I see a horse do something that I want to reinforce when I am at the opposite end of the yard to that horse. So that means the horse has to be recognising my body laguage of “I’m in food delivery mode” when I start to weight shift, walk and my hand starts moving to the treat pouch. If they were working off the time it takes for me to get the food to them then I would be in big trouble and it would significantly limit the ‘ad hoc’ training that this other approach allows me.

    That then brings up the subject of the power of anticipation and how to use that to our advantage in training. I think it was Kay Laurence who wrote about that.

  2. Your blog, website and YouTube videos are fantastic. Thanks for sharing and helping pet dog owners like myself as your work with reactive dogs has really helped me with my terrier cross rescue. We need people like you on TV to help educate people, not Cesar Millan.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s