The secret world of data

Publication Date
Professor Susan McVie
Jan Savinc
Harriet Baird
Dr Areti Manataki

Data is all around us. How our data is used, by whom and for whose benefit is a major topic of discussion. So, what is really being done with it? Who can use it and why? And how can it help people?

This event will unpack how public sector data is being used securely for research and explore how linking different types of data can provide insights that benefit society.

From end-of-life care to Covid fines, we will discuss how researchers use data to understand the world better and improve public services, and how data is protected, and people’s identities are kept safe.

Curious 2023

RSE Curious logo 2023

This event is part of Curious 2023.

Get under the surface with Scotland’s leading experts! The Royal Society of Edinburgh’s summer event series, Curious, is back from 04-17 September.

Delve deep during thought-provoking discussions, explore cutting-edge research and ignite your curiosity through a range of engaging talks, workshops, tours, and exhibitions. Join in this celebration of extraordinary people discussing big ideas!

To get involved or see more Curious events visit


Please note transcripts are automatically generated, so may feature errors.


Ready Right okay, I suggest we make a start. Hello, everyone, and welcome to this session the secret the secret world of data. My name is Dr Areti Manataki . I’m delighted to welcome you to this event, a brief introduction to the Royal Society of Edinburgh. So the RSE is Scotland’s national Academy. And as such, it connects and engages nationally and internationally to share knowledge and tackle the most pressing challenges of the modern world. You can see here knowledge made useful. So it’s all about that. Some housekeeping first, so just to advise you about the RSEs, fire safety and vacation procedures. So in the case of a fire, please assemble outside the dome. And you can see more details here on this slide. Also, just to let you know that we’re going to be live streaming this session on YouTube. A brief outline of the event now. So the title of the session is the secret world of data. The main questions that are to be answered in this session is how is our data being used, by whom and for whose benefit and we will unpack how the public sector data is being used securely for research, and explore how linking different types of data can provide insights that benefit society. This, this event is part of the curious programme that is running in September from the fourth to 17th with a range of events offering insight from some of the world’s leading experts on a range of topics addressing the theme of under the surface. Allow I’d like to introduce our speakers. So today we have Professor Susan McVie the professor of Quantitative Criminology and co director of SCADR at the University of Edinburgh SCADR is the Scottish Centre for administrative data research. Along with her we have a Jan Savinc research fellow in the same Centre at the Edinburgh Napier University. And our last speaker is Harriet Baird. An engagement the engagement lead of SCADR at the University of Edinburgh. I hope you can join me to welcome them.


Hello, everyone, thanks so much for joining us today. I know it’s really sunny outside, but I’m glad you’ve all got some fans. And hopefully it will stay pretty cool in here. Thanks to all those who are joining online as well. Perhaps you have an iced drink with you, or the like. We’re here today to talk all about data. So it’s something that’s ever present in our lives and a hot topic of debate. But do we know what kinds of social research are being carried out using data? And have you ever wondered what researchers do? And are you curious about how they actually access the data we retain from SCADR and we’re made up of universities from across Scotland, so not just in Edinburgh, but in Glasgow and St. Andrews. And we’re all social scientists, and dare I say data geeks, we’re funded by the Economic and Social Research Council to do this kind of research. And we’re part of a UK wide initiative called ADI UK, which is four nations wide. And the aim of that partnership is to do data research for the public good. So today we hope to share more about what we do with this particular type of data that we call administrative data. We hope that you’ll learn a lot from this session. There are some interactive elements, dare I say minor acrobatics, and also a quiz to start off with. So I don’t know whether you’ve seen the QR code either on your seat on the screen. If you’re online I mean, it could be in the chat. And people might have logged in already. But you can only get so far without me having to move things on. So we want you to do this quiz. And Susan is going to talk, talk through it with the results. Because we just want to get a sense of whether you know what on earth we’re talking about, and what we’re starting with. So hopefully, you should be able to log in, and you should be able to answer the first question. So let me know how that’s going. Yep, great. I can see lots of people are starting already. So I’ll pass over Susan, to Susan, who’s going to run through the quiz. Thanks,


thank you very much. I’m gonna you cannot, you can either stand at one side of the stage or the other, but not in the middle because you get blinded by the super trouper at the back. So apologies for snot kind of standing in the centre of the of the stage. And yes, there’s Harriet said, we’re going to start by talking a little bit about what administrative data is. And we wanted to get a kind of sense from yourselves, both those of you in the room welcome. And those of you that are beaming in from somewhere else about what you think administrative data are, what they’re used for, and how, how we keep them safe. So the first question was, what is administrative data, and you’re very clever people, the vast majority of you have already got this, right. It’s all of the things that we’ve listed there. administrative data can be held, it is data that’s held in administrative systems is typically collected by either private or public sector organisations. And it’s used by their organisations in order to conduct their business wherever that business might be. So hospitals, schools, the police, and banks, they all collect some kind of administrative data about the people that they have contact with. And they use that data to conduct their business. Now, obviously, sometimes that’s population data. So for example, the National Health Service will pretty much have information about the whole population. And so that’s really big data. And sometimes administrative data is called big data. And other times, it might be information that’s gathered just about a subset of the population. So for schools, for example, only have information about pupils, maybe a bit about parents as well, prisons will only collect information about the prison population. So we sometimes work with very, very large population based datasets. And other times, they’re smaller ones. So we’re going to move on to the next question now. And this is why is administrative data linked together for research? So we’re going to be fine at four sorry, options there. So if you want to just take a few seconds just to answer this question, and you should be able on your screens to see what the results are as they pop up, I’m watching them on the screen. Unfortunately, we couldn’t get them on the screen, so we could watch it live. But you should be able to see things coming through on your screen. So I can see that everyone so far. As we go through. We’ve got a couple of outliers. This is great. It’s a bit like Paul might feel a bit like John Curtis. So most people are saying that linked data is used for understanding more about how to improve, improve, sorry, public services. And you’d be absolutely right. Now, administrative data is not collected for research purposes, as clearly, as I said, it’s collected for administrative purposes. But it is good research, good data that can be used for research. But there’s very, very tough restrictions on how we can use it and what we can use it for. So yes, we use it to understand more about how to improve public service services. But we can’t use it for those these other things here. So we can’t just use it to indiscriminately find out about people and look at their private lives. However much we will might want to do that. And we can’t get data just to help people with their problems. It is possible that organisations may share data between themselves to do that kind of thing. But data that’s linked together for research purposes, cannot be used to identify anyone, and therefore it can be used to directly benefit those individuals in terms of problems they might have in their life. In fact, all of the data that we get has been de identified. That means that the names and addresses of people have been taken off. Now we might get some bits of personal information. So we might get date of birth. For example, we might get somebody’s postcode, but we would only get that kind of information if we absolutely needed it to answer research questions. Otherwise, that even that data would be truncated in some way. So instead of getting postcode, we would just get local authority or data zone or some other geographical indicator and instead of getting data birth, we would only get the year that someone’s born, for example, and also at the end there where it says to ensure different organisations know what information they hold the right people. That doesn’t happen either. So data that shared for research does not mean that the organisations that share data get to see anything else that any other organisation holds, they don’t. Okay, so we’ll move on to the next question. What’s the benefit to the public of doing research using administrative research? So if you just quickly have a go at Question two, we’ve given you four options there again.


Okay, so a couple of people thinking that it’s just for helping to improve services and better understand social problems, most people choosing all of the above, and all of the above is correct. Again, these are just three of the uses of these sorts of data, it is used to help save lives, we do. So there are people that are using administrative data, for example, to identify, you know, the more effective treatments for cancer, it does help to improve public services, we do a lot of work with policymakers and practitioners, to identify questions that are of value to them and shaping the way they deliver and develop services. And we also help we also use it to understand social problems, a lot of research is done around things like inequalities, for example, so it feeds in to government, policy development, as well as the development of practice within a lot of organisations. So we’ll move on, again, Harriet, the next question is, what do you think, is the most important way to make sure people’s personal data is protected? And we’ve given you three options, legal safeguards, physical safeguards and ethical safeguards? So there’s no right or wrong answer to this question. This is just what you think is most important. And we wanted to kind of get a sense of where people, you know, were in the room around what kind of protocols they might expect to be in place for these kinds of data. So we’re getting we’re getting quite a few results coming through for all three, at the moment, legal safeguards is in the lead closely followed by ethical SEO, and now feel like a kind of racing jockey, kind of. And but actually, all of your picking, you know, one or other of these three, and and actually, you need them all. So there is no one that is more important than the other. We all need all of these three types of safeguards in order to make sure that everybody’s data stays safe. It’s no good just having legal things if you don’t also have ethical protocols, and physical ways in which we keep data safe as well. So we’ll need all three. And during the course of the presentation, today, we’re going to talk about the ways in which we keep data safe through all three of these routes. I think this is the last question or do we have two more? Two more? Okay, so it’s so this is just for us to get a sense about the way in which you think your data is being used? How confident are you that your data is being used safely? For research? Okay, so that’s very interesting, and answers coming in. So it’s great the way it jumps up and down. So most people are going for the middle options. So the kind of I’m a bit confident, I’m a bit not confident, not so many people going through the two extremes. And if we had had a third option, or fifth option in the middle, we would probably have had a lovely normal distribution curve about now. But we’re getting more people slightly going on the fairly confident than the not very confident, but there is a bit of there’s understandably, a little bit of caution, and maybe a little bit of concern about the ways in which your data might be being used by people like me. So another of the purposes of today’s event, is to try and kind of reassure you that we do lots of things to try and make sure that your data is protected while we’re doing research with it. And that the level of risk for any project that we undertake is very, very, very low. Okay, final question, I think area. How much do you agree that research using administrative data is of benefit to the public? So again, if you just take a few seconds to answer this one? Oh, that’s very interesting. So the majority of people at least agree either strongly or a bit, and we haven’t had anyone so far that disagrees with that. So that’s good. Or maybe that’s, maybe that’s because of the nature of this particular audience, though, just just when I just when I say nobody disagrees, somebody decides to disagree. So that’s the the beauty of live performance. Definitely not going to be applying to be in the fringe. And so most people are agreeing that there is a public benefit. And again, another one of the purposes of today’s event, is to talk to you about some of the research that we do. And the fact that we’re not allowed to do any research unless we’ve already proved to at least three different sets of people that it’s for public benefit. Okay, so I think we’re going to move on now. And we’re actually going to take you on a bit of a journey. And the journey is Harriet’s journey to becoming a safe researcher and being able to do administrative data research. And she’s going to have to pull apologies for the terrible metaphor. But she’s going to have to jump through a number of hoops in order to do this, although I don’t know think any actual jumping is involved. So apologies for that. But she will have to Yes. So she’s going to have to jump through five hoops, and Jan is going to start us off telling us about the five groups or the five seats.


Right. So yeah, thank you, Susan. And good evening, everyone. I’ll briefly introduce the five safes, which is how we grouped the different safeguards that we use when we use administrative data into different concepts. So starting off with the very first one, safe people, so everybody in the public included wants their data to be handled safely. And for that, we need researchers who we can trust and we know are competent and can use the data safely. So Harriet is embarking on this journey, she has a research project, she wants to use administrative data for it, the very first thing she needs to do to become a trusted researcher is to complete particular sets of training for to become a safe researcher. This training will basically cover the things that we’re talking about today, but in more detail, it will talk about principles of data minimization about how to use anonymous data safely. And it will also cover things like legislation, you’ve probably heard of things like GDPR, or the Data Protection Act. So hired will need to learn about these things in detail. One thing that is also included as part of the training is penalties for what happens if there is a data breach. So these would impact on Harriet personally. So if you were involved in a data breach, she might be held, personally liable, depending on the nature of the data breach, of course, she might lose her job or not be able to use research data anymore, or there might be criminal proceedings as well. But that’s only for the really severe cases that I don’t think have happened in the history of research yet. The other thing that might happen is for organisations that might lose access to their data, data is reputational damage. And with GDPR, there’s also quite hefty fines. So it’s in everybody’s interest to really stick to these safeguards and to use the data very safely. So once Harriet completes the training, she has to complete the test. And if she passes the test, then she gets her first hoop so she becomes the safe real researcher. Thank you, yeah.


Okay, the second group that Harriet will have to jump through is to develop a safe project. Now, let’s see if project means that the research that she develops, has to go to a number of different stages of which it’s approved, it can take quite a long time to develop a research proposal for administrative data because we consult with quite a large number of people when we’re developing the research questions and thinking about what we want to ask. We very often work with policymakers or with practitioners to talk to them about what are the questions that would be of most benefit to you and to the wider public? Once we’ve got our draft research proposal and our questions, the first thing that Harriet will need to do is to get it through an ethical approval panel. And usually, the ethical approvals are done by universities. So the lead researcher usually submits it to their university and the ethics, the ethics panel, are looking to make sure that Harriet has understood GDPR Excuse me, excuse me, that she’s Understood, understood GDPR. And she has conformed to all the GDPR requirements about the data she wants, that she’s understood the Data Protection Act, and that she’s kind of conformed to all of those. But she’s thought carefully about what data she’s going to need to analyse. And she’s only asked for the data that she needs. So that’s the data minimization principle I mentioned earlier. And the ethical review will also be looking to make sure that she understands how to keep the data safe, how to make sure that anything she produces from the research isn’t going to identify anyone, or isn’t going to be used in a way that might stigmatise communities. That’s something that we get a lot more questions about these days, as you know, if this is something that’s challenging, for example, it’s about drug use, for example, you know, are we going to make sure we only release results that don’t stigmatise specific communities or specific groups of people. So ethics is first she then will need to speak to our very lovely public panel. So SCADR has a group of ordinary members of the public, excuse me, but people that have an interest in data. And the public panel will review all of the research projects that we devise, and they’ll look at it through the lens of is this what a member of the public would expect to be done with their data, and which members of the public have any objections to the data being used for this particular public. For this particular purpose, and the public panel are really great, they’re very challenging, they ask us a lot of really difficult questions. And that will allow Harriet and her data journey to improve her research proposal. The final stage she has to go through is to get approval from what’s called the public benefit and privacy panel, or P bap, we call them. And there are two public benefit and privacy panels in Scotland, there’s one that looks after health data and is led by a group called the cold guardian. So they’re the group of data controllers and people that look after health data. And there’s another one for data the sort of like census data, data that’s held by the Scottish Government. And depending on the nature of her projects, you might have to get approval from both of those different groups. And they’re also pretty tough in terms of they’re reviewing the proposals, and they will make sure that she’s gone through all the ethical protocols. And she’s considered all the legal safeguards, and that she is our safe researcher before she gets that. And once she’s done that, she can jump through and a third tip. Oh, second tip, you still only got two groups, three to go people. Yeah.


Right. So coming to the third hoop, which is save data. We already mentioned the principle of de identifying data and data minimization. So I’ll talk you through more what that means. Because these are basically the most essential things that we do when we try to make data safe. So when we work with data, we want to de identify, which means removing any personal identifiers, things like names, addresses, dates of birth, things that could possibly be used to identify individuals from the datasets that we’re using. And they’ll show you an example of how it works. So starting from just kind of example of a dataset that Harriet might want to work with, you can already tell that there’s quite a lot of personally identifiable information in this dataset, we’ve got names, we’ve got dates of birth, we’ve got addresses, and lots of additional details. So the very first step that we will do in this example, you see there’s a name, we will remove the name. And we will replace that with randomly generated identifier. So this identifies the individual within the project, but would be different for each individual research project so that you can tell the same person across different projects. The next piece of information here is the date of birth. Harriet might request that from the data provider, and the data provider might come back and say, Actually, we think that’s quite a lot of data to request, would you be okay with just a year of birth? Because for most projects, we only were really only interested in somebody’s age at the time of the study, as opposed to their full date of birth. So Harriet might say, okay, yeah, that’s, that’s fine. That’s enough for my project. So we would replace that with birth year. The next item here is an address, you’d normally never need an address for study, you might be interested in things like area based social deprivation, or whether people live in an urban or rural area, for example. So in this case, the data provider might say, would the city somebody lives in be enough. And Harriet might say, yeah, that’s, that’s okay, for this project. The next item here is an income which is given as a precise figure. And the data provider in that case might say this, we don’t want to share precise figures for this project. But we could categorise people put them into income brackets, for example, is this Is this acceptable? Harriet would say yes. So we only get a income brackets instead of a precise figure. And then finally, I’ve also included marital status. Generally, this isn’t problematic item, but it’s very context dependent. So you might have a study, for example, that looks at younger population, if you’re looking at teenagers, for example, very few of those will be married. So in that case, that could be potentially identifiable as well. So data provider might not want you to use that piece of information as well, except, of course, the worst if you’re studying marriage, and teenagers, for example. So all of this is done, before Harriet actually sees any of the data. So this would be done by a data provider or by a trusted third party who processes the data on their behalf. So that once Harriet gets the data, it’s already anonymized. And she’s already negotiated the kind of minimal data set based on what she needs for her study. Like Susan said, you can’t just say, I want all of the data and I’ll figure out what I need as I go along. You need to specify exactly what you need. And it’s a kind of back and forth through the application process of specifying what it is that you need and then how much the data provider is willing to share for so that it’s still for the public benefit. So once the data has been made safe, Harriet can have her third hoop. She’s using safe data, and we’ll move on to the fourth, safe


jumps Who is this See settings hoop. So we’ve dealt with a lot of the legal requirements and the ethical requirements. And the people that are working with the data before she gets it have dealt with a lot of the issues around making sure the data is de identified. But once Harriet gets her data, she needs to make sure she uses it in a safe setting. Now, she doesn’t just get sent an Excel spreadsheet through some sort of unencrypted email service. The data that Harriet will get will be held in what’s in what’s called Scotland’s national safe haven. So this is an example this could be Harriet’s project. And she’s asked for a variety of different pieces of information from some datasets that are held by different organisations, we’ve got five here, as you can see, the arrows all go into the safe haven from those organisations. So nobody gets to see someone else’s data. As I said earlier, all the data that’s provided into the safe safe, safe haven is then linked together. And as I said, it’s de identified so that Harriet and or any of the other researchers that are working on the team, don’t ever get to know who data these belong to. And as far as possible, the team that do this work this is done by I have to read this out, because I can’t remember, Electronic Data Research and Innovation service public health, Scotland or EDRIS. So the EDRIS team work with the data on every project will have a coordinator and that coordinator will make sure that the data that’s provided to the researchers is completely safe has had all the appropriate and De identification. As you can see there, there’s the project space that Harriet and her anyone that’s anyone that’s named on her approved research proposal can access the data through that project space, but it’s got a blue marker in the blue box on the outside of it, because Harriet, in order to get into that space will need to go through at least four different forms of password protection or mobile phone authentication to actually get into her project space. And only the people that are named on it can get access to that information. So it can actually take quite long time. And if you don’t keep working on it yet timed out within about 10 or 15 minutes. So sometimes you’re continually having to remember to put in very, very long passwords. And now the project space is the the place where their research data’s held, and where Harriet can access it. But she also has to think about physical security of where she is. Now, we do have a number of safe pods around the UK based in some accredited universities, there’s not very many of them. But these are kind of top secure environments where you can’t take anything else in with you except a piece of paper and a pencil. And it’s all designed in a way that allows you to do completely safe data access. There aren’t very many of those across the country and not enough for all the researchers that need them. So researchers can access the data from their offices within the university, but there are a lot of strict protocols around what they can do. So if Harriet wants to be accessing the data from the office and her work, she will have to make sure that no one else can see her screen. And that might be that she’s in a room by her own or there are we do have some kind of screen guards that we can put down screens so that people can’t see it, she needs to make sure nobody’s standing behind her looking at the screen. At any point, if she steps away from the computer, she has to lock the screen. If she steps out of the office, you have to lock the office, she can’t download or upload or copy or paste or anything within that environment. She can’t take screenshots of anything. She can’t take well, she can take a few notes. And but she’s not allowed to copy anything from the screen itself. And that would be considered disclose IV. All of that has to be done for her by the EDRIS coordinator. When she’s in the project space that you just coordinator can monitor what she’s doing at all times as well. So there’s always a kind of audit trail about the work that’s been done. So that’s once Harry has done all of that thing. She’s working in a safe setting, she’s jumped through her fourth hoop


our fifth and final hoop, which is saved outputs. What we mean by outputs is basically anything that’s derived or from the data that Harriet seen in the safe setting, so she might produce graphs, tables, maps, for example, that sort of thing that you will then use in reports or in scientific papers or presenting at conferences or giving public talks and so on. So before any of the summary of this research, any of these outputs can be taken out, they have to be checked that they’re not disclosed live through a procedure called statistical disclosure control. So once Harriet’s produced the graph, for example, she submits it to her research coordinator to be checked. And it goes through the procedure and I’ll show you an example of how that might look like. Essentially through this process. What we’re trying to do is we’re trying to prevent individuals to be re identified From the data, so even though we’re working with data that’s been made anonymous, in one of the previous lives, it’s still sometimes possible to identify individual This is, especially in small geographic areas, for example, and I’ll show you how that might work. So we’ve created a completely made up example of a piece of research that hires might be working on. And in this case, Harriet is interested in people who are bitten by horses in 2022. And and she’s looking at three different geographic areas. The this example has been contrived so that the there’s two large areas, Glasgow and Aberdeen, and then a very small one, Kirkwall, Orkney and Harriet’s found that in 2022, this number of people in the different areas have been bitten by horses, there’s quite a few people with horse bites in Glasgow and Aberdeen. But only two in Kirkwall in Orkney. And for example, if you were from there, you might actually recognise this and go hang on, I think I know one of these two people, or there might have been the newspaper reports where that person has been named, which means that through using different pieces of information like that, you might be able to identify who the study is about. And this gets really problematic, especially once you add more information about these people. So we’ll add another piece of information to this, which is, we’re looking at some kind of potentially embarrassing medical condition that these people don’t don’t necessarily want to be shared with the public. So now what you can see is 12 of those people in Glasgow and 17 in Aberdeen also have this embarrassing condition, and one of those two people in Kirkwall has this condition. Now, if you were able to figure out one of these two people in Kirkwall, I know, now you also know something about their medical history. And as researchers, we absolutely want to prevent that you shouldn’t be able to identify individuals or their medical history from our research. So what we’ve come up with is a kind of arbitrary rule. And that whenever you’re published anything like that, your research shouldn’t kind of describe fewer than some number of people. In some cases, that would be five people. In some cases, it would be 10. So when Harriet produces this table, she submits it to her research coordinator, it gets checked for this closeness, the research coordinator might say, actually, this, this is dealing with a very small number of people. So there was a risk of identification. So can we actually suppress these figures instead? So this is what it might look like. Now, instead of reporting the actual figures, we just say there’s fewer than 10 people in this category. Now, if you’re reading this, even though you might know somebody who’s included in this, because you don’t know how many people there are, you can’t really tell whether this other category applies to them as well. So this is it’s a, it’s a kind of technique that we use to prevent small groups of people from being identified. Now, one thing I should point out here here is that we normally deal with population scale scale data. So we look at 1000s of people at the same time. So this is a relatively rare occurrence, where you’d have very few people afflicted by a particular condition or, you know, being described in this way. So but but this is still a procedure that you have to go through in order to ensure that your outputs are safe. So once Harriet has made safe outputs, she then receives her final fith hoop. She’s created safe outputs. She can she can publish this, she can publish papers, she can talk about it in the public, and she’s completed the entire procedure. Which brings us to question section.


Thank you all very much for the very interesting presentation, like to invite people for questions, just a five minute q&a session to explore further these topics. So I’ll start with a question about processes. So suppose I’m a researcher, that I once has a really cool idea, but a project that needs public sector data? How long does it take for me, from the day I decide to go for it, I have, I think, a pretty good idea of what I want to do to actually getting access to the data.


Well, different projects do take different lengths of time, the simpler the project, and the less the data you’re asking for, the quicker it will be. The more complicated the project, the more data you want, the longer it will take. And projects can take. It’s difficult to get a project done in less than six months. So you’d be looking at a minimum a think of six months from coming up with your research idea to be able to submit. But that’s a very conservative estimate in reality at the moment because it’s relatively early days in terms of the procedures we’re using. And also because there’s been an expansion and people wanting to use these data. It can take a year sometimes two years to get from your idea to your data. So when Yeah, it’s not the kind of thing you if you want a quick result or you’re doing A funded piece of research that’s time limited, then it’s maybe not the research area for you. Okay, thank you.


Any more questions from our audience?


Thank you. Can you hear me? Okay? Yep. I was struck by what you said about safe reporting, and about how findings may invite stigma on certain populations. And I just, I’m stuck as to how you can predict that before you actually report it. It seems kind of like a chicken egg situation. There may be. You know, there may be unexpected attitudes that emerge after after reporting. And also suppression of insights that aren’t in the public domain because of an attitude of conservatism. I just wonder how you make that decision?


Yeah, that’s a really good question. And so we deal with that in a number of ways. First of all, it’s very rare that we haven’t thought about those things at the point that we’re developing the project. And usually, as academics, we’re usually building on existing research. So we might know, for example, that we’re likely to find that, for example, people in deprived areas are more likely to experience a certain type of thing. And so we would be thinking about that at the beginning. And we consults quite often with third sector organisations or groups of people with lived experience. So we would talk to them about, we need to balance up the public benefit of doing the research because sometimes there are hard, hard to find things difficult to deal with findings, but they need to come out. Because we’re dealing with things like inequalities and, you know, so we but we have to think sensitively about how we release those findings. And we quite often work with, as I say, third sector organisations or people that represent groups with lived experience in order to do it in a sensitive way. Of course, what the media does with it when the media gets ahold of it is just another thing completely. You know, I’ve been on the wrong end of media reporting many times over my career, I think you just sometimes have to take that on the chin and make sure that anything you put out as a researcher is done as sensitively and ethically as possible.


Thank you. We have time for one more question. And there will be more time for questions at the end. Yes.


I think we’ve got one here.


We’ll take two questions.


A related question to that was just asked sometimes the nature of the research is you are going to find this out. I mean, the classic is homosexuals and aids in the 1980s. This does have the effect of stigmatising on the other hand, it’s also the basis on which you channel funding for directed education programmes. So how do you weigh these up?


Oh, that’s a difficult one him. I mean, as you see that, because all of the research we do must have public benefit, we have to have thought through very carefully what that balancing act of public benefit versus potential harm to the public. And it might be that on some occasions, we don’t release all of the research findings publicly. So we might relay some of the research findings publicly but only use certain data to inform public policy or practitioners for example, if it was considered the finding was so potentially harmful, that it would have unintended consequences, then I think we would have to make quite difficult decisions about whether that was released. The other thing we have to weigh up though, is this is publicly funded research that most of us are doing. So if it’s publicly funded research, you know, should we always publish it? Before things are published, we always discuss it with, you know, other groups, as I said, people with lived experience, we’d probably take something that’s really tricky like that back to our public panel, and talk to them about how we released the results as well. Just because you can do this kind of research doesn’t always mean that you should, and that’s one of the first things we have to think about when we’re developing any research is about, you know, what are the potential harms and doing this particular piece of research and making sure that the pop any public benefit outweighs that harm. So I hope that kind of answered your question.


Just to add to that, maybe it’s also just the framing of results to not have kind of too much negative language. And as Susan said, working with savers children in care, care experience children, you know, you Getting a sense of you know, they don’t want to just have headlines all the time that say, your chances are, you know, kind of working out how you can positively frame some of those things and show that this research is being done to help improve those outcomes. It might be highlighting that the outcomes are poor, but that the aim in doing so is to help government and policy to make better decisions, rather than kind of being. Yeah, just this is what’s going to happen in your life, you know, kind of contextualise it. And so it’s also about framing findings and language as well around certain groups.


Just one final question, before we move to the presentation of


data, looking at the inputs, the one from schools, there’s an element of subjective data there, as well as sort of non subjective data. So how would you how would you account for that


not only in schools? Well, although schools, there’s a lot of subjective data. Well, like any research that uses data, you need to understand where that data has come from, and how it’s been collected. And the processes by which has been collected the purpose for which it’s been collected. I mean, when you say subjective, I mean, some some of the data is just very objective data, very kind of hard numbers. We don’t get an awful lot of qualitative information to administrative datasets. And so you know, in some studies, you really need something a bit more qualitative in order to, to give you more context to what the research is telling you, we always have to be mindful that administrative data is not collected for research purposes. So it’s always going to have limitations in terms of what it will tell us. And I think when we’re publishing, as well as releasing the results of the research will always see, you know, that there are limitations to this study, and it couldn’t tell us this. And you know, we need to be cautious sometimes about the what the results actually mean. And quite a number of us also do qualitative research on the side as it were, just so that we’ve got more of a balance. But but the the administrative data work is, is it is what it is, we just got to be very careful about how we present it.


Thank you. Now let’s move on to the rest of the presentation. Yep.


Okay, this is the final section, I’ll go really quick swiftly through this. But so now that we’ve talked about the process, and about the way in which we go about conducting the research and getting to the point of having data, we wanted to tell you a little bit about some of the research we’ve actually done. And, as Harriet said, we’re a group of researchers from across Scotland, we all have different disciplinary backgrounds and areas of research that differ. This is six of the kind of main pieces of research we do, you can find out much more on our website. And but we do work. We do research around things like employment patterns, inequality and access to employment. We do work around health and social care which Jan is going to talk about in a in a moment. But we do a lot work around health and social care services, how they’re delivered. And you know, what the kind of inequalities in access to that as is, and what kind of services we’re going to need in the future. Because we can look at population change, we do a lot of work around children’s lives and their outcomes, particularly in the policy sphere. So there’s been a big piece of policy work around the promise. So that’s trying to improve outcomes for children who grew up with care experience. So we do a bit of research looking at looked after children. And we also do work around things like inequalities in educational outcomes. I lead on the research around safer communities. So most of my research the range of themes relating to crime and justice and and well being. So I do a lot of research with the police with prisons, and looking at things like underlying vulnerabilities and how those impact on calls for police service. For example, you’re probably familiar with the figure that around 80% of all calls that come in to the police are not about crime. They’re about underlying vulnerabilities. We also in addition to health and social care, we have research right on lifelong health and well being so that might look at things like how we can lead to more active and physically healthy lifestyle. We’ve got research on the benefits of cycling to work and the benefits of joining organisations like the scouts and the guides, guides for your kind of long term well being. And in addition to the research, we also do quite a lot of work around creating new datasets. So we have a team that work on historic data. And they have digitised and produced datasets from old censuses and from births, deaths and managed records that go back, you know, 100 years. So we’re able to do more historic research as well as contemporary work, and we create synthetic datasets. So the work we’ve talked about today on the data that we use in the safe haven, we can’t take that data out. We can’t give it to anyone. We can’t use it for anything else other than the research, but we can create synthetic datasets which look like the real thing and behave like the real thing. But they’re fake data. So we can use that for things like training researchers or use it for developing new methods. So that’s kind of generally what we do a kind of snapshot of that, as I said much more information on the website. We’re now going to talk to you about a couple of examples or some research that we’ve been involved with over the last few years. And as it happens, they’re both COVID 19 examples, and Jan is going to kick us off by talking about deaths at home.


Thanks, Susan. So my research is in the health and social care strand. And I’ll try to breeze through this quickly. So I’m studying why there were more deaths at home during the pandemic in Scotland. The reason to study this is that, because Scotland’s population is ageing, this has resource implications. The basically as the as the population ages, there’ll be more resources required for end of life care, both in terms of providing formal care or in hospitals or things like palliative care, but also in terms of supporting unpaid carers who care for people at home, for example, the the policy implications of this for how we can allocate these resources more efficiently. So if we know that more people die at home, then we can provide care at home, for example, and divert some of the resources from places like hospitals or care homes, for example. So my research questions here are whether people who died at home were different during the pandemic than before the pandemic. This is because Previous research has shown that people are more likely to die at home, if they are married, or if they have adult children, or if they have higher incomes, they can afford to have to pay for additional care, for example. And I’m also interested to see whether people use health services differently during the pandemic than before the pandemic, it might have been the case that people were attending hospital less, and therefore were more likely to die at home, for example. This is the kind of headline finding, and also what I’m trying to explain. I’ve called this the new normal. Basically, what’s happened as you can see from the graph, before the pandemic, there was a gradual increase in the number of home deaths, year on year. And then during the pandemic that really jumped up by about a third, which means there was kind of the workload for services who support people dying at home increased by about a third in a single year. So that’s a big impact on services. What I’ve also done here I’ve filled in with red COVID related deaths. So you can see that most of these deaths aren’t COVID related only about 2% were a lot of these deaths were due to heart disease or due to cancer. But basically, that’s due to all cause causes increased. This research is ongoing, so I only have kind of some interim findings that I can present the minute I’ll talk through briefly through the kinds of data that I’ve used. So I used this data that are collected by national records Scotland, and I’ve linked these two NHS datasets on hospitalizations, Cancer, diagnosis, prescriptions, and then accident, emergency attendances ambulance calls. And it’s just 24 calls and uses of out of our GP services, you might be thinking, there’s no GP data here. And that’s because it’s not really very well available on nationwide level, only certain GP practices of shared data for use in research. And in order to capture the entire nation, you’d have to individually contact quite a lot of practices for them to share their data as well. So it’s not very easily accessible. Some of the things that I found is that people were less likely to go to hospital and to A&E during the pandemic. And they had shorter hospital stays and earlier discharges which which explains why somebody who might have been admitted to hospital previously with the same condition and died in hospital was now discharged earlier and therefore was more likely to die at home. And also, as a consequence of using some services less some other services were used more like these out of hours GPs. Basically, when you look at individual change, changes in how people use services, they’re very small. But the the benefit of using population data is that you can look at the entirety of the population. And there it looks like these were really big changes to services that added up to quite a lot of additional pressure on the services. One thing I’d like to highlight here is the importance of working with organisations that are involved in the in providing care. So the advisory group for my project includes members from Scottish government and from various third sector organisations that work with carers. And they were really instrumental informing this research and its direction. And also, one of my colleagues was able to interview some of the people who worked with various families during the pandemic, to sort of get some of the more qualitative aspects of this research. Because with this, this administrative data like Susan mentioned . Sometimes they really get only the quantitative, so only the numbers but not really things like what the quality of care was received during the, during the pandemic. The aim of this research is, of course to inform policy, for example, the currently developing national care service and sort of how care is provided. And also, if you’re interested on our website, during the course of the first, this research, I’ve published short updates on the research so you can find these online. I’m happy to take questions on this research. But we’ll do that after the next presentation. So I’ll pass on to Susan,


slides for me and write in four minutes, I’m going to romp through my bit of research, I want to take you back to March 2020. And we were all placed under lockdown. And we were told to stay at home. So we were experiencing the toughest restrictions on our freedoms and our civil liberties than probably had been experienced since the Second World War. And all four UK Governments at that time agreed that we needed police powers to be introduced in order to make sure that people complied with the new public health regulations that were brought in. And so the police were given powers to, to enforce the regulations, for the most part that involves giving people fines. But in extreme cases, people could be charged and end up in court. So the police were actually having to straddle a pretty fine line, because on the one hand, you’ve got people saying, fine more people, there’s all these people out there doing things they shouldn’t be doing, we police should be out finding everybody that’s out and about. On the other hand, you’ve got people saying, It’s too heavy handed to us, the police shouldn’t be involved in this, this is about individual responsibility. So police are doing the kind of pretty tough job going out, you know, having to be on the front line without the priority for vaccination, et cetera. And for the most part, they used what was called the Four Es. So the first E was engagement, they would engage with people and find out a bit about why they might be out when maybe they weren’t expected to be. The second one was they would explain the regulations if the person wasn’t clear about what they could and couldn’t do. The third thing was to encourage so if the person was a little bit reluctant to follow the rules that would encourage them strongly to do that. And if necessary, they would move to enforcement. So use of a fine. And at the beginning of the pandemic a fine was 60 pounds, right across the UK, and it was reduced to 30 if you paid it within a certain period of time. Now 30 pound fine doesn’t sound like a lot of money to most people. But for some people, 30 pounds is quite a lot. And actually what we found in terms of patterns of fighting over the course of the pandemic is that people living in more deprived areas were more likely to be fined. A lot more likely, in some cases, particularly at the beginning. So we’ve got police fines, and some of the information that was coming through about why people didn’t conform to the regulations was that there was underlying health vulnerability. So people with mental health conditions struggled to stay at home, for example, people with alcohol addictions, or a drug dependency might have struggled, as well. We also heard narratives around people who had experienced violence in the home that struggled to kind of stay at home for long periods of time as well. So there’s a range of potential reasons why people might not have complied with the regulations. I was working with the police during the whole of the pandemic, I was part of an independent advisory board that was scrutinising police use of the powers. And in particular, I was using their data. So because I had all their data. And because there was there was a really important questions about you know, why people might have been fined. We decided to explore the links between some of the underlying vulnerabilities and compliance with the regulations. And we did that by getting the police to share their data about fines, and then getting the Public Health Scotland to to link that to data about underlying health conditions. So we were looking at people who had presented requiring health services for mental health drugs, alcohol or violence in the year preceding the pandemic and then around the time of the pandemic as well. One thing I would say about the fines as a 30 pound fine if you pay it quickly doesn’t sound like a lot, but if you committed another offence it doubled. And then if you committed a third again, offence, it doubled again. And actually, over the course of the pandemic, particularly in England, some of the fines were really quite big. So 60 pound fine at the beginning of the pandemic went up to 200 pound fine at some point during the pandemic in England, and actually some people were being fined 10,000 pounds. So this is not a, you know, a trivial matter. So we asked two specific questions. First, did underlying health vulnerabilities increased the likelihood that someone would be issued with a fine during the pandemic? And the second was, were those who fined for failing to comply? Were they actually increased public health risk? Are they spreading it more? Were they more likely to have the disease where they’re more likely to die. Gotta go straight to the kind of killer stats. And so we didn’t just compare the people that got fine with everybody, we compare the people that got fined that looked quite similar to them in terms of other characteristics that we knew were significant in terms of someone getting a fine. So younger people were much more likely to be fined. So we match people on age, men were more likely to be fined. So we matched them on six people living in more deprived areas were more likely to be fined. So we matched them on their Scottish index and multiple deprivation, and we matched them on local authority as well. And even when we’ve matched people, and compared the people that got a COVID, fine to those that didn’t, you can see that there’s a big difference in their underlying health vulnerabilities. So the people that got fined, were almost five times more likely to have presented with an underlying mental health during that time, around seven to eight times more likely to have presented with an alcohol or drug addiction, and more than 10 times likely to have presented to health service, because they’ve been a victim of violence. So these are not trivial numbers. And as you can see, it’s really important to try and better understand that haven’t got figures for the other question, but I can tell you that the people that were fined, are no more likely to have tasted positive and no more likely to die of COVID. Like on to the last slide. So public benefits. Why did we do do this research? Why is it important? Well, the first thing is, it’s really important to understand the unintended consequences of policing powers. When we give the police more powers to do things to the public, we need to understand what consequences that might have. And in this case, we know that the people that were fined are much more likely to be from deprived backgrounds. It’s really important to look at government policy, particularly in the context of a public health crisis. And if we don’t look at this now, when another pandemic comes along, we’ll make the same mistakes. Again, I mentioned stigma earlier, and there was a lot of stigma around people that weren’t complying with the regulations. And we wanted to do this research to show that actually, there was some some quite good reasons, potential underlying reasons. Now we don’t have the exact reasons because we’re not looking at individual cases. But this is making associations between health vulnerabilities and behaviours. And we wanted to generate robust data that would inform better, fairer and more effective, effective policies in the future. So that’s a little bit of a snapshot of some of the research that we’ve done. We’ve now got some time at the end for questions. Hopefully, I’ve not used too much time. Okay. Thanks.


So there’s a bit of time for some questions, I’ll start, the questions can be on what has been discussed, but also on broader topics. So let’s start with one about big data. So you’ve mentioned big data in your presentations. So what kind of how big are the populations in the research that you’re involved in?


Do you want to answer this? Yeah, maybe I’ll answer this. In my particular case, there’s there’s between 50 and 60,000 people who die each year in Scotland. So because I looked at the period between 2015 and 2020, that’s approximately 350,000 people. So that’s the that gives you an idea of the kind of scale of the data.


Thank you. Let’s move on to some questions from the audience. Yes, this one over there.


Thank you for your thank you for your resources. It’s really interesting. I used to work on the COVID regulations in England. So I don’t know enough about Scotland. What would you have highlighted here is, is really interesting in terms of research. So what we have done before making any decisions was obviously looking at things like perceived measures that would make people feel safe, that might have no impact on actual COVID numbers, and then also on the measures that actually work. So for instance, ventilation works. One way systems didn’t work. There’s no research that actually proves that it works, but it made shop owners and everybody feel safe. I should also say that obviously what was proposed this as a measure and what actually was implemented by the Prime Minister were two different things. But I was wondering, so I was in charge of talking to the National College of policing to inform the policeman of all the powers that basically changed, which was mind blowing. And most of the different councils followed the kind of the measure to say, those powers who we know don’t really work. We don’t really want to fine so we look at the bigger context and I was wondering if that played a role in your research as well in Scotland.


And yes, so we we’ve done interviews with police officers, we’ve done interviews with members of the public that were fined. And we’ve also, in addition, the data presented today with Scottish data, because we’ve been able to link that. But we’ve also analysed all of the fines that were issued in England and Wales, as well. And I think the key thing around thinking about the role that the police were doing at the beginning, during the first lockdown, it was really clear what everyone had to do. So the police didn’t have an awful lot of discretion, I guess, in terms of what they were doing in terms of their policing role. Although I think at that point, it hadn’t really been thought through very carefully, what would happen when all of these public services disappeared, all the services that work with people with mental health, with drug addictions, and alcohol addictions, they all disappeared, and the police were the only ones left. And so I think we scooped a lot of people up into that kind of criminal justice as we want to put it like that. I mean, a fine isn’t doesn’t actually count as a as a criminal conviction by any means. But the you know, the implications in terms of being fined and having to pay that fine, particularly if you’ve committed a number of events is quite significant for someone, particularly if they’re on a low income. So there was the issue about what could have been done to support and enable people to comply, rather than punishing people who felt that they couldn’t, or just wouldn’t comply. And I think later on as the regulations became much more complex, and they were changing practically on a daily basis, and there was rules over here and guidance over there. And even the politicians couldn’t remember what was rules and what was guidance, then they stopped obeying them themselves, which doesn’t really help them matters. And so you end up in a situation where actually there’s a loss of legitimacy of these regulations. And I questioned what value policing powers as little as the pandemic went on, and we went into another lockdown, and some people were following them, some people weren’t. And actually people, a lot of people were making informed judgments about what they were doing rather than and finding, stretching the rules to suit their own ends. And of course, in the second lockdown, the figures that I showed you are average figures across the whole of the pandemic. But actually, if you look at that, during the first lockdown, those figures are much higher. And they’re much lower during the second second lockdown, because a much wider swathe of the population were being fined, particularly young people, particularly because they wanted to party with their friends, and they hadn’t been able to go out. But actually it increased across other age groups and a lot more across the demographic spectrum as well. So we had police officers telling us about being called out to houses where they would normally only go if the person had been a victim of crime. And we had stories about people that were on the phone to their lawyer asking for their advice about accepting this fine on the doorstep from a police officer. So I mean, it’s a very complex picture. But I think when you’ve got very clear rules, then it’s it’s perhaps policing powers are useful, but you have to use them in a way that’s actually not going to discriminate against people and create additional justice inequalities. Sorry, that was a long winded answer to a very good question.


So you mentioned it, it takes quite a bit of time to do this research, obviously, current thing and momey genitive re AI in the use of language learning models to speed these things up. Have you one got any view on that? And specifically, the unconscious bias element of that through your safe steps where who checks for unconscious bias through your datasets?


Oh, that’s a good question. Well, I mean, we don’t so much use AI for the kind of work that we do, although I’m aware that I think Scottish Government are starting to look at more details at how AI machine learning and algorithmic work might be helpful in developing more data sets for use in the future. And obviously, with anything that has anything that you can’t see its brain working, you have to have kind of ask questions of that and make sure that, you know, all of the things that we’ve talked about in terms of ethics, legal restrictions, and safeguards that we put in place to do this research, we’d all apply to AI. But the limitations around what it’s telling you and the assumptions you make about some of the results, you have to be very careful about that. There’s some really good examples, for example, and China, of data scientists getting hold of data, facial recognition data and constructing models that tell you if someone’s going to be a criminal based on the configuration of their facial features, you know, that is in my book very against the ethical protocols that we would work with. But if you give if you create data and put it out there, people will use it for all sorts of purposes. We’ve got to make sure that you know, we do have protocols in place for safe and legal and ethical research, and make sure that we call out researchers snort done according to those kinds of protocols. Great question.


So we have time for one final question. I believe there was a hand up a while


ago here. Thank you. Very interesting, I have a question relating to maybe the health or COVID Death data are coming back to that, about this kind of kind of assumptions. If you were, if I may say that you made at the end of people being less likely to access certain services, versus being able to access the services at the time. So I was just wondering about those kinds of judgments from like more individual perspectives of population versus kind of societal or socio economic, social, social, political challenges. And how you deal with that, in terms of this is the data. And this is kind of some of the events that happened in terms of short staff shortages in the NHS, people not being able to access certain services, more pressure on mental health, on people, etc. So I think there’s a lot more that obviously doesn’t, I assume, doesn’t appear in the quantitative data.


Yeah. Thank you very much for that question. I think you’ve, you’ve essentially paraphrase the discussion section of the paper that I’m writing, because that’s exactly the case. All of these things apply. Often with administrative data. Like we said, this is data that isn’t collected for research purposes. So you’re, you’re just, you get what you’re given. You look at what’s there, and then you have to interpret it. So we don’t know for any individual person, whether they decided not to go to hospital or whether they couldn’t get to hospital, or you know, what, what the reasons are, so the best we can do is just describe exactly the context that you were that you mentioned, and then supplement that ideally with another study where we interview people to see what their experiences were, because the experience is something that’s not recorded routinely.


Yeah, oh, Is there potential to then draw in other datasets to say, for example, sample NHS staffing? What other employment is? Like to what the interpretation of different


yeah, that’s, that’s another very, very good suggestion. Yeah. So there is that there are data collected on workforce data, for example. So we could look at how many people were employed overtime during COVID, whether people were laid off at the start or left their jobs, and therefore also look at the availability of staff. And yeah, that will be one of the ways that we explain what’s happened here. Thanks. Very, very good question.


So I’d like to hand over to Harriet for a minute before we conclude the session.


I don’t think we’ve really got time. But essentially, you can log back in in a second, if you want to, we just want to see if people feel any differently than they did at the beginning. So do they feel more confident about the safeguards in place, and so forth, but I’ll let the closing remarks happen. I’ll get it up on Mentimeter. But absolutely no problem if you’ve got to rush off.


So thank you all very much for attending. A big thank you to our speakers. Professor Susan McVie, Jan Savinc, Rand  Harriet Baird,


before you go, just to remind you that the curious events, they’re just they’ve just started and there’s more really interesting events to to go to, if you like you can find out more online. And also just to say that you will receive an evaluation email, asking for your feedback, which would be much appreciated. Thank you all for coming.

Publication Date
Professor Susan McVie
Jan Savinc
Harriet Baird
Dr Areti Manataki
Share This