Summary6 min read

Podcast Summary: "Why Independent Researchers Need Better Access to Platform Data"

The Tech Policy Press Podcast
Host: Justin Hendricks
Guests:

Brandi Gerkink (Executive Director, Coalition for Independent Technology Research)
Peter Chapman (Associate Director, Knight Georgetown Institute, Georgetown University)
Elke (Elka) Zeiling (Coordinator, DSA4 Data Access Collaboratory, Weizenbaum Institute, Berlin)
Air Date: November 9, 2025

Episode Overview

This episode, recorded at MozFest in Barcelona, brings together leading voices in the field of technology research, data access, and regulation, particularly related to platform transparency in democracies. The conversation revolves around the new Knight Georgetown Institute’s "Better Access Data for the Common Good" report, the challenges independent researchers face regarding data access, and the evolving European regulatory landscape—particularly the Digital Services Act (DSA). The discussion highlights why independent access to platform data matters, what legal and technical barriers exist, and the broader societal consequences if these obstacles remain.

Key Discussion Points & Insights

1. Why Platform Data Access Matters

Democracy, Accountability & Power Dynamics:
- Platforms shape public discourse; without independent research, only industry-approved narratives are studied [(03:45–07:00)].
- "Platforms are going to do everything in their power to resist giving researchers access to data. They have no incentive to make this happen." — Brandi Gerkink (28:48)
- Analogy: Lack of data access is compared to communities unable to test polluted river water, highlighting the need for regulatory change [(15:07)].
Democratization of Knowledge:
- "It's about democratizing power and access to information and the ability to ask questions that result in better experiences for the everyday person who's using the Internet." — Brandi Gerkink (06:35)
Research as Counterweight:
- Researchers should act as a check on platform power, similar to watchdogs in tobacco or fossil fuel industries [(04:30–07:00)].

2. The "Better Access Data for the Common Good" Report

Purpose & Scope:
- Developed over a year by 20+ experts to create “a roadmap for expanding access to high influence public platform data” [(02:37, 08:44)].
- Focuses on "high influence public platform data"—e.g., widely shared content, public figures, political actors, and promoted posts [(08:44–12:00)].
Challenges Identified:
- Platforms are restricting access to previously available tools (Meta closing CrowdTangle, API restrictions on X/Twitter and Reddit).
- Balancing privacy, ethics, and meaningful research access is complex.
- Global equity is necessary: frameworks must work as well in Congo as California [(12:45)].

3. Current Barriers & Legal Landscape

The DSA (Digital Services Act):
- First law to grant researchers a right to access data from private platforms, not just governments [(07:20)].
- Two Data Access Types:
  - Article 40(12): “Publicly accessible” data—definition ambiguous, letting platforms set limits. Researchers usually get aggregate stats but face opaque boundaries [(19:20)].
  - Article 44: “Non-public” data—broader access, including internal platform data, now possible but subject to regulatory approvals and conservative implementation [(20:32)].
Delays & Caution in Implementation:
- Regulators are slow to act, expecting careful first cases by late next year to avoid major blunders [(21:38)].

4. Key Obstacles and Risks

Corporate Capture:
- Platforms resist data sharing, and can influence regulatory processes or challenge requests in court.
Fragmented Research Infrastructure:
- Access is inconsistent; historic models like Twitter’s API or Social Science One privileged a small set of Western (often US-based) researchers [(23:05)].
Cultural Shift Needed:
- Researchers must move from treating data access as a privilege to seeing it as a right, and embrace their roles as political actors and collaborators [(28:48, 31:59)].

5. Looking Forward: Opportunities & Threats

No "One Size Fits All":
- DSA is not globally replicable, but offers lessons for other regions, like the UK and the US’s proposed Platform Accountability Transparency Act (PATA) [(25:12)].
Three-Tiered Access Model:
- Platforms should:
  1. Proactively provide data (via interfaces or third parties)
  2. Respond to custom data requests
  3. Allow/reduce restrictions on independent collection (scraping, crawling) [(25:12)].
Threats: Continued platform resistance, slow regulatory uptake, legal challenges, and the risk researchers remain dependent on the goodwill of corporations [(28:48, 34:38)].

Notable Quotes & Memorable Moments

On the Importance of Data Access:
- "If we don't have data access, the only questions that can be asked are those that can be asked by the technology industry, and they're going to ask the questions that matter to them."
  — Brandi Gerkink (05:53)
On the Stakes:
- "Platforms shape what we know, how we connect, what we hear, what we amplify, and across a range of themes. They're fundamental infrastructure for modern civic life."
  — Peter Chapman (03:58)
On Regulatory Novelty:
- "It's like the first time we actually have a right to get access not just to governments and public services, but to actually have access to privately run companies.”
  — Elke Zeiling (07:28)
On Cultural Change for Researchers:
- "We actually have a right to access this information that is required to play this critical role that researchers play vis a vis the public. And for us to shift from this situation of data as a privilege to data as a fundamental necessity for us to do our jobs in service of the public interest is going to be a culture shift."
  — Brandi Gerkink (29:56)
On the Future:
- “If everything goes well, I think this is really like an amazing way to kind of open up and kind of understand these what we often call black boxes and to kind of do better regulation on it. If it does not work...the genie [data access rights] is out of the bottle.”
  — Elke Zeiling (35:41)
On the Big Picture:
- "The only questions that we can ask right now are largely being asked by companies...they're thinking very small because they're thinking about money. When you bring more people into the fold that are not just obsessed with money, you can build great things.”
  — Brandi Gerkink (39:33)

Important Timestamps

02:37 — Scene setting at MozFest and introduction of guests
03:45 — Why data access matters (roundtable)
08:44 — Launch of KGI’s report; defining “high influence public platform data”
12:45 — Ethical, global, and privacy challenges in establishing a data framework
15:07 — Brandi’s river/factory analogy for the data access crisis
19:20 — Elke explains the DSA’s dual-path data access approach
23:05 — How we got to the current regulatory moment; European and global context
25:12 — Peter on different models of data access and UK/US regulatory horizons
28:48 — Brandi on threats and necessary cultural shift
31:59 — Elke: why researchers must act politically, not just as outside observers
34:38 — Researchers need to define the baseline for meaningful data access
35:41 — Roundtable: visions for 5–20 years, risks, and hopes for future tech transparency

Resources & Further Reading

Knight Georgetown Institute’s Better Access Report: kgi.georgetown.edu
Coalition for Independent Technology Research: independenttechresearch.org
DSA4 Data Access Collaboratory: dsa40collaboratory.eu

Tone & Takeaways

The conversation is urgent yet optimistic, blending technical challenge with a civic mission. All guests stress that equitable, robust platform data access is a linchpin for tech accountability and the health of democracy. Achieving this will require not just new laws and regulatory muscle, but also an internal culture shift among researchers, policymakers, and platforms themselves.

Memorable Closing

"Now that [research data access] is on the table, you won't get the genie back into the bottle."
— Elke Zeiling (37:17)

For more in-depth discussion on the intersection of tech and democracy, visit techpolicy.press.

Loading summary

Transcript48 lines

[00:01]
A
Your film is now ready to be shown.
[00:12]
B
Good morning. I'm Justin Hendricks, editor of Tech Policy Press, a nonprofit media venture intended to provoke new ideas, debate and discussion at the intersection of technology and democracy. Greetings from a cafe in Barcelona, the site of this year's Mozilla Festival, where I've had the opportunity to connect with so many Tech Policy Press contributors and readers over the past three days and to meet so many new people working on the front lines of defending democracy and human rights and thinking hard about the role that technology plays in the world. One session I had the chance to attend focused on how to get better access to data for independent researchers to study technology platforms and products and their effects on society. The session coincided with the launch of the Knight Georgetown Institute's report Better Access Data for the Common Good, the product of a year long effort to create a roadmap for expanding access to high influence public platform data. The report was written with the input from individuals from across the research community, civil society and journalism. I was pleased to play a small role in McGregor in a gazebo near the Mozilla Festival main stage, I hosted a podcast discussion with three people working on questions related to data access and advocating for independent technology research. The first thing I always do is ask folks to just state their name, their title, and their affiliation for the record. So Brandi, perhaps I'll start with you.
[01:38]
C
Hi, I'm Brandi Gerkink. I'm the Executive Director of the Coalition for Independent Technology Research.
[01:43]
D
Peter My name is Peter Chapman. I'm the Associate Director at the Knight Georgetown Institute at Georgetown University in Washington.
[01:50]
A
D.C. and Elke Hey, I'm Elka Zeiling and I coordinated data access DSA4 data access collaboratory at the Weitzenbaum Institute in Berlin.
[01:59]
B
Okay, so I have to tell our listeners we're not in my normal podcast mode. I normally either am recording in my office in Brooklyn or sometimes in my basement at home. We're in quite better sort of situation. We're in a gazebo in Pobla Espanol in Barcelona, just outside of Mozfest. Peter, you want to describe for the listener what's around us at the moment?
[02:25]
D
Sure. We are basically in a castle. There's the large Mozfest main stage just behind us where we've seen speakers from Ruha Benjamin onward over the last two days of MozFest.
[02:37]
B
It's a fun spot and has been a good set of conversations over the last couple of days and looking forward to this one. We're going to talk a little bit about data access. We're going to use the occasion of the publication of a new report from the Knight Georgetown Institute, the Better Access Framework, as part of the kind of basis for the conversation. But then also bring in some recent work that Brandi and the coalition have done, including a report from that community as well, and hear a little bit more about what Elka's up to in particular. But I just want to ask you all three a basic question before we get going. This is a somewhat niche topic, data access. We've, gosh, spilled lots of ink on it at Tech Policy Press, because I think of it as a kind of fundamental and a really important one. But I want to give each of you an opportunity and maybe incorporating a little bit about what your organizations do and why you work on this. If you think of this microphone as literally being connected to the ear of the listener, if you were going to whisper into the ear of the listener, why does data access matter? What would you say? And perhaps, Peter, I'll start with you.
[03:45]
D
So KGI is an independent institute, and we're focused on connecting independent research with technology policy and design. And to do that, we need independent research. And access to data is simply fundamental for journalism, for civil society, for academics to understand the nature of conversation on online platforms. And online platforms shape what we know, how we connect, what we hear, what we amplify, and across a range of themes. They're fundamental infrastructure for modern civic life. And the ability to understand these platforms, understand conversations taking place in these platforms in real time requires access to the public data that these platforms host.
[04:29]
B
Brandi, what about you?
[04:30]
A
So.
[04:31]
C
So the coalition is really a movement of independent researchers who are working to build the power and influence of independent voices as really a counterweight to the technology industry in conversations about the impacts of that industry on our lives, in our communities, in society overall. And if you look historically at virtually any other industry, take the tobacco industry, take the fossil fuel industry. Scientists and researchers who have been empowered to ask fundamental questions about the impacts of those industries on our bodies, on our health, on the health of our natural environment has been the cornerstone of consumer protection, law, of regulation of those industries that have made our communities safer. And that's what we need with technology. And so data access in that way, is about fundamentally reshaping the power dynamic that exists so that independent questions that really matter to everyday people can be answered. Because if we don't have data access, the only questions that can be asked are those that can be asked by the technology industry, and they're going to ask the questions that matter to them. They're not going to ask some of the most fundamental, pressing, hard questions that you and I and people listening to this really care about. And the results of those questions and the way in which they're presented are also going to naturally be biased towards the industry that's asking them. And so that's why independent research is so critical and. And why it's fundamental to helping people to have a better experience on the Internet, ultimately. Because when we are able to actually ask these questions, then we can not only do what Peter was talking about in terms of really understanding the online environment, but we can also ask questions that help us imagine how things could be better outside of the constraints of what might be the best for the profit margin of a company. So that's really what it. That's why it matters to me. It's about democratizing power and access to information and the ability to ask questions that result in better experiences for the everyday person who's using the Internet.
[07:04]
B
And Elka, what about you? Just maybe also in context of making the Digital Services act work, I mean, that's part of what you're doing at DSA 40, the data access collaboratory. Why does this matter for making that regulation work?
[07:20]
A
I guess in the first kind of, in the first place, I think it is a prime example to kind of study how regulation translates into practice, to kind of see, okay, like, we do have this law now, which is crazy. Also, if you think about it, it's like the first time we actually have a right to get access not just to governments and like, kind of public. Public data from public services, but to actually have access to privately run companies. And I think this is just for one, is an interesting research question to kind of see how this translates and where different hurdles might be. But then again, and as Peter and Brandy already said, it's quintessential just to do basic research on it. And so we need to kind of know, how does it work, how does it not work? What else do we need?
[08:07]
D
And.
[08:07]
A
And yeah, I might be spoiling things a bit when I say that things do not work as we would want them to work.
[08:15]
B
So, Peter, I want to come to you and ask you a little bit about this report. And this framework that you just put out in full disclosure played a minute role in kind of helping to review the document, as did Mark Scott, who's a contributing editor at Tech Policy Press and many other experts that you were able to pull together to work on this for what seemed like the better part Of a year. It might have been longer, you'll tell me. I can't quite recall when things started, but you've just published this. What's this for? And what are the top lines?
[08:45]
D
Great, yeah, thank you. It's been about a year and I think, you know, sort of taking a step back about why we need a framework like this. Better access data for the common good. Research, as we've just heard, is being frustrated by these companies. We're seeing multiple sort of avenues through which companies are cutting off access to public platform data at the same time that regulation here in Europe is, as Elke just described, offering new opportunities for data access. So companies are pulling back from some of the tools that had previously been available. Meta had acquired a platform called Crowdtangle which provided real time sort of data analysis from Meta platforms. They ended that product in August of 2024. X had introduced significant new fees for its API. Reddit has changed the way in which folks can access its API, including the research community. And there are a couple of different motivations for this change. I mean, one is Brandy described is, you know, these tools enable platforms to be scrutinized. And if you look at the universe of research on platforms X and, or Twitter and Facebook historically had the most liberal mechanisms for access and they've been the most studied. And so we know most about those platforms. And so there is a, maybe a perverse incentive with platforms by taking off or sort of eliminating some of this access, they can restrict the scrutiny that the platforms are exposed to. At the same time, there's been an absolute rush for this exact type of data. Public platform data, as generative AI models are actually built on this publicly available data on the Internet. And so we've seen a rise of third party tools, we've seen a rise of platforms trying to commodify this data that you and I, we all contribute in our online footprints. And then thirdly, there was an ongoing challenge in the research community to ensure that research is done ethically and in privacy response ways. The scandal around Cambridge Analytica in 2016, 17, 18 really exposed how this data could be used in pernicious ways. And the research community has been continuing to evolve ethical standards. So all of that is context for this group of 20 experts coming together, academics, folks from civil society, journalists, to identify and articulate a framework for public platform data as a sort of minimum expectation for the data that we need to understand online platforms. And so the group coined this term high influence public platform data. And this is data that by virtue of its Reach or its engagement or the status of the speaker or the account matters most for civic life. And that includes things like highly disseminated content. It includes government employees, political accounts, notable public figures like journalists or influencers, as well as business accounts and promoted content. So this data online impacts what we see, what we hear, what we know, and plays an outsized influence in the information environment. And so the group analyzed the research about the distribution of this content, looked at the power law dynamics on social media, finding that a very small amount of content make up the most views, most reach most engagement on our social platforms, and tried to establish what that minimum expectation is to understand and then grapple with the trade offs of. How do you ensure meaningful access by researchers around the world? How do you ensure both? Proactive disclosure of this data from platforms, but also independent collection from researchers wasn't.
[12:27]
B
Necessarily an easy process. Not everyone agrees on every topic with that, what were some of the kind of key challenges? Challenges you felt like you had to kind of come to consensus around things that people didn't necessarily see eye to eye on or that felt like they needed more conversation than others as you tried to arrive at this common framework.
[12:46]
D
I think a fundamental challenge is the uneven understanding of how these platforms interact with information environments around the world. And so the group really felt fundamentally we needed a framework that was durable in a region of Congo, as a region of California. And historically that's not how data access has worked. You've had global access, you've looked at sort of dominant narratives, mostly in Western societies, and you've not looked at, you know, what's happening in West Bavaria as opposed to Germany. And so by creating or sort of articulating these information environments, global regional linguistic information environments or geographic information environments, we try to enable research that fits the need of the journalists, of the societies of researchers in different contexts. So that was definitely a challenge. And I think this ongoing debate around trade offs between privacy, researcher ethics and public data like these are very difficult issues. And the framework that we have articulated does not resolve the risks of privacy, private or sorry, of private information being treated as public by platforms or researchers. But what we've tried to do is narrow that risk by leaving out the vast majority of public content, of content that's publicly available online and focusing on these actors that by virtue of their status or sort of dissemination of content, have the greatest impacts of what we see and know online.
[14:16]
B
So one of the things that we're trying to negotiate here is really the, you know, practices of researchers, the ability that they have to access information and to be protected. One of the things, Brandy, in your report that you talk about is this kind of, you know, gulf between the. Where we're at, those who build the systems, those who live under them. In the in between are researchers who are trying to get access to the information. Often unclear about, you know, what is legal, what is in accordance with the terms of services they might have to sign up to in order to have access to platforms. You say it's a crisis. What are the dimensions of that crisis? And how does some of this thinking around data access kind of fit into that or help resolve that crisis?
[15:07]
C
Yeah, I want to start answering that with maybe an analogy, which is if you are a community that is, you're part of a community that is living in, close to a river that's being polluted by a company that has a factory situated on that river, there's a few things that you might do and there's amazing organizations, community organizations that do work like this, which is, you know, take a bucket over to the river, collect the water, send it to a local lab that might be able to test that water, and then take that lab result to a court and challenge the factory that's been polluting the river that you live right next to, that your community relies on for, you know, sources of clean water, clean soil, the food, you know, the food that you eat, all these things. And because we have environmental regulations in many, certainly the community that I live in, and I think in the many communities that those of us sitting around this table and listening to this podcast live in, there's something that you can do about it. And that is fundamentally not the case. We have no similar analogy right now that actually empowers people who use social media platforms, people who interact in some way, way with social media platforms. We are treated as data subjects, that is to the benefit of the companies that are profiting from these technologies. And that is fundamentally wrong. And so we need to reshape the crisis is, you know, that we speak about. Like, I think that's, that's ultimately the gulf is that this dynamic that we're in, this fundamental inability, that doesn't really make sense just because one is a physical river and one is a digital environment that we're engaging with. There are choices, hard fought choices that have been made to enable people to go to the river with the bucket and collect the water. There are regulations that enable people to hold a company accountable if that company is poisoning their community. And all of those things have had to be fought, fought for. And we're at the beginning of that road. But the gulf is not different than it has been in so many other fights that communities have won over time as well. And I think that I try to come back to analogies like that, because the dimensions of the crisis, you can get lost in them. Like, we're talking about weird lawsuits, about terms of service versus violations for scraping. And I'm thinking about all of the ways that activists trying to take the bucket to the river have been come at by industries that they've gone up against and seeing the parallels in our space, because I think it's important to think about and to remember, like, the overarching power dynamic that we're actually trying to go up against in this work. And then you start to see, when you have that in your mind, you start to see the cease and desist letters, the lawsuits, the, you know, ways in which social media companies have blocked the accounts of researchers trying to do this work. You see it in that fundamental analogy. Right? And I think that that's what, what we need to realize is like, not to get too caught up in the details of like, oh, well, yeah. And I mean, you know, we're researchers, so like, of course we're caught up in the details, but what is the overarching power dynamic that we are trying to push back against and equalize, and where do we, we fall within that?
[18:47]
B
Elka, I want to ask you a question as well about, you know, what you're up to at the moment, just in terms of testing the boundaries of the, of the rights that are afforded under the dsa and maybe for the listener's sake, just give us a quick rundown where things stand. I mean, I was talking to you at the session earlier and asking just that basic question. You know, this law's been in place for a bit. Has any data been liberated yet from a platform that researchers can study? If not, when can we expect that might happen?
[19:21]
A
So I would be lying if I said that no data has been shared. I don't know if that has been liberated thus far. So the DSA basically sets out two kinds of data access. One is set out in Article 4012, which basically says that researchers, if they fulfill a set of requirements like they are independent from financial interests or that they can disclose their financial interests, that and they can safeguard the data properly, that they can access what is called publicly accessible data. And this is not further defined, which is interesting. And this kind of lets already the platforms defined what they mean. And this is where Pete's report comes in with regards to like, okay, I'm potentially expanding this definition which has been kind of rather narrow. So basically all you get right now is the aggregated interactions on specific kinds of content. So you get the number of likes, the number of views, you might get the comments if you're lucky, but that might already not be included. And so there's different amounts of data that different platforms kind of share under this kind of publicly accessible data. And you could argue again that is a much broader category. Right. So like this is where I would see the first kind of area for liberation to kind of drive, try to you know, kind of push back on these initial boundaries that the platforms have set. The other data access is set out in Article 44 of the DSA. And this basically it doesn't say non public data, but it kind of means non public data because it just says generally you got rights, like you got the right to access data. And this has recently been specified by a delegated act which actually came into force on a 29th of October very, very recently. And since then researchers can actually apply with their national kind of supervisory authorities like the so called digital services coordinators. And then the digital service coordinators go out and check the research question and the data that they want to ask for. And they can theoretically ask for any kind of data. Right? So like we're talking internal documents, we're talking like individual exposure histories or like individual kind of what did like individuals see? So like we were talking much, much, much, much further basically all the kind of data that platforms might have. And we're probably going to need some, we need to wait some time to kind of see how this will pan out because the regulators seem to be rather, let's say careful because they know that if they fumble this start that there might be like, this might be like even more detrimental to the entire project than kind of rejecting a few requests at the start. So I'm expecting the first decisions to come in at the end of the first quarter of next year. And this is also going to be interesting because there are some questions which researchers have already asked. So Algorithm Watch has already put in for example requests under Article 44, DSA but there's also requests where, which researchers are planning, which kind of take this route, this non public route, but they're actually asking for things that should be public. Right. And so we kind of see also the regulators navigate this space where they kind of again engage in this kind of boundary work with the platforms to kind of liberate the data and I think this is where researchers and regulators need to kind of at least communicate and coordinate in order to do this liberation. Because as it stands right now, there is not a lot of data that is at least accessible. Right? Like, I'm not saying that it should be freely accessible to everyone. I'm just saying it should be accessible if you can provide the proper mechanisms to safeguard it. But this is not even the case. So we have a lot of work to do.
[22:55]
B
I wanted to ask you as well, what should we know about how we got here? I mean, what are the kind of precedents for this current framework? How did this kind of come to be, particularly in Europe?
[23:06]
A
So, I mean, Peter already talked a bit about it with regards to the kind of Twitter API. And to be honest, I don't think there's a lot of precedence to this, like, at least not in, like, the formalized sense that the DSA kind of puts in place. And so, yeah, what we've seen is that, like, researchers were basically, I mean, okay, like what you could kind of reference were these early programs like the Twitter API, but then also social science one, right? But this kind of did not kind of serve the wider research community. This kind of led to a kind of situation where mostly US researchers got access, privileged access to data which would kind of further their individual careers. And again, no shade, this is how the academic system works. But in the end, it did not kind of contribute to a systematic change in how we come to understand these platforms. And I think that the DSA in theory, marks a departure from that, because in theory, everyone who researches what is called systemic risks inside the upper European Union is eligible for data access. Which means that anyone, like, they may be in Britain, they may be in the us, they may be in Australia or even in India, they can ask for data access under the DSA unless their research question is related, so long as the research question is related to the eu. And so, again, I think we'll see to what extent, and again, we've seen from the data that we've collected with the collaboratory that the platforms kind of do reject these requests, which we think is not understanding or interpreting the law correctly. And I think we'll see to what extent the DSA can actually live up to that potential of democratizing data access.
[24:41]
B
I mean, this is the big question, like, can this work? Does this DSA model work? Is the mental model that Brandi's created for us that will go to the, well, sorry, go to the river, get the bucket, bring it to the scientists and change will come. Peter, are there other kind of obstacles to that that you see in the near term? I mean, I know the framework is meant to kind of help clear some of those obstacles, but what are the other obstacles to making that, you know, somewhat perhaps simplistic model that I've just described work?
[25:12]
D
Yeah, I mean, I think it's important to underscore that, you know, no one data access mechanism is going to answer the range of questions, questions that researchers have with platforms. And I think it's also important to mention that in the context of the dsa, many platforms have built new processes. So Meta has deprecated CrowdTangle, but has built a meta content library that does provide some researchers access to data. What we've tried to do in the framework is articulate sort of three primary access mechanisms where platforms proactively provide data through a proactive data interface. And this could be them individually or also them supporting a third party to provide this type of access. We envision a world where platforms should respond to custom data requests from researchers. So researchers in a particular environment looking at questions that are outside the scope of the proactive data interface can look at publicly available or high influence public platform data through those requests for a data set or through an archive. And again, there's precedent for this. Twitter has hosted archives in the past. Somar at the University of Michigan has data archives to understand different platform dynamics. And then there's this independent collection piece where researchers either build their own tools to scrape or sort of crawl data from these sites. And I think there it's important to underscore or with the rise of generative AI, the rapid proliferation of third party tools providing just that. There is a booming commercial industry providing brands, providing generative AI developers providing folks who can pay for access access to this data. So it's not like it's technically unfeasible. It's just a question of whether researchers should be able to have independent free access to this data to look at questions in the public interest. And I think when you look across sort of regulatory models emerging, I think we expect that in the UK data access is a focus. There's a task force that's being developed. I think from what I hear from the discussions there, they're learning lessons about the DSA infrastructure and the sort of costs of building some of this infrastructure. Because, you know, a different approach would be saying we want to clarify legally that terms of service do not prevent independent researchers from going and analyzing platform dynamics. And this is exactly actually what was agreed to with AliExpress. Under the digital Services act enforcement framework. So they've said we'll change our terms of service, we'll enable public data research. And actually we went the other day and looked at their terms of service and there is an explicit carve out for DSA access that would give many NGOs, many researchers around the world world confidence that them going and looking at these public conversations are not going to be caught up in sort of legal wrangling or potential legal challenges. And in the US context, I think a lot of the focus is on pata, the Platform Accountability Transparency Act. At the federal level there also have been state level proposals and increasingly a lot of those are oriented around AI and generative AI, which platforms are using algorithmic recommender systems to surface and distribute. And so there's a lot of overlap between some of the transparency efforts that are being discussed in the US context. So I think, you know, there's no one size fits all like the DSA in my view is probably not going to be replicated around the world. But there are multiple avenues that offer sort of short term, you know, incremental opportunities to open up access while we also look at longer term, more holistic solutions.
[28:48]
C
Can I come in on the question of is this going to work? Because I have a few thoughts on that. One is about the threat of corporate capture within this entire process. As I spoke about earlier, platforms are going to do everything in their power to resist giving researchers access to data. They have no incentive to make this happen. We've seen that voluntary pressure, pressure for them to do this voluntarily has failed, which is that this is, you know, the recognition of why the DSA Article 40 exists, is that recognition that we have to mandate this kind of data access if we're going to get it at all. And all of those systems, systems that rely on, you know, a regulator, a state body to intermediate, that are have the potential to be captured by the corporations. Even if not captured, there's the issue of legal challenges that are going to be, we can expect to see lawsuits that are going to be filed, challenging the requests on the basis of intellectual property grounds, challenging on the basis of privacy grounds, challenging on the basis of legal privilege grounds. Right, where we will see all of those things happen. And then I think there's also a fundamental shift. So I guess is this going to work? We got to be prepared for that. And then it might. And I think from the perspective of the research community ourselves and as researchers, there's also going to be kind of a fundamental culture shift. I think that needs to happen because we've been in a situation of mother may I. Right, with the platforms where it's been very much, and as Elka alluded to, a kind of seen as a really nice privilege if you get to have some access to data to write the research paper that you really want to write that advances your career and provides definite benefit in those kind of answers to the public. And what the DSA offers with Article 40 is a way of sort of a recognition that this research is fundamental to the protection of European democracy and European societies and that researchers have a play that critical role in it. And we actually have a right to access this information that is required to play this critical role that, that researchers play vis a vis the public. And for us to shift from this situation of data as a privilege to data as a fundamental necessity for us to do our jobs in service of the public interest is going to be a culture shift that needs to happen within the research community to be ready to step into that role. And so a lot of that is what coalition members are doing amazing work to propel. But I just wanted to add those two things as additional dimensions of. Yeah, in the question of will this thing work?
[32:00]
A
If I can pick up on that, I would completely agree. And I think there's more cultural shifts that need to happen also because researchers tend to not. I mean, while there is like a culture of collaboration, it is not really the case because the structure incentives do not kind of give you and your research group a professorship. They give individuals a professorship. And I think that like researchers are quite happy if they can do their research by themselves, but they are up against a lot of kind of structural asymmetries with regards to both the regulators, which is in this case like the supervisory or like the supervisory or an enforcement authority of these data access procedures is the European Commission, which is like in itself a very politicized body, which has led to worries of them kind of maybe sacrificing it, maybe kind of compromising, maybe having only enforcing some of the, the data access procedures.
[32:50]
B
Right, so like only sort of enforcing the DSA altogether.
[32:55]
A
Exactly. And so like I think that researchers need to come to understand themselves. And I think it's also like then kind of contrasts with like a kind of long standing tradition and like post enlightenment research, basically that you're standing outside of the context, right? You're looking in, you have like a kind of bird's eye view on things. But no, you're in the midst of it, you are political actors in it. And the evidence you produce can lead to. Might lead to change, but might also lead to. Kind of. Might also support the kind of political ambitions of the European Commission. And what I'm trying to do also with the work that we're doing at the collaboratories, to kind of sensitize researchers to this role and to get them to think about how do I position myself within this, and where are the people standing left and right to me that I can kind of join up with to kind of put more pressure on not just the. The platforms, but arguably also the regulators, because in the end, these are the ones who will have to either fine or put other sorts of pressure onto the platforms based on the evidence that we produce and the problems that we highlight.
[33:54]
D
And this brings me back to the reason why we supported this process to develop the Better Access Framework, which was giving researchers the space, people who deal with this data every day, journalists in newsrooms who are reporting with publicly available platform data, giving the space to articulate in policy language what data we need and want from platforms to understand our information environment. You know, the DSA doesn't solve this article 4012 just says, publicly available in the interface. What is that? What does that mean? What are the privacy and ethical implications of that data? So bringing a group of researchers together saying, you know, this is actually what we need as a bare minimum understanding. And if you're not providing this, you're not providing enough, you're not providing nearly enough.
[34:38]
B
I want to cast our minds just forward a little bit, maybe, in closing here and kind of try to imagine a little bit of the sort of future, what things might look like, perhaps when we've got a few years under our belts on this. What do you think it looks like, Elka? What do you imagine? Are there lots of PhDs being minted, of course, on data that's been, you know, provided by platforms through these mechanisms or collected independently through these mechanisms. What will it mean, I suppose, for the future of the way we relate to technology? I keep thinking with, you know, artificial intelligence at this beginning point, I suppose it's almost like we're at a kind of inflection point. It feels like we're kind of seeing a kind of pulling away almost of what the industry knows and is capable of in terms of producing, you know, new models, technology that affect society and what independent researchers are able to scrutinize from the outside. What does the future look like in 5 years, in 10 years time? If we get this right?
[35:41]
A
Yeah, I mean, if we get it right, like, PhDs will be minted for sure. But like the, to me this is not clear, especially if you look at the way that the US Administration is behaving in the way that like the European Commission also kind of is engaging in kind of, how do you say, Tradecraft. Exactly. Trade craft with these companies. So I think that like from what we see, like it will be like a very, very time consuming and resource consuming kind of uphill battle to kind of tweeze every bit of data out of, kind of out of the platforms. And I'm not sure if this will kind of be sustainable in the long term. We might see a culture shift also when the kind of platforms do not feel that like they are backed from the White House and that they might actually have more incentive to collaborate, which I think is driving some of the kind of non collaboration that we're seeing right now. So if everything goes well, I think this is really like an amazing way to kind of open up and kind of understand these what we often call black boxes and to kind of do better regulation on it if it does not work. I think that the DSA still allows us to start to understand what platforms should provide and the kind of alternatives that may pop up in the stat may kind of then kind of have these ideas built in them from the start to kind of really account for researcher data access, like without even being a very large online platform, but just because it is such a quintessential democratic function. So either way, I think that now that it's on the table, like you won't get the genie back into the bottle.
[37:25]
B
Peter Abrandi Imagination five, 10 years.
[37:30]
D
Yeah, I think in the last several years, you know, as this access has been restricted, like we've seen really a broad coalition emerge around tech companies accountability that we have not seen before. Like speaking from the US context where I'm from, you have parent groups on the front lines of these debates. We recently had an election in Virginia where the Democrats did very well and reportedly data centers and technology infrastructure was an animating factor in driving people to the polls. So we're seeing a broader coalition care about these issues really front and center. And I think we've talked about how this is a niche issue, but this is a niche issue that provides sort of infrastructure for a broad range of issues. And I think to the degree to which this community can respond to those interests, but also provide resources and opportunities to expand what we know about these black boxes, I think we're going to see increasing pressure for there to be more disclosure, more information, more Scrutiny.
[38:34]
C
I want to talk about my vision for, like, 20 years into the future. I want to think more expansively than five years. And I think some of what Pete is talking about, about accountability is. Is a. Is a beginning place. And it's maybe one that we'll start to see sooner than 20 years, which is, you know, right now, if you see something that's happening on your own experience on social media, and you're thinking, you know, maybe it's. It's. We've seen such harrowing reporting about how, you know, children are being impacted, for example, by the rise, this huge availability all of a sudden of chatbots, right? So maybe it's like you see something happening in your child's own life, in their own experience, and you are able to, you know, thanks to independent research on this topic, understand that you're not alone in that experience, that there is a documented pattern there of harm, and that there is something that could be done about it, something that could be different. That's accountability. And in my view, when people know those kinds of things and they trust that kind of research, they will make different decisions. They'll make different decisions for their own children, for their own communities, their own workplaces. But we will also demand better from our elected lawmakers to help create those safeguards that help us to have a healthier society. And so I think that's the accountability piece. But I also think that there's this bigger vision that to me has to do with, like, freedom and has to do with better technology that I think is ultimately the vision, because it's so wild that the only questions that we can ask right now are largely being asked by companies that are thinking. They think that they're thinking big. They're not thinking big. They're thinking very small because they're thinking about money. And when you start thinking about, how can this experience. How can this experience of this technology be better for, like, my community, my family? When you bring more people into the fold that are not just obsessed with money, you can build great things. You can build things that people actually want to be part of. And, you know, we're here at Mozfest talking about, like, the early Internet. Weird, like, hearkening back to that spirit. And I think that there's a link here with, like, D data access and democratizing information because it enables us to ask the questions that if we're just thinking about making more money, we would never ask. And I believe that we will begin to learn things and understand things that can actually help us to build better technology that serves people. And so when I think about like the 20 year vision, I think about ultimately one in which we are using technology, but it's technology that we want to use, it's technology that we enjoy using, using and that makes like our lives and the lives of the people around us better.
[41:41]
B
Platforms that are perhaps built to be observed or built to engage individuals in the science studying them. Perhaps we can imagine all of that. Let me ask the three of you to just tell my listeners where they can go to find, well, your reports, your work. Quick shout out to your websites and your social handles.
[42:02]
D
Peter, go ahead to KGI Georgetown Edu and you'll be able to find the better access report.
[42:08]
C
Brandy, join the coalition@independenttechresearch.org and check out.
[42:14]
A
Our work at DSA40collaboratory EU.
[42:19]
B
I look forward to speaking to you all again. Sometime perhaps we'll find another castle in another wonderful European city. Always up for it. Thank you very much and thanks to Mozilla for allowing us to use these wonderful SHURE microphones.
[42:33]
C
Thanks so much.
[42:34]
D
Thank you, Justin.
[42:34]
A
Thank you.
[42:48]
B
That's it for this episode. I hope you'll send your feedback. You can write to me at justinettechpolicy Press. Thanks to my co founder Brian Jones. Thank you for listening.
[43:06]
A
Tech Policy Press.