
Loading summary
Tim Peterson
Foreign. Hello.
Kamika McCoy
Hello and welcome to another episode of the Digiday Podcast. I'm your co host, Kamika McCoy, senior marketing reporter here at Digiday.
Tim Peterson
And I'm Tim Peterson, executive editor of Video and Audio.
Kamika McCoy
Hello, Tim, and welcome back to the States. You have been in Europe for the Digiday Publishing Summit and you are finally back in America where we are at the tail end of the election, election cycle. So welcome back to election day.
Ingrid Verscherin
Yeah, no, it was nice being I was in Europe for, I think, a.
Tim Peterson
Little over a week. It was nice to be on the other side of the Atlantic until we spent some time in France before going down to Barcelona for the Publishing Summit and then of course, trained to the publishing summit, have some fellow Americans a row behind us talking about the election. I was just like, oh, yeah, welcome back to the States.
Ingrid Verscherin
A jump scare a little bit.
Kamika McCoy
So, yeah, I, I live in Georgia, which is considered a swing state. So we have been pummeled with ads, whether it be my poor suffering mailbox, which has not seen Anything less than 5 ads and flyers a day. Streaming services. You're getting pummeled with them. I have not even turned on the radio out of fear. It is relentless out here.
Tim Peterson
How many political text messages are you getting a day?
Kamika McCoy
Enough to make me cry.
Ingrid Verscherin
Yeah, enough to make, enough to make.
Tim Peterson
Me want to change my number.
Kamika McCoy
I can't keep seeing these text messages, but it is election day, so hopefully that's, that's winding down. But I think it speaks to how much money is being pumped into this space to, to get, you know, on the, on the fence voters off the fence. But you just got back from, like I said, Europe, where we had the Digiday Publishing Summit, which is also today's guest. So who are you talking to?
Tim Peterson
Yeah, so I spoke with Ingrid Verscherin from Dow Jones and she, we talked a lot about AI because she oversees.
Ingrid Verscherin
You know, AI and data over there.
Tim Peterson
And so we talked about how they're incorporating generative AI tools into the newsroom, different models that they're using, and how they're evaluating models. Like, we go pretty deep. At one point, we get into a discussion of retrieval augmented generation, which is a way to like, pipe data to large language models. So it's, it's, I really enjoyed the conversation. It seemed like folks in the crowd enjoyed the conversation. Hopefully listeners will enjoy the conversation.
Kamika McCoy
Nice, nice, nice. Okay, we will get into that. But before that, we're going to talk about the juicy scoops this week, including that recap, an update on AI search and how news outlets are choosing to not endorse anyone and see how that's affecting their business. And then finally Comcast is potentially spinning out their cable business. So we'll get through those things before we get to our interview. But given you're off the heels of the publishing summit, kind of what were some of the hot topics?
Ingrid Verscherin
Yeah, site traffic challenges for publishers was the big one.
Tim Peterson
So for anyone who hasn't been to one of our summits, one thing we do. My favorite part of the summit is we have these behind closed doors town hall sessions with attendees. In this case it's with publishers, in other cases it's with agency executives. And we asked them to write down the big challenge that they are currently facing. They post them on what we call the challenge board so that everyone can kind of see what are the big challenges facing publishers at the moment and also what are the most common challenges. And I didn't do the actual math on it, but it looked like site traffic was the most common, most prevalent challenge across the publishers. So we spent a lot of time talking about site traffic challenges, you know, how publishers are trying to figure out how to get people to be coming to their sites, how they're trying to figure out how to, you know, generate more revenue for those who are coming. So basically increasing revenue per page view in a way. And then there's also a fair amount of conversation about search related traffic declines. And we had a session on stage with Martin Liddell from Reach PLC. There's a write up on Digiday.com for anyone who wasn't able to be in Barcelona with us talking about how Reach has seen. Reach saw in, I believe it was September, Google search referral traffic was down 25% year over year. And he said that's been pretty steady over the past year. And I think Sarah Waglioni, our senior media reporter has written a fair amount about this over the past two years about how just search traffic to publishers has been on the decline. So that was also a fair amount of the conversation.
Kamika McCoy
Yeah, search has been a. Between, you know, Google being in court and, and sponsored ad and AI Search has been a very much talked about space here lately. Curious if any of these conversations ChatGPT or Meta came up. Right. Because they're making some movement and making some big promises.
Ingrid Verscherin
Yeah, it's weird.
Tim Peterson
Those didn't really come up. So OpenAI this past week announced that it's incorporating search into ChatGPT and it's officially started to roll that out for. I think it's like ChatGPT plus subscribers and team subscribers. And then the information reported that Meta is reportedly developing an AI powered search engine. So a lot of competition, which I imagine can be welcome news for publishers.
Ingrid Verscherin
Because it's like, okay, if we're not.
Tim Peterson
Getting the traffic from Google Search, maybe these other search engines can drive traffic to us, obviously. Still a lot remains to be seen when it comes to Meta because that's just in development. Meta hasn't officially announced anything but ChatGPT having search. We do know a lot more about that, like is including links to sources. So there is a mechanism within ChatGPT search to drive traffic to publisher sites.
Kamika McCoy
It'll be interesting to see who, you know, what deals roll out here and what publishers want to strike deals in this space. But I think what's just having worked in this industry for so long, it's almost a, you get whiplash from what happened with the pivot to video and others. You know, some of these promises that were made by tech companies a couple years ago when, you know, publishers were hurting for traffic, where we got reliant on Google and we got reliant on Meta and you know, their social media platforms. So the question then becomes, is this, you know, act two of that?
Tim Peterson
Yeah, and that's always kind of an evergreen issue because that was also a large part of the conversation during DPSC talking, you know, about those site traffic challenges of just this tension of wanting publishers, wanting to get people to come to their owned and operated properties, but also acknowledging they need to recognize like they have to figure out how to get to audiences that are on these third party platforms like Meta, Google Search or like Facebook Google Search. Now chatgpt with search. And yeah, so I was playing around with the ChatGPT search this morning just to see like how it works, how, how well it works, what these links look like. And in OpenAI talked about how it has these deals with news organizations and that's being incorporated into ChatGPT search. From my experience, the only news publishers whose links I received are publishers who have deals with OpenAI. So I asked ChatGPT, hey, what's going on with this Washington Post endorsement gate? Catch me up. And it like did a pretty good job of giving me like the sort of Wikipedia digest of what's going on, having links, but the links were to the ap, the Atlantic, New Yorker, Wall Street Journal, all of who have deals with OpenAI for content license. But I know Digiday has also written about this. So I asked ChatGPT like, cool, thanks for this. What, what's Digita, you know, said about this, like can't find anything there. Even sent ChatGPT had this eventually sent to ChatGPT a link to Sarah Guaglioni's coverage of how all you know, Philadelphia Inquirer and the Guardian are getting subscribers because of what's been going on with The Washington Post, LA Times and USA Today. And then ChatGPT was able to go through and like summarize that article, but I had to prompt for it. So which leads me to wonder how long is that going to be the case in which ChatGPT search is limited to those publishers who have been willing to do deals with OpenAI or in position to be able to do deals with OpenAI? And what's that going to mean for the full long tail of publishers that don't have those deals in place?
Kamika McCoy
Yeah, especially when you're talking about the many startups that have happened within these last couple of years. And then also local news which have really been taken a hit for God knows how long at this point. I will say kind of moving into our next juicy scoop here that publishers have not necessarily helped themselves when it comes to subscribers with being on the fence about endorsements, something that this industry has done for, you know, God knows how long endorsing candidates both on a presidential level and also local level. But the Washington Post and a handful of others are seeing some backlash and loss of subscriptions because of a lack of endorsing. So what are you seeing there?
Ingrid Verscherin
Yeah, I mean, I think the last.
Tim Peterson
Count reported that I saw reported by NPR was the Washington Post had lost more than 250,000 subscribers. And those I believe were subscribers who had said that they canceled for editorial reasons. So that's like the closest you can make the connection between those subscription cancellations and the Washington Post owner Jeff Bezos deciding to block the paper's planned the paper's editorial board had planned to publish a presidential endorsement. Its owner blocked. That same thing happened with the LA Times and its owner Patrick Soon Xiang. And then Gannett, which owns USA Today, made the decision. USA Today is also not going to publish a or have its editorial board publish a presidential endorsement. And so that's like you mentioned, led to a lot of subscribers canceling because they don't like to see that from these news outlets. And then it's also led to some staffers quitting. And in both cases that's created an opportunity for other news outlets to gain subscribers, even to gain staffers, because I think it was the Atlantic ended up hiring two of the Washington Post editorial staffers that had left as a result of Bezos deciding not to publish an endorsement. So it's just a huge issue and not great, because news publishers, the news business is always in a tough spot. But it feels like, especially right now, in a very precarious position, especially looking ahead to what the outlook is going to be for the freedom of the press after this election.
Kamika McCoy
Yeah. They say democracy dies in darkness. So very curious about kind of where that catchphrase lands, you know, after. After this. But it begs.
Tim Peterson
Because I believe that's the Post catchphrase, if I'm not mistaken.
Kamika McCoy
Yeah, it is. I do. I do think that this begs the question of, you know, the. The idea of the benevolent billionaire that owns these publications. Right. Like between they're trying to get subscribers on board and the loss of revenue from advertisers and turning toward. If you don't get landed by a private equity firm, you know, you get landed by a billionaire and hope that it's benevolent. But should billionaires own publishers? Right. If there then becomes the clash between the owner and the editorial staff? Because then how do you maintain that integrity?
Tim Peterson
Yeah, Because, I mean, the big optimism was it roughly a decade ago when Bezos acquired the Washington Post, was, oh, great. Now you have this person who has all the money in the world. They can own this news organization and ideally be completely hands off. And it's kind of like a philanthropic effort for them. Same has been said about Lorraine Powell Jobs owning the Atlantic, Patrick Soon Chiang owning LA Times, Marc Benioff acquiring Time Magazine. Cool. You have these people who don't really need to worry about if news organizations are going to make enough money or not, or won't feel a pressure to push a publication like the Post or the LA Times to get into clickbait just to drive revenue. They can really just commit to the new side of things. But now it seems the fear for these billionaires is news organizations facing the ire of Donald Trump especially, and Republicans in general. And so not seemingly not wanting to publish these endorsements for fear of, you know, the backlash that they could get not only for their news organizations, but when it comes to, like, Jeff Bezos, he has his space company, which is going to be doing deals or is doing deals with the government also facing regulation, all of that stuff, doesn't want the repercussions that could come with the Post potentially endorsing Kamala Harris. So. But so now the question has been like, should Bezos continue to own the Post? There is one billionaire who seems to have made the decision like, don't really want to be in the news business anymore. CNBC reported that Marc Benioff is looking to sell Time magazine.
Kamika McCoy
Well, the news business is in a precarious time right now, but you know, I feel like that's an evergreen tweet, you know what I mean? And it's not just publishers on the, on the, you know, on the newspaper and magazine side. Because Comcast is also looking to spin off its cable business, which includes news, which is CNBC and msnbc.
Tim Peterson
The big unknown on that front is so Comcast. It's, you know, we should say it's Comcast cable TV business, not its cable, like the broadband Internet business. But that would include the TV networks, cable TV networks, CNBC and MSNBC. It's unclear if that would mean a sale of NBCUniversal's news organization in general or just kind of the footprints, I guess, of CNBC and MSNBC and like their brand. So there's, it's all kinds of messy, this potential sale because you would think also Peacock, Comcast is probably going to.
Ingrid Verscherin
Want to hang on to that.
Tim Peterson
If they're hanging on to NBC, the broadcast TV network. You would want Peacock because that's the future of this business. But a lot of Peacock programming is Bravo, it's E. And you know, as well as CNBC and msnbc. And so how do you divest of these cable TV networks but then maintain enough programming for Peacock to be successful?
Kamika McCoy
Yeah, you've got that funky juxtaposition of, you know, when media, legacy media companies kind of came into the streaming space of now trying to divvy up and make good on that on that bet. And I think we're starting to see that shake out here.
Tim Peterson
Yeah, yeah. I mean, we also saw it. I think we had this conversation like over the summer when both Paramount and Warner Brothers Discovery took write downs on their cable businesses. So Comcast is really just the last to take a look at its cable business and be like, this isn't a great business. Maybe we want to get out of it. That said, Disney had dangled a similar possibility last year when Disney said maybe.
Ingrid Verscherin
We want to reevaluate our cable TV.
Tim Peterson
Portfolio and see if we sell off some channels and all of that. But nothing has ended up coming of that. And it seems like, I think it was Matt Baloney from Puck who reported that a lot of that was because ABC is so dependent in some ways on the programming of those networks or there's also the ABC news side of things to consider. So Disney was in a similar position.
Ingrid Verscherin
To Comcast a little over a year.
Tim Peterson
Ago, ended up not doing anything. So for all we know, Comcast doesn't do anything here, but it seems more and more likely we're going to see at least one of the major cable TV conglomerates sell off its portfolio and or strike some sort of streaming venture. Like maybe we see some merger of Peacock and Max, or Paramount and Max, or Paramount, Peacock and Max all together.
Kamika McCoy
Then we just have cable TV all over again.
Tim Peterson
Exactly.
Kamika McCoy
Back at square one.
Tim Peterson
Nothing actually changes in this business.
Kamika McCoy
Love that for us. Love that for us. Well, that's all the juicy scoops that we have for you today, folks, but we've got an exciting conversation happening with Dow Jones. Remind us one more time of who our guest is this week.
Tim Peterson
Yeah, it's Ingrid Verschurin. And we talk, you know, pretty in depth about how they've been incorporating generative AI technologies into especially their news content and how they've been evaluating large language models, switching large language models. So anyone who's interested in kind of technical look at AI and newsrooms, generative AI in newsrooms. This should be hopefully right up your alley.
Kamika McCoy
All right, well, with no further ado, let's get to it.
Ingrid Verscherin
Please take a seat.
Joanna Sturm
Thank you.
Ingrid Verscherin
So, AI, like I mentioned, we kind of haven't been talking as much about AI during the summit.
Joanna Sturm
I know.
Ingrid Verscherin
As I had expected coming in, I figured coming off the perplexity news, everything going on with OpenAI, everything Meta announcing an AI search engine this week, it just seemed like AI was primed for the conversation. But I think maybe we were just waiting for this session.
Joanna Sturm
I might. I hope so. Happy to be here and happy to talk about AI.
Tim Peterson
Yeah.
Ingrid Verscherin
And so on that front, a lot of newsrooms, news organizations have been figuring out, okay, how do we incorporate generative AI into our operations?
Tim Peterson
And I feel like everyone's at some different level of.
Ingrid Verscherin
Okay, we're still experimenting. We haven't quite implemented it yet to. No, we very much implemented it. Where's Dow Jones on the spectrum?
Joanna Sturm
I think maybe a good thing to do is give you an example of one of the things that we did a couple of weeks ago, and it's something called Joanna Bot. And Joanna Sturm is one of our tech columnists. And every year she writes about the new iPhone and she does a review. And this year she actually thought, you know what, it's better to give readers a more interactive experience and allow them to ask very specific questions. So what they did or what we did, we developed an Interactive chatbot. And in order to make sure that we were using the right content, we were using all of her columns about reviews about iPhones over the last 10 years. So we used that as input. We used technical documentation as input as well. And then based on that, readers could start asking questions. And what it allowed us to do was one, to see what the reaction was from readers, whether they liked it or didn't like it. It also allowed us to actually see technically how it would work. So we used Google Gemini. And what it showed us is that human oversight was still really important. And it allowed us to also get a sense of cost, which I think is interesting as well. So it was a great experience.
Ingrid Verscherin
And I mean, that's a fairly big decision to even just do something. Especially when Joanna Stern publishes a new review related to especially an Apple product, a lot of eyes get on that. What was the decision making process behind doing it and making sure this was something that publication was willing to commit to.
Joanna Sturm
Yeah, so I think there's a couple of things there. And one of the things I think was it was Joanna's personal decision as well. She was the one who came up with the idea and she just really wanted to one, give readers a different experience and two, get her hands on tech. Right. So how did it actually work, but at a broader level? So if we forget about this specific experiment, the way that Dow Jones looks at AI or the way that we govern these decisions is we have an AI steering committee. And within that AI steering committee, we have representation from across the business. So it includes the newsroom, but it also includes representation from the commercial side. It includes representation, not surprisingly, from legal, from technology as well. And the function of this cross functional steering committee is really to ensure that whatever we do with Genai, that it fits with our core principles when it comes to Gen AI. So we look at every single request that we get and then decide, can we actually accept the risk? Can we accept it? But we need to mitigate. If we need to mitigate, what are the guardrails? And then make sure that ultimately the people from whoever department the request comes from, they are the ones who then execute on it. We don't get involved in the execution.
Ingrid Verscherin
Okay, that steering committee, when was that formed?
Joanna Sturm
I think it was formed 18 months ago, roughly. So it was really because Dow Jones had been using AI for many, many years. So it wasn't that AI suddenly was something new. But I think we felt that with Genai, we actually needed to really make sure that we protect our content as a publisher. We just want to make sure that we protect it. And it's so much easier to take content out of context. So we just really wanted to make sure that we had a much better grasp on that.
Ingrid Verscherin
Okay, and roughly how many people are part of this committee?
Joanna Sturm
I would say 10, roughly.
Ingrid Verscherin
And has that grown in the 18 months?
Joanna Sturm
No, we've actually been really strict to make sure that it's because we want to make decisions. And I think that it is tempting to expand it to like 20 people or 30 people. But what we did really well is really select the right people on the committee. So what we found is that actually we don't need to add people because we have across that Jones, the right representation.
Ingrid Verscherin
Okay, and how did you determine that? Was there specific criteria used to evaluate who should be part of this or who from specific teams should be the ones on the committee?
Joanna Sturm
Yeah, I think we knew the functions, first of all, so we wanted to make sure that each function had a representation. And then it was just internal discussions. So seeing who is the right person to represent that function or that department.
Ingrid Verscherin
How often does the committee meet?
Joanna Sturm
Every two weeks.
Tim Peterson
Okay.
Ingrid Verscherin
And are these like hours and hour long meetings, or is it very set agendas each day?
Joanna Sturm
It's very set agendas. So the way that the committee works is we look at internal use cases, but we also look at external use cases. So if we get requests from partners who want to use our content, then we want to make sure that, again, they follow the guardrails and the things that we feel are important.
Tim Peterson
Okay.
Ingrid Verscherin
And how does what gets discussed in the committee get shared across the wider organization?
Joanna Sturm
I think it depends. So what we are trying to do is we are trying to make sure that everyone within the organization really understands what our core principles are when it comes to Gen AI. So thinking about that, the first one, and I think this is no surprise to anyone in the room, we are a publisher, right. And that is really our. If you have to think about all the principles, this is our core principle. We want to make sure that we can continue doing really good journalism. And as a result of that, we want to make sure that our content is being used in a transparent, transparent way and that we are being fairly compensated for the use of our content. So that is the first core principle. The second principle is really we act as an arbiter for other publishers as well. And we do that one by talking about how we act as a publisher. So this is a good example. But also we do it through our B2B business, whereby we have licensing agreements with many publishers. And we want to make sure, again, if we get compensated fairly for our content, we want all of our publishers to get compensated fairly as well. And then thirdly, we want to innovate, because ultimately, we do believe that AI is going to help our business. We see it as an augmentation of journalism. So we want to make sure that we continue to innovate using AI or Genai.
Ingrid Verscherin
And on that front, does the steering committee set rules regarding how the newsroom uses AI or what limitations there would be, or there are conversations in which rules are created, maybe in collaboration with the newsroom?
Joanna Sturm
Yeah, and the newsroom is part of the committee as well. And that was a very conscious decision because we want to make sure that they have a say in how they want to use AI. I think a good example of where we saw that collaboration. One of the other examples where we are using Genai is automated translation. So we have a newswires business, which is a financial news business. And one of the things that we had been playing with and was part of the newsroom was traditional machine translation. And that was always then reviewed by editors, making sure that those translations were correct. With the rise of Genai, it actually gave us an opportunity to see how large language models were helping to improve the translations. And one of the things that we focused on, or one of the languages that we focused on first was Korean. What it allowed us to do was one, reach a broader market because we didn't have any Korean language news service. It also allowed our readers or our clients to have access to more timely information in their own language. Before we started that process, one of the decisions we needed to make was whether we were going to allow others to translate our content or whether we actually wanted to keep control over the translations. And that was discussed in the AI steering committee. And we reached the decision that ultimately the translation is still a representation of our content. Readers put a lot of trust in the content that we provide, and we want to make sure that we continue to deliver this trusted content that is highly accurate. We want to make sure that we follow the ethical standards. And hence, we made the decision that actually we want to control the translations because the translation is as important as the original content is.
Tim Peterson
Okay.
Ingrid Verscherin
And how did you go about controlling the translation? Is that just before, like, you have the AI process the translations, then before they actually go out to the newswire service?
Joanna Sturm
Yeah. So in this particular case, what we did is we had very clear quality process principles, and there were a couple of different areas that we focused on. One was using glossaries. So even though the models and specifically compared to traditional machine translation, the models are doing a job that is so much better. We still ran into issues with specific terms, specific words. So creating a glossary that actually focuses on that specifically when it comes to company names, for example, and as Korean is a different script than English, that was really important. So part of how we controlled the quality was by creating those glossaries. The second piece was to do a lot of testing. So we actually had two separate teams that were doing the testing. We have a very large research team within Dow Jones who are all linguists. So they were kind of the first round of testers because they wanted to make sure that from a linguistic point of view, all of what was translated made sense. And then the second layer of testing was actually the newsroom. Because sometimes what we found was that while strictly speaking the translation was correct, from a newsroom perspective, we could actually improve upon that. So we have to two layers of quality control, two layers of testing. And then thirdly, what we also found was that while we got to a point where in Korean we can actually auto translate, we still have oversight from the newsroom to make sure that what goes out is correct. It is also the control is also there to ensure that the prompts that we are using are still working correctly. Because what we noticed during testing, if you make even the slightest change to your prompt, suddenly your whole translation is off again. And you're like, why?
Tim Peterson
Oh geez.
Ingrid Verscherin
And then with, you know that with, you know, sometimes the newsroom having to make tweaks and adjustments to what is outputted by the AI, does that then get cycled back to the AI so that it's able to learn off of those changes?
Joanna Sturm
Yes and no. So it depends a little bit on what it is that needs to be corrected. So again, if it is something that we can fix with glossaries, that's actually very controlled environment so we can add new terms to the glossary. If it is something that is a one time error, we kind of look at it from a trend perspective. And if we see that the error happens more than once, what we would do is we would actually make sure that we adjust the pro problems.
Ingrid Verscherin
Okay, got it. And you mentioned earlier with Joanna Bot, that was using the Google Gemini model, is that also the model that was used for that Newswire translation?
Joanna Sturm
We started with that, but we've actually moved on to different models. And what is amazing, and I don't know how many of you have experimented with translations and using LLM models The speed in which they continue to improve is actually amazing. And especially if you're a linguist. 25 years, actually more than that. When I studied Spanish and Portuguese at university, I would have never, ever expected to see a machine doing such a good job.
Ingrid Verscherin
Okay, got it. And so what was it that you saw that led to? We need to change up.
Tim Peterson
We need to bring.
Ingrid Verscherin
Because it sounds like you didn't just change from the Google Gemini model to another model, but to multiple other models.
Joanna Sturm
Yeah, because one of the things that we are seeing is that Korean is the only language that is live, to be very clear. But we are experimenting with other languages as well. And what we are seeing is depending on the model, some models will battle with different languages. So it's not one size fits all, but it's actually trying to see, okay, let's use this model. Right. We test all the models and then we pick the one that we think works best with the language.
Tim Peterson
Okay.
Ingrid Verscherin
And that process of, you know, going from one model to a different model, it doesn't seem like that's. That's an easy hot swap to.
Joanna Sturm
It is not. Because you have to almost start all over again. Right. You have to start again from scratch. But you can use the experience, of course. Right. Because you've learned a lot, so you definitely can use that. But it's not a lift and shift. So you have to start from scratch, basically, and then do the testing again, making sure that it still works before you actually make the switch.
Tim Peterson
Okay.
Ingrid Verscherin
So what does that timeline look like from. Okay, we're using the Gemini model. Oh, maybe we need to switch to a different model. What are the other models that would be out there? Okay, we found the other models we want to switch to, but then there's a transition period, I would think, there. Like, can you walk me through how long it took each of those steps?
Joanna Sturm
Yeah. And I think going back to what I was saying earlier about the. You learn from the extension experience, I think when we started looking at it and we started actually off by traditional machine learning, machine translation. So when we started looking at that until we went live with Korean, that took roughly a year. Right. Because it was. You have to get it right. So we, we, we wanted to make sure that when we looked at the quality scores, we were getting to a level that the newsroom felt confident in signing off on. So that took around a year. Switching from then one model to the next model took maybe, I'm going to say six months, but I'm putting an asterisk there because I'm not exactly sure how long it took, but it took relatively a much shorter time.
Ingrid Verscherin
Okay, and is that something where eventually, like, the more familiarity you are with this technology, the better the technology gets, that switching time would shorten further? Like the next time it could be three months instead. Or do you think six months is roughly how long, at least for the foreseeable future, it would take to switch models?
Joanna Sturm
Yeah, that's a good question. I don't know, to be honest. What I can share is that there's a lot of. We didn't necessarily pick the easiest language to begin with, and if we had focused on German or French, that would have actually been a better choice. So not only does the model influence the timeline, but the language itself actually has an impact on that as well.
Ingrid Verscherin
Okay, got it. So with that, you mentioned in switching models, you basically started from a blank page. Having gone through that process, is there anything you're able to take from it so that the next time there may be a model switch, you're not having to start quite with so blank a page?
Joanna Sturm
Yeah, that's a good question. I don't know, actually.
Ingrid Verscherin
So I guess we'll have to talk next time.
Joanna Sturm
I'll come back after six months and we can talk about it. Nan, I think this goes back to what we were saying earlier, that ultimately the business unit that runs the project, they know all the ins and outs and they know all the technical details. This one in particular, because the team that I manage at Dodge Jones did a lot of the initial testing, which is why I know a little bit more about this one in particular.
Tim Peterson
Got it.
Ingrid Verscherin
And to what extent are there opportunities for cross departmental learnings with the different models? Because this is a very specific use case of translation. So some models may be better for translation, which may the next time there's a Joanna bot. I don't know how helpful it would be to know these models are good at translation, but the next Joanna bot may not need those models. Or it could be something where it's helpful for the different departments that have more familiarity with given models to be sharing that so that models are evaluated. I'm going to stop saying models because the word's starting to lose all meaning.
Joanna Sturm
I will try to avoid saying that as well. What I think is interesting, and this is another role of the AI steering committee, right? Because ultimately what it means is that the committee has access to all of the different use cases. So when we get a request for different use case, we can actually say, oh, have you spoken to this department? Because they are using this. Have you spoken to that department? Because they are using that. The other thing that we are really, really trying to do and very conscious of, we want to make sure that we have. That whatever we develop can be applied in multiple instances. So I think when you think about a chatbot experience, I think the experience of the Joanna Bot feeds into that. But ultimately we want to make sure that no matter what the business unit did, that the foundation of that chatbot would always. That we can apply that in different instances. So that's, that's kind of how we look at some of the applications.
Ingrid Verscherin
Okay. And then staying on top of. Because these models are being updated, I think Google's, I think, reportedly going to have a big update in December. Meta with llama, they're always updating OpenAI down the list. Everyone's always updating their models. That term. Again, is there any kind of like, secret sauce to staying on top of how the models are being updated? Or is it just reading the coverage? Reading, I'm sure, Dow Jones, Wall Street Journal's own coverage of these models, reading documentation? Or is it a lot of meetings or evaluating beyond just what everyone else may have expected it to be?
Joanna Sturm
All of the above, I would say.
Ingrid Verscherin
Are they coming in? Like, are any of the AI companies coming in and pitching, hey, here's our latest model. Do you want to use it?
Joanna Sturm
Yes, but I think, again, it is. One of the things that I think is important is again, we don't look at AI as a threat. We actually do see it as an opportunity. So ensuring that everybody understands what an LLM is. How does a react model, like how does everything work? Is important. And then we have specific teams that focus more on the technological aspects of each model.
Ingrid Verscherin
Just because you mentioned the term and I'm rag, Retrieval Augmented generation. For anyone who doesn't know, I'm working on an explainer video about rag. So since you bring it up now, it's become a pet obsession of mine.
Tim Peterson
Lately because I'm trying to learn about it.
Ingrid Verscherin
So anyone in the audience who doesn't know, and correct me if I'm completely butchering this, but retrieval augmented generation is basically a process for a publisher like Dow Jones to take a proprietary database, make it available to an LLM, while also providing the LLM with the right context to be evaluating that data so that if there is a chatbot like Joanna Bot, that a prompt goes to the LLM and then the LLM knows how to find that information within the database to then spit out the correct answer. RAG seems to be fundamental. To everything that's going or to many of the things that's going on in terms of chatbots or just making information from publishers available through these LLMs. Is there anything that's important for especially me as I'm working on this explainer video to know about the rag process or how Dow Jones has incorporated it?
Joanna Sturm
I think I'm going to use two learnings from the Joanna bot that hopefully help explain or help answer that question. The first one was that it was a very controlled content set that we used because we used all of our columns and we used technical documentation. So it wasn't that we went out to the Internet and used everything available. That was what was going into the rack. That is the content that we used for the answers. Even so we did see hallucinations. And what I will add is that it was definitely people who were trying to break the bot. Right. So it wasn't with the first question, but it was something that did happen. So even at a very controlled content set that might happen. The second thing is really thinking about what content is needed. So one of the things that Johanna found out as she was working on this project, it's not cheap, right? So the more content you have that you use, the more expensive it.
Ingrid Verscherin
How much does it cost?
Joanna Sturm
That I don't know.
Tim Peterson
Okay.
Joanna Sturm
But actually I think she has written about it, but I just don't remember the answer. But it was not cheap and this was a very small content set. So the larger your content set, the higher your costs are going to be. But what it allows you to do is control and make sure that it uses your content when answering questions.
Tim Peterson
Okay.
Ingrid Verscherin
And I imagine that can be then play into some like these broader deals going on between publishers and AI companies. Because we had our publishing summit in the US last month and I think when Mark Howard from Time was on stage and he was talking about like the structure of these deals, that it's not just licensing payments, it's not just Rev Share, that these AI companies also want publishers to be using the LLMs, using the technology, because that's helpful training for them. And so as you all are evaluating these models, incorporating this and as the broader News Corporation is doing these AI deals, to what extent does the use of the model figure into the deals or the deals figure into the evaluation of the model? So for the various uses like we've.
Joanna Sturm
Been talking about, I think without going into too many details about the OpenAI DLB, because I think that's the one that you're referring to, certainly not for flexib. What I think was really important to us and what was really really important to News Corp is the value of the content. Right? We want to be absolutely sure that we get fairly compensated for the content. We want also to be sure that it's very transparent both to us how our content is being used and similarly to the users that they know where the content is coming from. So the focus of that deal was really all about the content and the value it brings to OpenAI.
Ingrid Verscherin
Okay, we're way over time I think at this point, but that's what happens when you talk about AI. So Ingrid, thanks so much.
Joanna Sturm
Thank you very much for having me.
Tim Peterson
Thanks for listening to this episode of the JJ Podcast. If you enjoyed it, please leave us a rating and a review on Apple Podcasts, Spotify or wherever you're listening. Get more from Digiday with our daily newsletter sent out each weekday morning. Visit digiday.comnewsletters to sign up.
The Digiday Podcast: Inside Dow Jones’s AI Governance Strategy with Ingrid Verschuren
Release Date: November 5, 2024
Introduction
In this episode of The Digiday Podcast, hosts Kamika McCoy and Tim Peterson explore the evolving landscape of digital publishing, focusing on the significant challenges and innovative strategies that media organizations are adopting. The highlight of the episode is an in-depth conversation with Ingrid Verschuren from Dow Jones, who provides valuable insights into the company's AI governance strategy and its implementation within the newsroom.
Pre-Interview Discussions: Navigating the Digital and Political Landscape
Election Day Hustle and Advertising Overload
The episode begins with a candid discussion about the relentless surge of political advertisements during the election cycle, especially in swing states like Georgia. Kamika McCoy shares her experience, saying, “my mailbox has not seen anything less than 5 ads and flyers a day” (01:33). This inundation of political messaging underscores the massive financial investments pouring into influencing undecided voters.
Publishing Summit Insights: Site Traffic Challenges
Tim Peterson reflects on his recent participation in the Digiday Publishing Summit, highlighting a critical pain point for publishers: declining site traffic. “Site traffic was the most common, most prevalent challenge across the publishers” (03:12), Peterson notes. Publishers are struggling to attract visitors to their websites, largely due to diminishing referrals from search engines like Google. He cites Martin Liddell from Reach PLC, who reported a 25% year-over-year drop in Google search referrals (05:02).
The Emergence of AI-Powered Search Engines
The conversation shifts to the advent of AI-driven search technologies. With OpenAI integrating search capabilities into ChatGPT and Meta developing its own AI-powered search engine, there's potential for diversifying traffic sources away from traditional platforms. Peterson observes, “ChatGPT having search... includes links to sources” (05:52), suggesting that these AI tools could redirect significant traffic back to publishers’ sites.
Corporate Strategy Shifts: Comcast’s Potential Cable Spin-Off
Another hot topic is Comcast's consideration of spinning off its cable business, which includes major news outlets like CNBC and MSNBC. This move reflects a broader trend of legacy media companies reassessing their positions in the streaming era. Peterson explains, “It's unclear if that would mean a sale of NBCUniversal's news organization in general or just the footprints of CNBC and MSNBC” (15:05), highlighting the complexities involved in such strategic decisions.
Interview with Ingrid Verschuren: Dow Jones’s Strategic Approach to AI Governance
Establishing a Robust AI Governance Framework
Ingrid Verschuren provides an overview of Dow Jones’s proactive stance on AI integration. Recognizing the transformative potential of generative AI, Dow Jones established an AI steering committee approximately 18 months ago. “We have representation from across the business... newsroom, commercial, legal, technology” (23:06), Verschuren explains. The committee’s primary role is to ensure that all AI initiatives align with Dow Jones’s core principles: protecting content integrity, ensuring fair compensation, and fostering innovation.
Implementing Generative AI in the Newsroom
Joanna Bot: Enhancing Reader Engagement
One of the standout projects discussed is the development of "Joanna Bot," an interactive chatbot designed to offer readers a more engaging way to interact with tech reviews. Leveraging Google Gemini, the bot allows readers to pose specific questions about iPhones based on Joanna Sturm’s extensive columns and technical documentation. “It allowed us to see technically how it would work” (19:36), Verschuren notes, emphasizing the importance of human oversight to manage inaccuracies or “hallucinations” in AI-generated responses.
Automated Translation: Expanding Global Reach
Another significant application of AI at Dow Jones is automated translation, particularly into Korean. This initiative aims to broaden Dow Jones’s global footprint, allowing non-English speaking audiences timely access to news content. The process involves stringent quality controls, including the use of specialized glossaries and dual-layer testing by linguists and newsroom editors. “We want to make sure that our content is being used in a transparent way and that we are being fairly compensated for the use of our content” (25:18), Verschuren reiterates the company’s commitment to maintaining content integrity.
Challenges in AI Implementation
Navigating Model Switching and Technical Hurdles
A key challenge highlighted is the complexity of switching AI models. Unlike a simple upgrade, transitioning from one model to another requires a complete restart of the testing process. “It is not a lift and shift. So you have to start again from scratch” (33:52), Verschuren explains. This extensive process can take up to six months, making agility a significant concern for rapid technological advancements.
Cost Implications of AI Projects
Implementing AI initiatives like Joanna Bot is resource-intensive. The costs escalate with the expansion of content sets, making scalability a financial challenge. “The more content you have that you use, the more expensive your costs are going to be” (42:15), Verschuren notes, highlighting the need for cost-effective strategies in future AI deployments.
Ensuring Ethical Use and Content Integrity
Dow Jones emphasizes ethical AI usage, ensuring that AI applications do not compromise journalistic standards. The AI steering committee plays a pivotal role in overseeing this aspect, ensuring transparency and fair compensation. “We want to make sure that our content is being used in a transparent way and that we are being fairly compensated” (25:18), Verschuren asserts.
Retrieval Augmented Generation (RAG): Enhancing AI Accuracy
The discussion delves into Retrieval Augmented Generation (RAG), a method that allows AI models to access and utilize proprietary databases effectively. This process ensures that AI-generated responses are accurate and contextually relevant. “Retrieval augmented generation is basically a process for a publisher like Dow Jones to take a proprietary database, make it available to an LLM” (40:17), Verschuren explains, underscoring RAG’s significance in maintaining content accuracy.
Future Directions and Cross-Departmental Collaboration
Verschuren highlights the importance of cross-departmental collaboration facilitated by the AI steering committee. By sharing insights and best practices across various projects, Dow Jones aims to streamline AI integration and foster a culture of continuous innovation. “We are trying to do is we are trying to make sure that everyone within the organization really understands what our core principles are when it comes to Gen AI” (25:18), she states.
Conclusion
The episode concludes by underscoring the delicate balance between leveraging AI for innovation and maintaining stringent content standards. Dow Jones’s structured approach, led by the AI steering committee, serves as a benchmark for other publishers navigating the complexities of AI integration. The conversation with Ingrid Verschuren provides a comprehensive blueprint for ethical and effective AI governance in the media industry.
Notable Quotes
For more insights and detailed coverage, visit Digiday.