
Hosted by Unknown Author · EN

https://tagging.tech/wp-content/uploads/2018/01/jason-chicola.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Jason Chicola. Jason, how are you? Jason Chicola: Doing great, Henrik. Thanks a lot for taking the time. Henrik de Gyor: Jason, who are you and what do you do? Jason Chicola: I’m the founder of Rev.com. Rev is building the world’s largest platform for work from home jobs. Our mission is to create millions of work from home jobs. Today, we have people working on five types of work. Jobs they could do in their pajamas. And the main ones are audio transcription and closed captioning. Several of my co-founders and I were early employees at Upwork which is the largest marketplace for work at home jobs. Rev takes a different approach than Upwork. With Rev, we guarantee quality which means that the task of managing a remote freelancer, hiring the right one is something that our platform excels at. And so what that means is our customers have a very easy to consume service. You can think of it… you can think of us as Uber for work at home jobs. So if you wanted to come to us to get say for example this call transcribed as you know as a customer all you have to do is upload an audio file to a website and then couple hours later, you’ll get back the transcript. Now behind the scenes, there is an army of freelancers that are doing that work and we have built our technology to make their lives easier and make them productive. If I zoom out from all of this, I look at the world and see a lot of people who are sitting in cubicles that probably shouldn’t have to, while he was in traffic and it shouldn’t have to and I look at what are all the kinds of jobs you will do today at a computer. How many of those jobs need to be done in cube farm. How many of them could be done from home? We think many of them can be done from home. And our mission is to give more people the opportunity and the freedom to work from home which allows them not only to choose the location but also gives people more control over their lives because they can decide whether they want to be the morning versus early afternoon. It means you’re not tied to a single boss or employer. It means that you can work on one skill on Monday and a different skill on go on Tuesday and go surfing or hiking on Wednesday, if you feel it. So that’s how we think about our business is really centered on giving people this freedom that comes when they can be their boss and work from home. And as a segue to some of your next questions that you and I discussed the past, as we got deeper and deeper into creating jobs for transcriptionists, we have invested in technology to make their jobs easier, to make them more productive. And that has led us to develop some competency and familiarity with what you’re calling here AI transcription which means using a computer to transcribe audio so that what I call a relatively new area for us, an important area, especially in light of people being familiar with Amazon Alexa and Apple’s Siri. So that’s a new small business, but the core is giving people work they can do with a computer. Most of that work today listening and typing. Henrik de Gyor: Jason, what are the biggest challenges and successes you’ve seen with AI Transcription? Jason Chicola: It’s really early to judge that. I can give you a specific example in a moment. But it’s a little bit like asking someone today what are the biggest challenges and successes of self-driving cars. The answer is I think business cases that they have been small but possible successes in the future could be massive. I really believe that we’re truly… you’re not even in the first inning. Maybe we’re the warm-up for the first inning of this game and I think is going to be a pretty exciting decade ahead of us as computers have gotten better, as more audio is captured in digital formats and companies like Rev are innovating in a bunch of areas. Our success today in this area has been… we had success, but it’s been at the fringes of our business so I’ll give you a specific example: when the Rev transcriptionist type out an audio file like somebody might hear about this phone call, some customers request time stamps and the humans part of their job is to go into note for example at the end of every minute, this event occurred in three minutes, this event occurred four minutes or so forth. That was an additional task they performed manually while they did their job. We automated that using what you could call AI transcription. So now not only time stamps are inserted automatically but every single word is marked by the AI as when it was sent. So literally for every single word, we know this word occurred at 4:38 and get that word occurred at 5:02. So that’s something that we’ve done that automated something previously done manually and it actually made it a much better experience for the customer because the timestamps are more accurate. That something we already have today. The challenge… the challenge list is longer. The biggest challenge to be aware of when it comes to automated transcription is that it’s garbage in, garbage out. Other people say you can’t make chicken salad out of chicken [****] that if you go to Starbucks and you sit outside by a noisy street and you record an interview with someone who you’re talking to for a book and you submit it to some automated engine you’re not going to get back anything that is very good. And that’s I mean it’s obvious why that is, but the quality of speech recognition depends I would say on three or four key factors other than the quality of the [speech recognition] engine itself. One is background noise. The less the better. Another is accents. The less the better. Another is how clearly the person is speaking. Are they annunciating? Are they slurring their words together? Are they speaking really quickly? Those tend to be the major factors. There is probably another one related to background noise which comes down to the quality of your microphone. How far you are from the microphone. You are a podcaster, so you probably know far more about how record clear audio than most people do. Most people throw an iPhone onto a table next to somebody else’s eating a bag of Doritos. [laugh] So you have great audio of someone eating a bag of Doritos which causes problems downstream and some of those people because they don’t think about it will say “Hey, I really annoyed. You didn’t get this word right.” And that’s because somebody was eating a bag of Doritos during the time that word was said. So part of our job… as we try to get better at helping people transcribing quickly and cheaply part of our job is to help customers understand that you need to record good audio if you want to get to get a good outcome. Henrik de Gyor: Jason, of January 2018, how much of the transcription work is completed by people versus machines? Jason Chicola: Are you talking to the work that Rev does? Henrik de Gyor: Sure. Jason Chicola: Depends on how you slice it, but I’ll say 99% percent people, 1% machine. Henrik de Gyor: Fair. Jason Chicola: We actually have…I’ll be a little more clear on that, we recently released a new service called Temi. Temi.com. That is an automated transcription service where people are not doing the work. Machines are and then are core service rev.com is done basically entirely by people. We believe that that’s required to deliver the right level of accuracy. This is I don’t answer your next questions but we clearly see these two blending and merging a little more over time, but today if you want to get good accuracy you need people to do it. If I give you kind of the external contacts in an earnings call used to be transcribed for Wall Street analysts and machine does it and they make a mistake on, you know, a key number or you know, the CFO said that something happened or something didn’t happen, that’s a big problem. Or if a mov...

Tagging.tech presents some of the backstory behind the new book titled Keywording Now by Henrik de Gyor. If your business is thinking about keywording services and/or image recognition, this Amazon Kindle book is the first and only book of its kind on this specific topic. It is the only book on Amazon about keywording. https://tagging.tech/wp-content/uploads/2017/08/keywording_now.mp3 Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com

Tagging.tech presents an audio interview with Emily Kolvitz on image recognition https://tagging.tech/wp-content/uploads/2017/07/emily-kolvitz.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Emily Klovitz. Emily, how are you? Emily Klovitz: I’m doing great. How are you, Henrik? Henrik: Good. Emily, who are you and what do you do? Emily: I’m a DAM consultant, marketer, and digital asset manager for Bynder. We’re an award-winning digital asset management software that allows brands to create, find, and use content such as documents, graphics, and videos. Before joining Bynder, I worked as a digital asset manager for JC Penny. I have MLIS, my masters in library information studies from the University of Oklahoma. I’ve worked with hundreds of different clients on their DAM implementations, providing best practice and consultation. Because I work with clients, I’m often able to see the very real world implications of what AI tagging can actually be like with live collections of content. The successes and challenges are very real, very tangible, and that’s not always something that you see when you’re watching a webinar or a product demo. Henrik: Emily, what are the biggest challenges and successes you’ve seen with image recognition? Emily: For challenges, of course, there are some challenges and opportunities for improvement when it comes to AI tagging. I think many of them have to do with the application and configuration of the AI, not necessarily the technology itself. Today, once specific limitation currently in our own implementation of AI, we only have US American English tags at this time, so we wanted to make a claim on the AI space very quickly, so English to start with was part of our MVP for AI features. Obviously, there’s more to come in the future. I think some other limitations include things like only certain file types are scanned, such as JPEG and IMG, so there’s an opportunity to extend this out to things like video, documents, etc. Many other companies are already doing this, companies like Ancestry.com for example or even DocumentCloud, which scans your documents through Thomas Reuters Open Calais to extract entities, topic codes, events, relations, social tags. In addition, there’s a full list of AWS limitations on the recognition site as well, which is what we use. But in terms of what more general things I think need to be considered challenges are things like mistakenly tagging something in a way that’s hurtful or harming in some manner. Those are things that don’t usually become apparent until after the fact. I think that AI tagging is very much in its infancy in terms of its application and that we’ll see it greatly grow and mature in the coming years where we may start to see challenges like information and privacy concerns pertaining to facial recognition. Being able to opt out of these things will basically be a big need for clients. As far as successes go, AI tagging detects objects, scenes, and can identify thousands of objects such as vehicles, pets, furniture, and it provides the confidence for, which simply tells you how confident the AI is that that tag is relevant and accurate. It’ll detect scenes within an image, so things like a sunset or a beach. This has really big implications for search filtering and curating very large image libraries. From my perspective alone, the time-saving factor for DAM managers, digital asset librarians, content managers, and admins of the system is probably one of the biggest successes for AI tagging. They spend an enormous amount of time and resources on metadata application alone. It’s tedious thankless work, but absolutely necessary so that people can find the assets they need. In terms of other things, I think it’s also helping to put a minimum viable metadata on a very large digital asset collection that may otherwise remain untagged. For DAM, it means that uploaded images get auto-tagged, helping with categorization, identification, and searchability of assets that could possibly otherwise be buried in the depths of your collection without metadata. Henrik: Emily, as of July 2017, how do you see image recognition changing? Emily: Becoming a defacto feature of digital asset management systems and less of a fun/nice to have feature, like more of a novelty feature, it’s becoming something you have to have. Henrik: What advice would you like to share with people looking into image recognition? Emily: This is a good one. If you can, provide a sample of your assets to different vendors and ask for results. It’s very easy to see a webinar or a product video showing 100% accuracy and it’s really neat, but it’s also really important to try out a wide variety of image assets to see where the real limitations are for each image type and the associated algorithms. Henrik: Where can we find more information? Emily: There’s lots of places on the internet you can find more information about AI tagging. You can find information from us specifically on our blog, blog.bynder.com. Amazon’s recognition website has a great FAQ that you can check out. We also did a presentation at the photo metadata conference in Germany, the IPTC Metadata Conference on image recognition and AI. There’s a PDF and a video available of this presentation on IPTC.org. Henrik: Great. Well, thanks Emily. Emily: Thank you, Henrik. Henrik: For more on this, visit Tagging.tech. Thanks again. For a book about this, visit keywordingnow.com

Tagging.tech presents an audio interview with Martin Wilson about image recognition. https://tagging.tech/wp-content/uploads/2017/01/martin_wilson.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Martin Wilson. Martin, how are you? Martin Wilson: I’m very well, thank you. How are you? Henrik: Good. Martin, who are you and what do you do? Martin: I am a director at Asset Bank. Being a director, I’ve done an awful lot of different things over the years. I have done some development on our product, Asset Bank. I’ve done sales and I’ve done consultancy while rolling out the product. Just to explain a little bit about what Asset Bank is as a product, it is a digital asset management solution. Digital asset management is often shortened to DAM. A DAM solution helps clients and the users to organize the digital assets that almost every organization owns and makes use of nowadays. By digital asset, we mean primary files. Things like images, videos, documents and all of those. A digital asset has an awful lot of value to an organization and it’s very important that they can find them easily, that they don’t waste money recreating digital assets that they already have, and that the assets themselves are used properly in a way that’s consistent with the brand of the organization. Henrik: Martin, what are the biggest challenges and successes you’ve seen with image and video recognition? Martin: Let me first start by saying how I think that image recognition has a potential to have a really big impact on my industry, digital asset management. Digital asset management is all about being able to find images and then use them properly. That’s the purpose of the DAM system. There’s an old adage which people use and it says that a DAM system is only as good as the metadata that is associated with the assets. The reason for that is, a million images, if you have a million images in any system it’s almost impossible to find the image you want without some sort of a search and or a browse function. Those searches and browse functions at the moment rely on what we call metadata that it is associated with the assets. That metadata is things like title or caption of an image, description, perhaps some keywords that been put in, maybe some information about how that can be used, the image can be used. The result of this is that people, humans, spend an awful lot of time entering the metadata that is associated with digital assets. Usually, within an organization, the processes, the workflows that are associated with using a DAM application involve uploading one or more or many digital assets, typically images or videos, and then manually entering the data by, for example, looking at the image, seeing what it’s about, what the subject is, who’s in it maybe if it’s of people and then just actually typing in that data. As you can imagine, that takes a lot of time. It’s also considered quite boring by most people. For that reason, it’s often skipped or not done really well. If it’s not done really well, the data associated with the assets is incomplete and therefore it’s very hard for it to turn up in the right searches. The idea that it could be automated, this process, and have a computer work out what’s in the image and tag the digital assets appropriately is enormous. It’s almost like the Holy Grail of the upload process for DAM systems. There was an awful lot of excitement when, for example Google Cloud Vision came out with their service. It’s what called an API which enables other applications to make use of the image recognition functionality. There’s a lot of other services as well that have come out in the last couple of years like Clarify, is another one. When they came out, lots of DAM vendors got very excited and rushed to add the functionality into their own applications. We did the same. About a year ago we started a project with the objective of developing a component that could be used with Asset Bank in order to add auto-tagging capabilities to asset bank. Let me just describe some of the challenges then that we found in doing that and when we rolled out some of our clients, the challenges they found. One of the challenges, I suppose which is always like a umbrella challenge over all of it, is people’s expectations. Humans are very good at looking at images and working out what’s in it. They’ve also got a lot of domain knowledge. Usually, they understand, for example, their products. They can look at a product shot and say, “Yeah, that’s product F-567”, or whatever the code is. It’s actually very hard for computers to do that well. That problem hasn’t been solved that well yet. What we found is, when compared with how humans tag images, the results coming from the auto-tagging software or APIs was not, to be frank, not of good enough quality for most cases. That’s the second specific challenge then, really. The quality of the raw results coming back from the software. The image, the visual recognition software was not quite good enough for use in most organizations, especially in a commercial sense. That’s not say that it’s not useful. I’ll come on to that in a bit. What we found, on to the successes, what we found was that certain clients who had more generic or general images, the results were much better. We’ve got some clients who are tourists boards. They’ve got images of landscapes and scenery. Most of the image recognition software is quite good at finding the subjects and suggesting keywords for those types of images. One of the reasons for that is that most of them have been trained on image data sets, that are images that are found on the internet for example. Of course they’re going to be generic. The other end of the spectrum, where we found it didn’t work that well was for clients that have got quite bespoke business domains or subject domains, images of their own product range. Very hard for these fairly generic image recognition software APIs to be able to come up with the right keywords for those sorts of image. That’s possibly where there are still gaps. That might be something we’ll talk about in a minute about the future, which is the inability for a lot of this tagging software to learn from bespoke data sets. Henrik: Martin, as of December 2016, how do you see image and video recognition changing? Martin: I think it’s fair to say that it’s in it’s infancy at the moment. It’s only since it’s become available through the online cloud services or web services that people have found it very easy to start using this technology in their own applications. It’s only been the last couple of years, that really has kind of taken off as something that can be openly or easily used. Now I think the vendors of this sort of software are learning very quickly from real use cases. I think it’s quite an exciting for where the commercial or non-commercial application of this software can go. I think if we first focus a little bit more on the current problems, that gives some insight into where the software might go, what direction it might go in. I was just...

Tagging.tech presents an audio interview with Jonas Dahl about image recognition https://tagging.tech/wp-content/uploads/2016/12/jonas_dahl.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Jonas Dahl. Jonas, how are you? Jonas Dahl: Good. How are you?Henrik: Good. Jonas, who are you and what do you do?Jonas: Yeah, so I’m a product manager with Adobe Experience Manager. And I primarily look after our machine learning and big data features across all AEM products, so basically working with deep learning, graph-based methods, NLP, etc. Henrik: Jonas, what are the biggest challenges and successes you’ve seen with image recognition? Jonas: Yes. Well, deep learning is basically what happened, what defines before and after. So, basically in 2012, there’s a confluence of the data piece that is primarily enabled by the Internet, large amounts of well-labeled images that could drive these huge deep learning networks. There’s the deep learning technology and, obviously, the availability of raw computing power. So, that’s basically what happened. And with that we saw accuracy increase tremendously, and now it’s basically rivaling human performance, right? So we see both accuracy and also kind of the breadth of labeling you can do in classification you can do has just increased and improved tremendously in the last few years. In terms of challenges, what I see is, I really see this as a path you’re going in or the first step is kind generic tagging of images, right? So what’s in an image? Are their people in it? What are the emotions? Stuff like that that’s pretty generic. And that’s kind of the era we’re in right now where we see a lot of success and where we can really automate these tedious tagging tasks at scale pretty convincingly. I think the challenge right now is to move to kind of the next step, which is to personalize these tags. So, basically provide tags that are relevant not just to anyone but to your particular company. So, if you’re a car manufacturer and you want to be able to classify different car models. If you’re a retailer, you may want to be able to do fine grain classification of different products. So that’s the big challenge I see now and that’s definitely where we are headed and where we’re focusing on in all apps. Henrik: And, as of November 2016, how do you see image recognition changing? Jonas: Well, really where I see it changing is, as I said, it’s going to be more specific to the individual customer’s assets. It’s going to be able to learn from your guidance. So, basically, how it works now is that you have a large repository of already-tagged images, then you train networks to do classification. What’s going to happen is that we’re going to add a piece that makes this much more personalized, much more relevant to you, and where the system learns from your existing metadata and your guidance, basically, as you curate the proposed tags. Another thing I see is video, it’s going to be more important. And video has that temporal component, which makes segmentation important, and that’s how that differs from images. So there’s that, and also the much larger scale that we’re looking at in terms of processing and storage when we’re talking about video. Basically, video is just a series of images, so when we develop technologies to handle images, those can be transferred to the video pieces, as well. Henrik: Jonas, what advice would you like to share with people looking at image recognition? Jonas: Well, I would say start using it. start doing small POCs [proof of concepts] to get a sense of how well it works for your use case and kind of define small challenges that, small successes you want to achieve and just get into it. This is something that is evolving really fast these days, so getting in and seeing how it performs now, then you’ll be able to provide valuable feedback to companies like Adobe. So you can basically impact the direction that this is going in. It’s something we value a lot. It’s really valuable to us that when we run beta programs, for instance, that people come to us and say, “You know, this is where this worked really well. These are the concrete examples where it didn’t work that well,” or, “These are specific use cases that we really wish that this technology could solve for us.” So now is a really good time to get in there and see how well it works. And also, I’d say, just stay on top of it. Stay in touch because, as I said, this evolves so fast that you may try it today and then a year from now things can look completely different, and things can have improved tremendously. So that’s my advice. Now is a good time. I think the technologies have matured enough that you can get real solid value out of them. So this is a good time to see what can these technologies do for you. Henrik: Jonas, where can we find more information? Jonas: Yeah, so we just at Adobe launched what we call Adobe Sensei, which is the collection of all the AI and machine learning efforts we have at Adobe. And going, just Googling that, and going to that website, that will be updated with all the exciting things that we are doing in that space. And I would recommend that you keep an eye on that because that’s something that’s going to really evolve the next few years. Henrik: Great. Well, thanks, Jonas. Jonas: Yeah, you’re welcome. Henrik: For more on this, visit Tagging.tech. Thanks again. For a book about this, visit keywordingnow.com

Tagging.tech presents an audio interview with Ramzi Rizk https://tagging.tech/wp-content/uploads/2016/10/ramzi-rizk.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Ramzi Rizk. Ramzi, how are you? Ramzi: Hey Henrik, how are you? I’m good thanks. Henrik: Great. Ramzi, who are you and what do you do? Ramzi: I’m one of the founders and I’m the CTO at a company called, EyeEm.com. Based out of Berlin, we’re a photography company, been around for 5 and a half years now, where we’re a community and market-based for authentic imagery. Basically, photos taken by average people who have a passion for photography, but aren’t necessarily professionals. Over the past few years, we’ve invested a lot and built quite a few technologies around understanding the content context and aesthetic qualities of images. Henrik: Great. What are the biggest challenges and successes with image recognition? Ramzi: I think over the past few years there’s been an amazing explosion in the number of tools that are available, particularly out of deep learning that are available to actually automate a big part of the photographers’ workflow, if you want. That includes, of course, recognizing what is in a photo, as well as, was the quality of the photo are and making photos just that much easier to find, to search and to share. I think the greatest successes have been naturally the fact that we’re at a point now where we can, better than human accuracy, I would say, describe the content of a photo. A lot of the challenges would have to be around data. Deep learning is a very data-heavy field and that you need a lot of content that is properly labeled, properly tagged, in order to train these machines to recognize what’s in the images. Over the past few years it’s gotten, things have gotten more and more accurate to the point where, in a lot of cases, machines are actually more accurate than humans at recognizing the various details in a photo. That being said, we as humans do have this innate ability to understand context and to draw the more subtle abstract notions of what an image is trying to compare and that is definitely significantly more challenging to model in a machine. Henrik: As of October 2016, how do you see image recognition changing? Ramzi: I think we’re getting to a point where the pure art of recognizing what is in a photo has become a commodity, I would say. In the next 6 months to a year, you should be able to just license a variety of APIs and Google has an API out, so do we, so does a few other companies that are specialized at understanding the content of a photo. I think image recognition in a classical sense, how we understand it. When you think 10 years ago we were talking about how amazing it is that we can now recognize cats in videos. I think that challenge is one that is solved and since it’s now a solved problem, we will be seeing, and we are seeing a lot of applications built on top of this, doing this that were previously not that possible. That includes also having the ability to run these so-called models, these algorithms on your device, on your phone, and not having to upload content to the cloud, even in real time. Which means we’re at a point now where while you’re taking a photo, you can actually be getting real-time feedback on the quality of the image, on whether the photo that you’re taking is actually aesthetic appealing and the minute you shoot it, your phone has already stored all of the content of that photo, making it searchable right away. Henrik: Ramzi, what advice would you like to share with people, looking into image recognition? Ramzi: People looking into building image recognition solutions, I would recommend not to anymore, because as I said, the problem is solved. You don’t reinvent email, you build services on top of it, and I think today you’re at a point where you can build a lot of really exciting, interesting services on top of existing image recognition frameworks and existing APIs that offer this out of the box. For people looking at using it, I think this is the perfect time to actually start building these applications because technology is mature enough, it’s more than affordable, and it’s at a point where anyone can really build software, with the assumption that they understand what is in the photo. Henrik: Where can we find out more information? Ramzi: I would definitely have to pitch, eyeem.com/tech. If you’re interested in looking at applied image recognition. We offer an API where you can actually keyword your entire content, your entire image library for photography professionals or for amateurs. You can also have it caption or have images described in a full sentence, even more interesting is machines that have learned to now understand your personal taste. They can actually surface content that you know you will like, or surface content that you know your customers will like or that your significant other would like and then just simplify that entire process of really taking out the monotonous, boring work out of photography, out of photographers workflow. As a photographer, you can just focus on the art of creation and on capturing that perfect moment. I think there’s a bunch of other services like Google Cloud Vision and so on, that you can also look at and learn more about what you can do with imagery today. Henrik: Thanks Ramzi. Ramzi: Thank you, Henrik. Pleasure speaking to you. Henrik: For more of this, visit <a class="markup--anchor markup--p-anchor" href="https://tagging.tech/" target="_blank" rel="nofoll...

Tagging.tech presents an audio interview with Mark Sears on crowdsourcing https://tagging.tech/wp-content/uploads/2016/06/mark_sears.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Mark Sears. Mark, how are you? Mark Sears: I’m doing great, Henrik, Thank you. Henrik: Mark, who are you and what do you do? Mark: My name is Mark Sears. I’m Founder and CEO of Cloud Factory. We spend a lot of time leveraging an on-demand workforce to structure data. We take a lot of unstructured data per clients and we process that in the cloud using a combination of human and machine intelligence. We do that for a lot of, mostly tech companies. We work a lot with technology companies that are looking for an API driven workforce to do tons of different use cases very relevant often to tagging tech would be things like tagging images for the purpose of machine learning. Or tagging images in terms of core business processes for things like intelligence. We do transcription and translation. We do a lot of document processing, again, trends like processing receipts and invoices. We do web research going out to do human powered screen scraping for lead generation, serum enrichment. A lot of different, very tedious, routine, repetitive work. We do it in a bit of a different model. Again, what we refer to as cloud labor. The ability for organizations to send their work to the clouds and have it come back done accurately, quickly, cost-effectively in hours if not minutes. So that’s kind of the world that we claim. Henrik: Mark, what are the biggest challenges and successes you’ve seen with crowdsourcing? Mark: When we think of crowdsourcing, we often like to look at it compared to maybe more traditional outsourcing model. We actually consider ourselves to be somewhere in between. So, my view of the world is that traditionally having a large number of people working in a delivery center … Offshoring, outsourcing. You need to get work done. This is one option that obviously a lot of companies have used in the last 20 years. Is to send that work to a team, maybe thousands of people that are sitting in urban India, Philippines or China. That’s one way to get a lot of this type of paperwork done. Another way, that’s more popular, recently, is to send it to a crowd and to do crowdsourcing. Our kind of view of the world is that crowdsourcing and sending out work to anonymous crowds, someone who maybe just signs up online and there’s not a really high level of engagement, accountability or ability to get quality from out of an anonymous, faceless crowd. We see that on one side of the spectrum. We see the other side of the spectrum being a traditional outsourcing. The view of the world that we have is right in between. It’s the idea of having an on-demand workforce that is leveraging automation and is highly efficient because of technology. But, at the same time, is not an anonymous crowd. We actually know and train, professionally managed and curated crowd. I think that’s a roundabout way of talking about how we view the world that I’ve seen and learned through a lot of different projects … The biggest challenge is often quality. It’s really harnessing the tower of an anonymous crowd is something that’s quite hard to do. So we love kind of playing in the hybrid and finding that radical middle where you get the best of all worlds in terms of quality, scalability, elastically, cost-effectiveness, speed of turn around, etc. to accomplish your large data work projects. Henrik: Mark, as of April, 2016, how do you see crown sourcing changing? Mark: Moving forward, there’s no question that the rise of robots and the flattening of the world are two major trends that are affecting, not just crowdsourcing, but really the future of work and really how enterprises get their work done. As we think of both of those trends, the world becoming more and more flat because of mostly the internet as well as just the cost of devices to access the internet. We’ve had 1.1 billion people have come online in the last five years’ And there’s another billion expected in the next five years. So you have this massive, global workforce that are now able to contribute to the tagging, and again, the routine repetitive work that every organization has deep inside that needs to get done. This new, untapped potential in being able to do online work and to leverage the talent that is equally distributed around the world. Again, acknowledging that opportunity is not. And so, we can really flatten the world with the internet with crowdsourcing and other online work approaches. The other side of it again, is automation and the rise of robots. Any project or solution that is not thinking first how do we automate this … Is going to be left behind. We absolutely have to leverage technology. Automation takes on a different forms. Actually, automating the work itself, using AI, ML, etc, to automate pieces of our tagging, labeling, video, audio, transcribing processing type of workloads is definitely essential to do that. But a lot of the technology just is not there. Looking first to see what pieces can you actually automate. And then also, of course, there’s the delivery and the receipt of the work. Being able to have the API to be able to send the work in and have it sent back once the work is completed, that automation. Having the automation of the workflow is well to streamline and speed things up and make things more cost-effective. There’s automating the actual work and there’s the automating of processes of getting the work done and delivering and receiving that one. Really, I see that’s a huge trend that everyone is how do we make this more streamlined, more efficient, faster, more cost effective, less manual touches in these projects to really, really make things more effective. That does include, as well, trying to automate as much of the work that we can do -That’s one thing that we have really seen just the desire and requirement to find the right mix of human and machine intelligence for every project. For every solution. It really is different for every solution. Trying to automate as much as we can with the approach, but obviously, there’s a lot of nuances in doing, kind of split, AB testing to kind of understand really what is the best, total cost of ownership of the solution depending on how much automation you include. Those are two trends definitely play into the future of getting this type of work done. Henrik: Mark, what else would you like to share with people looking into crowdsourcing? Mark: I think the key thing is understanding self serve versus full serve. There’s no question there’s power in leveraging a global workforce and accessing online and being able to send your repetitive data projects to a crowd. The question is that there is experience in doing that. A lot of people do like to have a self serve approach and accessing it themselves. Other people prefer to have experts that are there to help along the way in terms of making sure that you’re getting the quality out of the crowd that you’re expecting. I think that as we look at the landscape, one way, I think somebody should be thinking about their project ...

Tagging.tech presents an audio interview with Nikolai Buwalda about image recognition https://tagging.tech/wp-content/uploads/2016/03/nikolai_buwalda.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Nikolai Buwalda. Nikolai, who are you, and what do you do? Nikolai Buwalda: I support organizations with product strategy, and I’ve being doing that for the last 15 years. My primary focus is products that have social networking components, and whenever you have social networking and user‑generated content, there is a lot of content moderation that’s a part of that workflow. Recently, I’ve been working with a French company, who’s launched their large social network in Europe, and as a part of that, we’ve spun up a startup that I’m the Founder of called moderatecontent.com, uses artificial intelligence to handle some of the edge cases when moderating content. Henrik: Nikolai, what are the biggest challenges and successes you’ve seen with image recognition? Nikolai: 2015 was really an amazing year with image recognition. A lot of forces really came to maturity and so you’ve seen a lot of organizations deploy products and feature sets in the cloud that used or depend heavily on image recognition. It probably started about 20 years ago with experiments using neural networks. In 2012, a team from the University of Toronto came forward with a real radical development in how neural networks are used for image recognition. Based on that, there was quite a few open source projects, a lot of video card makers also developed hardware that supported it, and in 2014 you saw another big leap by Google in image recognition. Those products really matured in 2015, and that’s really allowed for a lot of enterprises to have a very cost effective ability now to integrate image recognition into the work that they do. So 2015 really has seen, in the $1000 range, the ability to buy a video card, use an open source platform, and very quickly have image recognition technology available to your workflow. In terms of challenges, I continue to see two of the very same challenges existing in the industry. One is the risk to a company’s brand, and that still continues. Even though image recognition is widely accepted as a technology that can surpass humans in a lot of cases for detecting patterns and understanding content, when you go back to your legal and to your privacy departments, they still want to have an element of humans reviewing content in the process. It really helps them with their audit, and their ability to represent the organization when an incident does occur. Despite companies like Google going with an image recognition first passing the Turing test, you still end up with these parts of the organization who want human review. I think it’s still another five years before these groups are going to be swayed to have an artificial intelligence machine‑learning first approach. The second major issue is context. Machine learning or image recognition is really great at matching patterns in content and understanding these are all the different elements that make up some content, but they are not great at understanding the context ‑‑ the metadata that goes along with a piece of content ‑‑ and making assumptions about how all the elements work together. To illustrate this, it’s probably a very good use case that’s commonly talked about, which is having a person pouring a glass of wine. Now, in all kinds of different contexts, this content could be recognized as something that you don’t want associated with your brand versus not being an issue at all. If you think about somebody pouring a glass of wine, say at a cafe in France versus somebody pouring a glass of wine in Saudi Arabia. Between the two, there’s very different context there, but very difficult for machine to draw conclusion about the appropriateness of that. Another very common edge case that people like to use as example is the bicycle example where machines are great at detecting bicycles. They can do amazing things, far surpass the ability of people to detect this type of object, but if that bicycle was a few seconds away from being into some sort of accident, machines are very difficult at detecting this. That’s where human review ‑‑ human escalations comes into play for these types of issues and still represent a large portion of the workflow and the cost in moderating content. So, mitigating risk within your organization to have some sort of person review of content. Then to also really understand the context are two things that I think, in the next five years, will be solved by artificial intelligence and will really put these challenges for image recognition behind them. Henrik: As of March 2016, how much of image recognition is completed by people versus machines? Nikolai: This is a natural stat to ask about, but I think, with all the advancements in 2015, I really like to talk about a different stat. Right now, anybody developing a platform that has user‑generated content has gone with Computer Vision Machine learning approach first. They’ll have a 100 percent of their content initially reviewed with this technology and then, depending on the use case and the risk profile, a certain percentage gets flagged and moved on to a human workflow. I really like to think about it in terms of, “What is the number of people globally working in the industry?” We know today that about 100,000 to 200,000 people worldwide are working at terminals moderating content. That’s a pretty large cost and a pretty staggering human cost. We know these jobs are quite stressful. We know they have high turnover and have long‑term effects on the people doing these jobs. The stat I like to think about is, “How do we reduce the number of people who have to do this and move that task over to computers?” We also know that it’s ab...

Tagging.tech presents an audio interview with Clemency Wright about keywording services https://tagging.tech/wp-content/uploads/2016/03/clemency_wright.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Clemency Wright. Clemency, how are you? Clemency Wright: Hi. I’m good, thanks Henrik. How are you? Henrik: Good. Clemency, who are you and what do you do? Clemency: I’m Clemency Wright. I’m the Owner and Director of Clemency Wright Consulting, which is a UK‑based business and we specialize in providing bespoke keywording services and metadata consultancy, primarily for the creative media industries. We work with stock photo libraries. We also work with specialist image collections. We work with book publishers and a small number of online retailers. We do some collaborative work with software developers and technical consultants on various projects. The purpose of our work, mainly, is to help our clients organize their digital assets. These could be visual or text‑based. The idea here is to make the assets found more quickly and more easily by their end users. Initially, my role in this field was working within the stock photo library, in search data and search vocabulary for a major global stock photo library based in London. From here, I’ve worked with specialist collections, where the nature of keywording is very different, and also in the museum and heritage sector; again, working with data in a very different format on a digitization process. The experience across those different fields is quite different when you look at it from a keywording perspective. Just to clarify now, I’m a consultant for various businesses. This is really key, as the proliferation of visual media continues to grow. We’re very closely looking at the way we handle digital content, how we make sense of that digital content, how we make the information relevant, and more available to more people. It has huge potential for our customers and for their end users, in terms of improving the search experience and the access to these assets. I think that pretty much summarizes where we are at the minute, in terms of who we work with, and what we provide for those people. Henrik: What are the biggest challenges and successes you’ve seen with keywording services? Clemency: One of the biggest challenges really is the perception that keywording is pretty much the same as tagging. Obviously with the rise of SEO, we’ve got some confusion here about what keywording is. We started keywording many years ago. Obviously within librarianship and archival work, people were keywording as a way to retrieve information, which is still what we do, but I think the challenge here is breaking down these perceptions that it’s always a very basic way of tagging content. We’re trying to differentiate between keywording which is, on its basic level, adding words that define an image or the content of an image, and high performance keywording which is very much a user‑focused exercise. It’s a very 360‑degree look at the life cycle of the image and how that image will be ultimately consumed and licensed for use in the broader digital environment. One of the challenges is highlighting the value of a high quality, high performance keywording project to the customers, and also their end users and the various stakeholders therein. I think working with specialist collections can be quite challenging. We have to create bespoke keywording hierarchies and controlled vocabularies for these clients, which obviously makes the access to the content much more. The performance of that is much greater, but it can be challenging. It can be quite time‑consuming. There’s a level of education that we need to have with our clients, to illustrate to them and demonstrate to them the return on investment that can be had from a good keywording methodology. By the methodology, I just wanted to define that, which links to the challenges that we have to do with technology and the extent to which we use controlled vocabulary systems and software, and the hierarchies that we build for our clients. They help to define the depth to which we can classify content, and also, the breadth of that content. The content may be video footage, or it may be photography. It may be illustration. Obviously, a challenge there is creating a vocabulary or a taxonomy that will cater for an ever‑increasing collection, one that is growing and evolving as businesses themselves incorporate new content into their collections. Technology is a challenge, but it’s also a great facilitator in the work that we do. It allows us to embed a level of accuracy and consistency to the work that we do for our clients. When you’ve got measures in place, and you’re creating controlled vocabularies and hierarchies, you’ve got systems there that make sure the right vocabulary is being applied, and it’s being applied consistently and accurately. There’s a level of support that the technology can offer, as well as it having its own challenges. Perhaps on a more general level, keywording has been tarnished somewhat by some multi‑service agencies which are offering keywording as a bit of a sideline. Perhaps their core business may be software or systems development or post‑production, but then, by offering keywording as an offshoot, some clients are going down that road and then discovering later on that actually, the keywording side of that was a bit of an afterthought. I think the methodologies and strategies in place have failed some of the clients that we work with, at any rate. There’s a challenge there for us to make sure that we can differentiate between specialist keywording provider and an agency that offers keywording as ...

Tagging.tech presents an audio interview with Joe Dew about image recognition https://tagging.tech/wp-content/uploads/2016/03/joe_dew.mp3 Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn. Keywording Now: Practical Advice on using Image Recognition and Keywording Services Now available keywordingnow.com Transcript: Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Joe Dew. Joe, how are you? Joe Dew: I’m well. How are you? Henrik: Good. Joe, who are you and what do you do? Joe: I am the Head of Product for a company called JustVisual. JustVisual is a deep learning company focused on computer vision and image recognition. We’ve been doing this for almost eight years. What my role is in the company is…think of me as the interface between engineering and computer vision scientist and end customers. We have a very deep technology bench and technology stack that does very sophisticated things, but translating a lot of that technology and capabilities to end‑consumers can be a challenge. Likewise, we have customers who are interested in the space, but aren’t really clear how to use it. My role is to translate their needs into requirements for engineering. Henrik: Joe, what are the biggest challenges and successes you’ve seen with image and video recognition? Joe: I think the biggest challenge is, for a little perspective, is that the human brain has evolved for millions of millions of years to be able to handle and process visual information very easily. A lot of the things that we as humans can recognize and do ‑‑ even a two‑ or three‑year‑old child can do ‑‑ is actually quite difficult to do for computers and takes a lot of work. The implication of this is that the expectations from users on precision and accuracy when it comes to visual recognition is very, very high. I like to say there’s no such thing as a visual homonym. Meaning that, if you did a text search, for example, and you typed in the word jaguar and it comes back with a car, and it comes back with a cat, you can understand why the search result came back that way. If I had asked the question with a visual ‑‑ if I queried a search engine with an image ‑‑ and it came back with a car when I meant for a cat it would be a complete fail. When we’ve done testing with users, on visual similarity for example, the expectations of the similarity is very, very high. They expect something like almost an exact match when they’re asking. It’s largely because we, as humans, expect that. Again, if you think about how we interact with the world digitally, it’s actually a very unnatural thing. When you search for things, you have to translate that, oftentimes, into a word or a phrase. You type it into a box and it returns words and phrases at which point you then need to translate again into the real world. In the real world, you just look at something, you say, “Hey, I want something like that.” It is a picture in your mind, and you expect to receive something like that. What we’re trying to do is solve that problem, which is very tricky thing for computers to do at this point. But, having said that, in the field there’s been tremendous improvements in this capability. Companies from Google to Facebook to Microsoft, for example, are doing some very interesting work in that field. Henrik: Joe, as of March 2016, how do you see image in video recognition changing? Joe: I think the three big factors that are impacting this field is increasing rise in processing power of a hardware, just the chip technology, Moore’s law, that type of thing. Secondly is a vast improvement in the sophistication of algorithms or, specifically, deep learning algorithms that are getting smarter and smarter in training. The third is, the increase in data. There is just so much visual data now ‑‑ which has not been true in years past ‑‑ that can be used for training and for increase in precision and recall. Those are the things that are happening on the technology field. The translation of all of these is the accuracy of image recognition and, for that matter, video recognition will see exponential improvements in the next few months even, let alone years. You started to see that already. You start seeing that in the client‑side applications and robotics, websites, and the ability to extract pieces out of an image and see visually similar results. Henrik: Joe, what advice would you like to share with people looking at image and video recognition? Joe: I think the understanding the use case is probably the most important thing to think about. Oftentimes, you hear about the technology and what it can do, but you need to really think thoroughly about what, exactly, do you want the technology to do. As an example, a lot of the existing technology today does what we called image recognition, or the idea of taking an image or a video clip and essentially tagging it with the English language words. Think of it as translating an image into text. That’s very useful for a lot of cases, but oftentimes, from a use case ‑‑ from a user ‑‑ it’s not that useful. If you take a picture of a chair, for example, and it returns back chair, the users says, “I know it’s a chair. Why do I need this technology to tell me it’s a chair?” But, “What I’m really looking for is a chair that looks like this. Where can I find it?” That is a harder question to answer, and that is not an exercise where you’re simply translating it to words. We found that there are companies that use Mechanical Turk techniques, etc. to essentially tag images, but users have not really adopted to that because, again, it’s not that useful. That’s one thing, is think abo...