In this episode of Vitrina Podcast, we welcome Abhirukt Sapru, SVP Commercial at Papercup. Abhirukt drives Go-To-Market strategies and key initiatives. With a rich background spanning Tessian, Citi IBD, and Snowplow, he’s seasoned in navigating the complexities of B2B SaaS and investment banking.
Our main focus has been honing in on speech, especially its expressiveness, because that’s what keeps viewers hooked. When folks ponder AI, they tend to fixate on speed and cost, overlooking other vital aspects. Quality is paramount, and for us, it boils down to watchability. That means ensuring the audience is captivated, sticking around for the duration, and minimizing drop-off rates. Our research revolves around making content as captivating as possible.
Our strategy focuses on excelling in a select few areas within AI. We prioritize languages with strong dubbing preferences and disruptive market potential. Starting with translating English into seven languages, we expand only when confident in quality. Our initial focus is on Latin American, US Hispanic, and European markets, including Arabic as a service.
Podcast Chapters
Time | Topic |
---|---|
00:00 | Introduction and Background |
04:04 | Transformation in the Localization Industry |
09:03 | Demand and Market Focus |
14:52 | Services and Use Cases |
25:18 | Voice Cloning and Deep Fakes |
For those on the buy side tuning in, there are several reasons to engage with us immediately. Firstly, strategy – localization isn’t just an execution problem, but a strategic opportunity, especially for content libraries. Whether you’re starting your localization journey or already have a use case in mind, talk to us. Even if you’re unsure if AI can solve your problem, it’s worth discussing. The technology is often more advanced than you might expect.
About Papercup:
Papercup is a machine learning start-up transforming the media industry. 99.9% of video content is shackled to a single language. Our ambition is to make the world’s video content watchable in any language. We’re translating videos by generating voices that sound like the original speaker, not only capturing the characteristics of your voice but also the way you speak.
Latest from Papercup:




- Specializations: Papercup offers a range of specialized services including localisation, machine translation (AI-enabled), synthetic voice (AI-enabled), dubbing (AI-enabled), subtitling (AI-enabled), and more.
- Quality Assurance Services: With services like audio QC, subtitle QC, and sound mixing, Papercup ensures high-quality output for localized content.
- Creative Localisation: Papercup helps in creatively adapting content for different markets, ensuring cultural nuances are captured effectively.
- Experience with Diverse Content: From documentaries to family dramas and beyond, Papercup has experience localizing various genres of content, making them a versatile choice for buyers.
- AI Integration: By leveraging AI technologies, Papercup streamlines processes like machine translation, synthetic voice generation, and dubbing, enhancing efficiency and accuracy in content localization.
Unlocking Global Audiences with AI Localization: A Conversation with Abhirukt Sapru, SVP at Papercup
Introduction:
This is the written version of the podcast featuring Abhirukt Sapru, SVP at Papercup, hosted by Atul from Vitrina. Summarized for quick reading, this Q&A-style transcript explores how Papercup is transforming the localization industry with AI voice dubbing — focusing on watchability, cost-effective scaling, language strategy, and the shifting expectations of broadcasters, streamers, and content creators globally.
- Vitrina: Abhirukt, could you give us an overview of Papercup’s journey and the problem you set out to solve?
Abhirukt Sapru:
“Papercup was founded in 2017 with a simple mission — a lot of the world’s content is trapped in a single language. It was fundamentally an accessibility problem. We started as a research project and have only been commercializing for a little over a year. In that time, we’ve had success across media, entertainment, sports, publishing, gaming — and the industry is moving fast.”
- Vitrina: What was the core focus of your research — text, speech, or something else?
Abhirukt Sapru:
“Our primary focus has always been speech — more specifically, expressivity. People think AI is all about cost and speed. But to us, quality means watchability: Are people watching? Are drop-offs going down? Are average view durations holding? That’s the bar we set.”
- Vitrina: Can you walk us through how localization has evolved in the past couple of years?
Abhirukt Sapru:
“Traditionally, localization has been very labor- and human-intensive. It’s more than just translation — it’s a market entry problem. You need to match your localization strategy to audience preferences. For instance, Latin America is a dubbing-first region, while other markets are more subtitle-focused. Most content owners either didn’t localize at all or did it wrong due to high costs or lack of insight.”
- Vitrina: Do you primarily localize English content?
Abhirukt Sapru:
“Yes. Our current model is focused on translating English content into seven languages. We pick languages where dubbing is the norm, talent is scarce or expensive, and there’s a strong market for disruption. We want to be best at a few things first before expanding. Quality always comes first.”
- Vitrina: Which languages and regions are your current priorities?
Abhirukt Sapru:
“We serve Latin American Spanish, Brazilian Portuguese, French, Italian, German, Castilian Spanish, and Arabic — targeting Latin America, the US Hispanic market, and Europe. We get a lot of interest in Indian languages like Hindi, but India has over 17 major languages. There’s no one-size-fits-all approach there, so we’ve deprioritized it for now.”
- Vitrina: What kind of content are you seeing the most demand for?
Abhirukt Sapru:
“Genre matters a lot. Right now, we see demand from three main areas:
- Voiceover-style content like documentaries and education — fast and efficient.
- Unscripted content — talent shows, reality series, etc.
- Scripted content — more expressive and emotive, where AI is just starting to gain ground.
We’re not ready for theatrical releases yet, but we’re close for less complex scripted formats.”
- Vitrina: So what should buyers come to you for right now?
Abhirukt Sapru:
“Strategy and execution. A lot of clients don’t know where to start with localizing their archives. We help them create a smart go-to-market strategy. If it feels like AI might solve the problem, it’s worth having a conversation. We’re ideal for use cases like news, sports highlights, AVOD, FAST channels, and even packaging screeners for international buyers.”
- Vitrina: What’s your average turnaround time vs traditional dubbing?
Abhirukt Sapru:
“We’re roughly twice as fast. If traditional dubbing takes 10 weeks, we aim for 5. That speed, combined with high-quality voice output, is what makes us compelling.”
- Vitrina: How does pricing work?
Abhirukt Sapru:
“Simple — it’s a price per minute per language. No setup fees. Add-ons are available for things like 5.1 sound mixes or localized on-screen graphics. Clients love the transparency.”
- Vitrina: Do clients need to manage a dashboard or self-serve?
Abhirukt Sapru:
“No — we’re a service, not a software platform. Clients send us files however they want, and we deliver finished videos back. No hidden costs or interfaces to manage.”
- Vitrina: Which customer types are reaching out most — and is that shifting?
Abhirukt Sapru:
“We’ve traditionally worked with distributors and media companies. But we’re now seeing growing interest from broadcasters and content owners who want to control localization from the start. The cost savings and speed make it attractive. Broadcasters especially are realizing AI dubbing offers better ROI.”
- Vitrina: Any voice cloning or face morphing in your product roadmap?
Abhirukt Sapru:
“We have the tech for voice cloning, but rarely use it. It’s a rights minefield and feels gimmicky — people know Tom Cruise doesn’t speak Bahasa. We also steer clear of face morphing. Content owners care deeply about integrity. Our focus is accessibility, not deepfakes.”
- Vitrina: Can you give examples of real-world use cases you’ve worked on?
Abhirukt Sapru:
“We’ve helped clients localize archives for AVOD, YouTube, and FAST channels. Others come to us with 10–15 hours they want to license or promote — sometimes even for screeners. We work with sports brands on global highlight content, and with news publishers on intraday clips for international markets. It’s all about enhancing engagement.”
- Vitrina: How do you benchmark costs versus traditional dubbing?
Abhirukt Sapru:
“Think of it as an order-of-magnitude difference. It’s not just percentage savings — it’s multiple times cheaper. That’s what enables our clients to go bigger, faster, and broader than before.”