Pivot to Reality: Why Elon Musk Will Learn to Love Advertising

Elon Musk does not have a love-hate attitude toward advertising: he hates it. At least, that’s what he said – on Twitter, of course – back in 2019: “I hate advertising.”

This may be a curious attitude for a new media mogul whose most recent acquisition – that same Twitter – is almost entirely ad-supported.

But it turns out that Musk is not alone. The history of major ad platforms is littered with righteous founders who did not like the business they now own. In fact, it’s difficult to find a founder who expressed any fondness for ads, let alone ad tech.

But when revenues are required, attitudes change; call it a pivot to reality.

Here – in alphabetical order – are the origin attitudes of the great modern ad businesses. What’s the lesson? We change as we grow? Maybe. Or maybe, like Musk, we all need to learn a little respect.

Amazon

“Advertising is the price you pay for having an unremarkable product or service.”

Jeff Bezos, founder (2009)

Apple

“If a business [i.e., advertising] is built on misleading users, on data exploitation, on choices that are no choices at all, then it does not deserve our praise. It deserves reform.”

Tim Cook, CEO (2021)

Facebook (now Meta)

“I say it’s time to start making money from theFacebook but Mark [Zuckerberg] doesn’t want advertising. Who’s right?”

Eduardo Saverin, co-founder (2004), quoted inThe Social Network

Google (now Alphabet)

“… [W]e expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers.”

Sergey Brin and Larry Page, co-founders (1998)

Instagram

“If we were to just build a product for advertisers, we would have no consumers.”

Kevin Systrom, co-founder (2012)

LinkedIn

“Silcon Valley is not particularly good at marketing.”

Reid Hoffman, co-founder (2021)

Netflix

“We want to be the safe respite where you can explore, get stimulated, have fun, enjoy, relax — and have none of the controversy around exploiting users with advertising.”

Reed Hastings, co-founder (2020)

Oculus (now Meta)

“It’s not clear right now that advertising is the right model for virtual reality anyway.”

Palmer Luckey, co-founder (2015)

SnapChat

“I got an ad this morning for something I was thinking about buying yesterday, and it’s really annoying.”

Evan Spiegel, co-founder (2015)

WhatsApp (now Meta)

“Advertising has us chasing cars and clothes, working jobs we hate so we can buy shit we don’t need.”

Jan Koum, co-founder (2014) [channeling Fight Club]

Yahoo

“We are probably the last people in the world that want to do this [i.e., advertising], but it will be tastefully done, and that will keep it free to the users, like TV.”

– Jerry Yang, co-founder (1995), quoted in Inside Yahoo by Karen Angel

Note: a version of this piece originally appeared in The Drum on Nov. 22, 2022

How To Build a Future-Proof Ad Business

Advertisers only wish we knew as much as people seem to think we do. “Surveillance” is a mysterious term and surely overstates the case. But it is undeniable that the rules are changing, perception is a mounting problem, and it’s time to think ahead. How?

One company leads the field with its provident tactics. Already a significant — if not yet dominant — media player, it is assembling the components of a powerful offering and has much to teach us all.

(1) Start with Your Brand

This won’t be easy. The ad business has long had trust issues, which did not start with GDPR and CPRA. Celebrations last year that advertising was now merely the second least-trusted profession (after politics) were just sad.

Publishers know that trust in media is down. On the demand side, faith in companies and institutions is in breathtaking global decline.

But there is an exception: a $400 billion-earning company that has managed to persuade most of us that it’s not part of a menacing “data-industrial complex.” Apple is the world’s most valuable brand, according to Interbrand, up 26% since 2020, well ahead of its competitors.

How? They raised awareness for problems most did not know existed — such as mobile app ID, I.P. address and email “tracking” by marketers — and then provided a solution. In a single release (IoS 14.5 for ATT), Apple went from enabling to solving a problem. Its brand as neighborhood watch for the web extended to TV spots showing creepy characters stalking us on our phones until a superheroic ATT intervened.

Lesson: Counteract trust erosion by investing in a brand of law and order.

(2) Make the Opt-In Positive

Apple has championed the opt-in model of consent, arguably influencing pending national privacy legislation. Opt-in presents the consumer with a mandatory choice to accept or reject (opt-in or -out) whatever is requested.

The challenge here is well-studied, although not by advertisers. It even has a name: “The Paradox of Privacy.” As it turns out, there are a lot of cognitive biases that affect a person’s decision in that magic opt-in moment.

We are lazy (“can’t think about this right now”); overestimate how much data is collected about us (“tracking me around the web”); underestimate the benefits (“what is ‘relevance,’ anyway?”). In short, quite logically, we lack both time and data to make a good decision.

Very few people watch ads by choice, and nobody endorses tracking. The most successful modern platforms — paid search, social in-stream, retail media — are not optional. Opting-in to ads is a default on sign-up.

A meta-study of dozens of academic studies of the privacy paradox concluded: “Privacy attitude was best predicted by internal variables likes trust ….”

And that’s why the language on the opt-in box itself — the experience around the decision window — is critical. Apple knows this.

As we all know by now, the required headline for the required ATT prompt read: “[‘Brand’] would like permission to track you across apps and websites owned by other companies.”

We note that tracking is not a benefit, and no rational actor would agree to it, even with time to think.

When it came to describing the rewards of targeting in its own environment, Apple provided more benefit-centric headline text and copy: “Personalized Ads … help you discover apps, products and services that are relevant to you.”

Lesson: Frame the benefits of behavioral data collection in positive terms.

(3) Focus Down the Funnel

The cookie and mobile IDs may be engaged in one of the longest death scenes in history. But pointed targeting and powerful measurement are more possible than ever — in controlled (“opted-in”) environments, sometimes called gardens.

Digital ad money was always further down-funnel than linear, and it’s getting more so. People forget that paid search is still half of digital ad dollars, and it’s a strong signal of intent. Retail media is more than a trend: connecting ads with purchases is as outcome-based as ads can get.

Building up a campaign, smart marketers start with (1) moments closer to the point of sale, and (2) outcomes they can track. They build out from there, into intent and targeted awareness (like CTV) and then less controlled environments like late-night cable for reach.

Apple’s media business starts with Apple Search Ads for app installs in the iTunes store, an estimated $5 billion business. Exempt from ATT, Apple can tie these ads to app installs and in-app activity. (Compare this with the brand-focused, abandoned iAds project.)

What comes next? It added hero placements on the App Store’s opening screen. It also offers ads in its News, Podcasts and Maps apps. Maps in particular offers direct-response opportunities tied to location.

That long-rumored DSP and network could allow Apple to bring some of this helpful intent information to ad targeting in apps it doesn’t own.

Lesson: Future-proof ad businesses start with direct response ads.

(4) Focus on “Non-Ad” and Peripheral Placements

There are ads that are ads that don’t seem much like ads. They’re more likely to be acceptable even to more paranoid web surfers and even to regulators. Paid search is an example, I think: despite Google ingesting some of the most private material on earth, consumers don’t seem to think it’s a problem.

Contrast search with a retargeted banner, which appears out of nowhere indicating it was watching me elsewhere. The retargeter knows much less about me than my search engine but seems to know more. Why? It’s overestimated. And it was widely noted that Apple Search Ads flourished after ATT limited retargeting last year.

What does this mean for your future-proofing ad player? What I’m calling “non-ad” ad formats are those that are ads but don’t feel like them. Look at product placement. These ads are growing almost as fast as CTV, are a $23 billion business, and are not even mentioned in the 11 Chapters of the of the GDPR text — which mentions just about everything else.

As AI gets better, weaving products into shows on Apple TV and network apps becomes appealing. Other trending formats don’t interrupt but marginalize ads, literally placing them around content. TikTok Pulse and Meta’s Reels released new “multi-advertiser” formats like this.

The real opportunity may be in shoppable commerce. These are non-ads in that they put a buy-now button on other content, like a show (or product placement). As mobile ad expert Eric Seufert has noted, commerce, retail and CPG are Meta’s largest verticals. Innovative units could shift some of that spending around.

Lesson: Future ads look more like buy-now buttons.

Note: a version of this piece first appeared in The Drum on November 9, 2022.

Your global ‘Center of Excellence’ could actually be sabotaging you

This article first appeared in The Drum on Oct. 11, 2022

Without appropriate cultural relevance, data analysis and distributed resources and knowledge, centers of excellence can undermine global success, writes Salesforce’s Martin Kihn as part of The Drum’s Globalization Deep Dive.

Parking garage spiral looking upward to skylight

/ Leon Seibert

Many of us remember where we were during the great chopstick scandal of 2018, when a storied luxury fashion maison launched three highly-produced videos on YouTube in China to support an upcoming show.

In them, models tried to eat traditional Italian food like cannoli and pizza using chopsticks, with predictable results: a hot mess for both the utensils and the brand, which faced a chorus of complaints from Chinese consumers on Weibo and other platforms, claiming cultural tone-deafness.

The brand quickly pulled the videos, canceled the show and has since lost share in the Chinese market. Part of the problem was a lack of localization in global brand messaging – or what one study of social marketing fails politely called “inadequate research.” The brand had a world-class marketing team, but it was back in Milan, the company’s de facto ‘center of excellence’ (CoE).

During a time of cultural sensitivity, global marketers must balance the imperative to build out such centers with a growing need for in-market nuance. There are new forces facing global marketers that call into question the conventional rush to build centers of excellence across almost every functional discipline.

It turns out, excellence may increasingly be found at the edge – not the center.

Four horsemen of the CoE-pocalypse

There is much logic and some evidence that CoEs can add value. Firms such as Gartner have long championed CoEs for marketing functions such as analytics and data science. They generally define a CoE as a discrete cadre of coordinated professionals with specific, uncommon expertise, who cross-reference ideas, disseminate practices and templates and function as a skilled resources for dispersed global operations.

So compelling is the impulse to the CoE model that it is difficult to find any doubters. Consultancies such as McKinsey routinely recommend establishing a CoE for advanced functions. For example, a recent McKinsey report on automation says: “A center of excellence is vital both as a source of expertise and to define priorities.” Meanwhile, the US Army has at least 15 CoEs for functions from missile defense to human resources.

Global marketers have taken the advice and adopted CoEs. A Gartner survey indicated that two-thirds of enterprise marketers already had an analytics CoE five years ago – yet this year, 26% of CMOs identified analytics as an ongoing capability gap. The same research revealed global marketers had a lot of swagger about their ’operational excellence’ (only 15% cited as a gap), due in part to CoEs.

Yet as the chopstick incident implies, not all wisdom can be centralized. And there are a number of rising forces that point toward the need for global marketers to question the march toward CoEs.

The four horsemen of the CoE-pocalypse are:

Cultural relevance: Local consumers require local nuance – and will take to social media if it’s missing.

Data bias: AI and machine learning models can inherit bias from data collection and processing methods, both of which can have a cultural dimension that is only now being recognized.

Knowledge resources: Many formerly ‘specialized’ disciplines – including reporting and campaign automation – are more common, with widespread learning resources.

Distributed workforce: More dispersed and hybrid employment models undermine some of the CoEs’ neo-Xerox Park ‘skunkworks’ premise.

To CoE or not to CoE?

How is a marketer to assess whether or not a CoE makes sense for a particular situation? As an ex-consultant, I’d be surprised if I didn’t propose a solution in the form of 2×2 framework.

CoEs are supposed to support: (1) availability of specialized skills; and (2) processes that can be run globally rather than locally. So by implication the axes of our 2×2 are:

Skill level: specialized v unspecialized

How hard are the skills to find in most global markets? If they’re rare and uncommon, or less rare in some regions than others, a CoE may be in order. As marketing teams get more sophisticated over time, fewer skills may fall into this area. (According to Gartner, the hardest marketing skills to find in the current environment are data and analytics, customer experience management and marketing technology; the easiest is social marketing).

Cultural proximity: embedded v not embedded in local culture

Is the function something that requires an awareness of how actual human beings talk, think and work in a specific context or not?

Another way to evaluate this requirement is in symbolic terms: How much does the function use global symbols, such as numbers, versus more culturally loaded symbols like words and images? If it’s mostly about numbers, a CoE could work – and if not, it may be time to reevaluate.

COE chart

So the CoE is most useful for marketing challenges related to data modeling and predictions, such as next-best-action and -experience – and for data operations and automation projects that can be standardized across regions. It is less useful for developing creative artifacts for specific regions and building media plans.

Let’s apply the framework to our original example. We see the capabilities required are video production and creative development, skills that are both locally available and deeply embedded in the culture – so, not amenable to the CoE treatment. On the other hand, were that same brand to implement a marketing automation system, it would be entirely reasonable to spin up a CoE for that.

As we go forth into our new world of global marketing, we should take care to discriminate non-human processes from very human communications and recognize that both standardization and globalization have their limits.

Martin Kihn is senior vice-president of strategy at Salesforce Marketing Cloud. For more on what marketers and their partners need to do to succeed on a global level, check out The Drum’s Globalization Deep Dive.

How much is that influencer worth? The answer may surprise you

This article first appeared in The Drum US on August 31, 2022

We’re all under the influence.

Influencer marketing is the fastest-growing paid channel this year, after connected television (CTV), resilient even in the face of recession. As companies plateau their use of social media, 75% of US marketers plan to invest in influencers this year – up from 66% in 2020, according to eMarketer.

And it’s not about products-for-posts anymore – it’s big business. Global marketers spent about $14bn on influencers last year, including media. B-listers such as Joanna Gaines and Addison Rae enjoy multi-figure deals, while real-life stars including footballer Cristiano Ronaldo get an estimated $500,000 per post. And there are thousands of creators in niches from travel to beauty to – of course – cats who are paid an average of $100 per 10,000 followers per meow.

In a world where 50 million people call themselves ‘creators,’ there are a lot of options for brands to partner their way into feeds, tweets and videos. Influencers can provide creative content, access to elusive audiences, higher engagement and compelling social proof.

But there’s a problem. Brands using influencers, surveyed by the Association of National Advertisers (ANA) in 2020, admitted their top challenge was measurement. The situation is no better now. How do you know if you’re getting a worthwhile return-on-influencer (ROIn)?

Channels are not created equal

Measuring the impact of an influencer program is notoriously sketchy. It’s an emerging channel without industry standards. Although the Media Rating Council (MRC) has established guidelines for paid social measurement, most of the value of influencers comes from organic engagement – all those likes, shares and comments from followers and friends of friends that turn a snippet of video into cultural cachet.

Challenges with measuring ROIn include:

  1. Data collection: Brands without API access to influencer accounts rely on methods such as emailed screenshots for metrics
  2. Reach: It is difficult (read: impossible) to deduplicate audiences across platforms
  3. Engagement: Different platforms present different options (where TikTok garners likes, Pinterest culls clicks) and define ‘engagement’ in different ways
  4. Consistency: Agency partners often use proprietary roll-up metrics that can be opaque

Earlier this summer, the ANA released the first ‘Influencer Marketing Measurement Guidelines,’ taking a step toward standardizing organic measurement. Developed by the Influencer Marketing Advisory Board – formed in 2020 with reps from brands such as Puma and Target – it was based on meetings with 25 agencies and the eight major platforms (Facebook, Instagram, LinkedIn, Pinterest, Snapchat, TikTok, Twitter and YouTube).

Brands that have been there will tell you that working with influencers is special – more like hiring an improv troupe than deploying a bot. Companies like control, but creativity is part of influencers‘ charm. So it makes sense to start by asking them how they measure success. A beauty star such as Huda Kattan might value video engagement, while a photo influencer such as Murad Osmann might care more about shares.

Most brands measure ROIn based on ‘engagement,’ a blunt sum of actions divided by exposures, aggregated across all the platforms in the campaign. But this method assumes every creator aims for the same responses, and it ignores the platforms‘ real inconsistencies.

Many roads to the rainbow

Using basic discipline, the hard-working marketer with an influencer program wants some combination of three KPIs:

  1. Awareness: This is driven by reach and frequency, generally available for each platform in isolation, but not across platforms; video views are usually counted here
  2. Engagements: These are measured interactions with the influencers‘ content, including likes and shares – often expressed as an ‘engagement rate’ (ER) or engagements per reach
  3. Conversions: Often the ultimate goal, this is likely undercounted and based on direct clicks through to the brand’s commerce site or other destination

Now, the ANA performs a public service in teasing out the vagaries of the platforms‘ self-reported metrics. Anyone who’s spent time parsing reports from social networks will appreciate this effort. Key differences among the platforms‘ influencer reporting include:

  1. Facebook and Instagram: For Meta-owned platforms, ER is total engagements divided by impressions, not including video views
  2. TikTok: ER is total engagements divided by video views, excluding replays
  3. YouTube: ER is the same as TikTok; however, TikTok counts any video that’s started as a view, while YouTube only counts a view after 30 seconds or 100% for its short-form ‘Shorts‘
  4. Twitter: Twitter is similar to Meta, but quote-tweet counts aren’t available via the API
  5. LinkedIn: ER does not include video views, which are counted after two seconds with 50% viewable
  6. Snapchat: Interestingly, Snap doesn‘t yet provide organic influencer reporting

Understanding the components of the platforms‘ reports unlocks comparisons. Obviously, an autoplay video view on Twitter isn‘t as meaningful as a video view on YouTube, and a retweet on Twitter is not exactly equivalent to a pin on Pinterest.

For awareness and conversion measurement, reach by platform and direct attribution are useful. They aren‘t perfect, since the former misses duplicates and the latter indirect attribution (ie people saw the content and converted later, or offline). But they‘re reasonable baselines.

The problem comes with the most important influencer metric: engagement rate. How can it be improved?

Worth the weight

The answer is by weighting the different components of engagement. Intuitively, we know that a like isn‘t the same as a share or a comment. It‘s easy to like a post – you just tap the heart, right? But sharing to your network is a kind of endorsement, and a comment – with the right sentiment – indicates more visceral involvement.

A principle I used when measuring the impact of social media for brands was one I took from the self-help guru John Bradshaw: “We give time to those things that we love.” Simple enough. Extending it to social platforms, I‘d argue that actions that take more time and effort should count more toward ROIn.

For example, the marketer can create a consistent weighting factor for different actions based on the time they take to complete. Say it takes a second to commit to and tap a like. Even a short, positive comment takes at least five seconds. And a share with a comment might take longer. Typical viewer patterns should be considered, and they will vary considerably based on the influencer and type of campaign.

The ultimate ROIn plan might include breakouts for awareness and conversion, and an approach to ER that considers weighting actions by their level of effort. (The ANA guidelines don‘t address weighting.) Of course, a detailed formula requires access to the platforms‘ API and permission from the influencer. Art, science and some social engineering are required.

But that’s what puts the ‘sure’ in measurement.

Martin Kihn is senior vice-president of strategy, marketing cloud at Salesforce.

Back to the Future: An Oral History of Microsoft & Ads

The following is an article I wrote that appeared in the mighty AdExchanger on July 19, 2022.

Tuesday, July 19th, 2022 – 6:15 am

Martin Kihn

Martin KihnAdExchanger Contributor

This article is based on interviews with participants. It was inspired by Microsoft’s supposedly surprising selection as Netflix’s ad tech partner. But driven by the acquisition of AT&T’s Xandr, that’s just the latest chapter in a breathtaking adventure of pivots, write-downs, partnerships and potential.

In the beginning were these words …

Bill Gates:

The future of advertising is the internet.

The occasion was the IAB Engage conference in London in 2005. At the time, Microsoft had MSN, an ad network and content deals with Fox, NBC and others. But it was focused on one particular upstart in Mountain View. Having lost a bid to acquire Overture, Microsoft launched its own search engine, originally code-named Project Moonshot.

Jed Nahum (director, product management, Microsoft adCenter): Google made about two times what we made on each keyword. We had this functionality which enabled you to bid for age and gender on top of keywords for search. It was our differentiator – but it wasn’t enough.

Eric Picard (director, ad tech strategy, Microsoft): Microsoft was focused on search, but Bill Gates recognized it was bigger – that ads could be another MS Office or Windows-sized business. We looked at investments in Xbox and PC gaming, video ads on Microsoft TV and Media Player and MSN Video. We looked at ads in Office. Around this time, Brian Burdick wrote a paper … that basically invented RTB.

Brian Burdick (principal group program manager, adCenter): In 2005, a couple people on my team and I wrote a proposal for an Online Listings Exchange. … We were piloting a contextual ad program that competed with [Google’s] AdSense. Microsoft had deals for content controlled by a premium display system. I realized on a drive home from work one day that if the revenue per impression between the contextual and premium systems was materialized in real time, any external third party could also participate.

Nahum: The insight of Brian’s paper was basically that what ad networks need is the User ID – like a cookie, IP header info, and a URL that corresponds to the context and location of the ad. If we could pass those three things to ad networks, they could evaluate on an impression-by-impression basis.

Burdick: Gates was super-bullish in the meeting. He had a bunch of comments. He said, “This is bold and ambitious and something we should do.” … Eventually, a lot of other teams wanted to piggyback on the idea, and our ask was for hundreds of engineers. It didn’t get approved.

Microsoft also took a look at Right Media, a pioneering exchange that allowed ad networks to bid on one another’s inventory. That meeting didn’t go so well.

Brian O’Kelley (CTO, Right Media): We went in to Microsoft to talk with a Technical Fellow. He put us through the wringer. I remember he asked us, “How many man-years did it take you to build the platform?” I said, “You’re missing the point. It’s liquidity you’re buying as much as technology.” Back then, Microsoft had swagger. I came away from Redmond feeling they were arrogant.

Right Media was later acquired by Yahoo, and Microsoft set its sights on another target, then owned by private equity firm Hellman & Friedman.

Nahum: Hellman & Friedman pitched DoubleClick to us. On my team, we were f*ing terrified. We understood the value of DoubleClick and what it would mean if it went to Google. After a low bid from Microsoft, Google and DoubleClick went into a quiet period. … We were very depressed. … Steve [Ballmer] quietly bid $3 billion, but Google threw in another $100 million to shut down the dalliance. We were left feeling burned. We were in a situation where we had to get a competitor to DoubleClick.

Picard: We left that meeting where we lost DoubleClick, and a week later Steve [Ballmer] had me and a few others in the room. He says, “This is like that scene in Animal House where Belushi rallies the troops.” And he says, “Okay, we lost DoubleClick – what else we got?”

Microsoft ended up buying aQuantive – including the agency Razorfish, the DrivePM ad network and the Atlas ad server – for $6.3 billion. On the same day, it acquired the AdECN exchange for, reportedly, somewhere between $50 million and $75 million. Bill Urschel and a rising star named Jeff Green ran AdECN.

Bill Urschel (co-founder, AdECN): They bought us and it happened pretty quickly. It was at an Ad:Tech [event] … Eric Picard and Jed Nahum came by our booth and asked all kinds of interesting questions.

Picard: We walked up and started chatting. We talked about what they’d built – it was interesting. Jed, [Microsoft GM] Joe Doran, Bill, Jeff and I had a fancy dinner and got along well. We were kindred spirits.

Jeff Green (COO, AdECN): Everyone wanted to see Microsoft do well. The AdECN strategy was to get Yahoo and AOL to join up and create a pool of liquidity that rivaled Google. But aQuantive was saying, you know, our ad server [Atlas] is better. Let’s combine it with Microsoft tech and build the world’s biggest ad network.

Comic: Microsoft Leaves The Party

Burdick: DrivePM was the internal ad network aQuantive ran for Microsoft, and it had more than 40% margin. [They] put the head of strategy of aQuantive in charge of strategy for Microsoft. … They eventually came around to the exchange model, but not in the beginning. There was resistance to the exchange, putting margins at risk.

Boris Mouzykantskii (founder, IPONWEB): I think AdECN had a chance to test real-time bidding in the market. It never happened. It’s possible, if they’d done it, Microsoft would be AdX.

Urschel: After the acquisition, on the Microsoft side there were some brilliant people who saw a vision of a bigger exchange, but they were essentially drowned out. The cash at the time was flowing from the aQuantive business, so I don’t think the exchange business ever got a serious look and didn’t get the resources.

Burdick: I went down to be CTO of AdECN. … We built the first real-time bid exchange. But between the aQuantive people and our VP, they would not let us go outside [Microsoft] for inventory. The reasons are murky to me. They just didn’t greenlight it.

Meanwhile, Brian O’Kelley had started AppNexus, originally a cloud hosting platform that became an SSP and exchange.

Brian O’Kelley (co-founder, AppNexus): My pitch to Microsoft was that they can’t fight Google in search and display. Let us be the market maker, make us the dominant exchange platform. But that would only work if you put the whole heft of Microsoft behind it – all MSN inventory – [and] make everyone buy through us. I spent a lot of time in Bellevue and got a mind meld [for] how we could beat Google. It was an incredibly strategic conversation about the future of the internet, not just about product.

Picard: I introduced Brian to [Microsoft Ads exec] Rik van der Kooi. I said, “If we’re not going to be allowed to build this internally, it’s not a bad thing to invest in another company that’s a credible competitor to Google’s ad exchange.”

O’Kelley: We made a deal where [Microsoft] gave us inventory and they got one-third of the company. … Exclusive inventory from one of the top five publishers. Over time, we delivered. Some things were not successful, like a Windows Phone integration, but Microsoft was the first fully programmatic major seller.

Nahum: After the AppNexus deal was done, we branded our instance of [as] MAX, [the Microsoft Advertising Exchange]. We couldn’t get the aQuantive guys to put inventory in AdECN, but we were able to put it into the waterfall after direct sales for AppNexus. Immediately it started making money. My team launched 35 markets internationally. We sold to aggregators of demand, to DSPs, agencies and trading desks.

Picard: It was really bittersweet. The day I decided to leave, I was in a meeting with Ballmer. He said, “I want us to shut MSN down, divest all the non-search ad business [and] the exchange and double down on paid search.” Ultimately, the team convinced him MSN was too critical – but the strategy shifted from editorial to being a portal with content from other publishers. It took about six years to fully divest the display business, until 2015.

Microsoft Press Release, July 2012: While the aQuantive acquisition continues to provide tools for Microsoft’s online advertising efforts, the acquisition did not accelerate growth to the degree anticipated, contributing to the write-down [of $6.2 billion].

AdExchanger, June 29, 2015: AOL to Absorb Microsoft’s Display Ad Business Along with 1,200 Employees, Bing to Power all AOL Search

O’Kelley: AOL’s pitch to Microsoft was [to] let us rep everything. There was tension there. [AOL CEO] Tim Armstrong and I would see each other on the street corner “growling.” Microsoft wanted AOL to choose Bing as its search engine. That was a $1 billion deal. We couldn’t beat that. How could we possibly win? We had to massively overdeliver.

David Jacobs (SVP, sales & monetization partnerships, AOL): I’d give credit to Tim Armstrong, who really leaned in, and Bob Lord who pushed the deal through. It seemed like a good thing. … It was almost like a scale play. AOL and Huffington Post were relevant properties that had legs, but this created an opportunity to take brand sales to another level. Header bidding was not mature yet.

O’Kelley: I convinced Microsoft to give AOL [some] major markets, including Japan, the US and UK, and [to] give us the rest. The deal was good for everyone. We made Microsoft hundreds of millions. That was a $30 million revenue account for us. We ran with 17 [or] 18 “demand evangelists” providing a lightweight sales model. My nightmare was AOL would drive us out of the deal.

Jacobs: It happened at a time with a lot of moving parts. AOL acquired Millennial Media around then. I was in Dulles with some Microsoft people when the deal was about to be signed – and that same day the Verizon acquisition [of AOL] was announced. … There was a lot of change management happening. It allowed Microsoft to not have to support a display ad sales team.

Comic: Wonder Twins Ad Powers Activate!

O’Kelley: There were so many of those moments. That was constant. Google was selling against us. AOL was selling against us. I used every bit of leverage to keep from losing our biggest client.

Jacobs: While not core to the deal, we would have liked to get Microsoft inventory into our SSP [from AppNexus]. Eventually, we migrated the Microsoft display inventory over to AOL’s ad server.

AppNexus was acquired by AT&T in 2018 and became the foundation for Xandr – which, in a twist of programmatic irony, was acquired by Microsoft this summer.

Brian Lesser (CEO, Xandr): Clearly there was some value there that we created, because Microsoft could have bought a lot of things, and they bought Xandr. … I think Xandr is going to be great with Microsoft.

John Cosley (senior director, brand marketing, Microsoft Advertising): We have bold ambitions, including the innovations we’ll drive with Xandr now that the deal closed – [also] continued momentum with our PromoteIQ offering, Microsoft Audience Network solution, our new measurement partnership with Roku – and ongoing innovations and market expansion for our advertisers across our search and audience network.

O’Kelley: I have mixed feelings because I wanted Microsoft to be the buyer the first time around. It felt like the right home for the company.

So, it’s a little bittersweet that they end up there now. I would have wanted to work at Microsoft. … LinkedIn is a huge asset. Activision is big. Windows is free now. There’s search, gaming – amazing ad assets. It doesn’t seem crazy that they could be successful in the ad tech business.

To be continued …

What’s Going On with Digital Marketing & Ads

This lavishly illustrated article is based on a talk I gave not long ago at the ANA Masters of Data event in Orlando and at Salesforce Connections in Chicago. It could interest cats wanting an overview of the state of digital marketing and ads, with an emphasis on worry beads. As usual, if you’re already a genius, I have nothing to tell you.

Now if you’ve noticed more speed in the digital space in recent years, you’re right. Of course the pandemic raised the velocity of digitization, but the real change had happened before 2020. In a phrase: digital won.

share of US ad spend

Advertising is a proxy for attention, so the movement of ad spend into the digital realm is an expression of our virtual migration. The –verse is already meta: we’re living in the ether. Those of us who remember the early 2000’s when digital was maybe 10-15% of ad spend at the most innovative shops, and most of that was search – well, we knew this would happen, but we’re surprised that it did.

There’s big money here, which is the best explanation for all the battles over IDs and OSs and privacy rights I can give you. On a related note, it’s also become an increasingly concentrated business. Pareto rules as 80% of the rewards go to just three companies: Alphabet (aka Google), Meta and Amazon. There’s a theory in market strategy called “The Rule of 3,” which is self-explanatory, and there’s some analogy to the Big 3 networks of the 1970’s.

To be clear: it’s not all about the third-party cookie. Cookies were the CNS of programmatic ads and tactics like retargeting, but they’ve been ebbing out for years. Mobile apps don’t use them, nor does search. Apple’s Safari browser defaulted away from them a half-decade ago; only Google’s Chrome remains loyal, and as you know the sand’s running down there as well.

We’re already living in the ‘cookieless world.’ In VERY round numbers, here’s a rough breakdown of digital ad spent in the U.S.

breakdown of US ad spend

If you figure half of ad spend still goes to linear channels, then your perfectly proportioned big-media mover is likely allocating something less than 10% of her budget on cookied media.

Speaking of cookies – as I so often do, making me lethal at parties – we do have the pandemic partly to blame for what I like to call the Longest Death Scene in History. Google’s Chrome blog announced their departure in January, 2020, and has subsequently extended the final flicker to some vague moment late in 2023 or beyond ….

No alt text provided for this image

So What’s Going On?

Let’s remember a concept called expected value. As we learned in business school, expected value is a product of (what something is worth) x (how likely you are to get it). So if the jackpot is $1 million and my odds are one in a million, that lottery ticket is worth $1 to me.

That’s how digital marketing works. Take an ad. When figuring how much to pay for it, the smart media planner will more or less think: (what is a positive outcome worth) x (how likely is it to happen)? In other words, they try to estimate what part of the audience will respond to the ad and what a “response” means in dollars.

No alt text provided for this image

This math is easier to describe than to do. How would you treat a car ad, that might raise awareness but inspire few sales – at least, this year? You might know the value; what’s the P? But the principle abides, and it helps explain a phenomenon we can call the Late Night Cable Ad Experience.

Imagine you’re on your sofa and it’s 2 a.m. and you’re randomly scrolling through cable. Those ads are barely targeted at you at all. And what you’ll see – so I hear – are a lot of ads for medications and class-action lawsuits and for food and cleaning products. In other words, either very expensive or very common items.

Why? It’s the expected value. When messages can’t be targeted very well, they will default to those with a very high P (hit rate) or those with a very high value; that is, the mass-iest of the mass market stuff and things like lawsuits, where you could have one in a million respond and still pay for the campaign. As marketers lose the ability to target on the open web through data deprecation, every ad experience in the wild will converge on cable.

No alt text provided for this image

It’s my theory that most consumers only think they don’t like targeted ads. In truth, we’ve become so used to applied data for aiming and attributing messages that we’ve forgotten what it’s like to be anonymous. It’s not pretty. Nobody remembers the mid-1990s and the beginning of the internet, but it was full of irrelevant emails and ads.

Apple has an on-and-off relationship with media, but it’s decided that “privacy” is its brand and an explicit opt-in is required for any kind of cross-domain view. Its App Tracking Transparency (ATT) framework is only a year old but has had a major impact on mobile networks. The Financial Times made some noise with its second-half 2021 estimates of lost revenue, due to ATT:

  • Meta – $8B lost revenue
  • Snap – $600M lost
  • Twitter – $400M lost

One surprise winner in FT’s analysis was … Apple, whose paid search ads business in its iTunes store was a significant gainer.

No alt text provided for this image

How did this happen? Mobile apps don’t use cookies because they don’t run in browsers; they’re equipped with a mobile ad ID (MAID) which is persistent and unique at the level of the operating system. Before ATT, this MAID-based system was better than cookies because it required less cumbersome ‘synching.’ Mobile ad networks like the Facebook Audience Network (FAN) used it to target and measure in-app ads very well.

In fact, many digital-first businesses focused most of their media spend on the singular channel of Facebook/Instagram ads. Back in 2017, the New York Times magazine ran a story headlined: “How Facebook’s Oracular Algorithm Determines the Fates of Start-Ups.” It was about just how powerful Facebook ad targeting was.

A lot of these businesses used Shopify as their commerce platform. Not surprisingly – but rather dramatically – the roll-out of ATT also had an impact on Shopify. Well before the recent market meltdown, Shopify’s market cap was cut in half from its peak.

No alt text provided for this image

(h/t Eric Seufert)

At the RampUp event earlier this year, I saw a presentation from a large publisher estimating the impact on their ad prices due to ID loss. They used Safari’s cookie deprecation as a proxy for estimating what would happen when cookies disappeared in Chrome. The punch line was: down 50%.

As a general rule, we can say the cookie (or MAID) doubled the marketers’ ability to find a likely customer.

So Who Is Fixing This?

Some are trying, and some don’t think there’s anything to fix. A few years ago esteemed Apple privacy engineer John Wilander described the situation nicely in this thread on Twitter. (Note that his phrase “There may be problems worth solving that were previously solved with third-party cookies” reduces two decades of ad tech innovation to a triviality.)

No alt text provided for this image

Privacy Sandbox. You’re likely aware that the World Wide Web is tuned up by a large and largely anonymous volunteer army, technical types who hold long calls over years and process fixes large and small through committees into production. From its foundations in HTML and TCP/IP, protocols are what makes it a Web, after all.

Browsers are overseen by the World Wide Web Consortium (W3C) and its various Community Groups and Business Groups. This is where Google’s Chrome engineers and others bring their proposals for the post-cookie world, and they’re discussed more or less openly by Apple, Mozilla, Meta, and others. You’ve heard of the ‘Privacy Sandbox’ and FLOCs and FLEDGEs and so on; this is where they nest.

We started with anonymized cohorts generated by the browsers (FLOCs) … moved to a less detailed version of the same with more randomness (TOPICS) … and are now excited about publisher-defined cohorts. Basically, we’re left with publishers (and in one proposal, browser users, aka, us) labeling ourselves for targeting.

So far, we’ve learned what won’t work – but not much about what will. Tension is fundamental – and perhaps irreconcilable – between people who hold:

  • Theory A: Browsers and apps can collect some form of information safely, without explicit opt-in
  • Theory B: Opt-in should be required for everything

Although so much is in flux now, it’s entirely believable that Chrome and Safari, Android and IoS, Mozilla and Edge will all have different rules in the end. Debates are in terms that have not been defined. What is “privacy”? What is “consent”? What is love – baby don’t hurt me …?

On that penultimate point, not enough legislative chutzpah has been pointed at the language of the opt-in box itself. It seems to me more important than anything else. Theory B assumes we average humans are actually equipped to know (1) exactly what data is collected about us; and (2) exactly what ‘personalization’ looks like. I’m not so sure. (Look up the ‘privacy paradox’.)

No alt text provided for this image

Note that Apple’s language (on the right) for its own ad experience is somewhat more attractive than the frowsty boilerplate it issued for ATT (middle). One woman’s “tracking” is another ones’ show of respect.

What Happens Now?

Digital marketing is a field that’s weathered teething and teen years, gone to college in a difficult political climate, and is old enough now to think about moving out of the basement. The first commercial browser, Netscape, was launched in 1995, making our Web 27 years old. Yes – definitely time to grow up.

Maturing is messy. There’s experimentation; fits and starts. We try one direction, go to Europe, and come home with a different look and outlook. Fundamentals apply; we’ve got a bright future. And we’ve got to play by the rules; we’ve got to adapt.

Marketers are adapting, fast. In my travels virtual and real these past few years, I’ve seen nothing but admiration for the challenge and a willingness to work. Stripping out the arbitrary vectors, what do we marketers need? I like the IAB’s framework, reducing requirements down to two categories: Addressability (finding people), and Accountability (measurement).

Both will always be possible; it’s their precision that’s in transit.

Some conclusions about the future seem reasonable to me. We can take these as likely hypotheses for planning:

  • User-level IDs will require opt-in to share (this includes IP address and maybe email)
  • Marketers have to get better at demonstrating data use
  • Publishers and people have some control over their labels (if they want to)
  • Measurement becomes a complex mesh of next-gen MMM and testing (see Analytic Partners)
  • First-party data builds competitive advantage

Yes, let’s talk about 1PD. First-party data isn’t new, nor is the idea of a ‘single view of the customer.’ What’s new are improvements in technology’s ability to support the V’s of big data (velocity, variety …) at reasonable rates. It makes sense to organize, harmonize and deduplicate your customer and prospect information, gathered with consent.

This is where the mighty Customer Data Platform comes to help you, and I’ve co-written a book with the multifaceted Chris O’Hara about this very topic (see B&N or Amazon: “Customer Data Platforms”).

It makes sense to try to collect more 1PD/0PD using increasingly inventive techniques.

No alt text provided for this image

And that’s what marketers are doing. Just ask them. I thought it was puissant that the IAB/Ipsos State of Data report this year named two ‘solutions’ to deprecation challenges: (1) gather more first-party data, and (2) analytics.

No alt text provided for this image

These answers seem right to me, but they’re also something of a holy rosary. We don’t have data: we’ll get more. We don’t know what to do with it: We’ll ask the machines. Of course, we all know it isn’t that easy, and I recall from my Gartner days that spending on marketing data science is generally rewarded. But the tone is defensive.

So what’s going on? We’ll stop here. The most mysterious impact on digital marketing’s future will come from forces we’ve barely mentioned: legislatures and mergers & acquisitions. A government could decide tomorrow that ad targeting is illegal and marketing is mind control.

Let’s hope for sanity. It always has a chance.

What’s Really Going on in the Privacy Sandbox?

The following column originally appeared in the mighty AdExchanger on Feb. 15, 2022

“I said ‘Hey, what’s going on?’” – 4 Non Blondes

Back in 1994, when a 23-year-old Netscape engineer single-handedly enabled third-party cookies by default, digital advertising was a $50 million business. Now it’s at $450 billion, and a lot more people are involved.

It seems they don’t agree – not just on technical issues, which can be solved, but on existential ones. Like: “Is ad targeting and measurement good for our society or not?” Or: “Is requiring a person to opt in to ‘tracking’ fair?” 

The Privacy Sandbox was launched by Google’s Chrome team in 2019 as a test bed for ideas. They chose to take their ideas to the World Wide Web Consortium (W3C). This step was not required; Apple and others regularly make changes to products on their own.

As the W3C’s hard-working counsel and strategy lead, Wendy Seltzer, admits: “We can’t force anyone to do anything. We look for places where we can help find consensus.”

And in the past month, there’s been a flurry of Sandbox-related announcements: a potential replacement for the FLoC proposal, in-market tests for measurement and attribution ideas, a new working group.

Amid all this excitement, we’d be forgiven for thinking we’re on the brink of adopting universal standards for ad targeting and measurement. Not quite. We’ve become so used to a splintered internet that the whole idea of a self-regulated World Wide Web with the same rules of engagement for everyone seems as quaint as “Do Not Track.”

Building castles in the sand

As a cooperative venture, the Web relies on the goodwill of participants to survive. The W3C and its nerdier cousin, the Internet Engineering Task Force (IETF), are certainly doing their jobs.

Despite what we think, advertising is only a small part of the W3C’s daily grind. (It almost never comes up at IETF meetings.) Sandbox ideas end up in the Improving Web Advertising Business Group (IWA-BG), the Privacy Community Group (PCG) or the Web Incubator Community Group (WICG). Only the first one is focused on ads. The IWA-BG has 386 registered participants, 62 more than the Music Notation Community Group but 14 less than the more-popular Interledger Payments Community Group.

The main work of the W3C members consists of responding to issues on GitHub and holding conference calls, which are fun to audit. They’re definitely overworked. Two weeks before last Halloween, a new group called the Private Advertising Technology Community Group (PAT-CG) launched with a lot of momentum. At the group’s first gathering, one participant made the obvious point: “Many of us are struggling to take active part in all the groups active in this space.”

Like most committees, these ones can inspire angst. Frustration could be felt in the Twitter screed of one of the PAT-CG’s champions: “The folks in this group are *hungry to make progress*.”

What is clear is that the pro-advertising contingent is fighting uphill. During a presentation to the IETF last year, a Google engineer describing the FLoC proposal felt the need to justify the project by citing academic studies about the economic impact of cookie loss on publishers. In the same meeting, an Apple engineer talking about Private Relay, which masks IP addresses (and can break things like time zone and fraud detection), felt no need to justify promoting “privacy.”

The trouble is – and this is the crux of the issue – there’s still no consensus here on a very important, foundational question: What is privacy?

There’s a team called the Technical Architecture Group (TAG) within the W3C drafting a set of “privacy principles.” These are still a work in progress with many stakeholders, and the W3C’s Seltzer said in a meeting last fall that “it’s a tough challenge to bring all those perspectives together.”

But the ultimate success of this draft or a related privacy threat model that would herd the privacy cats isn’t clear.

So, what happens now?

Given its limited objectives, the Sandbox is succeeding. The Chrome team has received a lot of feedback and is reacting. According to the latest updates, four proposals have completed or are currently in trials (Trust Tokens, FLoC, Core Attribution and First-Party Sets). At least two more will enter trials this year.

Results are mixed, but that is just how engineering works: blunt feedback and iterations. FLoC itself has flown through an initial test, a redirection and recent relaunch, and it has hatched a whole aviary of suggested improvements. Missing in all this is a promise of cross-browser, Web-wide solutions.

The impact of FLoC is instructive in another way – one that’s reminiscent of the “Do Not Track” experience. In the latter case, a member of the W3C working group, Ashkan Soltani, grew frustrated and ended up helping to draft the CCPA and CPRA regulations. (Soltani is now in charge of the California Privacy Protection Agency.)

Similarly, a vocal member of the W3C Privacy Sandbox, James Rosewell, drafted a complaint that, in part, led to Google’s agreement to cooperate with the UK’s Competition and Marketing Authority. This agreement was accepted by the CMA just before Valentine’s Day, while a coalition of European publishers filed another complaint.

Seems like, in the end, the future of the cookie may just be worked out between the parties with the power here: Alphabet and the regulators.

Follow Martin Kihn (@martykihn) and AdExchanger (@adexchanger) on Twitter.

So, you think you want untargeted ads? Think again

This article first ran in The Drum on Jan. 10, 2022

Salesforce strategist Martin Kihn gives us a real-time glimpse into a cookieless future.

What does it look like to live in a universe – or a metaverse – where ads are noticeably less relevant? Is our collective user experience really any better than what we’ve got now?

To answer this question, I conducted an experiment. I visited some of my usual websites using two different browser setups: ‘Targeted‘, with Google Chrome browser with cookies, location and IP address enabled; and ‘Untargeted‘, with Safari browser on Mac Monterey OS with location tracking turned off, browsing history cleared, and all cookies (except first-party) disabled. I also enabled a new Beta feature called Private Relay, obscuring my IP address, which can be used as a back-up ID when cookies aren’t present.

Then I took a cleansing breath, fired up the Safari browser and started surfing.

Our untargeted ad ‘FutureWorld’

Welcome to a web where nobody knows your name.

Dropping by Forbes, I’m immediately greeted by a sumptuous ad for a piece of beachfront real estate with spectacular views that do not remind me of my nearby Jones Beach, Long Island. ‘Own the Lifestyle,‘ it tells me… unfortunately, that lifestyle is in 1,300 miles away in South Beach.

Checking out a story about my man Matthew McConaughey, I see an ad for Toluna, which provides “agile consumer behavior tracking” for small and medium-sized business (SMB) owners (which I’m not) and a multi-paneled ad for Santa Teresa Rum. Now I don’t drink, but the article was about McConaughey’s whisky venture (not Santa Teresa), so I’m seeing some contextual targeting in action.

Stopping by BroadwayWorld.com for the latest on the Great White Way, I see an alarming ad with an older man knee-bracing a swollen limb under the headline “Bone on Bone?” Ouch. Swiss Air entices me to visit Venice and Florence… cities not actually on my Covid agenda, yet. YvesSaintLaurent lures me toward Black Opium, a perfume for women.

KnowYourMeme.com (a repo of meme info) flatters me with an ad for Oracle NetSuite and a call-to-action to download a white paper aimed at the chief financial officer (CFO), which I am not. I’m getting a suspicion these sites have somehow ID’d me as a business guy (true) and are trying out various roles (SMB? no… how about CFO?), but this fear is allayed by the next two ads, which I don’t understand: one for something called ‘MX KEYS MINI‘ and another for a Basilisk v3 with “Full Spectrum Customizability,” which looks like a mouse powered by a tiny nuclear reactor. (It’s for gamers, which I’m not.)

Dropping by Adweek, I’m invited to explore DisneyTech, a job site for Disney (not looking)… and Swiss Air again, this time trying to get me to go to Switzerland, which is probably lovely this time of year.

Toddling back to Forbes to recheck some fine points in the McConaughey story, I enjoy different ads for Ralph Lauren eyewear, modeled by a woman who looks like her kale wilted; and Cosabella Petite 28A to Ultra Curvey 36L inviting me to feel great in “your everyday bralette,” a word I’ve never seen before.

Finally, I’ll mention that Taboola ‘outstream‘ ads, at the bottom on the page, made up in entertainment value what they lacked in relevance. On CNBC.com I saw one with the headline “${city:capitalized} Seniors Are Living Good In These Incredible …”

Which is one way to deal with a lack of location data, I suppose.

Back in the normal

My future world is alarming and tragic. I feel as though these poor publishers are basically rolling a set of pixellated dice, hoping to interest me in something… anything. Almost none of the ads have a prayer of converting me to anything.

Going back to the familiar world, I prime the pump by visiting my boys at HugoBoss.com to check out the new line of Boss x NBA Knicks-branded athleisure, and of course the Cadillac Escalade 4WD Sport Platinum to wear it in, knowing full well what comes next.

Nor am I disappointed, feeling as though I am falling into a warm bath of relevance and recognition that is comfortingly repetitive, like Top 40-radio. For reader, I suddenly saw a lot of ads for Hugo Boss x NBA athleisure (although not for my Escalade, probably because supplies are limited these days).

Visiting Forbes again, I see ads for the Teaching Company (I’m a customer), the Joyce Theater (ditto) as well as ads for direct competitors of my employer and for my employer itself. Capital One rotated some ads touting their “ML for Causal Analysis,” which is something I actually understand. And there were ads for mid-cap stock funds and SurveyMonkey research instruments, both of which I’m considering.

Over on CNBC, I am flattered to see the site has obviously mistaken my browser for that of a much richer man: there is an inspiring banner urging me to ‘Own Your Sky,‘ trying to sell me a jet.

At Adweek and BroadwayWorld and so on I notice a very familiar and similar ad experience, proving that programmatic advertising really does target the browser and not the publication. It works as advertised. Most of the ads are retargeted, some are competitors of brands I use, and others are just categorically appropriate things that people in my age and income bracket might buy (cars, funds, supplements).

Above all, it is a world that I recognize.

So, what did we learn about these colliding worlds?

My experiment is anecdotal, but it did surprise me in four ways:

Publishers aren’t adept at handling users with no IDs. There were far fewer contextual ads than expected and more low-awareness (and presumably low-bidding) advertisers filling space.

Retargeting is definitely overused. It has a role as a reminder and incentive to act but quickly devolves into negative returns for the brand.

We consumers are kidding ourselves if we think advertisers “track your every move” and know everything about us. If that were true, targeting would be a lot better than simply retargeting.

And finally, the untargeted experience is truly awful. Nobody could possibly want it: not advertisers, publishers or consumers. If it wins, the open web won’t have a chance.

For all concerned, there has got to be a compromise on the continuum of privacy and relevance. Let’s make that a New Year’s resolution.

Martin Kihn is senior vice president of strategy at Salesforce.

Can a Computer Write a Hallmark Holiday Movie?

The following post originally appeared on the NYC Data Science Academy blog on Sept 28, 2021. This project was my capstone submitted for my data science coursework. It was not sponsored, endorsed or even noticed – so far as I know – by the mighty Hallmark network.

As the holidays approach, many of us eagerly await a new crop of Hallmark Holiday movies: positive, reassuring, brightly-lit confections that are as sweet and reliable as gingerbread. Part of their appeal is a certain implicit formula — a woman in a stressful big city job goes home for the holidays, falls for a local working man, and realizes she’s missing out on life.

Small towns, evil corporations, a wise older woman … there are recurring motifs that made me wonder if I could apply machine learning to the plots of these movies to understand (1) what the formulas are; and (2) if a computer could write a Hallmark movie (for extra credit).

NLG has been tried on Christmas movies before, and the results were quite funny. Perhaps I could do better.

My initial hypothesis was that there are a certain (unknown) number of plot types that make up the Hallmark Holiday movie universe, and that these types could be determined from a structured analysis of plot summaries. Because the stories seemed formulaic, they could potentially be auto-generated using natural-language generation (NLG) methods, which were new to me.

Assembling the Data Set

Step one was to pull a list of titles. The Hallmark Channel has been running original movies with a Christmas theme for a quarter century, although the rate of production skyrocketed in 2015 as they became very popular. Pandemic production issues slowed the pipeline slightly in 2020, but the pace remains rapid.

Although 1995-2010 saw fewer than five original titles a year, the years 2018-2021 saw almost 40 each year. It’s quite a pipeline.

Luckily, there is a list of original Hallmark production titles on Wikipedia, which I was able to scrape using Scrapy. Holiday movies aren’t distinguished from others, so there was some manual selection in cutting the list. Once I had my titles, I was able to use the API for The Movie Database project (TMDB), which maintains information about films and TV shows, to pull the ‘official’ plot summaries.

There were 260 plot summaries in my corpus. The summaries ranged in length and style, and their lack of standardization and detail caused some challenges in the analysis. However, short of watching all the movies and building my own summaries, the TMDB summaries (which were provided by the network, I assume) were my data.

My intended audience was writers, producers and TV execs who want to understand how the Hallmark Holiday genre works and the elements of a successful production slate. These popular movies could also be used to inform other narrative projects with similar valence.

Of the 260 summaries, all but two were Christmas movies. Many summaries were disappointingly brief and generic, but many were better than that. There were about 15,000 words in total in the final data set.

Here’s a typical example of a summary for the film “Christmas Town,” starring the adorable Hallmark-ubiquitous Candace Cameron Bure:

Lauren Gabriel leaves everything behind in Boston to embark on a new chapter in her life and career. But an unforeseen detour to the charming town of Grandon Falls has her discover unexpected new chapters – of the heart and of family – helping her to embrace, once again, the magic of Christmas.

Over the years, the stories and themes of the Hallmark Holiday films changed, as the network nosed around and then settled on a set of typed tropes. For example, earlier films used Santa as a character more often and spirits as worthy guides for the heroine. By 2015 or so, Hallmark had found its soul: small towns, high-school boyfriends, family businesses threatened by big corps, and so on.

Feature Engineering

After lemmatizing and tokening, removing stopwords and other standard text preprocessing, I realized that the corpus would have to be standardized to gain insight into its themes and to provide training data for any NLG model. For example, the summaries had names for characters, but those names didn’t matter to me – I just cared that it was <MALE> or <FEMALE> (for the main characters), or <CHILD> or <SIBLING> or <PARENT> or <GRANDPARENT> with respect to the main character. Often there was also a <BOSS>.

(If you’re curious, the most common names for characters mentioned in the corpus were: Jack, Nick and Chris.)

Likewise, towns were often named, but my only interest was that it was a <SMALLTOWN>, or (in the case of those bustling metropolises our heroines liked to leave in the beginning of the story) <BIGCITY>. And the evil big corporation might be named, but I wanted to tokenize it as <BIGCORP>.

Note the <BRACKETS> which would indicate to the model that these were tokens rather than the words originally in the corpus. How to make the substitutions without a lot of manual lookups?

I ended up using Spacy to tag the parts of speech. Although it requires some computer cycles, Spacy is a great NLP library that will tag each word by its part of speech, including place names, personal names and proper nouns. The tags themselves are then accessible to a Python script as part of a dictionary-lookup substitution.

In the case of character names, I was able to tag them using Spacy and then run them through Genderize to get a likely gender. This doesn’t always work, as viewers of “It’s Pat” on Saturday Night Live know, but a quick scan let me correct mistakes.

I could also automate much of the <TOKENIZATION> using dictionary substitutions. For example, I could find instances of “L.A.” and “Los Angeles” and “New York City” and so on and substitute <BIGCITY>. However, a careful manual check was needed to do some cleanup.

In the end, I had a corpus of 260 plots with major character, location and relationship types <TOKENIZED>.

Frequency Analysis & Topic Modeling

Word frequencies were high for terms such as ‘family’, ‘child’, ‘help, ‘love’, ‘parent’, ‘small town’. This agreed with my personal memories of the films — i.e., an abiding emphasis on families, home towns, and positive mojo.

Bigrams and trigrams (common two- and three-letter combos) uncovered even more of the Hallmark spirit than word frequencies. Among bigrams, the most common were ‘high school’, ‘fall love’, and ‘return hometown’. Common trigrams were ‘high school sweetheart’, ‘old high school’ and ‘miracle really happens’.

It is possible just to look at the common trigrams and get a very good feel for the alternate reality that is the mini-metaverse of Hallmark Holiday films.

The heart of my NLP analysis consisted of LDA topic modeling, using the Gensim library. Latent Dirichlet Allocation (LDA) is a statistical method that takes a group of documents (in our case, plot summaries) and models them as a group of topics, with each word in the document attached to a topic. It finds terms that appear together (frequency) and groups them into “topics” which can be present to a greater or lesser degree in each particular document.

Often used for categorizing technical and legal documents, I thought it could be used to find the different holiday themes I detected in the plot summaries.

First, I did a grid search for parameters using “coherence score’ as the target variable to maximize. The purpose of this search was to find a likely number of distinct topics, or plot types. I guessed there were 5-10, and this hyperparameter tuning exercise indicated that 8 topics appeared to be the most likely best fit.

Training the topic model on the plot summaries, I generated 7-8 distinct topics, with some overlap in words, as expected. These topics were analyzed using pyLDAvis, which allows for interactively probing the topics and changing some parameters to make them easier to interpret. (Figure 4 shows the pyLDAvis interactive view.)

Here some manual work — call it ‘domain knowledge’ (e.g., watching the movies) — was needed. I tagged the plots with the topics and focused on those that clearly fell into one topic or another. I then came up with a rough summary of these plots and gave that ‘theme’ a name. The manual tagging was needed because the theme name itself often didn’t actually apear in the summaries.

The 8 Types of Hallmark Holiday Movies

The 8 themes I ended up identifying, along with my own names and sketches, were:

  1. SETBACK: Disappointed in work/love, a woman moves to a small town to heal/inherit
  2. BOSS: A cynical businessman hires a spunky woman for holiday-related reason (like planning a party)
  3. MIXUP: A travel mixup/storm forces some incompatible people to work together
  4. ALT-LIFE: A wish upon Santa/spirit is granted and a woman is shown an alternative life — often, this involves time travel
  5. TAKEOVER: A big corporation threatens a family-run business in a small town
  6. RIVALS: Two seemingly incompatible rivals are forced to work together for some goal
  7. IMPOSTER: Dramatic irony: Someone lies about who they are — or gets amnesia and doesn’t know who they are
  8. FAMILY/CRISIS: A woman is forced to return home because of a family crisis

As usual with LDA, there was some overlap among the themes. In particular, #1 co-occured with others often; it started the story moving. For example, the heroine might suffer a SETBACK at work which encourages her to go back home (#1), and she encounters a MIXUP on the way (#3) that lands her in a delightful small town (this is the plot of “Christmas Town”).

Interestingly, when I looked at the distribution of themes over the course of the Hallmark seasons, they were fairly evenly present. This made me think the producers at the network are well aware of these themes and seek to balance them to avoid repetition.

Text Generation Using Markov Chains, LSTM and GPT-2

As an experiment, I looked at three different methods of generating text, the idea being to use the plots as training data for a model that would generate an original plot in the style of the others. Text generation or NLG is an emerging field that has made amazing strides in recent years – as I discovered – and has developed uncanny capabilities. I was only able to touch the surface in my work.

I began with traditional text generation methods, which were hardly magical.

Markov Chains were the most intuitive: they use the corpus and predict the next word (word-by-word) based on a distribution of the next words seen in the training data. Because it’s at the word-by-word level (not larger chunks of text) — at least, the way I implemented it — the results were coherent only in very small sequences. Overall, it didn’t work to put together sentences or stories that made sense.

Figure 6 shows a few examples of text generated in this way.

Long Short-Term Memory (LSTM) is a form of recurrent neural network (RNN) AI model. They were created as a way to solve RNN’s long-term memory problem, as RNN’s tend to forget earlier parts of a sequence (e.g., of text) due to a vanishing gradient. They also make predictions at the word level based on weights derived in the training stage.

Training was done over 6 epochs and 100-character sequences using ‘categorical cross-entropy’ as the loss function. It took about two hours on my average-powered setup, so it’s time-intensive. Longer training would improve disappointing results. (See Figure 7.)

Frankly, LSTM was a misfire. It required a great deal of training and although I did train for a few hours, my results were coherent only for a short (half-sentence) of text. More training might have helped, but I was more interested in moving on the ‘transformers’, the current state of the art for NLG.

GPT-2 — this is an open source version of the OpenAI transformers models. It’s pretrained on vast amounts of text data, giving it a very good basic model of English text. (GPT-3 — which is much better at NLG — is not available open source and I could not get access.) Training GPT-2 using the plots, I was able to ‘direct’ it toward my particular genre. The results were much more coherent than the other methods, while still falling short of useful new plots. (See Figure 8.)

To implement, I used the Transformers library provided by Huggingface/PyTorch, pretrained on data from the web. I trained the model for 50 epochs in batches of 32 characters (about 6 words).

Clearly, transformers are the way forward with NLG. GPT-3 has generated a lot of excitement in the past year or so, and its ability to create human-readable text that is original in a wide number of genres is astonishing. The state of the art could create a Hallmark movies plot already, and this tool will only get better as GPT-4 and other transformer models appear.

Conclusions

My hypothesis that Hallmark holiday movies tend to cluster around a set of common plots was validated. Specifically, I found:

  1. Hallmark Holiday movies have a consistent set of themes: small towns, families, career setbacks, old boyfriends, spirits and wishes
  2. Analyzing the text required standardization to avoid missing themes: man/woman/small town, etc.
  3. LDA topic modeling worked fairly well in identifying 7-8 key topics, with some overlap
  4. NLG yielded inconsistent results, with transformers pre-trained model living up to its reputation as a leap forward

Additional analyses I’d like to do would be to examine ‘plots’ as a time series. They are a sequence of events that happen in order. Adding the step-by-step flow would be an intriguing exercise.

Have a great holiday — and enjoy the movies!

Yet another new podcast? Yes!

After many years of shiftless planning and a listless lockdown, I finally put the pixel to the pointer and started a podcast! My friend Jill Royce and I co-host a weekly in-depth interview with one of the founding or influential figures in the first twenty years of advertising (and marketing) technology. That’s 1995-2015 or so … a time of tremendous innovation, excitement, ambition, posturing and fraud … a deranged double decade. So far, most of the people we’ve asked have agreed to join us — although we just started.

I’ve been touched by the support we’ve received from people who (like me) find the history of this much-maligned and underappreciated industry so fascinating. Check us out on Apple Podcasts and Spotify.

Our website is here.

And by the way — the show is called “PALEO AD TECH”

Let me know what you think! martykihn at gmail