Back to the Future: An Oral History of Microsoft & Ads

The following is an article I wrote that appeared in the mighty AdExchanger on July 19, 2022.

Tuesday, July 19th, 2022 – 6:15 am

Martin Kihn

Martin KihnAdExchanger Contributor

This article is based on interviews with participants. It was inspired by Microsoft’s supposedly surprising selection as Netflix’s ad tech partner. But driven by the acquisition of AT&T’s Xandr, that’s just the latest chapter in a breathtaking adventure of pivots, write-downs, partnerships and potential.

In the beginning were these words …

Bill Gates:

The future of advertising is the internet.

The occasion was the IAB Engage conference in London in 2005. At the time, Microsoft had MSN, an ad network and content deals with Fox, NBC and others. But it was focused on one particular upstart in Mountain View. Having lost a bid to acquire Overture, Microsoft launched its own search engine, originally code-named Project Moonshot.

Jed Nahum (director, product management, Microsoft adCenter): Google made about two times what we made on each keyword. We had this functionality which enabled you to bid for age and gender on top of keywords for search. It was our differentiator – but it wasn’t enough.

Eric Picard (director, ad tech strategy, Microsoft): Microsoft was focused on search, but Bill Gates recognized it was bigger – that ads could be another MS Office or Windows-sized business. We looked at investments in Xbox and PC gaming, video ads on Microsoft TV and Media Player and MSN Video. We looked at ads in Office. Around this time, Brian Burdick wrote a paper … that basically invented RTB.

Brian Burdick (principal group program manager, adCenter): In 2005, a couple people on my team and I wrote a proposal for an Online Listings Exchange. … We were piloting a contextual ad program that competed with [Google’s] AdSense. Microsoft had deals for content controlled by a premium display system. I realized on a drive home from work one day that if the revenue per impression between the contextual and premium systems was materialized in real time, any external third party could also participate.

Nahum: The insight of Brian’s paper was basically that what ad networks need is the User ID – like a cookie, IP header info, and a URL that corresponds to the context and location of the ad. If we could pass those three things to ad networks, they could evaluate on an impression-by-impression basis.

Burdick: Gates was super-bullish in the meeting. He had a bunch of comments. He said, “This is bold and ambitious and something we should do.” … Eventually, a lot of other teams wanted to piggyback on the idea, and our ask was for hundreds of engineers. It didn’t get approved.

Microsoft also took a look at Right Media, a pioneering exchange that allowed ad networks to bid on one another’s inventory. That meeting didn’t go so well.

Brian O’Kelley (CTO, Right Media): We went in to Microsoft to talk with a Technical Fellow. He put us through the wringer. I remember he asked us, “How many man-years did it take you to build the platform?” I said, “You’re missing the point. It’s liquidity you’re buying as much as technology.” Back then, Microsoft had swagger. I came away from Redmond feeling they were arrogant.

Right Media was later acquired by Yahoo, and Microsoft set its sights on another target, then owned by private equity firm Hellman & Friedman.

Nahum: Hellman & Friedman pitched DoubleClick to us. On my team, we were f*ing terrified. We understood the value of DoubleClick and what it would mean if it went to Google. After a low bid from Microsoft, Google and DoubleClick went into a quiet period. … We were very depressed. … Steve [Ballmer] quietly bid $3 billion, but Google threw in another $100 million to shut down the dalliance. We were left feeling burned. We were in a situation where we had to get a competitor to DoubleClick.

Picard: We left that meeting where we lost DoubleClick, and a week later Steve [Ballmer] had me and a few others in the room. He says, “This is like that scene in Animal House where Belushi rallies the troops.” And he says, “Okay, we lost DoubleClick – what else we got?”

Microsoft ended up buying aQuantive – including the agency Razorfish, the DrivePM ad network and the Atlas ad server – for $6.3 billion. On the same day, it acquired the AdECN exchange for, reportedly, somewhere between $50 million and $75 million. Bill Urschel and a rising star named Jeff Green ran AdECN.

Bill Urschel (co-founder, AdECN): They bought us and it happened pretty quickly. It was at an Ad:Tech [event] … Eric Picard and Jed Nahum came by our booth and asked all kinds of interesting questions.

Picard: We walked up and started chatting. We talked about what they’d built – it was interesting. Jed, [Microsoft GM] Joe Doran, Bill, Jeff and I had a fancy dinner and got along well. We were kindred spirits.

Jeff Green (COO, AdECN): Everyone wanted to see Microsoft do well. The AdECN strategy was to get Yahoo and AOL to join up and create a pool of liquidity that rivaled Google. But aQuantive was saying, you know, our ad server [Atlas] is better. Let’s combine it with Microsoft tech and build the world’s biggest ad network.

Comic: Microsoft Leaves The Party

Burdick: DrivePM was the internal ad network aQuantive ran for Microsoft, and it had more than 40% margin. [They] put the head of strategy of aQuantive in charge of strategy for Microsoft. … They eventually came around to the exchange model, but not in the beginning. There was resistance to the exchange, putting margins at risk.

Boris Mouzykantskii (founder, IPONWEB): I think AdECN had a chance to test real-time bidding in the market. It never happened. It’s possible, if they’d done it, Microsoft would be AdX.

Urschel: After the acquisition, on the Microsoft side there were some brilliant people who saw a vision of a bigger exchange, but they were essentially drowned out. The cash at the time was flowing from the aQuantive business, so I don’t think the exchange business ever got a serious look and didn’t get the resources.

Burdick: I went down to be CTO of AdECN. … We built the first real-time bid exchange. But between the aQuantive people and our VP, they would not let us go outside [Microsoft] for inventory. The reasons are murky to me. They just didn’t greenlight it.

Meanwhile, Brian O’Kelley had started AppNexus, originally a cloud hosting platform that became an SSP and exchange.

Brian O’Kelley (co-founder, AppNexus): My pitch to Microsoft was that they can’t fight Google in search and display. Let us be the market maker, make us the dominant exchange platform. But that would only work if you put the whole heft of Microsoft behind it – all MSN inventory – [and] make everyone buy through us. I spent a lot of time in Bellevue and got a mind meld [for] how we could beat Google. It was an incredibly strategic conversation about the future of the internet, not just about product.

Picard: I introduced Brian to [Microsoft Ads exec] Rik van der Kooi. I said, “If we’re not going to be allowed to build this internally, it’s not a bad thing to invest in another company that’s a credible competitor to Google’s ad exchange.”

O’Kelley: We made a deal where [Microsoft] gave us inventory and they got one-third of the company. … Exclusive inventory from one of the top five publishers. Over time, we delivered. Some things were not successful, like a Windows Phone integration, but Microsoft was the first fully programmatic major seller.

Nahum: After the AppNexus deal was done, we branded our instance of [as] MAX, [the Microsoft Advertising Exchange]. We couldn’t get the aQuantive guys to put inventory in AdECN, but we were able to put it into the waterfall after direct sales for AppNexus. Immediately it started making money. My team launched 35 markets internationally. We sold to aggregators of demand, to DSPs, agencies and trading desks.

Picard: It was really bittersweet. The day I decided to leave, I was in a meeting with Ballmer. He said, “I want us to shut MSN down, divest all the non-search ad business [and] the exchange and double down on paid search.” Ultimately, the team convinced him MSN was too critical – but the strategy shifted from editorial to being a portal with content from other publishers. It took about six years to fully divest the display business, until 2015.

Microsoft Press Release, July 2012: While the aQuantive acquisition continues to provide tools for Microsoft’s online advertising efforts, the acquisition did not accelerate growth to the degree anticipated, contributing to the write-down [of $6.2 billion].

AdExchanger, June 29, 2015: AOL to Absorb Microsoft’s Display Ad Business Along with 1,200 Employees, Bing to Power all AOL Search

O’Kelley: AOL’s pitch to Microsoft was [to] let us rep everything. There was tension there. [AOL CEO] Tim Armstrong and I would see each other on the street corner “growling.” Microsoft wanted AOL to choose Bing as its search engine. That was a $1 billion deal. We couldn’t beat that. How could we possibly win? We had to massively overdeliver.

David Jacobs (SVP, sales & monetization partnerships, AOL): I’d give credit to Tim Armstrong, who really leaned in, and Bob Lord who pushed the deal through. It seemed like a good thing. … It was almost like a scale play. AOL and Huffington Post were relevant properties that had legs, but this created an opportunity to take brand sales to another level. Header bidding was not mature yet.

O’Kelley: I convinced Microsoft to give AOL [some] major markets, including Japan, the US and UK, and [to] give us the rest. The deal was good for everyone. We made Microsoft hundreds of millions. That was a $30 million revenue account for us. We ran with 17 [or] 18 “demand evangelists” providing a lightweight sales model. My nightmare was AOL would drive us out of the deal.

Jacobs: It happened at a time with a lot of moving parts. AOL acquired Millennial Media around then. I was in Dulles with some Microsoft people when the deal was about to be signed – and that same day the Verizon acquisition [of AOL] was announced. … There was a lot of change management happening. It allowed Microsoft to not have to support a display ad sales team.

Comic: Wonder Twins Ad Powers Activate!

O’Kelley: There were so many of those moments. That was constant. Google was selling against us. AOL was selling against us. I used every bit of leverage to keep from losing our biggest client.

Jacobs: While not core to the deal, we would have liked to get Microsoft inventory into our SSP [from AppNexus]. Eventually, we migrated the Microsoft display inventory over to AOL’s ad server.

AppNexus was acquired by AT&T in 2018 and became the foundation for Xandr – which, in a twist of programmatic irony, was acquired by Microsoft this summer.

Brian Lesser (CEO, Xandr): Clearly there was some value there that we created, because Microsoft could have bought a lot of things, and they bought Xandr. … I think Xandr is going to be great with Microsoft.

John Cosley (senior director, brand marketing, Microsoft Advertising): We have bold ambitions, including the innovations we’ll drive with Xandr now that the deal closed – [also] continued momentum with our PromoteIQ offering, Microsoft Audience Network solution, our new measurement partnership with Roku – and ongoing innovations and market expansion for our advertisers across our search and audience network.

O’Kelley: I have mixed feelings because I wanted Microsoft to be the buyer the first time around. It felt like the right home for the company.

So, it’s a little bittersweet that they end up there now. I would have wanted to work at Microsoft. … LinkedIn is a huge asset. Activision is big. Windows is free now. There’s search, gaming – amazing ad assets. It doesn’t seem crazy that they could be successful in the ad tech business.

To be continued …

What’s Really Going on in the Privacy Sandbox?

The following column originally appeared in the mighty AdExchanger on Feb. 15, 2022

“I said ‘Hey, what’s going on?’” – 4 Non Blondes

Back in 1994, when a 23-year-old Netscape engineer single-handedly enabled third-party cookies by default, digital advertising was a $50 million business. Now it’s at $450 billion, and a lot more people are involved.

It seems they don’t agree – not just on technical issues, which can be solved, but on existential ones. Like: “Is ad targeting and measurement good for our society or not?” Or: “Is requiring a person to opt in to ‘tracking’ fair?” 

The Privacy Sandbox was launched by Google’s Chrome team in 2019 as a test bed for ideas. They chose to take their ideas to the World Wide Web Consortium (W3C). This step was not required; Apple and others regularly make changes to products on their own.

As the W3C’s hard-working counsel and strategy lead, Wendy Seltzer, admits: “We can’t force anyone to do anything. We look for places where we can help find consensus.”

And in the past month, there’s been a flurry of Sandbox-related announcements: a potential replacement for the FLoC proposal, in-market tests for measurement and attribution ideas, a new working group.

Amid all this excitement, we’d be forgiven for thinking we’re on the brink of adopting universal standards for ad targeting and measurement. Not quite. We’ve become so used to a splintered internet that the whole idea of a self-regulated World Wide Web with the same rules of engagement for everyone seems as quaint as “Do Not Track.”

Building castles in the sand

As a cooperative venture, the Web relies on the goodwill of participants to survive. The W3C and its nerdier cousin, the Internet Engineering Task Force (IETF), are certainly doing their jobs.

Despite what we think, advertising is only a small part of the W3C’s daily grind. (It almost never comes up at IETF meetings.) Sandbox ideas end up in the Improving Web Advertising Business Group (IWA-BG), the Privacy Community Group (PCG) or the Web Incubator Community Group (WICG). Only the first one is focused on ads. The IWA-BG has 386 registered participants, 62 more than the Music Notation Community Group but 14 less than the more-popular Interledger Payments Community Group.

The main work of the W3C members consists of responding to issues on GitHub and holding conference calls, which are fun to audit. They’re definitely overworked. Two weeks before last Halloween, a new group called the Private Advertising Technology Community Group (PAT-CG) launched with a lot of momentum. At the group’s first gathering, one participant made the obvious point: “Many of us are struggling to take active part in all the groups active in this space.”

Like most committees, these ones can inspire angst. Frustration could be felt in the Twitter screed of one of the PAT-CG’s champions: “The folks in this group are *hungry to make progress*.”

What is clear is that the pro-advertising contingent is fighting uphill. During a presentation to the IETF last year, a Google engineer describing the FLoC proposal felt the need to justify the project by citing academic studies about the economic impact of cookie loss on publishers. In the same meeting, an Apple engineer talking about Private Relay, which masks IP addresses (and can break things like time zone and fraud detection), felt no need to justify promoting “privacy.”

The trouble is – and this is the crux of the issue – there’s still no consensus here on a very important, foundational question: What is privacy?

There’s a team called the Technical Architecture Group (TAG) within the W3C drafting a set of “privacy principles.” These are still a work in progress with many stakeholders, and the W3C’s Seltzer said in a meeting last fall that “it’s a tough challenge to bring all those perspectives together.”

But the ultimate success of this draft or a related privacy threat model that would herd the privacy cats isn’t clear.

So, what happens now?

Given its limited objectives, the Sandbox is succeeding. The Chrome team has received a lot of feedback and is reacting. According to the latest updates, four proposals have completed or are currently in trials (Trust Tokens, FLoC, Core Attribution and First-Party Sets). At least two more will enter trials this year.

Results are mixed, but that is just how engineering works: blunt feedback and iterations. FLoC itself has flown through an initial test, a redirection and recent relaunch, and it has hatched a whole aviary of suggested improvements. Missing in all this is a promise of cross-browser, Web-wide solutions.

The impact of FLoC is instructive in another way – one that’s reminiscent of the “Do Not Track” experience. In the latter case, a member of the W3C working group, Ashkan Soltani, grew frustrated and ended up helping to draft the CCPA and CPRA regulations. (Soltani is now in charge of the California Privacy Protection Agency.)

Similarly, a vocal member of the W3C Privacy Sandbox, James Rosewell, drafted a complaint that, in part, led to Google’s agreement to cooperate with the UK’s Competition and Marketing Authority. This agreement was accepted by the CMA just before Valentine’s Day, while a coalition of European publishers filed another complaint.

Seems like, in the end, the future of the cookie may just be worked out between the parties with the power here: Alphabet and the regulators.

Follow Martin Kihn (@martykihn) and AdExchanger (@adexchanger) on Twitter.

So, you think you want untargeted ads? Think again

This article first ran in The Drum on Jan. 10, 2022

Salesforce strategist Martin Kihn gives us a real-time glimpse into a cookieless future.

What does it look like to live in a universe – or a metaverse – where ads are noticeably less relevant? Is our collective user experience really any better than what we’ve got now?

To answer this question, I conducted an experiment. I visited some of my usual websites using two different browser setups: ‘Targeted‘, with Google Chrome browser with cookies, location and IP address enabled; and ‘Untargeted‘, with Safari browser on Mac Monterey OS with location tracking turned off, browsing history cleared, and all cookies (except first-party) disabled. I also enabled a new Beta feature called Private Relay, obscuring my IP address, which can be used as a back-up ID when cookies aren’t present.

Then I took a cleansing breath, fired up the Safari browser and started surfing.

Our untargeted ad ‘FutureWorld’

Welcome to a web where nobody knows your name.

Dropping by Forbes, I’m immediately greeted by a sumptuous ad for a piece of beachfront real estate with spectacular views that do not remind me of my nearby Jones Beach, Long Island. ‘Own the Lifestyle,‘ it tells me… unfortunately, that lifestyle is in 1,300 miles away in South Beach.

Checking out a story about my man Matthew McConaughey, I see an ad for Toluna, which provides “agile consumer behavior tracking” for small and medium-sized business (SMB) owners (which I’m not) and a multi-paneled ad for Santa Teresa Rum. Now I don’t drink, but the article was about McConaughey’s whisky venture (not Santa Teresa), so I’m seeing some contextual targeting in action.

Stopping by for the latest on the Great White Way, I see an alarming ad with an older man knee-bracing a swollen limb under the headline “Bone on Bone?” Ouch. Swiss Air entices me to visit Venice and Florence… cities not actually on my Covid agenda, yet. YvesSaintLaurent lures me toward Black Opium, a perfume for women. (a repo of meme info) flatters me with an ad for Oracle NetSuite and a call-to-action to download a white paper aimed at the chief financial officer (CFO), which I am not. I’m getting a suspicion these sites have somehow ID’d me as a business guy (true) and are trying out various roles (SMB? no… how about CFO?), but this fear is allayed by the next two ads, which I don’t understand: one for something called ‘MX KEYS MINI‘ and another for a Basilisk v3 with “Full Spectrum Customizability,” which looks like a mouse powered by a tiny nuclear reactor. (It’s for gamers, which I’m not.)

Dropping by Adweek, I’m invited to explore DisneyTech, a job site for Disney (not looking)… and Swiss Air again, this time trying to get me to go to Switzerland, which is probably lovely this time of year.

Toddling back to Forbes to recheck some fine points in the McConaughey story, I enjoy different ads for Ralph Lauren eyewear, modeled by a woman who looks like her kale wilted; and Cosabella Petite 28A to Ultra Curvey 36L inviting me to feel great in “your everyday bralette,” a word I’ve never seen before.

Finally, I’ll mention that Taboola ‘outstream‘ ads, at the bottom on the page, made up in entertainment value what they lacked in relevance. On I saw one with the headline “${city:capitalized} Seniors Are Living Good In These Incredible …”

Which is one way to deal with a lack of location data, I suppose.

Back in the normal

My future world is alarming and tragic. I feel as though these poor publishers are basically rolling a set of pixellated dice, hoping to interest me in something… anything. Almost none of the ads have a prayer of converting me to anything.

Going back to the familiar world, I prime the pump by visiting my boys at to check out the new line of Boss x NBA Knicks-branded athleisure, and of course the Cadillac Escalade 4WD Sport Platinum to wear it in, knowing full well what comes next.

Nor am I disappointed, feeling as though I am falling into a warm bath of relevance and recognition that is comfortingly repetitive, like Top 40-radio. For reader, I suddenly saw a lot of ads for Hugo Boss x NBA athleisure (although not for my Escalade, probably because supplies are limited these days).

Visiting Forbes again, I see ads for the Teaching Company (I’m a customer), the Joyce Theater (ditto) as well as ads for direct competitors of my employer and for my employer itself. Capital One rotated some ads touting their “ML for Causal Analysis,” which is something I actually understand. And there were ads for mid-cap stock funds and SurveyMonkey research instruments, both of which I’m considering.

Over on CNBC, I am flattered to see the site has obviously mistaken my browser for that of a much richer man: there is an inspiring banner urging me to ‘Own Your Sky,‘ trying to sell me a jet.

At Adweek and BroadwayWorld and so on I notice a very familiar and similar ad experience, proving that programmatic advertising really does target the browser and not the publication. It works as advertised. Most of the ads are retargeted, some are competitors of brands I use, and others are just categorically appropriate things that people in my age and income bracket might buy (cars, funds, supplements).

Above all, it is a world that I recognize.

So, what did we learn about these colliding worlds?

My experiment is anecdotal, but it did surprise me in four ways:

Publishers aren’t adept at handling users with no IDs. There were far fewer contextual ads than expected and more low-awareness (and presumably low-bidding) advertisers filling space.

Retargeting is definitely overused. It has a role as a reminder and incentive to act but quickly devolves into negative returns for the brand.

We consumers are kidding ourselves if we think advertisers “track your every move” and know everything about us. If that were true, targeting would be a lot better than simply retargeting.

And finally, the untargeted experience is truly awful. Nobody could possibly want it: not advertisers, publishers or consumers. If it wins, the open web won’t have a chance.

For all concerned, there has got to be a compromise on the continuum of privacy and relevance. Let’s make that a New Year’s resolution.

Martin Kihn is senior vice president of strategy at Salesforce.

Can a Computer Write a Hallmark Holiday Movie?

The following post originally appeared on the NYC Data Science Academy blog on Sept 28, 2021. This project was my capstone submitted for my data science coursework. It was not sponsored, endorsed or even noticed – so far as I know – by the mighty Hallmark network.

As the holidays approach, many of us eagerly await a new crop of Hallmark Holiday movies: positive, reassuring, brightly-lit confections that are as sweet and reliable as gingerbread. Part of their appeal is a certain implicit formula — a woman in a stressful big city job goes home for the holidays, falls for a local working man, and realizes she’s missing out on life.

Small towns, evil corporations, a wise older woman … there are recurring motifs that made me wonder if I could apply machine learning to the plots of these movies to understand (1) what the formulas are; and (2) if a computer could write a Hallmark movie (for extra credit).

NLG has been tried on Christmas movies before, and the results were quite funny. Perhaps I could do better.

My initial hypothesis was that there are a certain (unknown) number of plot types that make up the Hallmark Holiday movie universe, and that these types could be determined from a structured analysis of plot summaries. Because the stories seemed formulaic, they could potentially be auto-generated using natural-language generation (NLG) methods, which were new to me.

Assembling the Data Set

Step one was to pull a list of titles. The Hallmark Channel has been running original movies with a Christmas theme for a quarter century, although the rate of production skyrocketed in 2015 as they became very popular. Pandemic production issues slowed the pipeline slightly in 2020, but the pace remains rapid.

Although 1995-2010 saw fewer than five original titles a year, the years 2018-2021 saw almost 40 each year. It’s quite a pipeline.

Luckily, there is a list of original Hallmark production titles on Wikipedia, which I was able to scrape using Scrapy. Holiday movies aren’t distinguished from others, so there was some manual selection in cutting the list. Once I had my titles, I was able to use the API for The Movie Database project (TMDB), which maintains information about films and TV shows, to pull the ‘official’ plot summaries.

There were 260 plot summaries in my corpus. The summaries ranged in length and style, and their lack of standardization and detail caused some challenges in the analysis. However, short of watching all the movies and building my own summaries, the TMDB summaries (which were provided by the network, I assume) were my data.

My intended audience was writers, producers and TV execs who want to understand how the Hallmark Holiday genre works and the elements of a successful production slate. These popular movies could also be used to inform other narrative projects with similar valence.

Of the 260 summaries, all but two were Christmas movies. Many summaries were disappointingly brief and generic, but many were better than that. There were about 15,000 words in total in the final data set.

Here’s a typical example of a summary for the film “Christmas Town,” starring the adorable Hallmark-ubiquitous Candace Cameron Bure:

Lauren Gabriel leaves everything behind in Boston to embark on a new chapter in her life and career. But an unforeseen detour to the charming town of Grandon Falls has her discover unexpected new chapters – of the heart and of family – helping her to embrace, once again, the magic of Christmas.

Over the years, the stories and themes of the Hallmark Holiday films changed, as the network nosed around and then settled on a set of typed tropes. For example, earlier films used Santa as a character more often and spirits as worthy guides for the heroine. By 2015 or so, Hallmark had found its soul: small towns, high-school boyfriends, family businesses threatened by big corps, and so on.

Feature Engineering

After lemmatizing and tokening, removing stopwords and other standard text preprocessing, I realized that the corpus would have to be standardized to gain insight into its themes and to provide training data for any NLG model. For example, the summaries had names for characters, but those names didn’t matter to me – I just cared that it was <MALE> or <FEMALE> (for the main characters), or <CHILD> or <SIBLING> or <PARENT> or <GRANDPARENT> with respect to the main character. Often there was also a <BOSS>.

(If you’re curious, the most common names for characters mentioned in the corpus were: Jack, Nick and Chris.)

Likewise, towns were often named, but my only interest was that it was a <SMALLTOWN>, or (in the case of those bustling metropolises our heroines liked to leave in the beginning of the story) <BIGCITY>. And the evil big corporation might be named, but I wanted to tokenize it as <BIGCORP>.

Note the <BRACKETS> which would indicate to the model that these were tokens rather than the words originally in the corpus. How to make the substitutions without a lot of manual lookups?

I ended up using Spacy to tag the parts of speech. Although it requires some computer cycles, Spacy is a great NLP library that will tag each word by its part of speech, including place names, personal names and proper nouns. The tags themselves are then accessible to a Python script as part of a dictionary-lookup substitution.

In the case of character names, I was able to tag them using Spacy and then run them through Genderize to get a likely gender. This doesn’t always work, as viewers of “It’s Pat” on Saturday Night Live know, but a quick scan let me correct mistakes.

I could also automate much of the <TOKENIZATION> using dictionary substitutions. For example, I could find instances of “L.A.” and “Los Angeles” and “New York City” and so on and substitute <BIGCITY>. However, a careful manual check was needed to do some cleanup.

In the end, I had a corpus of 260 plots with major character, location and relationship types <TOKENIZED>.

Frequency Analysis & Topic Modeling

Word frequencies were high for terms such as ‘family’, ‘child’, ‘help, ‘love’, ‘parent’, ‘small town’. This agreed with my personal memories of the films — i.e., an abiding emphasis on families, home towns, and positive mojo.

Bigrams and trigrams (common two- and three-letter combos) uncovered even more of the Hallmark spirit than word frequencies. Among bigrams, the most common were ‘high school’, ‘fall love’, and ‘return hometown’. Common trigrams were ‘high school sweetheart’, ‘old high school’ and ‘miracle really happens’.

It is possible just to look at the common trigrams and get a very good feel for the alternate reality that is the mini-metaverse of Hallmark Holiday films.

The heart of my NLP analysis consisted of LDA topic modeling, using the Gensim library. Latent Dirichlet Allocation (LDA) is a statistical method that takes a group of documents (in our case, plot summaries) and models them as a group of topics, with each word in the document attached to a topic. It finds terms that appear together (frequency) and groups them into “topics” which can be present to a greater or lesser degree in each particular document.

Often used for categorizing technical and legal documents, I thought it could be used to find the different holiday themes I detected in the plot summaries.

First, I did a grid search for parameters using “coherence score’ as the target variable to maximize. The purpose of this search was to find a likely number of distinct topics, or plot types. I guessed there were 5-10, and this hyperparameter tuning exercise indicated that 8 topics appeared to be the most likely best fit.

Training the topic model on the plot summaries, I generated 7-8 distinct topics, with some overlap in words, as expected. These topics were analyzed using pyLDAvis, which allows for interactively probing the topics and changing some parameters to make them easier to interpret. (Figure 4 shows the pyLDAvis interactive view.)

Here some manual work — call it ‘domain knowledge’ (e.g., watching the movies) — was needed. I tagged the plots with the topics and focused on those that clearly fell into one topic or another. I then came up with a rough summary of these plots and gave that ‘theme’ a name. The manual tagging was needed because the theme name itself often didn’t actually apear in the summaries.

The 8 Types of Hallmark Holiday Movies

The 8 themes I ended up identifying, along with my own names and sketches, were:

  1. SETBACK: Disappointed in work/love, a woman moves to a small town to heal/inherit
  2. BOSS: A cynical businessman hires a spunky woman for holiday-related reason (like planning a party)
  3. MIXUP: A travel mixup/storm forces some incompatible people to work together
  4. ALT-LIFE: A wish upon Santa/spirit is granted and a woman is shown an alternative life — often, this involves time travel
  5. TAKEOVER: A big corporation threatens a family-run business in a small town
  6. RIVALS: Two seemingly incompatible rivals are forced to work together for some goal
  7. IMPOSTER: Dramatic irony: Someone lies about who they are — or gets amnesia and doesn’t know who they are
  8. FAMILY/CRISIS: A woman is forced to return home because of a family crisis

As usual with LDA, there was some overlap among the themes. In particular, #1 co-occured with others often; it started the story moving. For example, the heroine might suffer a SETBACK at work which encourages her to go back home (#1), and she encounters a MIXUP on the way (#3) that lands her in a delightful small town (this is the plot of “Christmas Town”).

Interestingly, when I looked at the distribution of themes over the course of the Hallmark seasons, they were fairly evenly present. This made me think the producers at the network are well aware of these themes and seek to balance them to avoid repetition.

Text Generation Using Markov Chains, LSTM and GPT-2

As an experiment, I looked at three different methods of generating text, the idea being to use the plots as training data for a model that would generate an original plot in the style of the others. Text generation or NLG is an emerging field that has made amazing strides in recent years – as I discovered – and has developed uncanny capabilities. I was only able to touch the surface in my work.

I began with traditional text generation methods, which were hardly magical.

Markov Chains were the most intuitive: they use the corpus and predict the next word (word-by-word) based on a distribution of the next words seen in the training data. Because it’s at the word-by-word level (not larger chunks of text) — at least, the way I implemented it — the results were coherent only in very small sequences. Overall, it didn’t work to put together sentences or stories that made sense.

Figure 6 shows a few examples of text generated in this way.

Long Short-Term Memory (LSTM) is a form of recurrent neural network (RNN) AI model. They were created as a way to solve RNN’s long-term memory problem, as RNN’s tend to forget earlier parts of a sequence (e.g., of text) due to a vanishing gradient. They also make predictions at the word level based on weights derived in the training stage.

Training was done over 6 epochs and 100-character sequences using ‘categorical cross-entropy’ as the loss function. It took about two hours on my average-powered setup, so it’s time-intensive. Longer training would improve disappointing results. (See Figure 7.)

Frankly, LSTM was a misfire. It required a great deal of training and although I did train for a few hours, my results were coherent only for a short (half-sentence) of text. More training might have helped, but I was more interested in moving on the ‘transformers’, the current state of the art for NLG.

GPT-2 — this is an open source version of the OpenAI transformers models. It’s pretrained on vast amounts of text data, giving it a very good basic model of English text. (GPT-3 — which is much better at NLG — is not available open source and I could not get access.) Training GPT-2 using the plots, I was able to ‘direct’ it toward my particular genre. The results were much more coherent than the other methods, while still falling short of useful new plots. (See Figure 8.)

To implement, I used the Transformers library provided by Huggingface/PyTorch, pretrained on data from the web. I trained the model for 50 epochs in batches of 32 characters (about 6 words).

Clearly, transformers are the way forward with NLG. GPT-3 has generated a lot of excitement in the past year or so, and its ability to create human-readable text that is original in a wide number of genres is astonishing. The state of the art could create a Hallmark movies plot already, and this tool will only get better as GPT-4 and other transformer models appear.


My hypothesis that Hallmark holiday movies tend to cluster around a set of common plots was validated. Specifically, I found:

  1. Hallmark Holiday movies have a consistent set of themes: small towns, families, career setbacks, old boyfriends, spirits and wishes
  2. Analyzing the text required standardization to avoid missing themes: man/woman/small town, etc.
  3. LDA topic modeling worked fairly well in identifying 7-8 key topics, with some overlap
  4. NLG yielded inconsistent results, with transformers pre-trained model living up to its reputation as a leap forward

Additional analyses I’d like to do would be to examine ‘plots’ as a time series. They are a sequence of events that happen in order. Adding the step-by-step flow would be an intriguing exercise.

Have a great holiday — and enjoy the movies!

Yet another new podcast? Yes!

After many years of shiftless planning and a listless lockdown, I finally put the pixel to the pointer and started a podcast! My friend Jill Royce and I co-host a weekly in-depth interview with one of the founding or influential figures in the first twenty years of advertising (and marketing) technology. That’s 1995-2015 or so … a time of tremendous innovation, excitement, ambition, posturing and fraud … a deranged double decade. So far, most of the people we’ve asked have agreed to join us — although we just started.

I’ve been touched by the support we’ve received from people who (like me) find the history of this much-maligned and underappreciated industry so fascinating. Check us out on Apple Podcasts and Spotify.

Our website is here.

And by the way — the show is called “PALEO AD TECH”

Let me know what you think! martykihn at gmail

Do robots belong on your copywriting team?

This article originally appeared in The Drum on 2/26/21

When I’m thirsty, I go with water. When I’m hungry, I drink beer.

It wasn’t me who made the decision. It was the people on Reddit!

Wow no cow, no beef. This was so good it even tasted like bacon.

Imagine for a moment you are in a creative brainstorm, and a junior copywriter swoops in bravely with the above. You might pause for a moment, inhale, and say, “it’s a start.”

Now what if I told you that copywriter was a machine who had been given a specific prompt (in italics) based on recent spots from Super Bowl LV? Well, it was a machine.

Those lines – and dozens less sensible – were generated on my MacBook Pro using a pretrained open-source natural language A.I. model called GPT-2, built by the Elon Musk co-founded OpenAI. It was “steered” by a list of words taken from Super Bowl ads using another open source code library called PPLM, built by Uber Engineering.

Loading and learning the models took about an hour. And given a few-word prompt, GPT-2 happily takes about five minutes to churn out 20 “ideas,” without breaking for lunch.

Text generation – or robo-writing – has made startling leaps in the past few years, moving from punchline to something that may deserve a seat in the creative lounge. Believe it or not, the best robo-writers are almost the equivalent of that most annoying/wonderful phenomenon: the eager beginner, completely inexhaustible but creatively uneven.

Most of GPT-2’s “ideas” were not quite ready to be presented; some were nonsensical. Oddly, it had no clue what to do with the prompt “Jason Alexander.” And one of its “Wow no cow” completions was “Wow no cowbell can be quite like the best in the universe.”

Which is probably true and not at all helpful.

In the near future, the smartest creative teams will be those that can use A.I. writers in productive ways, as a computer assist to a creative session and a source of ideas that might spark better ones.

One trillion parameters

At first, GPT-2’s creators were so afraid of its power falling into the wrong hands that they were reluctant to release it. They relented and rely now on academic partnerships to limit bad actors like cyber-propagandists. Although not open source, GPT-2’s successor — called GPT-3 — is available to try on application as an API. The full model was recently licensed to OpenAI’s major investor, Microsoft.

GPT-3’s largest setting has 175 billion parameters. Think of these as individual knobs that the model has to tune, based on human writing samples, in order to predict the next word in a sentence. Google just open-sourced an even larger text model called Switch Transformer that reportedly has more than 1 trillion parameters.

The human brain has about 3 trillion synapses. Let’s leave that right there.

GPT-3 takes GPT-2 out of the sandbox and sends it to middle school. Give the model a prompt (e.g., “Once upon a time…” or “It wasn’t me…”), and it can continue at some length, generating text that is often plausible and sometimes uncanny. Early testers ranged from awed to skeptical – and often both.

Hype grew so heated last summer that OpenAI’s chief executive Sam Altman took to Twitter to reassure humanity that GPT-3 “still has serious weaknesses and sometimes makes very silly mistakes.”

The most angst issued not from writers and poets — who are depressed enough already — but ironically enough from computer programmers. It turns out that computer code is also a language that GPT-3 likes to write.

In fact, that’s what makes this new generation of robo-writers different: they are flexible models, verging into the space called artificial general intelligent (AGI). This is the kind of intelligence we have: not pre-trained in any particular discipline but capable of learning. GPT-3 seems to perform well on a range of language tasks, from translation to chatting to impressing electronic gearheads.

Ad copywriting isn’t such a leap. As a tool to build creative prompts from catch phrases ready for human filtration, so-called Transformer models make a lot of sense.

From the refrigerator to the gallery

Even as AI agents got noticeably better at diagnosing fractures and targeting drones, their creative efforts were conspicuously weak. Robot “art” looked like it belonged on a refrigerator in a garage, and robot “poetry” not even on a prank e-card. This is changing.

Robo-writers are already employed in shrinking newsrooms. So far, they’re mostly stringers on the high-school sports reporting, weather and stock market desks — churning out endless Mad Lib-style pieces in routine formats about games and finance that no sane journalist would want to write, even for money.

Computers thrive at tedium. It’s their métier. The for-profit OpenAI takes a brute force approach to its innovation. Two years ago, it gained notoriety for developing a machine that could beat the world’s best players of a video game called Dota 2. It did this by having software agents play the equivalent of 45,000 hours of games, learning by trial-and-error.

The GPT family of tools were also developed by pointing software agents at a massive corpus of data: in GPT-3’s case, millions of documents on the open web, including Wikipedia, and libraries of self-published books.

GPT-3 is exposed to this mass of human-written text and builds a vocabulary of 50,000 words. Its model’s weights predict the next word in a sequence given the words that came before – and develops meta-learning beyond simply memorization. It requires a prompt and can be guided by sample text that provides a “context,” per the Super Bowl examples above.

It’s a trivial matter to drop a prompt like “Car insurance is …” into the GPT-3 Playground, tweak a few toggles, and generate snippets of sensible prose. It’s not much harder to guide the model with a sampling of, say, action movie and comic book plots and generate stories at least as coherent as those of some recent superhero movies.

To answer the obvious question, it can be shown that GPT-3’s creations are not just plagiarism. But are they pastiche? The model learns to predict words based on its experience of what others have written, so its prose is predictable by design. But then again, isn’t most writers?

Limitations include a “small context window” – after about 800 words or so, it forgets what came before – so it’s better at short pieces. It has a short attention span, matching our own.

For this reason, people who have spent more time with the model grow less impressed. As one said: “As one reads more and more GPT-3 examples, especially long passages of text, some initial enthusiasm is bound to fade. GPT-3 over long stretches tends to lose the plot, as they say.”

But ad copywriting isn’t about sustaining an argument or writing a book. It’s about trial and error and inspiration. Increasingly, that inspiration may be coming from a robot.

Building a Plotly Dash on the Ames Housing Dataset

Home Sweet Homes

This post originally appeared on the blog of the NYC Data Science Academy, where I am a student. We were assigned the Ames Housing dataset to practice Machine Learning methods such as Lasso, Ridge and tree-based models. I added the Plotly Dash out of sheer exuberance.

The Ames Housing dataset, basis of an ongoing Kaggle competition and assigned to data science bootcamp students globally, is a modern classic. It presents 81 features of houses — mostly single family suburban dwellings — that were sold in Ames, Iowa in the period 2006-2010, which encompasses the housing crisis.

The goal is to build a machine learning model to predict the selling price for the home, and in the process learn something about what makes a home worth more or less to buyers.

An additional goal I gave myself was to design an app using the Plotly Dash library, which is like R/Shiny for Python. Having been a homeowner more than once in my life, I wanted a simple app that would give me a sense of how much value I could add to my home by making improvements like boosting the quality of the kitchen, basement or exterior, or by adding a bathroom.

Conceptually, I figured I could build a model on the data and then use the coefficients of the key features, which would probably include the interesting variables (e.g., kitchen quality, finished basement). These coefficients can predict what a unit change in a feature would do to the target variable, or the price of the house.

Data Preparation

One advantage of the Ames dataset is that it’s intuitive. The housing market isn’t particularly hard to understand, and we all have an idea of what the key features would be. Ask a stranger what impacts the price of a house, and they’ll probably say: overall size, number of rooms, quality of the kitchen or other remodeling, size of the lot. Neighborhood. Overall economy.

To set a baseline, I ran a simple OLS model on a single raw feature: overall living space. My R^2 was over 0.50 — meaning that more than half of the variance in house price could be explained by changes in size. So I didn’t want to overcomplicate the project.

The data had a lot of missing values. Many of these were not missing at random but rather likely indicated the feature was not present in the house. For example, Alley, PoolQC and Fence were mostly missing; I imputed a ‘None’ here. Other values needed special treatment.

LotFrontage was 17% missing. This feature turned out to be closely related to LotShape, with IR3 (irregular lots) having much higher LotArea values. So I imputed values based on LotShape.

Lot Frontage

There were a lot of categorical values in the data which I knew I’d have to dummify. Before going there, I looked into simplifying some of them. ‘Neighborhood’ seemed ripe for rationalization. It was clear (and intuitive) that neighborhoods varied in house price and other dimensions. There didn’t seem to be a pattern of sale price and sale volume by neighborhood (except for NorthAmes), but Neighborhood and Sale Price were obviously related.

Now I know it isn’t best practice to create a composite variable based only on the target, so I created a new feature called “QualityxSalePrice” and split it into quartiles, grouping Neighborhoods into four types.

There were many features related to house size: GrLivArea (‘Graded’ or above-ground living area), TotalBsmtSF (basement size), 1FlrSF, even TotRmsAbvGrnd (number of rooms). These were — of course — correlated, as well as correlated with the target variable. Likewise, there were also numerous features related to the basement and the garage which seemed to overlap.

To simplify features without losing information, I created some new (combined) features such as total bathrooms (full + half baths), and dropped others that were obviously duplicative (GarageCars & GarageArea).

Looking across both continuous and categorical features, there were a number that could be dropped since they contained little or no information; they were almost all in one category (e.g., PoolArea, ScreenPorch, 3SsnPorch and LowQualFinSF).

Finally, I dummified the remaining categorical features and normalized the continuous ones, including the target. The sale price (target) itself was transformed via log, to adjust a skew caused by a relatively small number of very expensive houses.

Data Exploration & Modeling

Everybody knows (or think we know) that it’s easier to sell a house at certain times of year, so I looked at sale price by month and year. These plots showed a peak in the fall (surprisingly, to me) as well as the impact of the 2008 housing crisis.

Correlation for the continuous variables showed a number aligned with the target variable, and so likely important to the final model. These included the size-related features, as well as features related to the age of the house and/or year remodeled.

Because my target variable was continuous, I focused on linear regression and tree-based models. I ran OLS, Ridge, Lasso and ElasticNet regressions, using grid search and cross-validation to determine the best parameters and to minimize overfitting.

The error functions for these models on the test data set were all similar. The R^2 for Ridge, Lasso and ElasticNet were all in the range 0.89=0.91. The significance test for the OLS coefficients indicated that a number of them were very significant, while others were not.

I also ran tree-based models Gradient Boosting (GBM) and XGB (also gradient boosting) and looked at Feature Importances. The tree-based models performed similarly to the linear regression models, and pointed to similar features and being significant. Of course, this is what we’d expect.

In the end, the most important features across all the models were: OverallQual (quality of the house), GrLivArea (above-ground size), YearRemodAdd (when it was remodeled), NeighType_4 (high-end heighborhood), and the quality of the kitchen, finished basement and exterior. If nothing else, the model fits our intuition.

Feeding the features through the model one by one, additively, it became obvious that the most oomph came from 20-25 key features, with the rest more like noise.

Plotly Dash App

Settling on a multiple linear regression as the most transparent model, I ended up with an array of 23 features with coefficients and an intercept. To use them in my app, I had to de-normalize them as well as the target.

The app was aimed at a homeowner who wanted to know the value of certain common improvements. The coefficients in my linear model were the link here: each coefficient represented the impact of a UNIT CHANGE in that feature on the TARGET (price), assuming all other variables stayed the same. So I just had to come up with sensible UNITS for the user to toggle and the impact on the TARGET (price) was as simple as multiplying the UNIT by the COEFFICIENT.

Since I was building an app and not trying to win a data science prize, I focused on the significant features that would be most interesting to a remodeler. Some things you can’t change: Neighborhood and Year Built, for example.

But some things you can: in particular, features related to Quality would be in the app. These were Basement, Kitchen and Exterior Quality. Other features could be changed with significant investment: Baths (can add a bath), Wood Deck (can add a deck or expand one), and even house size (since you can — if you’re crazy and rich — tack on an addition somewhere).

I also included a couple Y/N features since they could affect the price: Garage and Central Air.

Plotly Dash

I knew Plotly as a way to create interactive plots on Python, but the Dash framework was new to me. It was introduced to me by one of our instructors and an active online community and examples also came to my aid.

Dash is similar to R/Shiny in that it has two key components: a front-end layout definer, and a back-end interactivity piece. These can be combined into the same file or separated (as they can in Shiny).


The general pseudo-code outline of a Dash App starts with importing the libraries and components. It’s written in Python, so you’ll need numpy and pandas as well as Plotly components for the graphs.

Dash converts HTML tags into HTML when they appear after “html.”, which is straightforward. I included Div tags, H2 (headers) and P (paragraph) tags. Different sections of the Div are called “children” (as they are in HTML). I used “children” here because I wanted to have two columns — one for the inputs and the second for the graph of adjusted house prices.

The rows and columns can be done in a couple different ways, but I used simple CSS style sheet layouts (indicated by “className”).


Interactivity is enabled by the layout and the callbacks. In the layout, as in Shiny, I could specify different types of inputs such as sliders and radio buttons. There aren’t as many or as attractive a selection as you find in Shiny, but they get the job done. I used sliders and buttons, as well as input for the avg starting price.

The functional part of the app starts with “@app.callback,” followed by a list of Outputs (such as figures/charts to update) and Inputs (from the layout above). The way Dash works, these Inputs are continually monitored, as via a listener, so that any change to a slider or other input will immediately run through the function and change the Outputs.

Right after the @app.callback section, there’s one or more functions. The parameters of the function represent the Inputs from @app.callback, but they can have any name. The name doesn’t seem to affect their order, since they’re read in the same order as the callback Input list.

This function includes any calculations you need to do on the Inputs to arrive at the Outputs, and if the Output is a graph or plot — as it almost always is — there’s also a plot-rendering piece.

In my case, I wanted the output to be a normal distribution of potential new values for the sale price, given the changes in the Inputs. For example, if someone moved the “Overall Quality” slider up a notch, I needed to generate a new average saleprice, a difference (from the baseline) and a normal distribution to reflect the range.

I did this by turning each coefficient into a factor, adding up the factors and multiplying by current sale price. I then generated a random normal distribution using Numpy with the new target as the mean and a standard deviation based on the original model.

The final dashboard looked like this:


Granted, there are a caveats around the app. It’s based on the Ames housing dataset, so other areas of the country at different times would see different results. It requires estimating a ‘starting price’ that assumes the default values are true, and this estimate might be difficult to produce. But as a proof of concept, it has potential.

I think there’s definitely a data-driven way to estimate the value of some common improvements. This would help a lot of us who are thinking of selling into the housing boom and wondering how much we can reasonably spend on a kitchen remodel before it turns into red ink.

PepsiCo Launched Two Consumer Ecommerce Sites in 30 Days — Here’s What We Can Learn From It

The following article first appeared on the Salesforce Marketing blog in January.

Last May, at the height of the COVID pandemic’s first wave, PepsiCo raised some bubbles by launching not one but two websites where consumers could browse and purchase a selection of the company’s more than 100 widely munched snack and beverage brands. At a time when many outlets were closed and people ordered more food online, the company quickly made it easy for consumers to buy directly.

One ecommerce site,, took a lifestyle-themed approach, offering bundles of products in categories such as “Rise & Shine” (Tropicana juice, Quaker oatmeal, Life cereal) and “Workout & Recovery” (Gatorade, Muscle Milk, and Propel electrolyte-infused water). A second site,, offered a more straightforward lineup of Frito-Lay brands such as Lay’s, Tostitos, Cheetos, dips, nuts, and crackers.

These platforms complement PepsiCo’s retailer channels to ensure the company continues to deliver consumers their favorite products on the right platform, in the right place, at the right time. 

Whenever, wherever, however has been the mantra of ecommerce digital marketing for a while, but it’s become more important in the age of COVID-19.

Whenever, wherever, however has been the mantra of ecommerce digital marketing for a while, but it’s become more important in the age of COVID-19. Most of Salesforce’s customers – particularly those in high-velocity consumer categories such as consumer packaged goods (CPG), retail, and restaurants – have had to become extraordinarily flexible over the past 10 months to adapt to ever-changing local guidelines and consumer behaviors.  

What was most striking about PepsiCo’s foray into direct-to-consumer (D2C) commerce and marketing was its speed. “We went from concept to launch in 30 days,” said Mike Scafidi, PepsiCo’s Global Head of Marketing Technology and AdTech. “Within 30 days, we stood up our ecommerce capabilities and started delivering direct to our consumers.”

PepsiCo’s products are consumed more than a billion times daily in 200 countries. It boasts 23 brands with more than $1 billion in annual sales. How does such a large and complex global company pull off such impressive footwork?

The answer, Scafidi said, was preparation. “Digital is inherently disruptive,” he explained. “We’ve been training for this for a long time. We’ve been preparing to adapt to disruption for 20 years.”

Planning for change is a skill

Scafidi and I spoke from our remote locations about “Building Resilience and Adapting Your Marketing Tech in Uncertain Times” during Ad Age’s CMO Next conference, co-hosted by Ad Age’s Heidi Waldusky. Scafidi stressed that tumultuous times took PepsiCo back to basics — inspiring the company to lean on skills it had been developing for years — especially in consumer research and media measurement.

Part of the reason PepsiCo was able to launch and so quickly, he said, was “we were leaning on what we were doing already.”

He reminded me of an analyst quote I read recently on embedding resilience into sales and marketing plans: “[T]he more an organization practices resilience, the more resilient it becomes.”

Over the past year, many organizations have had plenty of time to practice being resilient. As stores shut down and millions huddled at home, there was a surge in digital activity across all channels. Media consumption soared: people around the world watched 60% more video, for example. And they shopped. Salesforce’s latest Shopping Index shows that comparable online sales were up 55% in Q3 of last year after climbing 71% in Q2.

We’ve heard from many of our customers that they needed to launch new capabilities faster than ever before. Otherwise, they’d lose business. Curbside pickup, buy online, pickup in store, expanded digital storefronts, appointment scheduling, contact tracing – the list goes on.

Our desire to help customers adapt to rapid digitization inspired us to launch Digital 360, a suite of ecommerce digital marketing products combining marketing, commerce, and personalized experiences under a single umbrella. With it, Salesforce Trailblazers like Spalding and Sonos were able to scale their online commerce dramatically, making up some of the shortfall in brick-and-mortar sales.

When times are changing, it’s too late to build up basic skills. If you have a foundation in place, that allows you to adapt.


Unilever also faced dramatic market shifts in the recent past. Keith Weed, the company’s chief marketing and communications officer, pointed out back in 2018 that the pace of change “will never be this slow again” – not knowing just how fast that pace would get. And like PepsiCo, Unilever met the hyperfast present by relying even more on its customer research skills.

“We know that people search [online] for problems, not products,” Weed said. So the company created, which offers detailed solutions to cleaning problems in 26 languages. Built before COVID-19, the site was ahead of its time and has attracted 28 million visitors to date.

Building a foundation that scales to meet customers wherever they are

When times are changing, it’s too late to build up basic skills. “If you have a foundation in place, that allows you to adapt,” Scafidi said.

For example, the PepsiCo team was able to rapidly restructure its internal media measurement analyses because it had already put in the work to develop an ROI Engine, which helped determine the real impact of its advertising, promotions, and email. The ROI Engine automates data inputs, processing, and algorithms to improve paid media optimization decisions. Combining the ROI Engine with a customer insight capability called Consumer DNA, “We were able to stabilize our understanding of the consumer and adapt to where they were,” Scafidi explained.

PepsiCo’s Consumer DNA project is an example of a custom-built tool that allows the company to gain a 360-degree view of the consumer to enable more effective and targeted media buying and marketing activation.

At Salesforce, we help our customers engage with their customers. To do this in 2020, we too relied on core skills, built up over years, to adapt to an environment that seemed to change by the second. The result was launches like Digital 360 and The latter helps companies safely reopen their workplaces. We also introduced a customer data platform (CDP) called Customer 360 Audiences, which serves as a single source of truth to build a unified profile for marketers and others.

The Ancient Greek philosopher Heraclitus said, “the only constant in life is change.” As customers like PepsiCo show us, the best way to adapt is to build core skills that can help you pivot quickly in the future.

Customer Data Platforms: Soon to be a Major Motion Picture?

Cover (not actual size)

Happy to announce that the book I co-wrote with Chris O’Hara on Customer Data Platforms has just been published by Wiley and is now available at fine booksellers everywhere such as this one.

Helpfully, the title is “Customer Data Platforms: Use People Data to Transform the Future of Marketing Engagement“.

Note the key phrase: Customer Data Platforms (CDP). Only the hottest mar-tech category to appear in at least a decade, and we literally wrote the book on it. At least, the first substantial mainstream book on this key topic. We cover the category from data integration and identity management to exploration, activation and A.I.

It’s accessible and a quick read, full of charming illustrations and architecture drawings. If you’re interested in this fast-growing tech category that points the way toward the converged platform future, our book is a great place to start.

Happy launch-day, my friends! Here’s to a more integrated, privacy-friendly and analytical future! The future is bright. peace mk

I Have Seen the Future of Measurement … and It Is Messy

The following column originally appeared in the mighty AdExchanger on 11/10/20

Measurement is a footnote – outglamorized by targeting and the opera of browsers, pushed into the corner during debates about the future of media. But it’s arguably more important than aim and requires more discipline.

A few weeks ago, the World Federation of Advertisers (WFA) released a statement and a technical proposal for a cross-media measurement framework. It was the outcome of a yearlong global peer review and an even longer discussion among participants including Google, Facebook, Twitter, holding companies, industry groups such as the ANA and MRC and large advertisers including Unilever, P&G and PepsiCo.

Reactions ranged from enthusiastic to less so, but few people seem to have read more than the press release. After all, it’s not a product and could be yet another in a parade of grand ambitions in online-offline media measurement, dating back to Facebook’s Atlas.

But it describes a realistic scenario for the future of measurement. Sketchy in spots, the WFA’s proposal is ironically the clearest screed of its kind and is worth a closer look.

To be sure, this is a project focused on solving a particular problem: measuring the reach and frequency of campaigns that run on both linear TV and digital channels, including Facebook, YouTube and CTV. In other words, the kinds of campaigns that cost participating advertisers such as P&G a reported $900 billion a year.

And P&G’s own Marc Pritchard is on record calling the proposal “a positive step in the right direction.”

The need is clear. Advertisers today rely on a patchwork of self-reported results from large publishers, ad server log files, aggregate TV ratings data and their own homegrown models to try to triangulate how many different people saw their ads (reach), how often (frequency) and how well those ads fueled desired outcomes, such as sales lift.

The latter goal is acknowledged in the current proposal, which doesn’t try to solve it. But the WFA, building on previous work from the United Kingdom’s ISBAGoogle, the MRC and others, lays out a multi-front assault on reach and frequency that covers a lot of ground.

How does it work?

The proposal combines a user panel with census data provided by participating publishers and broadcasters, as well as a neutral third-party data processor. The technical proposal spends some time talking about various “virtual IDs” and advanced modeling processes that are loosely defined – and the goal of which is to provide a way for platforms that don’t share a common ID to piece together a version of one.

Needless to say, a lot of the virtualizing and modeling and aggregating in the WFA’s workflow exists to secure user-level data. It’s a privacy-protection regime. It also engages with the much-discussed third-party cookieless future.

Panel of truth

The proposal leans heavily on a single-source panel of opted-in users. At one point, it calls this panel the “arbiter of truth,” and it’s clear most of the hard work is done here. Panelists agree to have an (unnamed) measurement provider track their media consumption online and offline. Panels are a workhorse of media measurement as provided by Nielsen and others, but they are expensive to recruit and maintain. It’s not clear who would build or fund this one.

In the past, other panels have struggled to collect certain kinds of cross-device data, particularly from mobile apps. Panels also get less reliable in regions or publishers where they have less coverage, a problem that could be addressed by joining multiple panels together.

In addition to the media consumption, demographic and attitudinal data it provides, the panel is used to “calibrate and adjust” much more detailed census data voluntarily provided by publishers (including broadcasters).

Publisher-provided data

No walls here – at least in theory. Given that Google and Facebook support the WFA’s proposal, it’s implied they’re open to some form of data sharing. It’s already been reported – although is not in the proposal itself – that some participants will only share aggregated data, but it’s better than nothing. The WFA’s idea of “census data” includes publisher log files, TV operator data and information collected from set-top boxes.

This census data is married at the person-level with the panel data using a series of undefined “double-blind joins of census log data with panelist sessions.” Joined together, the different data sets can correct one another: The panel fills gaps where there is no census data, and the more detailed census data can adjust the panel’s output.

Virtual ID’s, anyone?

The census data will have to be freely provided, and so wide-ranging participation across many publishers is required for success. Another requirement is a way to tie impressions that occur on different publishers (which don’t share a common ID, remember) to individuals to calculate unduplicated reach and frequency.

In a general way, the proposal describes a process of assigning a “Virtual ID” (VID) to every impression. This VID may – but may not – denote a unique individual. How is it assigned? Based on a publisher-specific model that is refreshed periodically and provided by the neutral measurement provider. It appears to use cookies (and other data) in its first version, graduating to a cookieless solution based on publisher first-party data in the future.

The output here is a pseudonymized log file with a VID attached to each impression, overlaid with demographic data – at least TV-style age and gender cohorts – extrapolated from the panel.

Doing the math

In the final step, each individual publisher will perform some kind of aggregation into “sketches.” These sketches are likely groups of VIDs that belong to the same demographic or interest segment, by campaign. And it is worth noting here that the “sketches” can’t be reidentified to individuals and are somewhat similar to proposals in Google’s “privacy sandbox.”

At the penultimate step, each individual publisher sends their “sketches” to an unnamed independent service that will “combine and deduplicate VIDs” to provide an estimate of reach and frequency across the campaign. The WFA has a proposal for this Private Reach & Frequency Estimator posted on GitHub.

A GitHub explainer mentioning data structures and count vector algorithms is ad tech’s new sign of sincerity.

Finally, outputs are provided via APIs and dashboards, which support both reporting and media planning. End to end, it’s an ambitious proposal that has many of the right players and pieces to work. Its next steps are validation and feasibility testing led by the ISBA in the United Kingdom and the ANA in the United States.

Whatever happens, we’ve learned something from the WFA’s proposal. Even in a best-case-scenario, accurate global campaign measurement will definitely require heroic levels of cooperation.