Note the key phrase: Customer Data Platforms (CDP). Only the hottest mar-tech category to appear in at least a decade, and we literally wrote the book on it. At least, the first substantial mainstream book on this key topic. We cover the category from data integration and identity management to exploration, activation and A.I.
It’s accessible and a quick read, full of charming illustrations and architecture drawings. If you’re interested in this fast-growing tech category that points the way toward the converged platform future, our book is a great place to start.
Happy launch-day, my friends! Here’s to a more integrated, privacy-friendly and analytical 2021! The future is bright. peace mk
The following post originally appeared on the NYC Data Science Academy blog on Sept 28, 2021. This project was my capstone submitted for my data science coursework. It was not sponsored, endorsed or even noticed – so far as I know – by the mighty Hallmark network.
As the holidays approach, many of us eagerly await a new crop of Hallmark Holiday movies: positive, reassuring, brightly-lit confections that are as sweet and reliable as gingerbread. Part of their appeal is a certain implicit formula — a woman in a stressful big city job goes home for the holidays, falls for a local working man, and realizes she’s missing out on life.
Small towns, evil corporations, a wise older woman … there are recurring motifs that made me wonder if I could apply machine learning to the plots of these movies to understand (1) what the formulas are; and (2) if a computer could write a Hallmark movie (for extra credit).
NLG has been tried on Christmas movies before, and the results were quite funny. Perhaps I could do better.
My initial hypothesis was that there are a certain (unknown) number of plot types that make up the Hallmark Holiday movie universe, and that these types could be determined from a structured analysis of plot summaries. Because the stories seemed formulaic, they could potentially be auto-generated using natural-language generation (NLG) methods, which were new to me.
Assembling the Data Set
Step one was to pull a list of titles. The Hallmark Channel has been running original movies with a Christmas theme for a quarter century, although the rate of production skyrocketed in 2015 as they became very popular. Pandemic production issues slowed the pipeline slightly in 2020, but the pace remains rapid.
Although 1995-2010 saw fewer than five original titles a year, the years 2018-2021 saw almost 40 each year. It’s quite a pipeline.
Luckily, there is a list of original Hallmark production titles on Wikipedia, which I was able to scrape using Scrapy. Holiday movies aren’t distinguished from others, so there was some manual selection in cutting the list. Once I had my titles, I was able to use the API for The Movie Database project (TMDB), which maintains information about films and TV shows, to pull the ‘official’ plot summaries.
There were 260 plot summaries in my corpus. The summaries ranged in length and style, and their lack of standardization and detail caused some challenges in the analysis. However, short of watching all the movies and building my own summaries, the TMDB summaries (which were provided by the network, I assume) were my data.
My intended audience was writers, producers and TV execs who want to understand how the Hallmark Holiday genre works and the elements of a successful production slate. These popular movies could also be used to inform other narrative projects with similar valence.
Of the 260 summaries, all but two were Christmas movies. Many summaries were disappointingly brief and generic, but many were better than that. There were about 15,000 words in total in the final data set.
Lauren Gabriel leaves everything behind in Boston to embark on a new chapter in her life and career. But an unforeseen detour to the charming town of Grandon Falls has her discover unexpected new chapters – of the heart and of family – helping her to embrace, once again, the magic of Christmas.
Over the years, the stories and themes of the Hallmark Holiday films changed, as the network nosed around and then settled on a set of typed tropes. For example, earlier films used Santa as a character more often and spirits as worthy guides for the heroine. By 2015 or so, Hallmark had found its soul: small towns, high-school boyfriends, family businesses threatened by big corps, and so on.
After lemmatizing and tokening, removing stopwords and other standard text preprocessing, I realized that the corpus would have to be standardized to gain insight into its themes and to provide training data for any NLG model. For example, the summaries had names for characters, but those names didn’t matter to me – I just cared that it was <MALE> or <FEMALE> (for the main characters), or <CHILD> or <SIBLING> or <PARENT> or <GRANDPARENT> with respect to the main character. Often there was also a <BOSS>.
(If you’re curious, the most common names for characters mentioned in the corpus were: Jack, Nick and Chris.)
Likewise, towns were often named, but my only interest was that it was a <SMALLTOWN>, or (in the case of those bustling metropolises our heroines liked to leave in the beginning of the story) <BIGCITY>. And the evil big corporation might be named, but I wanted to tokenize it as <BIGCORP>.
Note the <BRACKETS> which would indicate to the model that these were tokens rather than the words originally in the corpus. How to make the substitutions without a lot of manual lookups?
I ended up using Spacy to tag the parts of speech. Although it requires some computer cycles, Spacy is a great NLP library that will tag each word by its part of speech, including place names, personal names and proper nouns. The tags themselves are then accessible to a Python script as part of a dictionary-lookup substitution.
In the case of character names, I was able to tag them using Spacy and then run them through Genderize to get a likely gender. This doesn’t always work, as viewers of “It’s Pat” on Saturday Night Live know, but a quick scan let me correct mistakes.
I could also automate much of the <TOKENIZATION> using dictionary substitutions. For example, I could find instances of “L.A.” and “Los Angeles” and “New York City” and so on and substitute <BIGCITY>. However, a careful manual check was needed to do some cleanup.
In the end, I had a corpus of 260 plots with major character, location and relationship types <TOKENIZED>.
Frequency Analysis & Topic Modeling
Word frequencies were high for terms such as ‘family’, ‘child’, ‘help, ‘love’, ‘parent’, ‘small town’. This agreed with my personal memories of the films — i.e., an abiding emphasis on families, home towns, and positive mojo.
Bigrams and trigrams (common two- and three-letter combos) uncovered even more of the Hallmark spirit than word frequencies. Among bigrams, the most common were ‘high school’, ‘fall love’, and ‘return hometown’. Common trigrams were ‘high school sweetheart’, ‘old high school’ and ‘miracle really happens’.
It is possible just to look at the common trigrams and get a very good feel for the alternate reality that is the mini-metaverse of Hallmark Holiday films.
The heart of my NLP analysis consisted of LDA topic modeling, using the Gensim library. Latent Dirichlet Allocation (LDA) is a statistical method that takes a group of documents (in our case, plot summaries) and models them as a group of topics, with each word in the document attached to a topic. It finds terms that appear together (frequency) and groups them into “topics” which can be present to a greater or lesser degree in each particular document.
Often used for categorizing technical and legal documents, I thought it could be used to find the different holiday themes I detected in the plot summaries.
First, I did a grid search for parameters using “coherence score’ as the target variable to maximize. The purpose of this search was to find a likely number of distinct topics, or plot types. I guessed there were 5-10, and this hyperparameter tuning exercise indicated that 8 topics appeared to be the most likely best fit.
Training the topic model on the plot summaries, I generated 7-8 distinct topics, with some overlap in words, as expected. These topics were analyzed using pyLDAvis, which allows for interactively probing the topics and changing some parameters to make them easier to interpret. (Figure 4 shows the pyLDAvis interactive view.)
Here some manual work — call it ‘domain knowledge’ (e.g., watching the movies) — was needed. I tagged the plots with the topics and focused on those that clearly fell into one topic or another. I then came up with a rough summary of these plots and gave that ‘theme’ a name. The manual tagging was needed because the theme name itself often didn’t actually apear in the summaries.
The 8 Types of Hallmark Holiday Movies
The 8 themes I ended up identifying, along with my own names and sketches, were:
SETBACK: Disappointed in work/love, a woman moves to a small town to heal/inherit
BOSS: A cynical businessman hires a spunky woman for holiday-related reason (like planning a party)
MIXUP: A travel mixup/storm forces some incompatible people to work together
ALT-LIFE: A wish upon Santa/spirit is granted and a woman is shown an alternative life — often, this involves time travel
TAKEOVER: A big corporation threatens a family-run business in a small town
RIVALS: Two seemingly incompatible rivals are forced to work together for some goal
IMPOSTER: Dramatic irony: Someone lies about who they are — or gets amnesia and doesn’t know who they are
FAMILY/CRISIS: A woman is forced to return home because of a family crisis
As usual with LDA, there was some overlap among the themes. In particular, #1 co-occured with others often; it started the story moving. For example, the heroine might suffer a SETBACK at work which encourages her to go back home (#1), and she encounters a MIXUP on the way (#3) that lands her in a delightful small town (this is the plot of “Christmas Town”).
Interestingly, when I looked at the distribution of themes over the course of the Hallmark seasons, they were fairly evenly present. This made me think the producers at the network are well aware of these themes and seek to balance them to avoid repetition.
Text Generation Using Markov Chains, LSTM and GPT-2
As an experiment, I looked at three different methods of generating text, the idea being to use the plots as training data for a model that would generate an original plot in the style of the others. Text generation or NLG is an emerging field that has made amazing strides in recent years – as I discovered – and has developed uncanny capabilities. I was only able to touch the surface in my work.
I began with traditional text generation methods, which were hardly magical.
Markov Chains were the most intuitive: they use the corpus and predict the next word (word-by-word) based on a distribution of the next words seen in the training data. Because it’s at the word-by-word level (not larger chunks of text) — at least, the way I implemented it — the results were coherent only in very small sequences. Overall, it didn’t work to put together sentences or stories that made sense.
Figure 6 shows a few examples of text generated in this way.
Long Short-Term Memory (LSTM) is a form of recurrent neural network (RNN) AI model. They were created as a way to solve RNN’s long-term memory problem, as RNN’s tend to forget earlier parts of a sequence (e.g., of text) due to a vanishing gradient. They also make predictions at the word level based on weights derived in the training stage.
Training was done over 6 epochs and 100-character sequences using ‘categorical cross-entropy’ as the loss function. It took about two hours on my average-powered setup, so it’s time-intensive. Longer training would improve disappointing results. (See Figure 7.)
Frankly, LSTM was a misfire. It required a great deal of training and although I did train for a few hours, my results were coherent only for a short (half-sentence) of text. More training might have helped, but I was more interested in moving on the ‘transformers’, the current state of the art for NLG.
GPT-2 — this is an open source version of the OpenAI transformers models. It’s pretrained on vast amounts of text data, giving it a very good basic model of English text. (GPT-3 — which is much better at NLG — is not available open source and I could not get access.) Training GPT-2 using the plots, I was able to ‘direct’ it toward my particular genre. The results were much more coherent than the other methods, while still falling short of useful new plots. (See Figure 8.)
To implement, I used the Transformers library provided by Huggingface/PyTorch, pretrained on data from the web. I trained the model for 50 epochs in batches of 32 characters (about 6 words).
Clearly, transformers are the way forward with NLG. GPT-3 has generated a lot of excitement in the past year or so, and its ability to create human-readable text that is original in a wide number of genres is astonishing. The state of the art could create a Hallmark movies plot already, and this tool will only get better as GPT-4 and other transformer models appear.
My hypothesis that Hallmark holiday movies tend to cluster around a set of common plots was validated. Specifically, I found:
Hallmark Holiday movies have a consistent set of themes: small towns, families, career setbacks, old boyfriends, spirits and wishes
Analyzing the text required standardization to avoid missing themes: man/woman/small town, etc.
LDA topic modeling worked fairly well in identifying 7-8 key topics, with some overlap
NLG yielded inconsistent results, with transformers pre-trained model living up to its reputation as a leap forward
Additional analyses I’d like to do would be to examine ‘plots’ as a time series. They are a sequence of events that happen in order. Adding the step-by-step flow would be an intriguing exercise.
After many years of shiftless planning and a listless lockdown, I finally put the pixel to the pointer and started a podcast! My friend Jill Royce and I co-host a weekly in-depth interview with one of the founding or influential figures in the first twenty years of advertising (and marketing) technology. That’s 1995-2015 or so … a time of tremendous innovation, excitement, ambition, posturing and fraud … a deranged double decade. So far, most of the people we’ve asked have agreed to join us — although we just started.
I’ve been touched by the support we’ve received from people who (like me) find the history of this much-maligned and underappreciated industry so fascinating. Check us out on Apple Podcasts and Spotify.
Loading and learning the models took about an hour. And given a few-word prompt, GPT-2 happily takes about five minutes to churn out 20 “ideas,” without breaking for lunch.
Text generation – or robo-writing – has made startling leaps in the past few years, moving from punchline to something that may deserve a seat in the creative lounge. Believe it or not, the best robo-writers are almost the equivalent of that most annoying/wonderful phenomenon: the eager beginner, completely inexhaustible but creatively uneven.
Most of GPT-2’s “ideas” were not quite ready to be presented; some were nonsensical. Oddly, it had no clue what to do with the prompt “Jason Alexander.” And one of its “Wow no cow” completions was “Wow no cowbell can be quite like the best in the universe.”
Which is probably true and not at all helpful.
In the near future, the smartest creative teams will be those that can use A.I. writers in productive ways, as a computer assist to a creative session and a source of ideas that might spark better ones.
GPT-3’s largest setting has 175 billion parameters. Think of these as individual knobs that the model has to tune, based on human writing samples, in order to predict the next word in a sentence. Google just open-sourced an even larger text model called Switch Transformer that reportedly has more than 1 trillion parameters.
The human brain has about 3 trillion synapses. Let’s leave that right there.
GPT-3 takes GPT-2 out of the sandbox and sends it to middle school. Give the model a prompt (e.g., “Once upon a time…” or “It wasn’t me…”), and it can continue at some length, generating text that is often plausible and sometimes uncanny. Early testers ranged from awed to skeptical – and often both.
In fact, that’s what makes this new generation of robo-writers different: they are flexible models, verging into the space called artificial general intelligent (AGI). This is the kind of intelligence we have: not pre-trained in any particular discipline but capable of learning. GPT-3 seems to perform well on a range of language tasks, from translation to chatting to impressing electronic gearheads.
Ad copywriting isn’t such a leap. As a tool to build creative prompts from catch phrases ready for human filtration, so-called Transformer models make a lot of sense.
Robo-writers are already employed in shrinking newsrooms. So far, they’re mostly stringers on the high-school sports reporting, weather and stock market desks — churning out endless Mad Lib-style pieces in routine formats about games and finance that no sane journalist would want to write, even for money.
Computers thrive at tedium. It’s their métier. The for-profit OpenAI takes a brute force approach to its innovation. Two years ago, it gained notoriety for developing a machine that could beat the world’s best players of a video game called Dota 2. It did this by having software agents play the equivalent of 45,000 hours of games, learning by trial-and-error.
The GPT family of tools were also developed by pointing software agents at a massive corpus of data: in GPT-3’s case, millions of documents on the open web, including Wikipedia, and libraries of self-published books.
GPT-3 is exposed to this mass of human-written text and builds a vocabulary of 50,000 words. Its model’s weights predict the next word in a sequence given the words that came before – and develops meta-learning beyond simply memorization. It requires a prompt and can be guided by sample text that provides a “context,” per the Super Bowl examples above.
To answer the obvious question, it can be shown that GPT-3’s creations are not just plagiarism. But are they pastiche? The model learns to predict words based on its experience of what others have written, so its prose is predictable by design. But then again, isn’t most writers?
Limitations include a “small context window” – after about 800 words or so, it forgets what came before – so it’s better at short pieces. It has a short attention span, matching our own.
For this reason, people who have spent more time with the model grow less impressed. As one said: “As one reads more and more GPT-3 examples, especially long passages of text, some initial enthusiasm is bound to fade. GPT-3 over long stretches tends to lose the plot, as they say.”
But ad copywriting isn’t about sustaining an argument or writing a book. It’s about trial and error and inspiration. Increasingly, that inspiration may be coming from a robot.
This post originally appeared on the blog of the NYC Data Science Academy, where I am a student. We were assigned the Ames Housing dataset to practice Machine Learning methods such as Lasso, Ridge and tree-based models. I added the Plotly Dash out of sheer exuberance.
The Ames Housing dataset, basis of an ongoing Kaggle competition and assigned to data science bootcamp students globally, is a modern classic. It presents 81 features of houses — mostly single family suburban dwellings — that were sold in Ames, Iowa in the period 2006-2010, which encompasses the housing crisis.
The goal is to build a machine learning model to predict the selling price for the home, and in the process learn something about what makes a home worth more or less to buyers.
An additional goal I gave myself was to design an app using the Plotly Dash library, which is like R/Shiny for Python. Having been a homeowner more than once in my life, I wanted a simple app that would give me a sense of how much value I could add to my home by making improvements like boosting the quality of the kitchen, basement or exterior, or by adding a bathroom.
Conceptually, I figured I could build a model on the data and then use the coefficients of the key features, which would probably include the interesting variables (e.g., kitchen quality, finished basement). These coefficients can predict what a unit change in a feature would do to the target variable, or the price of the house.
One advantage of the Ames dataset is that it’s intuitive. The housing market isn’t particularly hard to understand, and we all have an idea of what the key features would be. Ask a stranger what impacts the price of a house, and they’ll probably say: overall size, number of rooms, quality of the kitchen or other remodeling, size of the lot. Neighborhood. Overall economy.
To set a baseline, I ran a simple OLS model on a single raw feature: overall living space. My R^2 was over 0.50 — meaning that more than half of the variance in house price could be explained by changes in size. So I didn’t want to overcomplicate the project.
The data had a lot of missing values. Many of these were not missing at random but rather likely indicated the feature was not present in the house. For example, Alley, PoolQC and Fence were mostly missing; I imputed a ‘None’ here. Other values needed special treatment.
LotFrontage was 17% missing. This feature turned out to be closely related to LotShape, with IR3 (irregular lots) having much higher LotArea values. So I imputed values based on LotShape.
There were a lot of categorical values in the data which I knew I’d have to dummify. Before going there, I looked into simplifying some of them. ‘Neighborhood’ seemed ripe for rationalization. It was clear (and intuitive) that neighborhoods varied in house price and other dimensions. There didn’t seem to be a pattern of sale price and sale volume by neighborhood (except for NorthAmes), but Neighborhood and Sale Price were obviously related.
Now I know it isn’t best practice to create a composite variable based only on the target, so I created a new feature called “QualityxSalePrice” and split it into quartiles, grouping Neighborhoods into four types.
There were many features related to house size: GrLivArea (‘Graded’ or above-ground living area), TotalBsmtSF (basement size), 1FlrSF, even TotRmsAbvGrnd (number of rooms). These were — of course — correlated, as well as correlated with the target variable. Likewise, there were also numerous features related to the basement and the garage which seemed to overlap.
To simplify features without losing information, I created some new (combined) features such as total bathrooms (full + half baths), and dropped others that were obviously duplicative (GarageCars & GarageArea).
Looking across both continuous and categorical features, there were a number that could be dropped since they contained little or no information; they were almost all in one category (e.g., PoolArea, ScreenPorch, 3SsnPorch and LowQualFinSF).
Finally, I dummified the remaining categorical features and normalized the continuous ones, including the target. The sale price (target) itself was transformed via log, to adjust a skew caused by a relatively small number of very expensive houses.
Data Exploration & Modeling
Everybody knows (or think we know) that it’s easier to sell a house at certain times of year, so I looked at sale price by month and year. These plots showed a peak in the fall (surprisingly, to me) as well as the impact of the 2008 housing crisis.
Correlation for the continuous variables showed a number aligned with the target variable, and so likely important to the final model. These included the size-related features, as well as features related to the age of the house and/or year remodeled.
Because my target variable was continuous, I focused on linear regression and tree-based models. I ran OLS, Ridge, Lasso and ElasticNet regressions, using grid search and cross-validation to determine the best parameters and to minimize overfitting.
The error functions for these models on the test data set were all similar. The R^2 for Ridge, Lasso and ElasticNet were all in the range 0.89=0.91. The significance test for the OLS coefficients indicated that a number of them were very significant, while others were not.
I also ran tree-based models Gradient Boosting (GBM) and XGB (also gradient boosting) and looked at Feature Importances. The tree-based models performed similarly to the linear regression models, and pointed to similar features and being significant. Of course, this is what we’d expect.
In the end, the most important features across all the models were: OverallQual (quality of the house), GrLivArea (above-ground size), YearRemodAdd (when it was remodeled), NeighType_4 (high-end heighborhood), and the quality of the kitchen, finished basement and exterior. If nothing else, the model fits our intuition.
Feeding the features through the model one by one, additively, it became obvious that the most oomph came from 20-25 key features, with the rest more like noise.
Plotly Dash App
Settling on a multiple linear regression as the most transparent model, I ended up with an array of 23 features with coefficients and an intercept. To use them in my app, I had to de-normalize them as well as the target.
The app was aimed at a homeowner who wanted to know the value of certain common improvements. The coefficients in my linear model were the link here: each coefficient represented the impact of a UNIT CHANGE in that feature on the TARGET (price), assuming all other variables stayed the same. So I just had to come up with sensible UNITS for the user to toggle and the impact on the TARGET (price) was as simple as multiplying the UNIT by the COEFFICIENT.
Since I was building an app and not trying to win a data science prize, I focused on the significant features that would be most interesting to a remodeler. Some things you can’t change: Neighborhood and Year Built, for example.
But some things you can: in particular, features related to Quality would be in the app. These were Basement, Kitchen and Exterior Quality. Other features could be changed with significant investment: Baths (can add a bath), Wood Deck (can add a deck or expand one), and even house size (since you can — if you’re crazy and rich — tack on an addition somewhere).
I also included a couple Y/N features since they could affect the price: Garage and Central Air.
I knew Plotly as a way to create interactive plots on Python, but the Dash framework was new to me. It was introduced to me by one of our instructors and an active online community and examples also came to my aid.
Dash is similar to R/Shiny in that it has two key components: a front-end layout definer, and a back-end interactivity piece. These can be combined into the same file or separated (as they can in Shiny).
The general pseudo-code outline of a Dash App starts with importing the libraries and components. It’s written in Python, so you’ll need numpy and pandas as well as Plotly components for the graphs.
Dash converts HTML tags into HTML when they appear after “html.”, which is straightforward. I included Div tags, H2 (headers) and P (paragraph) tags. Different sections of the Div are called “children” (as they are in HTML). I used “children” here because I wanted to have two columns — one for the inputs and the second for the graph of adjusted house prices.
The rows and columns can be done in a couple different ways, but I used simple CSS style sheet layouts (indicated by “className”).
Interactivity is enabled by the layout and the callbacks. In the layout, as in Shiny, I could specify different types of inputs such as sliders and radio buttons. There aren’t as many or as attractive a selection as you find in Shiny, but they get the job done. I used sliders and buttons, as well as input for the avg starting price.
The functional part of the app starts with “@app.callback,” followed by a list of Outputs (such as figures/charts to update) and Inputs (from the layout above). The way Dash works, these Inputs are continually monitored, as via a listener, so that any change to a slider or other input will immediately run through the function and change the Outputs.
Right after the @app.callback section, there’s one or more functions. The parameters of the function represent the Inputs from @app.callback, but they can have any name. The name doesn’t seem to affect their order, since they’re read in the same order as the callback Input list.
This function includes any calculations you need to do on the Inputs to arrive at the Outputs, and if the Output is a graph or plot — as it almost always is — there’s also a plot-rendering piece.
In my case, I wanted the output to be a normal distribution of potential new values for the sale price, given the changes in the Inputs. For example, if someone moved the “Overall Quality” slider up a notch, I needed to generate a new average saleprice, a difference (from the baseline) and a normal distribution to reflect the range.
I did this by turning each coefficient into a factor, adding up the factors and multiplying by current sale price. I then generated a random normal distribution using Numpy with the new target as the mean and a standard deviation based on the original model.
The final dashboard looked like this:
Granted, there are a caveats around the app. It’s based on the Ames housing dataset, so other areas of the country at different times would see different results. It requires estimating a ‘starting price’ that assumes the default values are true, and this estimate might be difficult to produce. But as a proof of concept, it has potential.
I think there’s definitely a data-driven way to estimate the value of some common improvements. This would help a lot of us who are thinking of selling into the housing boom and wondering how much we can reasonably spend on a kitchen remodel before it turns into red ink.
The following article first appeared on the Salesforce Marketing blog in January.
Last May, at the height of the COVID pandemic’s first wave, PepsiCo raised some bubbles by launching not one but two websites where consumers could browse and purchase a selection of the company’s more than 100 widely munched snack and beverage brands. At a time when many outlets were closed and people ordered more food online, the company quickly made it easy for consumers to buy directly.
One ecommerce site, PantryShop.com, took a lifestyle-themed approach, offering bundles of products in categories such as “Rise & Shine” (Tropicana juice, Quaker oatmeal, Life cereal) and “Workout & Recovery” (Gatorade, Muscle Milk, and Propel electrolyte-infused water). A second site, Snacks.com, offered a more straightforward lineup of Frito-Lay brands such as Lay’s, Tostitos, Cheetos, dips, nuts, and crackers.
These platforms complement PepsiCo’s retailer channels to ensure the company continues to deliver consumers their favorite products on the right platform, in the right place, at the right time.
Whenever, wherever, however has been the mantra of ecommerce digital marketing for a while, but it’s become more important in the age of COVID-19.
Whenever, wherever, however has been the mantra of ecommerce digital marketing for a while, but it’s become more important in the age of COVID-19. Most of Salesforce’s customers – particularly those in high-velocity consumer categories such as consumer packaged goods (CPG), retail, and restaurants – have had to become extraordinarily flexible over the past 10 months to adapt to ever-changing local guidelines and consumer behaviors.
What was most striking about PepsiCo’s foray into direct-to-consumer (D2C) commerce and marketing was its speed. “We went from concept to launch in 30 days,” said Mike Scafidi, PepsiCo’s Global Head of Marketing Technology and AdTech. “Within 30 days, we stood up our ecommerce capabilities and started delivering direct to our consumers.”
PepsiCo’s products are consumed more than a billion times daily in 200 countries. It boasts 23 brands with more than $1 billion in annual sales. How does such a large and complex global company pull off such impressive footwork?
The answer, Scafidi said, was preparation. “Digital is inherently disruptive,” he explained. “We’ve been training for this for a long time. We’ve been preparing to adapt to disruption for 20 years.”
Planning for change is a skill
Scafidi and I spoke from our remote locations about “Building Resilience and Adapting Your Marketing Tech in Uncertain Times” during Ad Age’s CMO Next conference, co-hosted by Ad Age’s Heidi Waldusky. Scafidi stressed that tumultuous times took PepsiCo back to basics — inspiring the company to lean on skills it had been developing for years — especially in consumer research and media measurement.
Part of the reason PepsiCo was able to launch Snacks.com and PantryShop.com so quickly, he said, was “we were leaning on what we were doing already.”
He reminded me of an analyst quote I read recently on embedding resilience into sales and marketing plans: “[T]he more an organization practices resilience, the more resilient it becomes.”
Over the past year, many organizations have had plenty of time to practice being resilient. As stores shut down and millions huddled at home, there was a surge in digital activity across all channels. Media consumption soared: people around the world watched 60% more video, for example. And they shopped. Salesforce’s latest Shopping Index shows that comparable online sales were up 55% in Q3 of last year after climbing 71% in Q2.
We’ve heard from many of our customers that they needed to launch new capabilities faster than ever before. Otherwise, they’d lose business. Curbside pickup, buy online, pickup in store, expanded digital storefronts, appointment scheduling, contact tracing – the list goes on.
Our desire to help customers adapt to rapid digitization inspired us to launch Digital 360, a suite of ecommerce digital marketing products combining marketing, commerce, and personalized experiences under a single umbrella. With it, Salesforce Trailblazers like Spalding and Sonos were able to scale their online commerce dramatically, making up some of the shortfall in brick-and-mortar sales.
When times are changing, it’s too late to build up basic skills. If you have a foundation in place, that allows you to adapt.
MIKE SCAFIDI, PEPSICO GLOBAL HEAD OF MARKETING TECHNOLOGY AND ADTECH
Unilever also faced dramatic market shifts in the recent past. Keith Weed, the company’s chief marketing and communications officer, pointed out back in 2018 that the pace of change “will never be this slow again” – not knowing just how fast that pace would get. And like PepsiCo, Unilever met the hyperfast present by relying even more on its customer research skills.
“We know that people search [online] for problems, not products,” Weed said. So the company created Cleanipedia.com, which offers detailed solutions to cleaning problems in 26 languages. Built before COVID-19, the site was ahead of its time and has attracted 28 million visitors to date.
Building a foundation that scales to meet customers wherever they are
When times are changing, it’s too late to build up basic skills. “If you have a foundation in place, that allows you to adapt,” Scafidi said.
For example, the PepsiCo team was able to rapidly restructure its internal media measurement analyses because it had already put in the work to develop an ROI Engine, which helped determine the real impact of its advertising, promotions, and email. The ROI Engine automates data inputs, processing, and algorithms to improve paid media optimization decisions. Combining the ROI Engine with a customer insight capability called Consumer DNA, “We were able to stabilize our understanding of the consumer and adapt to where they were,” Scafidi explained.
PepsiCo’s Consumer DNA project is an example of a custom-built tool that allows the company to gain a 360-degree view of the consumer to enable more effective and targeted media buying and marketing activation.
At Salesforce, we help our customers engage with their customers. To do this in 2020, we too relied on core skills, built up over years, to adapt to an environment that seemed to change by the second. The result was launches like Digital 360 and Work.com. The latter helps companies safely reopen their workplaces. We also introduced a customer data platform (CDP) called Customer 360 Audiences, which serves as a single source of truth to build a unified profile for marketers and others.
The Ancient Greek philosopher Heraclitus said, “the only constant in life is change.” As customers like PepsiCo show us, the best way to adapt is to build core skills that can help you pivot quickly in the future.
The following column originally appeared in the mighty AdExchanger on 11/10/20
Measurement is a footnote – outglamorized by targeting and the opera of browsers, pushed into the corner during debates about the future of media. But it’s arguably more important than aim and requires more discipline.
A few weeks ago, the World Federation of Advertisers (WFA) released a statement and a technical proposal for a cross-media measurement framework. It was the outcome of a yearlong global peer review and an even longer discussion among participants including Google, Facebook, Twitter, holding companies, industry groups such as the ANA and MRC and large advertisers including Unilever, P&G and PepsiCo.
Reactions ranged from enthusiastic to less so, but few people seem to have read more than the press release. After all, it’s not a product and could be yet another in a parade of grand ambitions in online-offline media measurement, dating back to Facebook’s Atlas.
But it describes a realistic scenario for the future of measurement. Sketchy in spots, the WFA’s proposal is ironically the clearest screed of its kind and is worth a closer look.
To be sure, this is a project focused on solving a particular problem: measuring the reach and frequency of campaigns that run on both linear TV and digital channels, including Facebook, YouTube and CTV. In other words, the kinds of campaigns that cost participating advertisers such as P&G a reported $900 billion a year.
And P&G’s own Marc Pritchard is on record calling the proposal “a positive step in the right direction.”
The need is clear. Advertisers today rely on a patchwork of self-reported results from large publishers, ad server log files, aggregate TV ratings data and their own homegrown models to try to triangulate how many different people saw their ads (reach), how often (frequency) and how well those ads fueled desired outcomes, such as sales lift.
The latter goal is acknowledged in the current proposal, which doesn’t try to solve it. But the WFA, building on previous work from the United Kingdom’s ISBA, Google, the MRC and others, lays out a multi-front assault on reach and frequency that covers a lot of ground.
How does it work?
The proposal combines a user panel with census data provided by participating publishers and broadcasters, as well as a neutral third-party data processor. The technical proposal spends some time talking about various “virtual IDs” and advanced modeling processes that are loosely defined – and the goal of which is to provide a way for platforms that don’t share a common ID to piece together a version of one.
Needless to say, a lot of the virtualizing and modeling and aggregating in the WFA’s workflow exists to secure user-level data. It’s a privacy-protection regime. It also engages with the much-discussed third-party cookieless future.
Panel of truth
The proposal leans heavily on a single-source panel of opted-in users. At one point, it calls this panel the “arbiter of truth,” and it’s clear most of the hard work is done here. Panelists agree to have an (unnamed) measurement provider track their media consumption online and offline. Panels are a workhorse of media measurement as provided by Nielsen and others, but they are expensive to recruit and maintain. It’s not clear who would build or fund this one.
In the past, other panels have struggled to collect certain kinds of cross-device data, particularly from mobile apps. Panels also get less reliable in regions or publishers where they have less coverage, a problem that could be addressed by joining multiple panels together.
In addition to the media consumption, demographic and attitudinal data it provides, the panel is used to “calibrate and adjust” much more detailed census data voluntarily provided by publishers (including broadcasters).
No walls here – at least in theory. Given that Google and Facebook support the WFA’s proposal, it’s implied they’re open to some form of data sharing. It’s already been reported – although is not in the proposal itself – that some participants will only share aggregated data, but it’s better than nothing. The WFA’s idea of “census data” includes publisher log files, TV operator data and information collected from set-top boxes.
This census data is married at the person-level with the panel data using a series of undefined “double-blind joins of census log data with panelist sessions.” Joined together, the different data sets can correct one another: The panel fills gaps where there is no census data, and the more detailed census data can adjust the panel’s output.
Virtual ID’s, anyone?
The census data will have to be freely provided, and so wide-ranging participation across many publishers is required for success. Another requirement is a way to tie impressions that occur on different publishers (which don’t share a common ID, remember) to individuals to calculate unduplicated reach and frequency.
The output here is a pseudonymized log file with a VID attached to each impression, overlaid with demographic data – at least TV-style age and gender cohorts – extrapolated from the panel.
Doing the math
In the final step, each individual publisher will perform some kind of aggregation into “sketches.” These sketches are likely groups of VIDs that belong to the same demographic or interest segment, by campaign. And it is worth noting here that the “sketches” can’t be reidentified to individuals and are somewhat similar to proposals in Google’s “privacy sandbox.”
At the penultimate step, each individual publisher sends their “sketches” to an unnamed independent service that will “combine and deduplicate VIDs” to provide an estimate of reach and frequency across the campaign. The WFA has a proposal for this Private Reach & Frequency Estimator posted on GitHub.
A GitHub explainer mentioning data structures and count vector algorithms is ad tech’s new sign of sincerity.
Finally, outputs are provided via APIs and dashboards, which support both reporting and media planning. End to end, it’s an ambitious proposal that has many of the right players and pieces to work. Its next steps are validation and feasibility testing led by the ISBA in the United Kingdom and the ANA in the United States.
Whatever happens, we’ve learned something from the WFA’s proposal. Even in a best-case-scenario, accurate global campaign measurement will definitely require heroic levels of cooperation.
Our TV screens are growing larger than our walls, surging eight inches on average in the last five years. And the most prevalent cultural phenomenon other than TikTok dance challenges is probably binge-watched streaming services, which spend more than $35 billion a year on new video content.
Blogging is dead and other favorite headlines
You’d be forgiven for believing that we’ve forgotten how to read. Judging by our popular culture, we’re becoming a post-literate, oral society, one whose always-dominant visual sense has overwhelmed our reasoning to the point where 72% of consumers now say they prefer all marketing to be delivered via video.
We don’t notice this trend because we’re part of it, but historian do. An iconic 1958 VW ad cited for its visual austerity contained just 165 words, including ’Lemon’. Last year, VW ran a magazine ad – remember magazines? – that had all of 13 words, including ’Volkswagen, it’s plugged in.’ Newspapers have seen subscriptions decline 70% since 2000, echoed by the 20 million hits received to the search “is blogging dead?”
Of course, it’s understandable. Sight is our dominant sense and is more primitive than the others. Almost 90% of the information going into our brain is visual, and 40% of neural fibers flow to our retinas. Images are processed much faster than text, which is a learned input that requires years of practice.
Words still exist. This sentence is proof. In fact, it may seem odd to be making such a statement in words, to a literate audience. But mass culture increasingly treats words as a kind of visual fillip, a graphic element used for iconic rather than informational content. Increasingly, American consumers are like English-only speakers who visit Tokyo, struck by the occasional familiar word among the kanji script.
Marketers: visualize this
How can a marketer adapt to the rise of the post-literate consumer?
First, make sure your brand has a strong visual identity – stronger than you think it needs. A recognizable logo and color palette aren’t enough. Assume consumers will not look at your logo – after all, they’re certainly multitasking. Your brand identity must be so strong it can be communicated simply through a consistent, insistent drumbeat of the same colors, fonts, shapes and styles.
Byron Sharp and the Ehrenberg Bass Institute influentially made a similar point in How Brands Grow. Sharp stressed the importance of “mental availability,” which is not awareness (as he often reminds his Twitter followers), but rather how familiar a brand’s (primarily visual) sensory associations are to consumers.
So be visually consistent, like your brand depends on it.
Second, simplify and streamline your cues. This is another trend that’s obvious to those who’ve looked for it. Logos, fonts, web pages, graphic design – all are retreating from clutter and complexity. It’s almost as though a decline in literacy has extended to the visual realm, or maybe we’re all just overwhelmed and told our visuals that – in the words of Taylor Swift – they need to calm down.
Cut the text. Lengthen the tweet?
Plenty of research confirms that most people prefer simplified designs that are neither complex nor particularly original. Simplification includes increasing the amount of white space, beautifying images – and cutting text.
Third – and most importantly – know when to ignore this advice. We have been talking here about mass consumer audiences. If you’re selling advanced hydroelectric plants, different principles apply. And remember that for every trend there is a counter-trend.
Some years ago, I worked at an ad agency doing social analytics for a luxury car brand. Examining the Twitter conversation about the brand, I noticed an odd phenomenon: it was bimodal. I mean that about 80% of the comments were inane, silly, and crass – what you’d expect. But 20% were very different: intelligent, thoughtful, almost nerdy. I concluded there were two different Twitters out there.
If you’re appealing to the second Twitter, the realm of academics and the informed, don’t sound dumb. You may need to increase rather than decrease your word count.
As I suspect you already know, since you’ve made it this far into a 1,000-word essay on marketing, there’s still a lot of life left in words.
Martin Kihn is senior vice-president strategy at Salesforce.
There’s a tart scene in Mad Men when Don Draper eyes the iconic VW Beetle ad with the one-word caption: “Lemon.” After a thoughtful pull on his Lucky Strike (“It’s Toasted!”), he says: “I don’t know what I hate about it the most — the ad or the car.”
These days, too many people seem to agree: they don’t like ads. They’re interruptive and meddlesome. Besieged on one side by blockers and browsers, on another by unfriendly memes, advertising seems to be having a moment.
A few years ago, NYU gadfly Scott Galloway went so far as to declare “The End of the Advertising-Industrial Complex,” scripting a future of subscription-only media and brands propelled by word of mouth alone. In a dark analogy, he called modern digital marketing a world “full of anxiety, humidity and …” so on.
Ouch. He reminds me of the observation ascribed to John O’Toole, late chair of Foote, Cone & Belding, that most critics don’t like advertising just “because it isn’t something else.”
It wasn’t always thus. The first major history of the ad business concluded advertising was “a civilizing influence” and “an educative force.” That was in 1929. Times have changed, although as recently as the 1970s a voice as acrid as media theorist Marshall McLuhan’s said, “Advertising is the greatest art form of the 20th century.”
Maybe so, but an increasing number of people seem to be asking: Do we even need it anymore?
What Does Advertising Do, Anyway?
The best and most obvious case for advertising is that it creates (or encourages) demand. As the economist John Kenneth Galbraith famously said in The Affluent Society (1958), advanced societies can’t prosper unless they convince people they need more than they actually do. In addition to information, advertising is supposed to supply desire.
A lot of academic angst has been directed at this point, but let’s avoid that for now and assume ads are supposed to bring in more at the cash register than they cost. Certainly, a lot of – let’s also assume –intelligent people believe this to be true. Even during this very difficult year, some $215 billion will be spent on ads in the U.S.
But do they work? It’s conceivable advertisers are in the pincers-grip of wishful thinking and fuzzy math. Having spent some years in the ad measurement business, I can tell you that measuring return on ad spend (ROAS) is not as easy as it looks, and it doesn’t look easy.
Oddly, digital channels make measurement harder – not simpler. When a majority of ad spend flows to platforms that do their own measurement and don’t release raw data, in the name of privacy, complex equations are required. The good news is that analysts who have worked these equations, over time and on a large scale, generally agree that advertising works. That is: brands that advertise more can be shown to have higher sales, on average, and the impact of ads is incremental.
What about Tesla? Sooner or later, the anti-ad crowd always mentions this brand. After all, four of the five biggest car makers – all major ad spenders – are losing share. Tesla is growing, despite spending almost nothing on ads.
My response to this is simple: come up with a revolutionary product that disrupts a 100 year-old industry, add a charismatic chairperson, and you too will enjoy so much free publicity that it would be redundant to advertise.
What Else Have You Done for Me, Lately?
Demand is its day job, but advertising has more mojo. There is the industry itself with its quarter-million hard-working employees. Many of these people have impressive (subsidized) side-hustles in the arts and sciences. And there is the indirect impact of additional demand, as companies sell more and then hire, buy equipment, take out leases, and so on.
To add all these direct and indirect effects up is a complicated exercise that has been attempted over the years. Each time, conclusions were impressive. A 1999 study overseen by a Nobel Laureate found advertising accounted for about 2.5% of U.S. economic output.
A decade and a half later, IHS Global Insight concluded that every dollar spent on advertising generates $22 in economic output (sales), and every $1 million spent supports 81 American jobs. An update done in 2015 found that advertising impacted up to 19% of U.S. GDP.
There are a lot of caveats, of course. IHS is a respected research firm, but the reports were sponsored by ad industry groups. The reports’ definition of advertising was very broad, e.g., including direct mail. But even if the numbers are discounted, they still point to a major impact on the nation’s economic life.
And we haven’t even mentioned that ads support content. It appears that advertisers spend about $35 per month to reach each U.S. adult online, which should make us feel pretty good. That money pays for news, entertainment and utilities that in an ad-free world either would not exist or would cost us something. Newspapers have lost 70% of their ad revenue since 2006, with a similar decline in reporters and content. Oft-mentioned counter-examples such as the New York Times’ rising subscription revenue are really just poignant outliers.
On the other hand, somebody is making money on ads. Two of the five most valuable companies on Earth are almost entirely ad-supported. Content on Google (including YouTube) and Facebook (including Instagram) is free, and the prolific ad experience they offer clearly isn’t turning people away. So the “advertising-industrial complex” may not have ended but just changed its owners.
Let’s admit that ads aren’t going away. The business needs to adapt its data and approach to a new reality. And many of the industries’ problems — from oversaturation to intrusive targeting — are actually self-inflicted.
Don Draper gives us some solid advice: “You want respect? Go out and get it for yourself.”
If you want to see the dramatic impact of COVID-19 on a world-class marketing department, look at Unilever. The consumer products powerhouse met a challenging global environment in Q2 and is making rapid changes in tactics, reviewing all marketing spend “to ensure it’s effective and appropriate,” according to CFO Graeme Pitkethly.
Meeting chaos with agility, Pitkethly points out that the company is “dynamically reallocating” budgets as consumer behavior shifts, moving resources out of outdoor ads (no traffic), and TV production (not safe) into areas with higher immediate return-on-investment (ROI) – such as skin care, home, and hygiene.
Unilever’s quick shifts and shimmies are mirrored by our other customers, many of whom are charged with increasing marketing ROI with fewer resources. The World Bank predicts a baseline 5.2% contraction in global GDP in 2020. Gartner’s CMO spending survey released in July showed 44% of CMOs expect midyear budget cuts.
What’s the best way to improve marketing ROI in today’s challenging landscape? By increasing one of these three things:
Effectiveness – get more revenue from the same investment
Efficiency – get the same (or more) revenue from a lower investment
Optimization – a combination of these through better resource allocation
And how do you know where to start? Data.
Keeping this framework in mind, and highlighting examples from our customers, here are three ways you can use data to increase your marketing ROI.
1. Dial up your digital (Effectiveness)
At a time when most of us spend more time than we’d like staring at screens, digital channels are the best way to reach us. Many indicators from time spent on mobile to time wasted – um, spent – playing networked video games prove that a lot of our lives are now online. And from a commerce perspective, McKinsey said consumers “vaulted five years in the adoption of digital in just eight weeks.”
Digitizing just as fast, marketers are ramping up their technology investments to manage customer data and use it effectively. And if you’re not, you should be. In addition to providing greater control over channels, data-driven investments include analytics to improve customer segmentation, message personalization, and targeting methods such as lookalike modeling. Despite an overall decline in enterprise tech spend, Forrester forecasts a rise in marketing technology investment. “In some cases,” says VP Principal Analyst Shar VanBoskirk, “technology may offer greater efficiency than relying on manual effort.”
Case in point: Orvis, an outdoor clothing retailer, saw pandemic-related store closures lead to tighter budgets and a mandate to improve engagement. They used Einstein Content Selection to automatically choose when to send standard messages and when to deliver content from its values-focused “Giving Back” campaign. They also took advantage of Einstein Send Time Optimization, which uses artificial intelligence to predict delivery times that align with when each recipient is more likely to open an email. Both approaches together led to a 22% higher email click-through rate.
2. Send fewer messages (Efficiency and Effectiveness)
Yes, that’s right: less can be more in digital marketing. All of us know the experience of being hounded by a brand to the point that we stop engaging, progressing from tuning out to turning off (hello, “unsubscribe”). During a time of message saturation, keeping your communications clear, on point, and not too frequent can cut costs (efficiency) and raise responses (effectiveness).
Last year, more than 40% of U.S. consumers said they were “overwhelmed” or “annoyed” by the volume of marketing content they experienced daily, according to Gartner. That reaction has only sharpened this year. In fact, Gartner calls this message-stress syndrome “COVID Fatigue.”
Treating the syndrome takes a good source of customer data and links among call centers, ecommerce, marketing, and other systems. With the right pipes in place, you can execute ROI-boosting tactics like:
Suppress social ads to people who have an open case
Merge customer records
Send fewer messages to people who are overwhelmed
Cable network Showtime did just that, starting before the pandemic. It used Salesforce Audience Studio to suppress spending on existing subscribers, then shifted budget toward those who had recently canceled, for the win back. This more efficient approach to their marketing helped them reach 25 million people.
3. Speed up planning cycles (Optimization)
During normal years – that is, not now – it’s common for marketers to do quarterly or even annual media budgeting. But in today’s environment, that pace won’t work.
Measurement, reallocation, testing, and optimization – these should be ongoing disciplines, not intermittent ones. Continuous monitoring allows you to move spend to higher-performing channels, cut short losses, and make the most of the resources you have. It also allows you to respond to market shifts, such as changing your tactics in areas affected by natural disasters or virus outbreaks.
Part of reimagining marketing in the “next normal,” according to McKinsey, is always-on customer data analytics: “Analytics will need to play a core role not only in tracking consumer preferences and behaviors at increasingly granular levels, but also in enabling rapid response to opportunities or threats.”
Part of this acceleration requires better data management, moving from manual reports to an automated real-time system. At Salesforce, we managed to combine marketing data from 83 sources and 182 streams using Datorama. This reduced wait time on data from two to three weeks to near real time. Most important, marketing ROI grew 28%.
Even as businesses face tighter budgets and a lower tolerance for risk, there’s still a world of opportunity for marketers to increase their ROI. You just need the right data to help you chart your course.