Blog Post

Insight becomes Action; Compelling Big Data Stories

Lawrence Lerner • Sep 08, 2016

 

The saying goes “Every Company is an IT Company.” It’s also just as common to hear every media story about Big Data. From the last year’s high-profile case of the FBI plowing through data to point the finger at North Korea to the Music Industry ( How Big Data can change the music industry ). Another hot topic is IoT or the “Internet of Things.” Definitions vary but there is one basic consistency. “#Things” talk to each other and when they do, a lot of data is generated. According to a BBC article (Big Data: Are you ready for blast-off? ) 2.5 billion gigabytes (GB) of new data was generated every day in 2012. By any standards, that’s a large amount of data.

 

While data is good, facts are better but stories are relatable . The 2016/2017 challenge for “Big Data” purveyors is to write relatable stories that inspire action to an end user. For our purposes, data that is “big” is beyond the ability of humans to read, review and retain anything that is meaningful. To create relevancy we develop computer systems and frameworks to search for predefined patterns and associations in the data flow. More data with higher numbers of relevant patterns gives us increased confidence in the answers achieved (e.g., 60,480 unique searches were done in a geography with a population of approximately 155,000 for late night pharmacies says it’s probably flu season). Our ability to easily, inexpensively store and process vast amounts of data has improved dramatically in the past ten years.

 

Using our four-step Constructive Disruption methodology we’re able to make Big Data into a relatable story that’s relevant to every business.

 

(Uncover) What’s the problem we’re solving?

 

Today, we can collect vast amounts of data, quickly (defined as near real-time and ongoing) from endpoint devices such as smartphones, sensors, point-of-sale terminals and video recorders all day, every day. Data collected may be structured (tables of information) or unstructured (pictures, blog entries). The amount of data can easily approach terabytes in little time. Comprehending data at this scale is a lot like listening to 60 different musical selections at once, each starting at different points in the song. It’s noise with hardly anything that’s understandable or memorable. Our goal is to:

 

  • Scan through the data fast enough to review it in a timely manner
  • Organize related bits of information (e.g., sort by zip code)
  • Develop relationships between related data elements (e.g., seismic activity during Seattle Seahawks games)
  • Create information and a meaningful bit of prose that can be consumed and made actionable by an appropriate audience

 

We’re doing all of this to transform the data elements we’ve collected. It’s key to test our facts and shape them into prose that’s easily read and understood for our audience. As with all good stories there are common elements:

 

  • Know/write for our audience
  • Create three dimensional characters
  • Give it a plot – Good, evil or morally gray what’s it all about?
  • Develop the crucible – Writer-speak for the backdrop and test(s) the character(s) face in a story. Whether we are writing fiction or non-fiction assumptions must be tested

 

Lastly, writers and more often editors will tell you “there is no great writing, just great editing.” Editing is similar to the process data scientists’ use to creating meaning out of volumes of information. Great writing often inspires action.

 

As you create your story, use three guidelines (from Rule of Three: The business of keeping it simple )

 

  1. People want to be entertained – Even if it’s a report detailing the history of grain production on a single acre the writing must be enjoyable
  2. Keep it really, really simple – The more complex the data and its subsequent analysis, the more easily should your readers consume the message (“Cold and flu season in the North East is in full effect”)
  3. Brand Matters – You want to be known for relating information in a certain style. That style brings readers back for many reasons. Consider congressional session transcripts re-written (and factually maintained) by J.R.R. Tolkien.

 

(Examine) What Solution do you provide?

 

As previously mentioned data at this scale is noise and not simply or conveniently consumed by humans. Technology becomes is the enabling solution and game changer for Big Data. In the past processing data at the terabyte scale on a regular basis would have been cost prohibitive. What would happen if you had access to a simple tool? Simple defined as:

 

  • Relatively low cost or Open Source (no charge to use)
  • Easily hosted (Smart Phone App -> Cloud Based Subscription)
  • Well stocked “top of the funnel" – Data that is easily acquired and loaded into the database
  • Fast – Low turn around time to provide a “closed loop” in your decision making process

 

You begin to achieve the democratization of Big Data . Traditional relational and so-called NoSQL (Not Only SQL) databases have become more powerful, distributed and able to manage data of different types. They are often based on Open Source principles and technologies. The popularity of NoSQL databases is on the rise due to low price points and the ability to leverage commodity hardware (e.g., cloud). Examples of NoSQL databases include FoundationDB , MongoDB and HBase NoSQL databases store data in different structures than relational databases (e.g., graph, document, name-value pair). This often accelerates certain functions such as lookup time, which is key making data useful.

 

Once your data repository is well stocked, data scientists create a story that’s relevant. Let’s assume you have data (10,000,000 rows of digital images and the associated photo information taken over 90 days). From there you may be able to do search and extract some facts (“86% of digital pictures taken outdoors. Most photos are less than 5MB”). Now it's a matter of weaving the facts into a story.

 

  1. Character – Photographer who carries a photo taking device
  2. Plot – Photos taken by everyday people over a 90 day period
  3. Crucible – Pictures taken through the lens of a digital device. Smartphones, no matter how good the image capture still lack sophisticated lenses and comfortable handgrips

 

Data such as:

 

  • Over a 90 day period devices were updated within five days of new software release
  • Exposure time, on average, is 1/588 seconds
  • 86% of photos have a focal length of 24mm

 

Can be written into prose…

 

“The average Smart Phone Photographer (SPP) likes the latest hardware and they update their software frequently. Most of their pictures are taken in and around an area covering five square miles, closest to home. You’re most likely to find SPPs taking photos before and after business hours and on weekends. When they do take pictures during work hours, SPPs take fewer but generally higher resolution photos. When they take and retain a picture SPPs keep an average of three for every scene. 86% of all photos are taken out doors in relatively close proximity to the subject. “

 

To properly “write” this story you need to be a bit of split-class photography geek/data scientist. As tools become simpler, the ability to tell stories (and automate their creation) is the true value.

 



(Prepare) What problem are you solving for others?

 

The collection, mining and development of Big Data repositories has had caché for a few years. Yet many find it difficult to assign dollar value or real relevancy to vast amounts of “likes,” blogs and assorted petabytes of structured and unstructured data.

 

Until the data is turned into something that can be monetized in a repeatable fashion, it’s another intangible asset. Stories and/or practical lessons learned help business leaders define a practical return on investment.

 

I Can Read You Like A Book

 

According to the EPA, 65% of electricity in the United States is used in consumed by commercial real estate usually by buildings. FlowEnergy is a Woodinville, WA based company that helps commercial real estate properties optimize their energy spend. FlowEnergy’s Surge platform combines hardware, software and Big Data analytics to optimize energy consumption in commercial settings. Once installed, their systems continuously collect

 

  • 35 data points per SmartValve every 60 seconds
  • 108 data points every 15 minutes from electric meters
  • Data from 40 electric meters installed at customer sites for a total data 640GB+ over an 18 month period

 

Their customers are often large hospitals or universities with large campuses. These campuses have diverse energy profiles. FlowEnergy’s data scientists take the collected data sets and merge them with Open Source weather information. Using EDA (Exploratory Data Analysis) they are able to profile temperature and energy profiles of buildings. Based on the analysis they are able to “read” the buildings and compare like profiles. For on case study this allowed them to drive energy savings for buildings. They were able to

 

  • Determine that excess consumption occurred at night
  • Find buildings that were consuming energy outside of a comparative profile
  • Identify downstream effects which caused erratic behavior on other devices such as water controls

 

Their underlying technology stack is hosted on AWS (Amazon Web Services) using Microsoft SQL Server and uses Tableau for data visualization.

 

The Right Stuff

 

Today, eCommerce and mobile buying is an every day way to shop. In 2013 US Consumers spent more than $322,000,000,000 online according to Statisa . A key component to the shopping experience is merchandising or the marketing and promotion of products. Merchandising, particularly online, includes doing the “ right stuff” in a compelling way to an audience that likely, cannot experience your product immediately. It includes the planning involved in marketing the right product/services at the right place/time/quantities/assortments and most importantly at the right price. With an ever-increasing number of products/services online Brands and Retailers are continuously seeking an edge. Enter Indix , a Seattle, WA Product Intelligence company. Indix collects product and product-related information to allow merchants to optimize the way they price, promote and plan product discovery.

 

Indix collects millions of pieces of information, using Apache Hbase and other technologies to store, track and analyze:

 

  • 700,000,000 products (estimates put Amazon’s inventory at approximately 200,000,000)
  • 600,000 sellers
  • 40,000 brands

 

The 2,000,000,000,000 (2TB) data store is refreshed every two weeks and continuously grows as new merchants and products are added to their repository.

 

With the vast number of products and more importantly variations product descriptions, it becomes an unmanageable task for a single Retailer to optimize search or SEO on something so basic. Prosperent , an affiliate commerce service, uses Indix machine learning services to help organize, categorize and streamline up to 50,000,000 products a day from the company’s 4,800+ merchant partners. Variations and inconsistencies in product descriptions are democratized across a base of millions of unique products.

 

KYC (Know Your Customer)

 

Today most online search is in the form of how a user “Googled it.” From content, to links, to mobile more data points are collected and analyzed to provide fresher and more relevant search results, the way Google sees fit to rank it. While there is no industry standard for search Google has become the de-facto standard (with an 80% US market share). Marketers need to manage the complexity of rankings to improve the visibility and have their sites appear more often at the top of search.

 

In 2012, Google made 665 changes to the way you receive search results. That’s about one change every 13 hours, a lot of work for rank-hungry sites. Moz a Seattle-based software startup focused on helping marketers understand and improve SEO (Search Engine Optimization) aims to drive transparency in the Google search rankings. Moz’s own engine performs a parallel search to Google. They take a large sample (20% - 60%) of available sites. Moz tracks and relates the data in an attempt to tell the story by for its customers to optimize their search rankings. In its last search Moz’s engine crawled

 

  • 285,000,000,000 urls
  • 1,250,000,000,000 links
  • 362,000,000 TLD (Top Level Domains, e.g., IBM.com)
  • 25,000,000,000 sub-domains

 

What does Moz do with all those results? They analyze the results to tell the story of change. Their products enable users to manage change to optimize themselves in Google’s rankings. Google announces changes in a broad way but doesn’t specifically describe how the changes affect your individual page search ranking. Moz has refined their own products to help brands and marketers ferret out changes and adapt. When Google released the “Panda” update (named for the engineer who designed it), sites with lots of duplicated content were pushed to a lower page ranking. Does that seem fair? Perhaps but if we use the Indix product catalog as an example, a marketer might have dozens of products with the same description. The only variation in description might be color (e.g., ink for pens).

 

(Satisfy) What’s the opportunity for Big Data tools?

 

Big Data tools and companies already demonstrate they can go beyond merely storing or reporting upon data. They tell stories that are relevant to managing buildings, product labeling and more. The New Normal is the ability to create relevancy by moving from data to action .

 

Apple’s iTunes subscriber list has more than 800,000,000 members and Apple Pay now has more than 700,000 locations . With the each account holding a credit card, linking to a physical location and understanding product demand what are the stories and manufactured opportunities the tech company will create? What are the privacy concerns? Look for Big Data and Privacy to become part of the 2016 Presidential platforms.

 

The natural evolution of Big Data tools is to “vanish” into business infrastructure and become every day tools in the same way as voicemail and email.

 

ABOUT THE AUTHOR


Lawrence


I translate the CEO, Owner, or Board vision and goals into market-making products that generate $100M in new revenue by expanding into geographies, industries, and verticals while adding customers.


As their trusted advisor, leaders engage me to crush their goals and grow, fix, or transition their businesses with a cumulative impact of $1B


👉🏼 Subscribe to Retail industry news, unpacking trends, and timely issues for leaders.

 

Ready to grow, address change, or transition your business? 👉🏼  Let's brainstorm

Share by: