Unstructured data?

Unstructured data?

We are preparing for the “snowpocalypse” here in New York City and there are lots of buzzwords being used to prepare for the impending snow storm.

The snow buzzwords sound a lot more intense than data buzzwords although there is always "BIG DATA". The word that seems to get the least amount of excitement is unstructured data. Mainly because it is time consuming to tackle. It’s also probably like 80% of your dataset. No matter which industry you are in.

Despite the challenge, the benefits can be significant. I’ve learned from trial and from error (lot's of error) that while artificial intelligence and machine learning serve their purpose, there are many many times where manual collection is better suited for unstructured content. I have a great case study on it. I can send it to you if you click HERE

Here is a 10 step roadmap for success when analyzing unstructured data:

1. Gather the data

Unstructured data means there are probably multiple unrelated sources. Also when the source was built it was probably built with no intention for someone else to recollect the information. Trust your sources and make sure you have access to the most up to date information.

2. Find a method

Fully understand what the outcome of your data should be and work backwards. Are you creating a contact list? Tracking a financial services trend? Trying to derive insights and triggers? You need a method in place to analyze the data and have at least a broad idea of what should be the end result. Create an Agile plan for finding a result and then scale the data process before it gets to convoluted.

3. Get the right stack

The data you collect is going to come from many sources, but the results have to be put into your technology. Also make sure that everyone in your data team knows their role in the process

4. Put the data in a lake

Ahhh the infamous data lake…Organizations that want to keep information will typically scrub it and then store it in a data warehouse. This is a clean way to manage data, but in the age of Big Data it removes the chance to find surprising results. The newer technique is to let the data swim in a “data lake” in its native form. If a department wants to perform some analysis, they simply dip into the lake and pull the data. But the original content remains in the lake so future investigations can find correlations and new results.

5. Prep for storage

To make the data useful (while keeping the original in the lake), it is wise to clean it up. For example text files can contain a lot of noise, symbols, or whitespace that should be removed. Dupes and missing values should also be detected so analysis will be more efficient.

6. Find the useful information amongst the clutter

Semantic analysis and natural language processing techniques can be used to pull various phrases as well as the relationship to that phrase. For example “location” can be searched and categorized from speech in order to establish a caller’s location.

7. Build relationships

This step takes time, but it’s where the actionable insights lay. By establishing relationships between the various sources, you can build a more structured database which will have more layers and complexity (in a good way) then a traditional single-source database.

8. Employing statistical modeling

Segmenting and classifying the data comes next. Use of tools and Machine algorithms to do the heavy lifting to find correlations are becoming pretty standard.

9. End results matter

The end result of all this work has to be condensed down to a simplified presentation. Ideally, the information can be viewed on a tablet or phone and helps the recipient make smart real-time decisions. They won’t see the prior eight steps of work, but the payoff should be in the accuracy and depth of the data recommendations.

10. The secret sauce

The secret sauce of managing data is the most key ingredient on this list. To learn more about the secret sauce click HERE

If you thought my post was helpful please share it on Linkedin!

Neal Conlon

Revenue Generator | Time Optimizer | Speaker || Business & Life strategist. Endurance Athlete. Marine Veteran. Greatness creator.

9y

Hey folks - I've been responding to your comments. I'm not sure why LinkedIn is not updating it quickly.

Like
Reply
Rob Saland

Helping brands preserve revenue and protect integrity with anti-counterfeiting technology

9y

Great post, Neal. I would be interested to get your perspective on where NoSQL fits into this methodology?

Like
Reply
Maroun Aoun

Delivery Partner / Project and Program Manager

9y

Mathieu Monestier this is for you

Like
Reply
John Ferraioli

Data. Analytics. Insights. Value.

9y

good points Neal Conlon What are your thoughts on "weighting" the value of the unstructured data?

Like
Reply

To view or add a comment, sign in

Insights from the community

Explore topics