Unstructured data?
We are preparing for the “snowpocalypse” here in New York City and there are lots of buzzwords being used to prepare for the impending snow storm.
The snow buzzwords sound a lot more intense than data buzzwords although there is always "BIG DATA". The word that seems to get the least amount of excitement is unstructured data. Mainly because it is time consuming to tackle. It’s also probably like 80% of your dataset. No matter which industry you are in.
Despite the challenge, the benefits can be significant. I’ve learned from trial and from error (lot's of error) that while artificial intelligence and machine learning serve their purpose, there are many many times where manual collection is better suited for unstructured content. I have a great case study on it. I can send it to you if you click HERE
Here is a 10 step roadmap for success when analyzing unstructured data:
1. Gather the data
Unstructured data means there are probably multiple unrelated sources. Also when the source was built it was probably built with no intention for someone else to recollect the information. Trust your sources and make sure you have access to the most up to date information.
2. Find a method
Fully understand what the outcome of your data should be and work backwards. Are you creating a contact list? Tracking a financial services trend? Trying to derive insights and triggers? You need a method in place to analyze the data and have at least a broad idea of what should be the end result. Create an Agile plan for finding a result and then scale the data process before it gets to convoluted.
3. Get the right stack
The data you collect is going to come from many sources, but the results have to be put into your technology. Also make sure that everyone in your data team knows their role in the process
4. Put the data in a lake
Ahhh the infamous data lake…Organizations that want to keep information will typically scrub it and then store it in a data warehouse. This is a clean way to manage data, but in the age of Big Data it removes the chance to find surprising results. The newer technique is to let the data swim in a “data lake” in its native form. If a department wants to perform some analysis, they simply dip into the lake and pull the data. But the original content remains in the lake so future investigations can find correlations and new results.
5. Prep for storage
To make the data useful (while keeping the original in the lake), it is wise to clean it up. For example text files can contain a lot of noise, symbols, or whitespace that should be removed. Dupes and missing values should also be detected so analysis will be more efficient.
6. Find the useful information amongst the clutter
Semantic analysis and natural language processing techniques can be used to pull various phrases as well as the relationship to that phrase. For example “location” can be searched and categorized from speech in order to establish a caller’s location.
7. Build relationships
This step takes time, but it’s where the actionable insights lay. By establishing relationships between the various sources, you can build a more structured database which will have more layers and complexity (in a good way) then a traditional single-source database.
8. Employing statistical modeling
Segmenting and classifying the data comes next. Use of tools and Machine algorithms to do the heavy lifting to find correlations are becoming pretty standard.
9. End results matter
The end result of all this work has to be condensed down to a simplified presentation. Ideally, the information can be viewed on a tablet or phone and helps the recipient make smart real-time decisions. They won’t see the prior eight steps of work, but the payoff should be in the accuracy and depth of the data recommendations.
10. The secret sauce
The secret sauce of managing data is the most key ingredient on this list. To learn more about the secret sauce click HERE
If you thought my post was helpful please share it on Linkedin!
Revenue Generator | Time Optimizer | Speaker || Business & Life strategist. Endurance Athlete. Marine Veteran. Greatness creator.
9yHey folks - I've been responding to your comments. I'm not sure why LinkedIn is not updating it quickly.
Helping brands preserve revenue and protect integrity with anti-counterfeiting technology
9yGreat post, Neal. I would be interested to get your perspective on where NoSQL fits into this methodology?
Delivery Partner / Project and Program Manager
9yMathieu Monestier this is for you
Data. Analytics. Insights. Value.
9ygood points Neal Conlon What are your thoughts on "weighting" the value of the unstructured data?