As Lightning in a Bot expands its services to include a database of customer information, I have come across the first of many obstacles that every budding data scientist/technologist must face: dirty data.
Data analysis and data visualizations are the end product of an often long and arduous process of organizing data so that it can be processed. Clean and organized data enables our analyses to proceed with precision and hopefully accuracy. Dirty, unorganized, mis-organized data is a ubiquitous problem in the tech industry and that of the big data boom. I am amazed at how quickly the problem reared its head in our process. In fact, its one of the major reasons we want to hold on to customer data ourselves (so we can clean it for our purposes).
However, ever the optimist, we see this problem as a major obstacle for small business owners to start analyzing their own data. Our team hopes to provide data management services alongside the UI/UX facing aspect of our product. The more services we offload from our customer, the more dependent they will be on our product.
We are fortunate that the data we're consuming as part of our first domain, Shopify, arrives fairly clean. There are a few pimples to pop here and there but otherwise it has been a smooth transition.
It's likely to get a whole lot more messy down the road.