Aamar Hussain Explained Why Data is Essential To Building An Efficient AIML System
Aamar Hussain, the Director of Azure Data at Microsoft attended Worldwide AI Webinar to explain why data should be the main focus of your next AI/ML project. Here are a few highlights of his talk!
Why you should focus on data to build a more efficient AI/ML system?
According to Aamar, since AI/ML systems are made up of code and data, there has been a traditional emphasis on code. For the few last decades, ML systems have been trained by downloading and experimenting with a generic set of data with a view to fine-tuning the model. Now the focus is starting to shift towards data as the right approach to build a more systematic ML model with better performance and accuracy rate.
Additionally, he stated that data is the lifeblood of any AI/ML system. For supervised forms of ML, especially the multi-layer deep learning neural network approaches, to work, they must be fed with large volumes of correct data. He also cited a research paper reporting that 80% of the time spent on AI projects are dealing with running data, proving this preparation work is labor-intensive. In a nutshell, as Aamar put it, garbage in is garbage out.
However, more data is not always the answer. Large models with 17 to 19 billion parameters require lots of resources. In June 2019, researchers from the University of Massachusetts, Amherst found that, for instance, training and running a natural language processing A.I. model can emit more than 626,000 pounds of carbon dioxide
. That’s almost five times the amount of carbon dioxide emitted by the average car during its lifetime. The energy costs could also go up to millions of dollars.
Aamar suggested that to deal with this problem, we could focus on the data we’re having and use them more efficiently.
He then proceeded to share a data-driven approach to conversational AI use case that he and his team worked on together, which you can watch on our website
and YouTube channel
for more details.
Best practices and recommendations
As Aamar has observed processes from thought leaders from various industries, he has a few recommendations for any teams or enterprises who are going to build an AI/ML model.
He recommended starting with framing the problem and defining the scope of the project. Consider answering the following questions:
What is your business problem?
What is the area you’re looking to solve?
Is it solvable using AI or can you utilize reporting or visualization?
Then, you need to gather data. What type of data do you need? What shape do you want your data to be in? Do you have the data pipeline set up?
Once you’ve got it in place, you can now train the model. Here you have to perform an iterate cycle of training, analyzing errors, and constantly improving.
Finally, you can deploy the model to production and keep monitoring and maintaining the system as you go on your AI/ML journey.
Aamar also recommended a few more tips to utilize data:
Enhancing and augmenting your data
Standardizing and formatting data
Dealing with duplicate outliers
De-biasing and anonymizing
In conclusion, having an effective data strategy is crucial for any business to thrive and successfully capitalize on the potential of AI/ML.