A Shift from Rule based Systems with input and output to be consistent towards ML systems. These systems doesn’t require rules, as they are probabilistic, based on patterns, public datasets.
AI in general is whatever systems that’s built to assist human effort, it even includes basic calculators, rule based systems to self driving cars and robotic automations
Machine Learning - Enable computers to learn from data patterns through various algorithms, decision trees, vector machine, clustering. Usage includes spam detection, fraud detections, predictive maintenance. Ideal for simple predictive analysis, such as “will this person buy this”
Deep learning - Learn representation of data across various level of abstractions. Suited for large scale system that requires precise predictions.
Inferencing
The process of trained AI systems that is making predictions or classification based on the input
real time inferencing - chatbots, emails, autonomous driving systems. AWS Sage Maker* provides endpoints for deploying models, that performs real ṇime inferencing
batch inferencing - Sentiment analysing where real time analysis isn’t required. AWS Sage Maker transforms jobs for applying models for datasets in S3
Data Types in AI
Numerical
AWS SageMaker provides us with various integrations for S3 and Redshift to integrate and process data. Numerical data - often integers, floating point numbers - represents mostly on machine learning models where regression analysis is performed to make a prediction
Categorical
With techniques such as one hot encoding, label encoding data can be transformed into numberical categories, as AI models are not gonna handle it itself
unstructured (Text, Images)
Unstructured data often require a lot of complex pattern matching to convert as they have lack of predefined models.
High quality pre-processed data produces better performance
Things like book and conversations need to be pre-processed with techniques like tokenisation and stop word removal which prepares text data for model training. so it can get to NLP model and sentimental analysis.
As for image data, pre processing techinques such as normalisation, augmentation for preprocessing prepares the data for NLP models. AWS Rekognition the service for working with image data
As for audio data which is unlike text and image data which are monotonic, audio data has variying ambiguity. AWS Transcribe is the service that performs operations on audio data
Data Preprocessing
A clean structured data means accurate and performant models. So raw data are to be processed with various processes such as encoding, scaling and cleaning.
Labelled vs Unlabelled
Supervised learning will use labelled data for training. Great for tasks like classification Unsupervised learning will use unlableled data for training
Time series data
Data over time
Tools like AWS Forecast will use forecasting algorithms on top of timeseries data to make predictions