The main purpose of machine learning is to produce a meaningful result and take action based on the incoming data.The more various the information we have, the easier it will be to achieve a successful result. Therefore, to be able to do machine learning, we will need 3 basic components:
Do you want to detect spam emails?
Then you will need spam message samples. Want to predict a stock?
You will need data with price histories.
Would you like to offer a best product based on the preferences of the users?
Then you will need to collect their activities and shares on social media. As you can see, we need a different data set for each problem, and with this variety of data, our result is more successful.
There are 2 ways to obtain this data; The first will be to collect data manually, that is, through the surveys or forms we make.
While this method allows us to collect a much cleaner, error-free data set, it is a rather lengthy and costly process. Secondly, there is an automatic data collection method. We can collect data from google, twitter or other social platforms this is a less costly and fairly easy way.
Data is very important within many companies so they can afford to share their algorithms, but never their data, data is very important to companies and there are quite a lot of privacy procedures in place to access any data even within themselves.
Features – Variables
They are referred to as parameters or variables. For example, when building a machine learning model, they are the variables that the model needs to look at and learn.
e.g; The customer’s age, gender, vehicle mileage, the price of the last vehicle purchased, the frequency of the word in a text, the price of a stock… Examples of variables in data sets can be given.
If our data is kept in tables in databases, our job is quite easy.
Column names consist of our variables. But if we have a data set consisting of 200 GB dog pictures with uncertain properties, it is very difficult to store and decide on them, we cannot consider every pixel as a feature, it is very difficult to decide on this and the machine learning method decides this,
In such cases, we humans’ emotions come to the fore, so algorithms are more successful in making decisions at this point.
There is a different machine learning algorithm approach for each problem. Therefore, different methods are used. The algorithm we choose affects the sensitivity, result, size and performance of the model. Another important point I would like to mention is that your data is correct. If you have useless dirty data, even if your algorithm is perfect, it will not work.
In this case, it can be an example of garbage in – garbage out. So don’t just pay attention to the accuracy of the model, make sure that the data is large and clean.