Data mining is a much-expanded area, we will first learn a little about its theoretical stuff and then review an algorithm almost elaborately.
There are nine algorithms, we will not have time to go through them all.
Truth is, this is a special book with 700-800 pages, and each algorithm must be reviewed with a case study.
Instead of learning each algorithm incompletely, we are going to learn the A to Z of one algorithm.
The algorithm we have chosen is called the association rule. We will talk about the algorithm later.
Let’s learn some basic stuff about data mining first and see what it is.
First of all, let’s learn about the definition of data mining.
What is Data Mining?
The process of analyzing data to extract and discover some hidden patterns from it is called data mining.
Features of a Data Mining Projects
What characteristics should a project have so that we can say we have performed data mining?
First of all, the pattern is something that is not obvious.
You run the mining algorithm across the data, it wanders around and extracts some stuff, then, it tells you about the combinations in your data. You may not even believe that such combinations have been there!
Data mining is sometimes called knowledge discovery in the database too, but not necessarily always.
But what does that mean?
It means that this is a matter of discovering knowledge from data.
When it comes to data mining, we are generally talking about the future.
With OLAP, we would be talking about the past and present at most, but with mining, we could talk about the future too.
So, in addition to showing us a bunch of useful hidden patterns and extracting them from the data, mining helps us predict the future too.
This is where we learn that by extracting some patterns from our data, data mining could give us some reports about the current condition and extract patterns that humans practically cannot, and it can also predict the future.
Why did I just say humans cannot extract such patterns?
Because there is too much data and finding relationships within all these data is practically impossible for humans.
What Is Not Data Mining?
alright, so mining is not what you see here,
No, OLAP is not the same as mining.
Many mining projects are indeed created in the form of a multidimensional OLAP, but an OLAP database does not mean mining.
I’ve seen many projects where the company has developed an OLAP database, has one fact, four dimensions, and two dashboards, claiming it has performed data mining.
this is not data mining.
Event Notification Is Not Data Mining Either.
Suppose you have a system that gives you alarms and warnings. that’s not data mining either, although you could use and get help from mining and use it in such projects.
Remember that, alright?
So, you’ll soon learn what mining is and how you should implement it in the SQL server.
Data Mining Tasks, Goals, and Applications
With a glance at the responsibilities of data mining, we could understand that it has two main tasks both of which are carried out through extracting patterns from the heart of our data:
One Is to Inform Us on The Existing Condition
And The Other Is Talking About The Future.
Data Mining Tasks
These are not algorithms.
They are a series of tasks data mining could do.
For example, it can do classification, clustering, association, regression, forecasting, or sequence analysis for you or carry out a deviation analysis.
Each of these tasks could be performed with specific algorithms.
Data Mining Applications
Let’s move on to see what applications data mining has.
Data mining has many applications; I have mentioned some of them here.
For example, you study the packets of a network in computer sciences to discover if there will be an attack in the future based on the packets’ behavior.
It could have many applications in medical sciences too.
Much professional medical software used by the experts of this field includes mining.
For example, they say, “We have mixed this medicine with that one; we want to know what the result is.”
It has got so many applications in business too.
For example, you could generate a recommendation system. This system recommends what to do or suggests you buy specific items.
It could recommend that the seller put specific items in clients’ eyeshots to buy more than they meant to.
You could get more money flowing into your organization without changing your production line.
You could send a targeted advertisement too.
You could discover insurance fraud; most insurance companies are doing it nowadays.
Mining could answer all these questions as a result.
Data Mining Algorithms in SSAS
There are 9 algorithms:
Microsoft Association Rules
Microsoft Decision Trees
Microsoft Linear Regression
Microsoft Logistic Regression
Microsoft Naive Bayes
Microsoft Neural Network
Microsoft Sequence Clustering
Microsoft Time Series
What Is an Algorithm?
You may have a key question now.
You may be asking yourself: “Well, what is an algorithm?”
you might’ve heard the names of various algorithms such as the neural network, SVM, or the decision tree.
well, what are these?
The short answer is that these algorithms are in our command to implement the two main tasks we just talked about.
For instance, there are a couple of algorithms that can only do the first data mining task which is to inform us about the existing condition.
Some other algorithms perform the second data mining task which is analyzing data to inform us about the future.
Is There an Algorithm That Can Do Both?
If it were me, I would have another question too: “Is there an algorithm that can do both?”
the answer is yes.
Several algorithms could tell you about the current situation and the future.
Well done to you!
You must be telling yourself: “When it comes to my projects, I should first determine the type of the project.
Am I gonna talk about the present or the future?”
Then, you’ll probably see that you have basically filtered the algorithms and several algorithms are automatically ignored. For example, when I say I wanna use the association rule, it means that I have already determined the type of my project.
I know that I wanna perform a market basket analysis and have chosen this algorithm accordingly.
Now, a smarter question: “After I select the type of project, is there only one algorithm that could do the work?”
The answer is no.
if the answer was yes, it would mean that there are only two algorithms each used for either of the two data mining goals, one to know the status quo and the other to predict, but that is not the case.
We have many algorithms to predict such as the neural network, the decision tree, etc.
Well, What Is The Solution Here?
If you wanna act professionally, it might be better to do the work using three algorithms, for instance, check the accuracy of the algorithms and a few other parameters, and then decide that one specific algorithm better suits your specific project (the neural algorithm, for example) and eliminate the other algorithms.
It just got more interesting!
If you have paid attention to my words, you must have noticed that I said: “Your specific project”. This means you examined the three algorithms using the data of a specific project and finally realized that the neural network algorithm, for example, has higher accuracy and better suits this project.
Now, suppose another project is handed over to you which also aims to predict, could you say that the neural network algorithm is better for this project too without experimenting with other algorithms?
Should you just start doing the work using the neural network algorithm?
The answer is no.
each project is different from the others in nature and you must go through the process for this new project again, which means you should do the evaluation again.
You might eventually conclude that the decision tree, for instance, suits this new project best.
Data Mining Goals
So, if we wanna talk a little more scientifically and sum up this discussion, there are two goals in mining projects.
we will practice achieving these two goals together in the article we’re about to get into, and you’ll practice these two goals with me.
What Is The Descriptive Goal For?
It is to create a set of patterns for description and interpretation.
We have so much data, so much data has been gathered. The algorithm extracts a series of patterns from these data, analyzes them, describes them, and tells you that these conditions exist in your data.
It’s all about discovering the relationship between data, which means it informs you about the current condition.
You might be telling yourself “I could understand the current condition myself by looking at the data”, but that isn’t really the case.
When we talk about data mining, it means that practically, there is plenty of data. These data have numerous features.
We are only humans and can work with a limited amount of data given the statistical tools we have.
When the number of data gets too large, it will be too hard or even practically impossible to analyze the data using the available statistical tools, so we have no choice but to use data mining.
What Is The Predictive Goal For?
There is also another model called the predictive goal which is one of the main goals of data mining.
it’s talking about the future. it predicts.
Well, it predicts some series of unknowns for you.
You say: “I’ll give you some input and you tell me what happens in the future”.
It predicts some unknowns.
It could predict the revenue of a company in the next month, for example.
you give it the revenue of the last ten years divided by months and it predicts how much the company is gonna make next month.
So you see, you have invested in something that keeps on giving.
If you master this science and learn to use one of the professional tools in this field (such as SQL Server) practically, which you’ll definitely learn through this article you’ll see that you have mastered the required expertise of today’s businesses.
Any business manager you talk to would enjoy it if you could do this stuff for them.
I say this based on my experience and things I have seen in many projects working with many managers. I have seen this enthusiasm in all managers, and the enthusiasm would multiply when they saw the practical outcome of the analyses.
The thing that happens next is that almost all these managers would be willing to work with you again after the project ends and you’ll do the supporting tasks of the project for the long term.
In this section, I’ve tried to answer some of your key questions and concerns.
You may get these questions and answers after years of experience, so try to answer these questions with great care in all projects before you start doing the job, and try to treat the project like a professional.
What Sciences Does Data Mining Consist Of?
There is a point here guys.
Mining is a combination of different sciences:
AI (Artificial Intelligence),
And Machine Learning.
All these sciences together form the science of data mining.
Do I Need to Master All This Knowledge?
You may ask: “Do I need to master all this knowledge?”
the answer is no.
you should just know enough about mining to get you going.
It means you don’t need to study like a postgrad student. Imagine having to sit and calculate the association rule on paper and extract all those formulas…
you don’t need to do all this, you should just know the basics.
The SQL server has implemented almost all the important algorithms.
Feed It Data,
Get The Results,
And That Is All.
If you learn just that much about each algorithm and know how to treat it, it gets your work done in many cases.
But if you wanna become a professional and be a data scientist, you could invest in machine learning.
you could have a good start with Python and machine learning.
But by the end of this article, you’ll realize that you don’t need many tools and you can implement many projects using these simple ones.
Build a Data Mining Industrial Project in SSAS
Alright, now let’s see how we can build a data mining project.
What Is The Name of The Project?
Our example here is a Market Basket Analysis.
it is one of the most common examples in the business.
What Is The Name of The Algorithm?
Our algorithm is the Microsoft Association Rule.
So the type of my project and the problem are clear, I know what I’m gonna do and I’ve selected the algorithm accordingly.
This algorithm could do my predictions using market basket analysis.
What Is a Market Basket Analysis?
it can tell me what items customers buy together, what items are sold together.
You can get these problem statements by using the market basket analysis.
You can identify the patterns in this customer’s purchases.
This lady that picks up the shopping basket and is gonna buy milk isn’t gonna buy carrots.
there’s a shopping pattern.
The one who buys this, for example, will buy coffee too. or something else.
for every item that goes into the shopping basket, something happens.
you can determine the arrangement of these items using an algorithm.
Arrange this shelf in a way that makes this lady get more items in her basket and spend more money.
In this way, someone who originally meant to buy three items may realize he’s picked five instead, five items that he needed.
The seller makes a profit and the customer will be happy to have found all he needed in one place.
What Is The Microsoft Association Rules Algorithm?
the most suitable algorithm is the association rule.
This algorithm aims to identify rules that happen together, like items that are bought together.
It identifies the patterns between them and extracts the rules governing the structure of the data, giving the client a useful output based on which he can make rational decisions.
Just remember that this algorithm cannot use continuous variables. It has a particular nature that does not work with continuous variables.
One thing that this algorithm can do for you is that if you have milk in your shopping cart and other ingredients for baking a cake. it can suggest you buy cream too.
These suggestions will be in your algorithm in the shape of patterns.
As a pattern, it says that people who have bought these items have bought this too with this percent of possibility, so there is this much possibility that this item will be purchased too.
These statements suggested by the algorithm are called “Item Set”.
Steps to Perform a Data Mining Project in SSAS
To create a data mining project in SSAS, you must perform the following steps in order:
Step 01: Data Preparation (OLTP or DW)
Step 02: Create Data Source
Step 03: Create Data Source View
Step 04: Create Mining Structure
Step 05: Deploy
Step 01:Data preparation (OLTP or DW)
The example we have here is called the data mining association rule.
Now, the thing I want to create has a database I have given you.
There is a backup file you should import into the SQL Server.
To import (Restore) this database, you must perform the following steps in order:
Step 01: Right-Click on the DatabaseOption (in Object Explorer)
Step 02: Select the “Restore Database” option from the options given to you.
Step 03: The next thing you should do is select “Device” from this window you see here.
Step 04: Click on the button with three dots on it.
Step 05: Click on the “Add” button in the window that opens.
Step 06: After you click on “Add”, all you have to do to import it is click on your backup file in the window that opens.
As I mentioned before, our database is called the data mining association rule.
Step 07:We just have to click on it and click Ok.
Do it with me so we can go forward together. All you have to do is importing this database into the SQL Server just like I did.
These are its tables:
This order table is our master table.
The order number and the customer key, it tells me what specific customer this purchase is for
and where his geographical location is,
What income he has,
and whether it is high or low.
We figured out
what database we are going to use,
what tables it has,
what fields there are in each table,
and what every field is good for.
Obviously, these two stages will not be as easy in big projects.
Sometimes you will have to discuss the data and the relationships between them for weeks to get a good understanding of the business and its data.
In the next stage, we will go to the Visual Studio environment and continue the rest of the work there.
If you want to know more about SSMS, it is recommended that you read the following article:
Now, we want to create the input of a mining structure for our algorithm.
The first step in this environment is determining the data source.
So the first thing you do is to determine where your data comes from. You have to tell the algorithm where to get its data from.
To do so, we should define the “Datamining_AssociationRule” database we prepared before as the data source.
This is the same database we just talked about in the SQL Server environment.
Here, in the Visual Studio environment, we should also determine what data to build a model on.
so we determine the data source at first.
Step 01:To create a data source, we Right-Click on “data sources” and select “new data source”.
Step 02:Click on the Next button.
Step 03:In the data source wizard window that opens, we select the second option and click on the New button.
Step 04:The connection manager opens.
This is where you should determine the name of your server so that it connects to it. I select my server here.
at the bottom, I can see the list of its databases. I select the datamining_AssociationRule database. Here, we say that our data source is this specific database.
The next thing to do is to click on “test connection” to make sure there is a good connection between the server and data source.
If everything’s fine and there is a stable connection, I click on “ok”.
Step 05:Click on the Next button.
Step 06:The next step is to select the “use this service account” option and click on the Next.
Step 07: we define a name for our project and click on “finish”.
So far, we determined the data source and selected a name for it, which is “data mining association rule”.
You could choose any name you like. We will need this name in the next step.
If you did everything right, you should see the following output:
Step 03:Create Data Source View in Visual Studio
The second step is to create a DVS.
A DVS is a graphical demonstration of the existing objects.
In fact, it demonstrates the data source you already defined in a graphical form.
There, you can see the tables of your database and the relationships between them graphically.
So at this stage, we create a DVS.
Step 01:To create a DVS, we right-click on “Data Source Views” and select the first option which is “New Data Source View“.
Step 02:Then, the data source view wizard opens where you just have to click “Next”.
Step 03:In the next window, you should introduce the data source you just created to the DVS.
If you remember, I chose the same name for the data source and the database, which is the “data mining association rule”.
At this step, we select this data source (Data Mining Association Rule) and click on “Next.”
Of course, if you haven’t already created and defined the data source, you could do it in this window too.
You have to click on “New Data Source.”
The data source wizard window opens to determine and create a data source.
Step 04:In the next window, we should select the views we need.
We have two tables in this database, and we need both. So we should add both to the projects.
To do this, all we should do is select the two tables (OrderDetailsand OrderInfo) and click on the “>>” button.
You see that both tables are transferred to the right (3).
Then, we click on “Next.”
Step 05:In the end, we determine a name for our DVS, I am going to use the name “Data Mining Association Rrule” for the DVS too and hit “Finish“.
Final Result:My DVS is now created and the DVS is nothing but a graphic demonstration of the existing objects. So, this is it.
This step was the second step in the Visual Studio environment.
And now, let’s go to the third step in this tool.
Up to here, we created both the data source and the DVS.
Step 04:Create Mining Structure in Visual Studio
The third step we should do in the Visual Studio environment is build our mining structure. This step is the main work and where we determine the algorithm and its inputs and outputs.
Just like the last step, we should move along with the process.
Step 01:We right-click on “Mining Structure” in the “Solution Explorer” window and select “New Mining Structure.”
Step 02:Alright, the data mining wizard window opens now, where we should click “Next“.
Step 03:In the next window, we should select the method for mining structure which can be from the relational database, data warehouse, or cube.
I will select the first option (Relational Database) and click “Next.”
Step 04:Now, we should select the algorithm we are going to use in the project, which is the Microsoft association rule.
I select the algorithm (Microsoft Association Rule) and click “Next.”
Step 05:Now, it asks us where this association rule algorithm should get its data from. From this DVS (Data Mining Association Rule) you have already created. Fanally, I click “Next.”
Now, you better understand why we did in the previous steps.
Step 06:Now that you have determined your DVS, you should determine which table is the case and the nested one. You should know why we need the case and nested case tables if you have paid attention initially.
You probably guessed that we should consider the OrderInfotable as the caseand the OrderDetailstable as the nested case.
I click “Next.”
We go forward and continue the work.
Step 07:in the next window, you see the tables and their fields and you should determine and arrange the columns that are going to be used in your analysis. At this stage, you should determine three things:
The table keys,
What fields are going to be considered as the algorithm’s input,
And finally, what field is your output based on which the prediction should be made
Here, we consider the two fields of “Model” and “Order Number” as our table keys.
What do I want as the algorithm’s input?
I ask it to consider “IncomeGroup” as the input in the patterns it creates. I hit “Input” here, and it selects it on the other side as well.
Then, I ask it to consider the purchased items as nested key in the nested table since they are the key to our analysis. If you don’t understand this part, make sure to rewind and reread it.
I determined the nested key here, and I asked it to predict the outputs I want: Predict the items a customer is going buy or give me the new patterns forming in my items as the outputs. You select these.
Now, I hit “Next”.
Step 08:It tells me which variable is discrete and which one is key. It figures itself out to some extent, but you could correct its mistakes or change content types.
I hit “Next” here.
Step 09:in this window, i have to determine the percentages for “train” and “test”.
Now, we accept these percentages.
Step 10:If you wanna have the “drill through” capability to get the details of data, you can determine it here.
You choose a name for it too. I will select the same name I’ve been using and hit “finish” so the model is built and the mining structure is built.
Final Result: Now, the model is built and the mining structure is built.