What do we aim to do with Data Science?

This article series consists of 4 main parts and this article is “What is data science?” is the third in the series.

Image for post
Image for post
Photo by https://d-aim.com/consulting-data-science/

Note: I usually will use some abbreviated words below:

  • Data Science — DS
  • Artificial Intelligence — AI
  • Machine Learning — ML
  • Big Data — BD
  • Deep Learning — DL

Lets talk about the aim of DS:

  • Making sense of the past with data!

While talking about what DS aims at, I said that we want to get useful information about the past from the data. Let’s open this up a bit and take a look at how we can find useful historical information from data:

For example, in your company, you have been given data on the number of past sales and asked to study how exchange rate movements affect sales. Let’s list the path you will follow roughly as follows:

Image for post
Image for post
Photo by https://www.msci.com/www/blog-posts/what-fed-monetary-policy-has/01244264353
  1. You will examine the sales data and try to fix the problems you identified in the data.

2. You will gather data that you think is useful in your analysis. For example, while the currency affects sales volume, it does so through several channels:

i) It affects your sales as it affects people’s purchasing power.
ii) It affects the demand for your products as it affects the prices of your products due to imported inputs.
iii) It affects the use of credit as it affects interest and therefore affects the demand for your products etc. In summary, in order to control the above influence channels in your analysis, you should consider variables such as Gross National Income, consumer loan interest and exchange rate pass-through of your products in your analysis and prepare the relevant data. Note that taking all these effects into account requires not only a mere statistical knowledge, but also a field knowledge of your field.

3. After the steps above, you applied a classic “data analysis workflow” and reached the findings of the effect of exchange rates on your sales volume.

The finding you have achieved is invaluable to your company. You have done a job to make sense of the past, but you have put forward one of the important building blocks of your company in formulating the exchange strategy to be followed in the future.

The example above is one of those in which studying the past aims to illuminate the future. In addition, there are things that need to be done just to understand the past. For example, measuring the impact of the development of Atlantic trade in the 15th century on Mediterranean trade.

  • Living the Moment with Data

Let’s get to now. You know, there is a pattern like “living the moment”. It is an important issue to expand the moment we live in in terms of information because the most important thing we can control about the future is “now”. So how can data help us? Let’s start with an affair from Rumi:

Image for post
Image for post
Photo by https://medium.com/@mervebdurna/what-is-cross-validation-how-does-it-work-1494774519e4

One day a man builds a house and makes a deal with the walls of the house. Before the walls come down, he will inform the landlord so that the landlord and his family are not harmed. Years pass and one day suddenly the walls fall. The man calls out to the walls crumbling in tears and anger:

  • You know we had a deal?
  • What were you going to let me know? A noise is heard from the walls:
  • We tried to report. When we opened our mouth to say something, you came with a mud and covered our mouth.

Let’s adapt this story to the present. You have a factory and you produce high value-added products in this factory with hundreds of machines in a thousand and one integration and automation. You have such machines that if they do not work, your entire production process comes to a halt. If one day these machines fail and your production fails, who will you be angry with?

The phenomenon known as the internet of things is so common. All these machines actually generate data about themselves and their work every second. If you can distill information from this data and detect anomalies in your machines, you can have time to take precautions. Here’s what you will do roughly:

  1. You will set up an infrastructure to receive and store data streaming live on your machines.
  2. You will learn from the mistakes you have made in the past. In other words, you will identify machine failures in your historical data and design statistical models that can capture failure situations.
  3. In addition to the second item, you will also design statistical models to detect situations that you might call anomalies.
  4. You will try to understand what your machines say without interrupting them.

In summary, it is possible for you to experience the moment you live with data more informed.

  • Pursuing the Future with Data

Suppose the word “future” means “unknown”. The purpose here is to devote this section to the prediction of unknown / unseen things independent of the concept of time. If we consider that temporally future things are also things that are not yet known / seen, you can ignore the assumption here. However, let’s begin with the examination of a future example from time to time.

Let’s say you have some money to invest in. A heap of alternative investment tools: Stock market, currency, gold, deposits, real estate, Bitcoin, Ethereum, Ripple etc. Which one will you invest in? In what time do you expect a return? You have historical data for all these investment alternatives. If you can model the pricing of each of the alternatives, you can also predict future prices. Take the stock market for example and create a simple workflow:

  • You have drawn the historical list of all the stocks listed on the stock exchange from BIST(Borsa Istanbul).
  • You have also reached the data that you think affects the share prices. For example, you now have data on the balance sheets of some companies as well as macro variables such as interest rates, exchange rates, inflation and Gross Domestic Product.
  • You have processed the data you have to use in your models.
  • You have determined your alternative models to choose the best among them.
  • You have decided to use the floating windows method to measure the performance of your models. You will move forward by designing a pseudo future in your historical data. For example, with 20 periods previous data, you will predict the 21st period as if it were to come. Then you will slide your window one period and predict the 22nd period with the previous 20 periods. So you will come until the last period.
  • Simply by averaging your model’s predictive performance, you will know how well your model is doing.
  • You have repeated the fifth and sixth items for all your model alternatives and selected the best performing model.
  • Using the model you have chosen, you can now predict the real future with the last 20 periods of data in your hand!

So, do you like it?
When you can do something similar to the above for all alternative investment vehicles, you have a data-based prediction of where you should invest your money.

Image for post
Image for post
Photo by https://t24.com.tr/haber/borsa-istanbul-da-yeni-rekor,857008

But the unknown is not just the future!

For example, you want to design a driverless car navigation system! What will you do? Will you code the rules that suggest how your vehicle should behave on which road by traveling all the roads on earth? Of course this is impossible. What you’re going to do is let your car learn how to move on roads it has never seen, by inputting what needs to be done on a sufficient number and variety of roads.

Let’s give one more example.

You run an educational institution, a school. You have received 100 new enrollments in the 1st grade of secondary school. You have some information about each student, but you do not know how to group these 100 students, who to put them in which classes.

What do you do?

At least, your data has something to tell you. You should listen to them first. If the number of new classrooms you will open for 100 students is 5, for example, you can start by dividing these 100 students into 5 groups according to the data you have. What if the data in your hand guides you and these 5 groups are in your mind as well.

Didn’t it get in your head?

So maybe you can try to open 6 classrooms or 4 classrooms. Maybe you will create more coherent groups. As long as you know how to decipher the language of the data.

My next post is “Is Data Science = Artificial Intelligence or Machine Learning?” See you…

My other Articles:

References

  1. https://d-aim.com/consulting-data-science/
  2. https://datajarlabs.com/blog/
  3. https://www.veribilimiokulu.com/nasil-veri-bilimci-olunur/
  4. https://www.msci.com/www/blog-posts/what-fed-monetary-policy-has/01244264353
  5. https://medium.com/@mervebdurna/what-is-cross-validation-how-does-it-work-1494774519e4
  6. https://t24.com.tr/haber/borsa-istanbul-da-yeni-rekor,857008

Written by

Experienced Ph.D. with a demonstrated history of working in the higher education industry. Skilled in Data Science, AI, Deep Learning, Big Data, & Mathematics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store