Almost every aspect of business is open to data collection. Data will be later used to obtain useful information which can be of help to improve quality of decision-making. You can even increase quantity of decision-making by automating decision processes that have to be performed on a regular basis. From data, information technology can be used to report periodically (database querying perspective), to describe aspects of interest (basic statistics perspective) or to find patterns (data mining perspective). All of these aspects are closely linked to each other and, frequently, more sophisticated techniques such as data mining rely on more basic techniques such as database querying. Extracting useful knowledge from data requires a structured approach that helps us to find the right way to the goals we pursue. One can face problems without any methodology and succeed but it is always useful to have a map when you go to the mountains. When facing business problems, a very useful framework can be found in CRISP-DM (Cross Industry Standard Process for Data Mining). Business understanding, data understanding, data preparation, modeling, evaluation and deployment are the main phases in which a DM project can be broken down. We will come back later (hopefully in a future post) to CRISP. Keeping in mind that a method is available for us is enough for now. The data perspective leads us to transform raw data into features that better represent the problem we are dealing with. Let’s see this with an example. Suppose you are an IT manager of a medium-size company and suppose you are told to suggest different alternative processes to reduce cash management costs using available IT resources of the company. Yes, my friend, it is all about money! It is also very likely that you barely know how cash managers deal with receipts and payments day after day. The only thing you know for sure is that they are always angry. What is the first thing you can do to face cash management problem from a data point of view? Think for a while… yes, you are right! The first thing you can do is writing down a complete definition of the process. How? The more data you use, the better. Cash is the life blood of a company and has different origins and different ends. Incoming cash comes from customers, banks, shareholders or even from public institutions in the form of grants or subsidies. Outgoing cash goes to vendors, employees and, again, banks, public institutions and shareholders in the form of taxes and dividends respectively. Incoming and outgoing, receipts and payments, collections and disbursements, inflows and outflows… At least, it seems there is a clear thing here: cash management is a problem with two dimensions, with two directions, “from” the company and “to” the company. Ok, one step closer to the finish line. Let’s move on to the next one. An exploratory analysis of available data will show you that, apart from identifying cash flow related entities such as customers and vendors, a number of interesting features can be used to get a deeper knowledge of the problem. Currencies, exchange rates, payment modes, payment terms, transaction dates, country of origin of customers and vendors are good examples of what we may be interested in. Ultimately, cash flow management is an exercise of speeding up collections to do payments on time. In this sense, trying to predict the future will likely help you to make the right decision in the present. This is not much different to predicting the customers’ payment behavior and predicting our own payment behavior. Surprisingly, sometimes it is easier to do the former than the latter.
Here you can find two useful references related to this post:
- Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 Step-by-step data mining guide.
- Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking.