All posts by Paco Salas

It’s a long way to the top (if you wanna rock ‘n’ roll)

Yes, it’s a long way to the top, especially if you want to publish your first paper in a journal ranked in the first quartile of Economics. But, after a year of hard work, good news arrived at the end of 2016. The International Journal of Forecasting decided that my first paper, Empowering cash managers to achieve cost savings by improving predictive accuracy, deserved publication. It has been a long way to the top, but I wanted rock ‘n’ roll. At several moments of the process of writing, submission and revision, I really thought that I was on a highway to hell.

The underlying idea of the paper was simple: if predictive accuracy is a good thing, then cost savings in cash management must be correlated with better forecasts.  Even though the idea was simple, the process to reformulate the idea as a convincing message was not so easy. After providing the necessary background about the cash management problem, we introduced the main characteristics of the cash flow data sets used in the experiments. Then, we proposed five different forecasters for comparative purposes: autoregressive, linear regression, radial basis functions, random forests and a seasonal interaction model (the latter suggested by a reviewer).  As an evaluation algorithm we relied on a time-series cross validation procedure by Hyndman and the winners were… random forests (for data set 1) and linear regression (for data set 2).

The crucial question came next: do better forecasts produce better cash management policies?  We expected so, but we had to show it empirically. To this end, we implemented a recent cash management model presented in the literature that used cash flow forecasts as a key input. We tested a wide range of cash flow forecasts with different accuracies that resulted in a thunderstruck  or, in other words, in an empirical confirmation of the savings hypothesis: better forecasts produce better cash management policies in terms of cost.  As a final contribution, we proposed a general methodology to help cash managers estimate whether their efforts in improving the predictive accuracy are rewarded by proportional cost savings.

The work was done and we had no other thing to do than waiting for the good news (if any). The e-mail confirmation arrived at the end of 2016 and I felt like if  you shook me all night long.

Thanks to AC/DC for providing the appropriate songs to illustrate my thoughts.

Advertisements

Python for finance (I). Remote data access

With this one I begin a series of posts related to explore the link between Python and finance that I expect to be of interest for both professionals and students. Python is an interpreted, object-oriented, high-level programming language that it is simple and easy to learn. Ok, but why Python? First, because it is open source; and second, because in many application domains, such as business or finance, spreadsheets are not enough. Decision-making is no longer a soft science based on opinions or visions about the future. It must be based on data. Descriptive analytics is not enough and a new kind of technology is required, that is, predictive analytics. In this sense, any available tool that can be used to make things happen (data mining, data analysis, predictions, optimizations, simulations, visualizations, … ) in an easier has to be added to the executive’s toolbox. Before going on, a warning remark must be done. Not every data scientist needs to know the secrets of management science, but every manager has to know the power of data science. Otherwise they will be condemned to be shadowed by a long tail of underperformance.

Let us start with the proposed framework to begin learning Python. No doubt that the winner is IPython Notebook, or Jupyter Notebook, if you like. A notebook is an application that allows you to create and share documents that contains code, visualizations and explanatory text that makes data analysis easier to understand and reproduce. Its utility has been recently highlighted by Helen Shen in Nature where even an interactive demo is available. If you are now discovering notebooks this demo is a good starting point.

For those who Python or any other programming language is something familiar, I hope that the following lines of code represent a good example of the utility of Python in making data analysis easier. Assume you are responsible for monitoring foreign exchange rates in your company. One of your common daily tasks is accessing some finance web site in order to get the required information. You will later add the data achieved to your database and begin with analysis tasks. What if we look for an easier way to get the data? Can we somehow automatize this task? The answer is yes, and the implication is great. We can dedicate our time to data analysis rather than to data fetching. A few lines of code do the hard work for us:
RDA2

CUR

This is only an example of accessing finance data from Quandl’s API but many other data providers are available (Google Finance, Yahoo Finance, St. Louis Fed, …) for your data analysis projects. Portfolio optimization, economic analysis and many other data-driven decisions processes are easier when some simple instructions are given to the right API.

The cost of not predicting

photo credit: <a href="http://www.flickr.com/photos/36196762@N04/4930275692">Army Photography Contest - 2007 - FMWRC - Arts and Crafts - Eye of the Holder</a> via <a href="http://photopin.com">photopin</a> <a href="https://creativecommons.org/licenses/by/2.0/">(license)</a>

We live in a predictive world and we are certainly predictive beings. No matter if you either accept it or ignore it. There is no choice for us. In the most elementary learning process a predictive task has to be carried out. Sometime in the past, we learnt that fire burns and now we keep away from it because we unconsciously predict that if we get too close we are going to be seriously damaged.

Bearing in mind that we are predictive beings can be of great help, especially when you are still mentally young enough to follow learning disregarding your age. Our ability to predict represents a solid starting point to any decision to be made. The better our predictions, the better our decision-making. Because of that, we usually consider predictive accuracy as a measure of performance of our predictions. Predictive accuracy is only a quantitative representation of how good our predictions are in comparison to the real observed values. However, predictive accuracy is not enough in real world problems. Cost analysis needs to be taken into account and needs to be performed in a broad sense.

Let’s see an example. Assume you are the cash manager of an important company. Assume you currently hold a high amount of money in your bank account. I know this last assumption is nowadays an unlikely one but assume it just for illustration purposes. Simple cash management models consider this holding position to have a cost because of the not achieved returns of alternative investments. In order to transform this idle money into productive money but with low risk you may decide to invest this extra money in, for example, treasury bills. That’s perfect! You will not only maintain the same amount of money at the maturity date but with a small amount on top of that. However this transaction has also a cost. No problem. As long as this cost is smaller than the amount of money obtained from the investment the final result will remain positive. That’s true. But let’s go on and see what happen in our example. The following day an unexpected large payment has to be done within the next 10 days and the amount of money left in your bank account after the investment is not enough to cover this payment. If you are lucky you may sell the treasury bill without losing too much and have enough money to face the payment. Again a transaction cost will be charged to your profit and loss account. This extra cost is an example of the cost of not predicting.

No one knows if this sudden payment can be predicted or not but there is one certain fact: not predicting has an associated cost. Any cost related to the lack of prediction can be reduced by attempts to reduce uncertainty using the best techniques available. In our illustration example, cash managers can use different cash management models that allow them to reduce cost by using cash flow forecasts as a key input to the models. In general, the cost of not predicting can be viewed as the benefit not achieved or the incurred costs associated to the fact of the unavailability of a prediction system. However, there is another type of costs associated to the design and implementation of any prediction system that must be considered. The cost of different predictive approaches to any particular problem may be totally different making projects feasible or not. In most of the cases both kinds of costs are unknown but can be estimated. This cost estimation task is a good starting point in any data mining project. A model that makes quick and cheap predictions will probably have cost savings in both the task performed and the saved executive time. The next question is: what to predict?

photo credit: <a href=”http://www.flickr.com/photos/36196762@N04/4930275692″>Army Photography Contest – 2007 – FMWRC – Arts and Crafts – Eye of the Holder</a> via <a href=”http://photopin.com”>photopin</a&gt; <a href=”https://creativecommons.org/licenses/by/2.0/”>(license)</a&gt;

People get married in summer and prefer to die in winter

The presence of seasonal patterns in many time series is not a big discovery. However, from time to time, interesting seasonal data come to you to teach you something it was no so obvious, at least for you. This happened to me recently when I was working with data of weddings and deaths in Spain for the last years in order to fit some basic ARIMA models to the data. I had an explanation for why do people decide to get married in summer because of the weather but the decision to die is not so clear, at least under normal circumstances.

Weddings and deaths in Spain from 2007 to 2013. Source: http://www.ine.es

Before seeing this plot, I would have betted a couple of beers that there was no relationship between the month and the number of deaths. Surprisingly, the reverse is true. It seems that the weather not only has an effect on weddings (nobody likes rain in his/her wedding day) but also on deaths, and the reasons are far from clear. One can think of a number of causes for this behavior but this is another question. In this post, my purpose is to show the importance of seasonality in predicting. In the context of general regression models, years, months, weeks and days are usually more important than we expected. Most of the times, plots rather than data tables allow practitioners to a better understanding of what is going on. In the search for features that are able to explain the behavior of any target variable, we are prone to look for a number of more difficult to obtain features, in terms of time and money. However, when dealing with time series forecasting, there are several features such as calendar variables that are cost free but that may be able to explain a remarkable part of the variance of the target variable. Two examples are weddings and deaths grouped by month but there are many other examples were seasonality is also present. Electricity consumption by hour, the effect of the day of the week and the January effect in stock market returns are examples on how seasonality and other calendar anomalies can lead users to important savings and benefits if they take into account these effects in their predictions. In the corporate cash management problem, one of the most important tasks is daily cash forecasting where the day-of-month and the day-of-week plays a critical role in the ability to predict next day cash flow. Setting one or two fixed days of payment per month is a common business practice so that cash flow highly depends on the day of the month that occurs. In this sense, the works on daily cash flow forecasting by Stone, Stone and Wood, and Miller and Stone in the seventies and eighties focus on the ability of calendar dummy variables for predictions. My first steps in daily cash flow forecasting confirm that the use of these calendar dummy variables results in better forecast accuracy. Remember, seasonality is present in weddings and deaths but also in cash flow forecasting and many other fields of interest for researchers. You can find useful information about seasonality in daily cash flow forecasting in the next references.

  • Stone, B. K. (1976). The payments-pattern approach to the forecasting and control of accounts receivable. Financial Management, 65-82.
  •  Stone, B. K., & Wood, R. A. (1977). Daily cash forecasting: a simple method for implementing the distribution approach. Financial Management, 40-50.
  •  Miller, T. W., & Stone, B. K. (1985). Daily Cash Forecasting and Seasonal Resolution: Alternative Models and Techniques for Using the Distribution Approach. Journal of Financial and Quantitative Analysis20(03), 335-351.
  •  Stone, B. K., & Miller, T. W. (1987). Daily cash forecasting with multiplicative models of cash flow patterns. Financial Management, 45-54.

Facing business problems from a data point of view

Almost every aspect of business is open to data collection. Data will be later used to obtain useful information which can be of help to improve quality of decision-making. You can even increase quantity of decision-making by automating decision processes that have to be performed on a regular basis. From data, information technology can be used to report periodically (database querying perspective), to describe aspects of interest (basic statistics perspective) or to find patterns (data mining perspective). All of these aspects are closely linked to each other and, frequently, more sophisticated techniques such as data mining rely on more basic techniques such as database querying. Extracting useful knowledge from data requires a structured approach that helps us to find the right way to the goals we pursue. One can face problems without any methodology and succeed but it is always useful to have a map when you go to the mountains. When facing business problems, a very useful framework can be found in CRISP-DM (Cross Industry Standard Process for Data Mining). Business understanding, data understanding, data preparation, modeling, evaluation and deployment are the main phases in which a DM project can be broken down. We will come back later (hopefully in a future post) to CRISP. Keeping in mind that a method is available for us is enough for now. The data perspective leads us to transform raw data into features that better represent the problem we are dealing with. Let’s see this with an example. Suppose you are an IT manager of a medium-size company and suppose you are told to suggest different alternative processes to reduce cash management costs using available IT resources of the company. Yes, my friend, it is all about money! It is also very likely that you barely know how cash managers deal with receipts and payments day after day. The only thing you know for sure is that they are always angry. What is the first thing you can do to face cash management problem from a data point of view? Think for a while… yes, you are right! The first thing you can do is writing down a complete definition of the process. How? The more data you use, the better. Cash is the life blood of a company and has different origins and different ends. Incoming cash comes from customers, banks, shareholders or even from public institutions in the form of grants or subsidies. Outgoing cash goes to vendors, employees and, again, banks, public institutions and shareholders in the form of taxes and dividends respectively. Incoming and outgoing, receipts and payments, collections and disbursements, inflows and outflows… At least, it seems there is a clear thing here: cash management is a problem with two dimensions, with two directions, “from” the company and “to” the company. Ok, one step closer to the finish line. Let’s move on to the next one. An exploratory analysis of available data will show you that, apart from identifying cash flow related entities such as customers and vendors, a number of interesting features can be used to get a deeper knowledge of the problem. Currencies, exchange rates, payment modes, payment terms, transaction dates, country of origin of customers and vendors are good examples of what we may be interested in. Ultimately, cash flow management is an exercise of speeding up collections to do payments on time. In this sense, trying to predict the future will likely help you to make the right decision in the present. This is not much different to predicting the customers’ payment behavior and predicting our own payment behavior. Surprisingly, sometimes it is easier to do the former than the latter.

Here you can find two useful references related to this post:

  • Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 Step-by-step data mining guide.
  • Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking.

Back to the roots and leap forward

The size of business data bases may continue to increase on a daily basis as a result of decision-making. This information is there for those who need it, for those who want to use it and make better decisions. Not using it would be similar to leaving the tap on wasting water. This fact becomes the starting point for this blog: extracting useful knowledge from data to improve quality of business decision-making. The focus will placed on predicting any target variable that can be of help for business problems. This allows you to go back to the roots, to achieve a deeper knowledge of the problem you are interested in by looking at the data gathered. Different strategies and techniques are available to use data as a solid foundation to leap forward to a better solution to the problem you face every day rather than basing decisions on intuition. All the principles, processes and techniques to do so can be grouped under the name of data science or data mining. More precisely, data-driven decision-making refers to the practice of basing business decisions on data. This practice is the origin of all the works that ultimately will lead to the writing of my PhD thesis and this blog is one of these works. I expect this blog to record the progress of the necessary learning process to finally obtain a piece of interesting research in the area. Relevant information about partial goals achievement will be presented and any suggestion will be welcome. I hope readers to enjoy as much as I do researching and writing about data-driven decision-making.