Use cases of window functions in spark

Dipawesh Pawar
2 min readMar 30, 2021

Do you want to do some computation for each row in dataframe utilizing information spread out in whole dataframe? Well, window functions would certainly help you.

So, what is window? Let’s take a sample dataframe as below. It consists of employee information such as employer, department, employee id, joining , leaving date and salary.

sample dataframe

Now let’s create some scenarios and see whether windowing can help.

1. how companies spending over employees salary has increased over time

Here we want to see how company expenditure over employee salary varied from time when first employee joined to time when last. In order to compute this for an employee we need employees information who joined before this employee. And rows of these employees would constitute a window here. We can just sum over salaries of these employees and get how much this employee has added to companies expenditure.

For creating such windows for each employee, we can just partition by company, order by employee joining date and consider rows from start till current row. Code will be as follows:

how companies spending over employees salary has increased over time

As we can see in above picture, Microsoft’s spending over employee salary has increased from 1400 to 10600 whereas Oracle’s has increased from 1000 to 7000.

Here we have used rowsBetween() function as we wanted to construct window just using information of rows till now independent of any column. Later we will see a scenario where we will need to construct window on the basis of some column value. There we will use rangeBetween() function.

2. how companies spending over departments has increased over time

It’s very similar to previous scenario. Only difference is here we need partitioning over two columns. code snippet is as follows:

how companies spending over departments has increased over time

--

--