Tuesday, May 22, 2012

Some Interesting Oracle Analytic Functions

I had to write some reports for a personal application of mine last night and needed to expand beyond my usual set of analytic functions that I normally use. I thought it would be a good idea to do a quick blog post on these analytic functions.

If you've never heard of or used Oracle analytic functions please read this article first. It's by far the best article I've read at explaining what analytic functions are and how to use them. The Oracle analytic function documentation can be found here.

RATIO_TO_REPORT

RATIO_TO_REPORT compares the current value against the sum of the other set of values. The following example shows the percentage of each employees salary when compared to the sum of their department's salary. You'll notice that you can calculate the same value using a different method (sal_dept_ratio2) which may be why I haven't seen RATIO_TO_REPORT used that often.
SELECT d.dname, e.ename, e.sal, 
  round(ratio_to_report(sal) OVER (PARTITION BY e.deptno),2) sal_dept_ratio, 
  round(sal / sum(e.sal) OVER (PARTITION BY e.deptno),2) sal_dept_ratio2
FROM emp e, dept d
WHERE e.deptno = d.deptno;

-- Result

DNAME          ENAME             SAL SAL_DEPT_RATIO SAL_DEPT_RATIO2
-------------- ---------- ---------- -------------- ---------------
ACCOUNTING     CLARK            2450            .28             .28
ACCOUNTING     MILLER           1300            .15             .15
ACCOUNTING     KING             5000            .57             .57
RESEARCH       FORD             3000            .28             .28
RESEARCH       SCOTT            3000            .28             .28
RESEARCH       JONES            2975            .27             .27
RESEARCH       SMITH             800            .07             .07
RESEARCH       ADAMS            1100             .1              .1
SALES          WARD             1250            .13             .13
SALES          MARTIN           1250            .13             .13
SALES          TURNER           1500            .16             .16
SALES          JAMES             950             .1              .1
SALES          ALLEN            1600            .17             .17
SALES          BLAKE            2850             .3              .3

NTH_VALUE

NTH_VALUE returns the nth row in the window clause. It's extremely important that you explicitly define the window or your values may not make sense. I'll write another post explaining this later. Update: NTH_VALUE Windowing Clause

The following example shows the 2nd highest salary in each department.
SELECT d.dname, e.ename, e.sal, 
   nth_value(e.sal, 2) OVER (
    PARTITION BY e.deptno ORDER BY e.sal DESC 
    -- Don't forget to define the window
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) sec_high_sal_dept
FROM emp e, dept d
WHERE e.deptno = d.deptno;

-- Result

DNAME          ENAME             SAL SEC_HIGH_SAL_DEPT
-------------- ---------- ---------- -----------------
ACCOUNTING     KING             5000              2450
ACCOUNTING     CLARK            2450              2450
ACCOUNTING     MILLER           1300              2450
RESEARCH       FORD             3000              3000
RESEARCH       SCOTT            3000              3000
RESEARCH       JONES            2975              3000
RESEARCH       ADAMS            1100              3000
RESEARCH       SMITH             800              3000
SALES          BLAKE            2850              1600
SALES          ALLEN            1600              1600
SALES          TURNER           1500              1600
SALES          WARD             1250              1600
SALES          MARTIN           1250              1600
SALES          JAMES             950              1600

LISTAGG

LISTAGG was a highly requested feature that was implemented in 11gR2. Overall LISTAGG allows you to group values in multiple rows and put them in a comma delimited column on one row.

The following example (which is not an analytic function) shows all the employees for each department:
SELECT d.dname,
  listagg(e.ename, ',') WITHIN GROUP (ORDER BY e.ename) dept_emp_list
FROM emp e, dept d
WHERE e.deptno = d.deptno
GROUP BY d.dname;

-- Result

DNAME          DEPT_EMP_LIST
-------------- ----------------------------------------

ACCOUNTING     CLARK,KING,MILLER
RESEARCH       ADAMS,FORD,JONES,SCOTT,SMITH
SALES          ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
Using the LISTAGG as an analytic function you can see each employee's colleagues in their department, including themselves.
SELECT d.dname, e.ename,
  listagg(e.ename, ',') WITHIN GROUP (ORDER BY e.ename) 
    OVER (PARTITION BY e.deptno) dept_colleagues
FROM emp e, dept d
WHERE e.deptno = d.deptno;

-- Result

DNAME          ENAME      DEPT_COLLEAGUES
-------------- ---------- --------------------------------------

ACCOUNTING     CLARK      CLARK,KING,MILLER
ACCOUNTING     KING       CLARK,KING,MILLER
ACCOUNTING     MILLER     CLARK,KING,MILLER
RESEARCH       ADAMS      ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH       FORD       ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH       JONES      ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH       SCOTT      ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH       SMITH      ADAMS,FORD,JONES,SCOTT,SMITH
SALES          ALLEN      ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES          BLAKE      ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES          JAMES      ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES          MARTIN     ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES          TURNER     ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES          WARD       ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

NTILE

NTILE allows you to divide your values into buckets and see which bucket the current row belongs to. For educational institutions this is a very good function to let students know that they're in the top x% of the class.

Update (12-Apr-2014): NTILE vs WIDTH_BOX describes the different ways to bucket data in Oracle. It also explains why SAL 1250 exists in both buckets 2 and 3.

The following example shows the 3 tiers (or buckets) of salaries across the entire company. It allows you to easily see who's in the top 33% of salaries.
SELECT d.dname, e.ename, e.sal,
  ntile (3) over (order by sal desc) three_tier_sal
FROM emp e, dept d
WHERE e.deptno = d.deptno;

-- Result

DNAME          ENAME             SAL THREE_TIER_SAL
-------------- ---------- ---------- --------------
ACCOUNTING     KING             5000              1
RESEARCH       SCOTT            3000              1
RESEARCH       FORD             3000              1
RESEARCH       JONES            2975              1
SALES          BLAKE            2850              1
ACCOUNTING     CLARK            2450              2
SALES          ALLEN            1600              2
SALES          TURNER           1500              2
ACCOUNTING     MILLER           1300              2
SALES          WARD             1250              2
SALES          MARTIN           1250              3
RESEARCH       ADAMS            1100              3
SALES          JAMES             950              3
RESEARCH       SMITH             800              3

8 comments:

  1. Really cool post Martin! Discovered new things today :) I've used in the past before discovering LISTAGG the undocumented wm_concat() which basically groups&concatenates comma separated rows but LISTAGG is the way to go here!

    ReplyDelete
  2. Brilliant and useful; thanks so much.

    ReplyDelete
  3. Great post. I tested and deployed listagg() in an interactive report within ten minutes of reading your article.
    It was exactly what I'd been looking for.

    Many thanks. See you at KScope12!

    ReplyDelete
  4. Realy interesting analytic functions for me, too. I using and very like ROW_NUMBER(). NTH_VALUE() function new in (oracle database 11g).

    Thanks for this interesting post.

    Regards
    Mahir M. Quluzade

    ReplyDelete
  5. Good Stuff but I do not understand how the same salary can be in two different buckets.

    ReplyDelete
    Replies
    1. Hi Rick,

      I'm not sure what your question is regarding the two different buckets. Can you please be more specific.

      Martin

      Delete
  6. Hi Martin
    It is probably my misunderstanding. In the ex. Pertaining to salaries in the 3 buckets I was curious how the salary 1250 is in both buckets 2 and 3. It is probably just a function of sorted 15 rows / 3

    ReplyDelete
    Replies
    1. Hi Rick,

      Excellent question. Long story short, the same salary can exist in two buckets because of how Oracle "buckets data". I've written a follow up article detailing this: http://www.talkapex.com/2014/04/ntile-vs-widthbucket.html I've also updated the NTILE notes referencing that article in case anyone else has the same question.

      Martin

      Delete