23

This answer to shows how to produce High/Low/Open/Close values from a ticker:
Retrieve aggregates for arbitrary time intervals

I am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for first_value().

So far, I have tried two queries:

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5)
    ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC
    )
ORDER BY cstamp;

Here's a piece of the result:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004

As you can see, 77.8 is not what I believe is the correct value for first_value(), which should be 77.0.

I though this might be due to the ambiguous ORDER BY in the WINDOW, so I changed this to

ORDER BY cstamp ASC 

but this appears to upset the PARTITION as well:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000

since the values for max and last now vary within the partition.

What am I doing wrong? Could someone help me better to understand the relation between PARTITION and ORDER within a WINDOW?


Although I have an answer, here's a trimmed-down pg_dump which will allow anyone to recreate the table. The only thing that's different is the table name.

CREATE TABLE wtest (
    cstamp timestamp without time zone,
    price numeric(10,5)
);

COPY wtest (cstamp, price) FROM stdin;
2013-03-29 09:04:54 77.80000
2013-03-29 09:04:50 76.98000
2013-03-29 09:29:51 77.00000
2013-03-29 09:29:41 77.80000
2013-03-29 09:26:18 77.00000
2013-03-29 09:19:14 77.00000
2013-03-29 09:19:10 77.00000
2013-03-29 09:33:50 76.00000
2013-03-29 09:33:46 76.10000
2013-03-29 09:33:15 77.79000
2013-03-29 09:30:08 77.80000
2013-03-29 09:30:04 77.00000
\.
4
  • 1
    It would make things easier if you provided a test table, for instance as SQLfiddle (random example). Mar 30, 2013 at 21:09
  • Sure the first order by make all rows within the partition the same. So that can make any of the rows fist or last. Now the second is the correct one. It looks to me it should work. There is something missing in your report. As @Erwin told you produce test data where that can be reproduced. Mar 30, 2013 at 21:27
  • @ErwinBrandstetter I'm sorry about the missing test table. I've added one in at the end of the question. I'm afraid I found SQLFiddle a bit intimidating -- I don't get my feet quite this wet very often. Mar 30, 2013 at 22:18
  • BTW, I've only been messing with Postgresql for the past few days, having moved partially from being a MySql user. I'm really impressed. Mar 30, 2013 at 22:25

3 Answers 3

27

SQL Fiddle

All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...):

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
    ORDER BY cstamp
    range between unbounded preceding and unbounded following
    )
ORDER BY cstamp;
1
  • Thank you very much. As you were answering, I came across the frame clause and it does, indeed, fix things. What a day! Mar 30, 2013 at 22:16
21

Here's a quick query to illustrate the behaviour:

select 
  v,
  first_value(v) over w1 f1,
  first_value(v) over w2 f2,
  first_value(v) over w3 f3,
  last_value (v) over w1 l1,
  last_value (v) over w2 l2,
  last_value (v) over w3 l3,
  max        (v) over w1 m1,
  max        (v) over w2 m2,
  max        (v) over w3 m3,
  max        (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
  w1 as (order by v),
  w2 as (order by v rows between unbounded preceding and current row),
  w3 as (order by v rows between unbounded preceding and unbounded following)

The output of the above query can be seen here (SQLFiddle here):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  1 |  1 |  4 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  2 |  2 |  4 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  3 |  3 |  4 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Think about it this way:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)

If you want to prevent that behaviour, you have two options:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()

5
  • Very nice demo! Maybe name the added max(v) over () m4 to keep the symmetry? Nov 7, 2014 at 18:14
  • @ErwinBrandstetter: Eh... ;-) Well, fair enough
    – Lukas Eder
    Nov 7, 2014 at 18:18
  • 2
    We wouldn't want my .. uhm .. anybody's OCD acting up, now would we? :) Nov 7, 2014 at 18:19
  • @ErwinBrandstetter: No, I can feel with you. I am the same (mostly) :)
    – Lukas Eder
    Nov 7, 2014 at 18:22
  • ... And thanks for that edit ;-) I had updated the SQLFiddle, but forgot to update the link...
    – Lukas Eder
    Nov 7, 2014 at 18:28
4

The result of max() as window function is base on the frame definition.

The default frame definition (with ORDER BY) is from the start of the frame up to the last peer of the current row (including the current row and possibly more rows ranking equally according to ORDER BY). In the absence of ORDER BY (like in my answer you are referring to), or if ORDER BY treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max() produces the same result for every row in the partition, effectively considering all rows of the partition.

Per documentation:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last peer. Without ORDER BY, all rows of the partition are included in the window frame, since all rows become peers of the current row.

Bold emphasis mine.

The simple solution would be to omit the ORDER BY in the window definition - just like I demonstrated in the example you are referring to.

All the gory details about frame specifications in the chapter Window Function Calls in the manual.

7
  • Without an order by, how to have a first and last with a real meaning? Mar 30, 2013 at 21:56
  • @ClodoaldoNeto: Those are arbitrary picks then. Depending on what the OP wants to achieve, a frame definition like in your answer, or a separate window definition or just arbitrary picks (no ORDER BY like I suggest) would be the way to go. Mar 30, 2013 at 22:01
  • @ErwinBrandstetter Yes, I really wanted first and last in time as per the cstamp column. Thank you for your help. Mar 30, 2013 at 22:14
  • "The default frame definition is from the start of the frame up to the last peer of the current row" - I think that's not entirely correct. It would be better to say that ordered windows have a default frame of ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, i.e. the current row is included. See also the example in my answer
    – Lukas Eder
    Nov 7, 2014 at 17:22
  • 1
    @LukasEder: I certainly didn't want to imply the current row was excluded. The "last peer of the current row" always includes the current row. I added a bit to avoid misunderstandings. Nov 7, 2014 at 18:11

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.