0

Sharing a common DolphinDB use case and solution for data processing.

I have a table of stock observation data recording minute-by-minute indicator states, with two key indicators:

  • ov95: Breakthrough signal for the 95% pressure level (0 or 1)
  • ov70: Breakthrough signal for the 70% pressure level (0 or 1)

I need to calculate a position state column with these rules:

  1. Initial position is 0 (no position)
  2. When current position=0 and ov95=1, position changes to 1 (open position)
  3. When current position=1 and ov70=0, position changes to 0 (close position)
  4. All other cases maintain the previous position

Here's what the logic would look like in procedural pseudocode:

position = 0  # Initialize
for each row in table:
    if position == 0 and row.ov95 == 1:
        position = 1
    elif position == 1 and row.ov70 == 0:
        position = 0
    # Else position remains unchanged
    row.position = position  # Store the computed value

The key challenge here: Each row's position state depends on the previous row's position value, creating a state dependency chain.

My sample data:

| Timestamp            | StockCode | ov95 | ov70 |    pos = 0
|----------------------|-----------|------|------|
| 2022-06-01 09:45:03  | 000717    | 1    | 0    | -> pos = 1
| 2022-06-01 09:45:06  | 000717    | 1    | 0    | -> pos = 0
| 2022-06-01 09:45:09  | 000717    | 1    | 0    | -> pos = 1
| 2022-06-01 09:45:12  | 000717    | 0    | 0    | -> pos = 0
| 2022-06-01 09:45:15  | 000717    | 0    | 0    | -> pos = 0
| 2022-06-01 09:45:18  | 000717    | 0    | 0    | -> pos = 0 
| 2022-06-01 09:45:21  | 000717    | 0    | 1    | -> pos = 0 
| 2022-06-01 09:45:24  | 000717    | 0    | 1    | -> pos = 0 
| 2022-06-01 09:45:27  | 000717    | 0    | 1    | -> pos = 0 

I've tried using simple CASE WHEN statements but found I can't reference the calculated result from the previous row.

In Python or Java this would be easy with loops, but I'm wondering if this can be done in pure DolphinDB SQL? If so, what approach should I use to implement this state-dependent calculation?

2
  • Just to be clear: if you have a row ov95 : 1, ov70 : 0 followed by ov95 : 0, ov70 : 0 you want that second row to still have position : 1 ? Commented Jun 5 at 13:53
  • Thank you for pointing it out, I re-edit the sample data to make it clear. Commented Jun 6 at 1:41

1 Answer 1

0

Suppose we have a table as follows:

time = 2023.01.01T09:00:00.000 + 1..9
code = take(`a, 9)
ov95 = 1 1 1 0 0 0 0 0 0
ov70 = 0 0 0 0 0 0 1 1 1
t = table(time, code, ov95, ov70)

We need to calculate an indicator called "position" using the following formula:

formula

This can be implemented in the following ways:

Data construction script:

n = 1000000
time = 2023.01.01T09:00:00.000 + 1..n
code = take(`a, n)
ov95 = rand([0, 1], n)
ov70 = rand([0, 1], n)
t = table(time, code, ov95, ov70)

Solution 1: Using accumulate

res = select *, accumulate(def(pos, ov95, ov70):iif(pos==0&&ov95 == 1, 1, iif(pos==1&&ov70 == 0, 0, pos)), [ov95, ov70], 0) as pos from t

Solution 2: Using for loop + JIT

@jit
def calPos(ov95, ov70){
    pos = array(INT, size(ov95)+1)
    for(i in 0:size(ov95)){
        if(pos[i]==0&&ov95[i]==1)    pos[i+1]=1
        else if(pos[i]==1&&ov70[i]==0)    pos[i+1]=0
        else pos[i+1]=pos[i]
    }
    return pos[1..size(ov95)]
}

res = select *, calPos(ov95, ov70) as pos from t

Performance Test

Test data: An in-memory table of size 1,000,000×4 [38 MB]

method time elapsed
accumulate 2.66
loop + JIT 0.08
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.