0

I have one table in Hive for which data has come from SAP system. This table has columns and data as given below:

+======================================================================+
|document_number | year | cost_centre | vendor_account_number | amount | 
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |                       |  123.5 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  25.96 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |                       |  586   |
+----------------------------------------------------------------------+

As shown above, value for vendor_account_number column is present in only 1 row and I want to bring it on all the rest of the rows.

Expected output is as follows:

+======================================================================+
|document_number | year | cost_centre | vendor_account_number | amount | 
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  123.5 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  25.96 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  586   |
+----------------------------------------------------------------------+

To achieve this, I wrote following CTE in Hive

with non_blank_account_no as(
  select document_number, vendor_account_number
  from my_table
  where vendor_account_number != ''
)

and then did self left outer join as follows:

select 
    a.document_number, a.year, 
    a.cost_centre, a.amount,
    b.vendor_account_number
from my_table a
left outer join non_blank_account_no b on a.document_number = b.document_number
where a.document_number = ' '

but I am getting duplicated output as shown below

+======================================================================+
|document_number | year | cost_centre | vendor_account_number | amount | 
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  123.5 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  25.96 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  586   |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  123.5 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  25.96 |
+----------------------------------------------------------------------+
|       1        | 2016 |     XZ10    |      1234567890       |  586   |
+----------------------------------------------------------------------+

Can anyone please help me understanding what is wrong with my Hive query?

3
  • 1
    What's the output of your CTE? Is it only 1 row? Commented Jun 28, 2017 at 17:50
  • @Andrew, Yes CTE is supposed to select only 1 row in which vendor_account_number is not null Commented Jun 28, 2017 at 19:10
  • I know it's only supposed to have 1 row, the question is how many does it actually have? It certainly looks like you're getting a cross join. Commented Jun 28, 2017 at 20:20

1 Answer 1

1

In many use-cases a self-join can be replaced by a windows function

select  document_number
       ,year
       ,cost_center

       ,max (case when vendor_account_number <> '' then vendor_account_number end) over 
        (
            partition by    document_number
        )                                       as vendor_account_number

       ,amount

from    my_table    
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I'll try this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.