10 October 2019

OGB Appreciation Day: APEX_DATA_PARSER: Flexible, Powerful, Awesome (#ThanksOGB #OrclAPEX)

For this years OGB Appreciation Day I wanted to highlight the very awesome APEX_DATA_PARSER.

At my current project one of the requirements is that CSV-files are to be uploaded by the user and the application has to figure out what type of file it is and process accordingly.
So the challenge with this is:

  • Which type of file is it?
  • What delimiter was used?

Uploading the file is straight forward enough with Application Express (APEX) and out of scope of this blog. The files are stored in a APEX Collection before they are processed.

The original procedure to determine what type of file it was and how to process it was quite cumbersome.
The file is parsed and the first line was extracted. The string which effectively was the header line of the CSV file was compared with a predefined string and based on this comparison it was determined what type of file it was.
The problem was that sometimes the header line would look like this:

      DATE_,PAYMENT_METHOD,CURRENCY,AMOUNT,STATUS,ID,DESCRIPTION
         
and sometimes it would like this:
      "DATE_","PAYMENT_METHOD","CURRENCY","AMOUNT","STATUS","ID","DESCRIPTION"
         
or maybe even sometime like this:
      "DATE_";"PAYMENT_METHOD";"CURRENCY";"AMOUNT";"STATUS";"ID";"DESCRIPTION"
         
The header line could be enclosed in quotes, or not.. The delimiter was a comma, or not...
I've simplified the example, the real header lines where a lot longer.
You can imagine that parsing the rest of the file was quite tricky, as the file content could have commas as delimiter or semi-colons or ...
Because of all the variations the procedure was quite lengthy and error prone.

Simplifying with APEX_DATA_PARSER

With APEX_DATA_PARSER things got a lot cleaner, not to mention a lot faster. The example below will only give you the gist of the procedure.

First the different header lines where declared as a record type of datatype APEX_T_VARCHAR2.

         l_file1_header apex_t_varchar2 := apex_t_varchar2 ('DATE_'
                                                           ,'PAYMENT_METHOD'
                                                           ,'CURRENCY'
                                                           ,'AMOUNT'
                                                           ,'STATUS'
                                                           ,'DESCRIPTION'
                                                           );
         l_file2_headers apex_t_varchar2 := apex_t_varchar2 ('ID'
                                                            ,'REF'
                                                            ,'ORDER_'
                                                            ,'STATUS'
                                                            ,'LIB'
                                                            );
         
Second we need to get the header line of the file that was uploaded through the APEX application.
         select column_name
           bulk collect
           into l_file_header
           from table (apex_data_parser.get_columns
                         (apex_data_parser.discover (p_content   => r_files.file_content
                                                    ,p_file_name => r_files.filename
                                                    )));
         
The APEX_DATA_PARSER offers a function to discover information about the file. The function return a file-profile and contains information about the file-encoding, file-type, how many rows it parsed, what type of delimited is used in case of CSV, etc. etc.. This function is aptly named DISCOVER.
With the APEX_DATA_PARSER.GET_COLUMNS function uses the file profile created by the DISCOVER-function to obtain information about the content: Which columns are in the file, what might the datatype be, what is the format mask that was used, in which order are the columns.
With the simple SELECT statement above we know the header columns in the uploaded file.

We're almost there..

Now that the header columns are known, we can determine how to proceed from here with a straightforward CASE statement.
Why a CASE statement you might ask. It gives a easy opportunity to raise an exception in case an unknown file is uploaded... i.e. it will raise a CASE_NOT_FOUND exception.
         case l_file_header
         when l_file1_header
         then
            process_file1;
         when l_file2_header
         then
            process_file2;
         end case;
         
The uploaded files are moved from the APEX collection to a regular heap table called ALL_UPLOADED_FILES (omitted from this blogpost).

To process the content of the file we can now do an INSERT-SELECT and drop the content into the destination table:
         insert into destination_tbl
            (date_
            ,payment_method
            ,currency
            ,amount
            ,status
            ,description
            )
         select to_date (col001, 'yyyy-mm-dd hh24:mi:ss')
               ,col002
               ,col003
               ,col005
               ,col006
               ,col007
           from all_uploaded_files fle
               ,table (apex_data_parser.parse (p_content   => fle.csv
                                              ,p_file_name => fle.csv_filename
                                              ,p_skip_rows => 1 -- skip the header line
                                              ))      
         

On a sidenote: we're using DropZone, so the file(s) are in an APEX collection which is read with the following query:
         select c001    as filename
               ,c002    as mime_type
               ,d001    as date_created
               ,n001    as file_id
               ,blob001 as file_content
           from apex_collections
          where collection_name = 'DROPZONE_UPLOAD'
         

A big thank you to the APEX team for implementing this awesome functionality! Thanks ODC!

12 March 2019

The elegant NUMTODSINTERVAL to transform numbers

There is a database table with three columns into which data from an external source is loaded.
The data is loaded as a CSV file and contains three columns to represent the date, hour and minutes. The CSV file is loaded into a staging table with three columns:

  • one for the date, datatype: DATE
  • one for the hours, datatype NUMBER
  • one for the minutes, datatype NUMBER

Loading the information in the destination table, there is only one column to hold the data. These three columns should be transformed into a "real" date, meaning a DATE datatype as we know it.

One way is to concatenate all the columns together and apply the TO_DATE function with the appropriate format mask. With this method you need to take care of leading zeros for the hours and minutes.
This method would result in something like the following:

to_date( to_char( e.csv_date, 'DD-MM-YYYY') || ' ' || to_char( lpad( e.hour, 2, '0')) || ':' || to_char( lpad( e.minute, 2, '0')), 'DD-MM-YYYY HH24:MI') as clock_time

As you can imagine, it was only because of exceptions before realizing that the hours and minutes should be left padded with zeros.
There is another, easier, way to transform the three columns into a single DATE column:

e.csv_date + numtodsinterval (e.hour, 'HOUR') + numtodsinterval (e.minute, 'MINUTE') as clock_time
Using the NUMTODSINTERVAL function for the hours and the minutes makes the transformation trivial. No need to worry about leading zeros (or lack thereof).

NUMTODSINTERVAL is a great little function that makes code so much more elegant.

11 October 2018

ODC Appreciation Day - Comparing JSON, they thought of everything...

Oracle 12c introduced JSON in the database and one of the most recent additions is a function to compare JSON, JSON_EQUAL.

From the documentation:

Oracle recommends that you use BLOB storage.
When I first learned about this recommendation it surprised me. Looking at a BLOB doesn't even remotely look like JSON, so how does JSON_EQUAL deal with a comparison between a BLOB and a string?
I mean how does this:
7B2274686973223A2274686174227D
compare to this:
{"testing":"information"}
Let's try and find out.

create table t  
   (json_data blob)
/
insert into t values ( 
   utl_raw.cast_to_raw ( 
      '{"testing":"information"}' 
   ) 
);
        
select case  
       when json_equal (json_data, '{"testing":"information"}') 
       then 'same' else 'different' 
       end as result 
  from t
/

RESULT
---------
same
        
select case  
       when json_equal (json_data, '{"testing":"different"}') 
       then 'same' else 'different' 
       end as result 
  from t
/

RESULT
----------
different
        
How cool is that?

Later I learned from Chris Saxon about the reason behind the BLOB recommendation: it avoids character set conversions!
So, there are no excuses for not using BLOB as your default choice when storing JSON documents.

A big thank you to the team that implemented JSON in the database and for thinking of everything! Thanks ODC!

Links

11 September 2018

JSON, BLOB column and Check Constraint

Last week I attended Neil Chandler's session on JSON during the POUG conference in beautiful Sopot, Poland.
The JSON Developer's Guide recommends using BLOB for storing data, and this is what Neil also recommended.
I was under the (FALSE!) impression that it was not possible to put an IS JSON check constraint on a BLOB column, simply because I tried once and got an exception. After the session I asked Neil if this was the case, but (of course) he said that it was possible.
"Trust, but verify", so here is my test case.

My tests are done on a 12.1.0.2.0 release of the Oracle database, as can be seen in the output below.

SQL> select banner
  2    from v$version
  3  /

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE    12.1.0.2.0      Production
TNS for Solaris: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

First of all create a table with a BLOB column to hold the JSON data.

SQL> create table t
  2  (json_data blob)
  3  /

Table T created.

The most easiest way of placing an IS JSON check constraint on the column is to use the following statement:

SQL> alter table t add constraint json_chk check (json_data is json)
  2  /

Error starting at line : 1 in command -
alter table t add constraint json_chk check (json_data is json)
Error report -
ORA-40499: no format defined for binary data type
Because it failed, I was under the assumption that it is not possible to place an IS JSON check constraint on the column.
Turns out that I was not complete in my check constraint definition (guess I should have read the exception and documentation more closely).

When you place the check constraint in the following manner on the column, the IS JSON constraint is added.

SQL> alter table t add constraint json_chk check (json_data is json format json)
   2  /
 
 Table T altered.
In later versions of the database (I've also tried on 12.2 and 18c) it is possible to use the former syntax, there is no need to use "FORMAT JSON" in the check constraint definition.

Let's see the BLOB column with the check constraint in action.
First attempt to insert a valid JSON-string into the table:

   SQL> insert into t values ('{"this":"that"}' )
      2  /
    
    Error starting at line : 1 in command -
    insert into t values ('{"this":"that"}' )
    Error report -
    ORA-01465: Ongeldig hexadecimaal getal.
Even though the string is valid JSON, it is not a binary representation. For this to work UTL_RAW is required:
      SQL> insert into t values (utl_raw.cast_to_raw('{"this":"that"}') )
      2  /
    
    1 row inserted.
Trying to create an entry in the table which is not JSON will result in a violation of the JSON check constraint.
      SQL> insert into t values (utl_raw.cast_to_raw('just a string') )
      2  /
    
    Error starting at line : 1 in command -
    insert into t values (utl_raw.cast_to_raw('just a string') )
    Error report -
    ORA-02290: CHECK-beperking (ALEX.JSON_CHK) is geschonden.
When the value that you're trying to insert is not a binary value, it will always fail with "ORA-01465: invalid hex number":
      SQL> insert into t values ('just a string' )
      2  /
    
    Error starting at line : 1 in command -
    insert into t values ('just a string' )
    Error report -
    ORA-01465: Ongeldig hexadecimaal getal.

One of the benefits of placing an IS JSON check constraint on the column is that you can use a "dot-notation" to query out the data:

      SQL> select tst.json_data.this
      2    from t tst
      3  /
    
    THIS
    --------------------------------------------------------------------------------
    that
Keep in mind that you must use a table alias for this to work. Omitting the table alias will result in an error.
      SQL> select json_data.this
      2    from t tst
      3  /
    
    Error starting at line : 1 in command -
    select json_data.this
      from t tst
    
    Error at Command Line : 1 Column : 8
    Error report -
    SQL Error: ORA-00904: "JSON_DATA"."THIS": ongeldige ID

09 January 2018

JSON_TABLE and the Top 2000

Oracle database 12c provides native JSON parsing. It is relatively easy to go from data in a JSON format to a relational representation.
In this example I will use data from the Top 2000. Every year in The Netherlands a playlist is produced by popular vote and fully broadcast between Christmas and the last day of the year, with the number one being played just before the New Year. You can see the complete list of 2017 on the official NPO website.
The Top 2000 list that I'll use for this example is the one from 2017.

The data from the JSON file looks like this:

Only the first part of the JSON file is shown, with the first two songs.
At the highest level there are three name-value pairs ("success", "message" and "messages") and an array named "data".
The "data" array contains another array with JSON objects containing information about the songs.
Each of these JSON objects contain name-value pairs, with very short none descriptive names, like "s" and "a". You might guess what these names would mean.
Even though the "data" attribute is a nested array, at the top level it is only one level deep.
The complete JSON-file can be downloaded by following this link.

At first I want to get the JSON file into the database, before I extract the values that I need.
First create the table and a check constraint to make sure that is JSON.

create table temp2000 
(complete_list clob);

alter table temp2000
add constraint list_is_json check (complete_list is json)
/
This table will hold the complete JSON file containing the Top2000. The check constraint on the column will verify that only correctly formatted JSON is allowed.
Now add the JSON-file to the table, the easiest way to do this is by adding a row using SQL Developer and copy-paste the complete JSON.

What I want to extract from the complete JSON file is the information about the artist, songtitle, release year, current position in the list, as well as the position in the list from last year.
My guess would be that the "a", "s", "yr", "pos", "prv" are the attributes that I need.
With a CTAS (Create Table as SELECT) and the JSON_TABLE operator I can transform the JSON to relational representation.

create table top2000
as
select songtitle
      ,artist
      ,release_year
      ,curr_position
      ,prev_position
  from temp2000 t
      ,json_table (t.complete_list format json, '$.data[0][*]'
         columns (
           songtitle     varchar2(150) path '$.s'
          ,artist        varchar2(150) path '$.a'
          ,release_year  number path '$.yr'
          ,curr_position number path '$.pos'
          ,prev_position number path '$.prv'
         )
      )
/
Because the song information is contained in the "data" array, and only in the nested array, I need to address that array as follows:
'$.data[0][*]'
Zero (in square brackets) representing the first array in the "data" attribute, and a wildcard (the asterisk) to address the other objects in the nested array.

To help with the discovery of the structure (and the paths to the values) of the JSON, Oracle Database 12c Release 2 introduced the JSON_DATAGUIDE function. Unfortunatelly I don't have Release 2 at my disposal right now, so I leave that for a later time.

Links

10 October 2017

ODC Appreciation Day: EMP #ThanksODC

Here is my very short entry for the Oracle Developer Community Appreciation Day 2017.

Very often I use the good ol' EMP table to demonstrate even the latest features of SQL and PL/SQL.
Everybody seems to know the EMP table, and some even know some of the content off the top of their head (yes, I'm guilty of that too). Whenever I need to write a hierarchical query and am not really sure what which column goes on which side in the CONNECT BY clause, I will use EMP to test and verify my assumptions. Something I did just this afternoon.

I found this old screenshot showing that the EMP table was around in Oracle database version 4.1.1.. history doesn't tell whatever happened to Carter though...

Update

Apparently I'm not the only one who's a fan of the EMP, there is even a store and a museum...

01 June 2017

Wanted New Speakers

Yesterday there was an email coming from the UK Oracle User Group (UKOUG) about their upcoming Tech conference at the end of the year.
There was a call to action to get more new speakers, but with a twist. This email wasn't directed at everybody, but specifically to people who have presented before at a UKOUG event.
Why would you sent an email to look for new speaker and only address the people that "have done it before"?
The call to action was basically: "reach out to new speaker by offering to do a joint presentation".

Why and How?

Now where did that idea come from?
A while ago I read a blog about the APEX Connect 2017 conference, it was posted anonymously and in German (Thank you Google Translate). Some parts of the blog were very positive, but there was also a critical note about the number of new speakers at the conference.
How the agenda was put together for the APEX Connect 2017 conference is up to the conference committee, there is only so much you can do about that - like providing feedback.
But that blog did get me thinking... I know that a lot of user groups really want to have new speakers, I also know that there are a lot of people out there that have a story to tell.
When I talk to people, at conferences, on the work-floor, anywhere and ask them about doing a presentation a lot of the same answers can be heard: "What I do is very simple" "I wouldn't know how to write an abstract" "I've got nothing to say" "so and so is an expert in that field, I've got nothing to add to it" "Nobody will show up" etc. etc. etc.
After a brainstorming session with Chris Saxon (while lunching at the RigaDevDays conference) and Jan Karremans (while waiting at the airport) I came up with the idea of a joint-presentation. Get a new speaker hooked up with a more experienced speaker and let the magic happen.
What will you get out of it as a first time speaker;

  • You don't have to talk the whole slot, you only have to do half of the talk...
  • Did you get a question that you don't know the answer to? There is someone else on stage that might know the answer.
  • Get pointers for writing an abstract
  • Get pointers for putting together a slide deck
  • Get pointers for telling the story
  • ...
The more experienced speaker can help with the above, and perhaps more.

Act now

Doing a joint presentation is a lot of work, it takes serious effort to put it all together. When you've done a presentation before you know this, and because of this I was pleasantly surprised to see so many speakers reaching out.
Alright, so if you've ever considered doing a (joint) presentation about SQL, PL/SQL and/or APEX - get in touch.