23 February 2010

XMLType, RefCursor and renaming ROW and ROWSET tags

The project that I'm currently involved in uses XML to send data to a reporting engine which parses the XML and creates a pretty PDF report.
In order to do this some "generic" packages were created to create the XML based on the content of a RefCursor. Works like a charm. Just call the function with a RefCursor and get an XML back.
The problem they initially had was that the XML returned uses the ROWSET and ROW tags, while they wanted different tags. No problem, some REPLACE function would do the job, and that worked fine. Until last week that was...


First let me show you how the initial function works and what the XML result was they returned.
SQL> create function rc_to_xml (p_rc in sys_refcursor)
  2     return xmltype
  3  is
  4  begin
  5     return xmltype (p_rc);
  6  end rc_to_xml;
  7  /

Function created.

The function has an input parameter which is a Ref Cursor and returns an XMLType. The XMLType constructor can take a Ref Cursor and transform that into an XMLType, pretty cool functionality. In order to test this functionality, we need two local variables - one for the Ref Cursor and one for the XMLType.
SQL> declare
  2     refC sys_refcursor;
  3     x    xmltype;
  4  begin
  5     open refC for
  6     select ename
  7          , job
  8          , sal
  9       from emp
 10      where rownum <= 2
 11     ;
 12     x := rc_to_xml (refC);
 13     dbms_output.put_line (x.extract ('/*').getClobVal());
 14  end;
 15  /

Using DBMS_OUTPUT to see the actual content of the XMLType, we will get this result:
<ROWSET>
  
    SMITH
    CLERK
    800
  
  
    ALLEN
    SALESMAN
    1600
  
</ROWSET>

A nice XMLType is the result. The only "problem" is the standard tags used by the XMLType. In this project we didn't want to have the standard ROWSET and ROW tags, we wanted to have more meaningful names.
Changing the function to change these tags on the fly would work
SQL> create or replace
  2  function rc_to_xml (p_rc     in sys_refcursor
  3                     ,p_rowset in varchar2
  4                     ,p_row    in varchar2
  5                     )
  6     return xmltype
  7  is
  8     retval xmltype;
  9     c      clob;
 10  begin
 11     retval := xmltype (p_rc);
 12     c := retval.getClobVal();
 13     c := replace (c, '', '<'||p_rowset||'>');
 14     c := replace (c, '', '</'||p_rowset||'>');
 15     c := replace (c, '', '<'||p_row||'>');
 16     c := replace (c, '', '</'||p_row||'>');
 17     retval := xmltype (c);
 18     return retval;
 19  end rc_to_xml;
 20  /

Function created.

Just to make sure it works as expected, a small testcase:
SQL> declare
  2     refC sys_refcursor;
  3     x    xmltype;
  4  begin
  5     open refC for
  6     select ename
  7          , job
  8          , sal
  9       from emp
 10      where rownum <= 2
 11     ;
 12     x := rc_to_xml (refC
 13                    ,'DEPT'
 14                    ,'EMP'
 15                    );
 16     dbms_output.put_line (x.extract ('/*').getClobVal());
 17  end;
 18  /
<DEPT>
  
    SMITH
    CLERK
    800
  
  
    ALLEN
    SALESMAN
    1600
  
</DEPT>

And that did the trick for quite some time (years actually).

This technique was used quite heavily for reporting purposes, it even works fine with nesting one XMLType in another:
SQL> create or replace
  2  function all_emps (p_deptno in dept.deptno%type)
  3     return xmltype
  4  is
  5     rc sys_refcursor;
  6  begin
  7     open rc for select ename
  8                      , job
  9                      , sal
 10                   from emp
 11                  where deptno = p_deptno;
 12     return rc_to_xml (rc
 13                      ,'EMPS'
 14                      ,'EMP'
 15                      );
 16  end all_emps;
 17  /

Function created.


In the below code sample you can see the above function being used in the Ref Cursor on line 8.
SQL> declare
  2     refC sys_refcursor;
  3     x    xmltype;
  4  begin
  5     open refC for
  6     select deptno
  7          , dname
  8          , all_emps (deptno) employees
  9       from dept
 10      where deptno = 10
 11     ;
 12     x := rc_to_xml (refC
 13                    ,'DEPTS'
 14                    ,'DEPT'
 15                    );
 16     dbms_output.put_line (x.extract ('/*').getClobVal());
 17  end;
 18  /
<DEPTS>
  
    10
    ACCOUNTING
    
      
        

    CLARK
          MANAGER
          2450
        
        
          KING
          PRESIDENT
          5000
        
        
          MILLER
          CLERK
          1300
        
      
    
  
</DEPTS>


PL/SQL procedure successfully completed.

Very flexible, used extensively throughout the reporting packages.

Last week problems started to occur with these packages however. XMLTypes were not being created correctly, strange characters started to appear causing the XML to be invalid. But not with all reports, not all the time. It appeared that if the reports became to large, this problem would occur.
After looking for the cause for quite some time, we had this feeling that it had to do with the datatype conversions going on between the XMLType and the Clob. This transformation is done in the REPLACE part of the function where the ROWSET and ROW tags are replaced.
First we started looking into creating a "REPLACE function for CLOB", which is according to the documentation not necessary... what a waste of time...
I couldn't help it, but I was getting the feeling that I wasn't the first one with this "problem". Surely there must be a way to rename the ROWSET and ROW tags when you create an XML based on a Ref Cursor?
Probably I was looking in the wrong direction, instead of treating the XML as a string, I should stay in the "XML Domain".
After more searching, finally I came across this procedure: setRowTag in the dbms_xmlgen package - just what the doctor ordered. All the convoluted code we used earlier was no longer necessary.
create or replace 
function rc_to_xml (p_rc          in sys_refcursor
                   ,p_rowset_name in varchar2
                   ,p_row_name    in varchar2
                   ) 
   return xmltype
is
   retval xmltype;
   ctx dbms_xmlgen.ctxHandle;
begin
   ctx := dbms_xmlgen.newContext (p_rc);
   dbms_xmlgen.setRowTag (ctx, p_row_name);
   dbms_xmlgen.setRowSetTag (ctx, p_rowset_name);
   retval := dbms_xmlgen.getXMLType (ctx);
   dbms_xmlgen.closeContext (ctx);
   return retval;
end rc_to_xml;
/

It's always good to realize that when you are trying to solve a certain problem someone must have come across your problem before and maybe even solved it. Hope my story will save you some time when you are trying to rename the ROWSET and ROW tags.

Oracle Documentation
REPLACE; it also works with CLOB
DBMS_XMLGEN - all subprograms

25 comments:

  1. I really think its time that some guys should do an internal "XML in the Oracle Database" training course...

    :-|

    http://www.amis.nl/xml_met_de_oracle_database.php

    Why don't you guys just ask...(the second question this week I could have answered in a second or two...)

    ReplyDelete
  2. Second comment regarding using extract. Extract will generate an XML fragment. If Oracle can't handle this smart, and that is the case in a lot of situations, then this will be processed the "standard XML parser" way in memory via DOM. This will result in a lot of consumption of PCU cycles and memory. In short, you will get in performance problems...

    In most cases this is easily avoided using different operators or functions. From 10.2.0.3.0 and onwards you should use XQuery based operators. Ergo, in 11.2.0.1 and onwards all Oracle propriety extract, extractvalue, existnode, etc functions are depricated / on maintainance support only (aka not optimizations will be done anymore on those methods).

    ReplyDelete
  3. Marco, thank you for your comments. Of course you're right regarding your comments on using the "extract" function. But that really wasn't the point of this post. Using the extract was just here to show the content of the XML, it is not used in the actual production code. Is there a "new and better" way to get XML pretty printed, besides this "extract" trick?

    ReplyDelete
  4. xmlserialize..., besides that the "extract" trick was a bug... :-)

    ReplyDelete
  5. @Marco Does that mean that
    select xmlquery( '/*' passing xmltype( cursor( select * from dual ) ) returning content ) from dual is a better/faster way to do an extract than
    select extract( xmltype( cursor( select * from dual ) ), '/*' ) from dual

    or
    select xmltype( cursor( select * from dual ) ).extract( '/*' ) from dual

    ReplyDelete
  6. It depends on your database version and your storage design(+index strategy and/or xml schema dependant). If you are working XMLType CLOB based (aka in this case using "dual") in 10.1 xmltype.extract will be slower than extract(xmltype) due to performance bugs and optimizations. In 10.2.0.3 xmlquery will be faster than an xmlquery in 10.2.0.1 due to in 10.2.0.1 the xquery parser is still java based and in 10.2.0.3 and onwards its oracle kernel buildin.

    Propriety XSQL as in extract, extractvalue, etc got less attention in 11.1 and onwards than xquery based XML standards, therefore chances are that, especially in 11.2 and onwards, xquery based methods and functions will outperform the in 11.2 even depricated XSQL methods and functions.

    Hints, tips and lessons learned can be found in http://www.oracle.com/technology/tech/xml/xmldb/Current/xmlqueryoptimize11gr2.pdf (which is 11.2 based in essence).

    Have a look at the whitepapers mentioned in http://www.liberidu.com/blog/?p=2140 to get a "quick" overview, although you need a bit more then basic knowledge to understand the how and why...

    ReplyDelete
  7. Hi,

    Sorry to drag this topic from oblivion....
    The link to xmlqueryoptimize11gr2.pdf is dead, is this one the right one ?
    http://www.oracle.com/technetwork/database/features/xmldb/xmlqueryoptimize11gr2-168036.pdf .

    Second question (more important for me) i am using owb mainly here(11gr2) and want to export relational tables to an xml txtfile.

    the owbblog talks about using XMLForest etc in conjunction with Table Functions.
    Are these still valid functions to use or should i use something different ?
    See here for the owbblog:
    http://blogs.oracle.com/warehousebuilder/2007/09/leveraging_xdb.html

    ReplyDelete
  8. @Michael, the "new and improved" way is to use the XMLTABLE function.
    http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/functions253.htm#SQLRF06232

    ... at least as far as I know, I will check with Marco. :)

    ReplyDelete
  9. -/fun/-

    I feel a bit like the "XML Expert" that is asked by Tom (better known as Alex N.) every time on asktom.oracle.com a question is raised regarding an XML topic ;-)

    @Michael The XMLDB team revises/improves the content more often on http://www.oracle.com/technology/tech/xml/xmldb/index.html then it adds new stuff, so sorry for another broken link.

    Be aware, the following is advised by the XMLDB team to use if dealing with XML(DB) functionality. To generate XML, from 11g onwards, use functions like XMLFOREST, XMLAGG, XMLELEMENT, XMLATTRIBUTES, etc.

    To consume XML, from Oracle 11g and onwards, the XQuery (XPath V2) supporting XML functions should be used, that is XMLTABLE, XMLQUERY, XMLEXISTS, XMLCAST, etc.

    I see that the OWB team uses still the in Oracle XMLDB 11.2. deprecated EXTRACT, EXTRACTVALUE, etc XML/SQL functions and operators. The problem for us as customers is that if we get into performance problems or bugs, the XMLDB team/support will advice us to use the XQuery functions. See also the deprecated section in the Oracle 11.2 XMLDB Developers Guide for extra information

    ReplyDelete
  10. Thanks Alex,

    Well my question was more focussed on using xmlforest etc. to create xml and using Table Functions to output this as a (xml)text file.

    Stuff like xmlquery etc. deals with querying xml already present in the database .

    So i now know that xmlforest etc. is still valid. The only thing i need to know now is how to ouput a xmlforest thingie as text.
    The owb blog mentions table functions in conjunction with some package, which i can not access yet to inpsect the code.

    Any insights on this ?

    Another question: from my earlier quest tru xml-in-oracle-db land i learned that getting really,really big xml docs in a database is very hard since most xml functions in the database will operate with a dom structure in memory.
    And that explodes with xml-files of 8Gb ....
    Am i right in assuming that xmlforest etc. functions suffer from the same DOM problem ?

    ReplyDelete
  11. @Marco: Thank you, my local XML-guru :)
    @Michael: To create an text-thingie using XMLFOREST, you can do something like this:
    SQL> select xmlelement
    2 ("Emps"
    3 ,xmlagg
    4 (xmlelement
    5 ("Emp"
    6 ,xmlforest
    7 (e.empno
    8 ,e.ename
    9 ,e.job
    10 )
    11 )
    12 )
    13 ).getClobVal()
    14 from emp e
    15 /

    XMLELEMENT("EMPS",XMLAGG(XMLELEMENT("EMP",XMLFOREST(E.EMPNO,E.ENAME,E.JOB)))).GE
    --------------------------------------------------------------------------------
    7369SMITHCLERK<EM

    ReplyDelete
  12. You can use XQuery for returning XML as well by using the FLWR Return part for example (return $i)

    All XML parsers known have issues with handling 8 Gb XML in memory. The issue can be circumvented by doing some of the mapping and re-writes in the database via temp physical storage (OR or Binary XML)

    I Oracle can not re-write or optimize the XML handling it will fallback on the standard way of doing it, that is handling it in memory via a standard based XML parser. Using the old syntax will force this to be done in the old xml parser engine, using the XQuery based operators and functions will force this to be done via the XQuery based XML parser

    See slide 9 of
    http://www.slideshare.net/MGralike/miracle-open-world-2011-xml-index-strategies

    or

    http://www.vldb.org/pvldb/1/1454177.pdf

    for more info see also

    http://www.slideshare.net/MGralike/opp2010-brussels-programming-with-xml-in-plsql-part-1

    ReplyDelete
  13. > Am i right in assuming that xmlforest etc. functions suffer from the same DOM problem ?

    XMLFOREST is mainly used for/on relational objects like the columns of a table, so it will use the standard methods to encapsulate the relational data in XML format. In the explain plan output you will see access paths via index or others if those indexing, for example, is available. Part of the reconstruct towards XML could, but shouldn't, be parsed by the XML engine(s).

    @Alex. getCLOBVal is one of those deprecated operators. You should use XMLSERIALIZE (default CLOB output) as an alternative or for example XMLCAST. These are optimized and supported in 11g and onwards releases.

    ReplyDelete
  14. I posted a big answer here and blogger killed it...

    ReplyDelete
  15. ...
    perhaps i wasn't clear, i meant the streaming of xmlforest to disk.
    I have seen examples of using utl_file.
    But that seems unreliable because of the 32k limit.

    I have come across DBMS_XMLDOM and DBMS_XSLTPRocessor.
    (yes marco thats your post :-) )
    It seems that DBMS_XMLDOM creates an in memory document.
    Do you know how DBMS_XSLPROCESSOR handles this ?

    It seems there is no good way in Oracle to do large loading/unloading of xml data.
    It needs something that can use a window when parsing xml data.

    Btw i ended up using a xslt 3.0 parser that can do streaming when parsing xml.
    I used it to parse the 8Gb xml file into a csv file in 5 minutes.

    ReplyDelete
  16. Not sure, regarding DBMS_XSLPROCESSOR, if I am not mistaken then the Sr product manager of XMLDB said that it used the same source path, so in that case using DOM as well.

    That said, do you need to parse the XML then? You know its valid XML so...

    In that case you don't need a XML parsing method. Things that pop into mind are using UTL_FILE with the 'wb' option (write as blob/raw) which hasn't issues with the 32k limit (I know currently using code from out colleague Anton), but probably this is to slow for a 8 GB file (I am guessing here).

    Another method would be to use the DATAPUMP method while using external tables (never used it though / does one end up with a dumpfile, aka not useful).

    ReplyDelete
  17. I needed the following to create a CSV file format while dealing with a table with 910+ columns...

    It was initially based on http://forums.oracle.com/forums/thread.jspa?messageID=6497046&#6497123 , but I ran very quickly in the issue with the 32 KB problem

    Posted the code here:
    http://forums.oracle.com/forums/thread.jspa?messageID=9537630#9537630

    It didn't fit on this comment section (max allowed chars 4096)

    ReplyDelete
  18. Hi Marco,

    I glanced at the code and saw that it uses utl_raw.cast_to_raw.
    Doesn't that expect a Varchar2 and as such is limited to 4000chars ?
    or am i missing something here ?

    ReplyDelete
  19. it uses utl_file in 'wb' (binary) mode. everything is casted to raw.

    ReplyDelete
  20. Marco, your missing my point (or i am missing yours..).
    I understand it uses utl_raw. What i meant is that the method cast_to_raw expects a Varchar2.

    See here: http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/u_raw.htm#i997085

    In fact, the following barfs because the loop is one char bigger than the max of 32767:

    declare
    v_finaltxt CLOB;
    v_filehandle UTL_FILE.FILE_TYPE;

    begin

    v_filehandle := UTL_FILE.FOPEN('A_ORACLE_LOC','test_raw','wb',32767);
    FOR i IN 1..32768
    LOOP

    v_finaltxt := v_finaltxt || 'a';
    END LOOP ;

    UTL_FILE.PUT_RAW( v_filehandle, utl_raw.cast_to_raw( v_finaltxt));
    UTL_FILE.FCLOSE(v_filehandle);

    end;

    ReplyDelete
  21. If I have a look at Anton's code, I am not that smart, I got help, then he chops the strings in pieces via

    utl_raw.cast_to_raw( SUBSTR( v_finaltxt, ( i - 1 ) * v_chunksize + 1, v_chunksize ) ), TRUE )

    Although not fully tested, I am guessing it will not surpass the 32767 limit.

    M.

    ReplyDelete
  22. "Although not fully tested"

    Actually that is not true, this is the improved code from Anton after I had tested it and saw that it flushed my table content as 1 big line and produced a single line CSV file of 550KB in size. Anton only added the "extra linefeed" into it.

    ReplyDelete
  23. Ahhh... i just sort-of-skipped that loop bit and got stuck at the cast statement.
    Mea Culpa ..

    Thx guys

    ReplyDelete
  24. Can I convert a xmltype to ref cursor?

    ReplyDelete
    Replies
    1. Well, you could do this.. but this is probably not what you mean...

      declare
      x xmltype;
      l_xml xmltype;
      rc sys_refcursor;
      begin
      x := xmltype ('that');
      open rc for select x from dual;
      fetch rc into l_xml;
      close rc;
      end;

      Delete