Extract Very Long String or CLOB from JSON [pre 12.2]

December 8, 2017

in Datatypes, JSON, PL/SQL

Extract very long string or CLOB from JSON

"Wheels within wheels", as Monty Bodkin would say. Extracting a very long string or CLOB from a JSON CLOB (very long => larger than max_string_size of 32767), in a pre-12.2 Oracle database, turned out to be more complex than it appeared at first sight.

This case study shows how to extract CLOB data from JSON CLOB, in an Oracle database with no/limited JSON parsing features.

CLOB within CLOB: What’s the problem?

Here’s the requirement and the challenge in implementing it. A big JSON CLOB input goes into function extract_payload, with the key name for which to extract the payload. In this example, the keyname is "payload" and it carries text data which could be very big in size. The payload value needs to be extracted and returned as CLOB.

The PL/SQL function’s signature:

CREATE OR REPLACE FUNCTION extract_payload
                    (p_json_clob    IN CLOB
                   , p_json_keyname IN VARCHAR2) 
RETURN CLOB

Sample JSON CLOB input:

  [
         {
             "id": "1",
             "payloadType": "csv",
             "payload": "Very big payload 100K bytes"
         }
     ]

The challenges:

  • The database is Oracle version 12.1 => newer features to extract CLOB  with json_object_t.get_clob (as done here) are not available.
  • The environment is controlled => the developer is not permitted to install third party utilities such as PL/JSON where json.to_clob (as done here) could have been used.

You might ask: Don’t the other Oracle 12c features for querying JSON data (SQL/JSON, dot notation) work?

Good question. Tried those: with CLOB data, got either errors or no value returned.

JSON_QUERY, JSON_VALUE with RETURNING CLOB don’t compile successfully.

Those lead to:

ORA-40444: JSON processing error

With JSON_TABLE, the column datatype mapping does not support CLOB. Compilation goes through, but the code conks off at runtime:

ORA-01460: unimplemented or unreasonable conversion requested

As a last-ditch effort, we saved "payload" data to temporary database table in a CLOB column, and then tried selecting via dot notation into a CLOB variable.

This did not give a compilation or runtime error, but nothing got selected when using very large CLOBs.

What’s the solution then? Here’s what worked for us. [Know of alternatives? Do share them.]

DBMS_LOB Solution: Instr to Find JSON Key Value end points, WriteAppend to CLOB output

Approach:

  1. Use dbms_lob.instr to find the start and end of the payload, based on the input JSON keyname and delimiters
  2. Read the JSON input chunk-wise – each chunk just under max string length – into a string buffer
  3. Use dbms_lob.writeappend to iteratively add the last read string buffer to a CLOB
  4. After the last chunk has been read and appended [test by comparing with the end delimiter for the payload], return the CLOB

extract_payload function:

-- Function that takes as input (1) a JSON CLOB
-- (2) JSON key name for which to extract the value
-- Returns the value for the input key name as a CLOB
-- Can extract very long string/CLOB
CREATE OR REPLACE FUNCTION extract_payload
                    (p_json_clob    IN CLOB
                   , p_json_keyname IN VARCHAR2) 
RETURN CLOB
IS
  l_payload        CLOB;
  l_payload_buf    VARCHAR2(32767);
  l_payload_init   PLS_INTEGER;
  l_payload_end    PLS_INTEGER;
  l_json_keyname   VARCHAR2(100)  := '"'||p_json_keyname||'":';
  n_chunksize      PLS_INTEGER := 32766;
  n_count          PLS_INTEGER := 0;   
BEGIN
  -- Create temporary LOB to hold the payload
  dbms_lob.createtemporary(l_payload, FALSE);
  
  -- Starting position of file content; this must lie within the
  -- first chunk being read i.e. l_payload_init < n_chunksize
  l_payload_init := dbms_lob.instr(p_json_clob, l_json_keyname)
                  + length(l_json_keyname) + 1;
  
  -- End position of file content
  l_payload_end  := dbms_lob.instr(p_json_clob, '"', l_payload_init + 1) -1;

  -- Extract the file content chunk by chunk and append to l_payload CLOB
  WHILE TRUE
  LOOP
    IF (LENGTH(p_json_clob) - (n_chunksize * n_count) >= n_chunksize) THEN
      -- Extract next chunk of size n_chunksize
      l_payload_buf := 
        dbms_lob.substr(
          p_json_clob
        , n_chunksize
        , (l_payload_init + (n_chunksize * n_count+1))
                       );
      dbms_lob.writeappend(l_payload, n_chunksize, l_payload_buf);  
      n_count := n_count + 1;
     
    ELSE
      -- Extract final chunk
      l_payload_buf := 
         dbms_lob.substr( 
            p_json_clob
          , (l_payload_end - (l_payload_init + (n_chunksize * n_count)))
          , (l_payload_init + (n_chunksize * n_count+1))
                        );      

      dbms_lob.writeappend(l_payload, length(l_payload_buf), l_payload_buf);
      RETURN l_payload;
    END IF;
  END LOOP;
END extract_payload;

When compiled:

SQL> -- Function that takes as input (1) a JSON CLOB
SQL> -- (2) JSON key name for which to extract the value
SQL> -- Returns the value for the input key name as a CLOB
SQL> -- Can extract very long string/CLOB
SQL> CREATE OR REPLACE FUNCTION extract_payload
  2                      (p_json_clob    IN CLOB
  3                     , p_json_keyname IN VARCHAR2)
  4  RETURN CLOB
  5  IS
  6    l_payload        CLOB;
  7    l_payload_buf    VARCHAR2(32767);
  8    l_payload_init   PLS_INTEGER;
  9    l_payload_end    PLS_INTEGER;
 10    l_json_keyname   VARCHAR2(100)  := '"'||p_json_keyname||'":';
 11    n_chunksize      PLS_INTEGER := 32766;
 12    n_count          PLS_INTEGER := 0;
 13  BEGIN
 14    -- Create temporary LOB to hold the payload
 15    dbms_lob.createtemporary(l_payload, FALSE);
 16
 17    -- Starting position of file content; this must lie within the
 18    -- first chunk being read i.e. l_payload_init < n_chunksize
 19    l_payload_init := dbms_lob.instr(p_json_clob, l_json_keyname)
 20                    + length(l_json_keyname) + 1;
 21
 22    -- End position of file content
 23    l_payload_end  := dbms_lob.instr(p_json_clob, '"', l_payload_init + 1)-1;
 24
 25    -- Extract the file content chunk by chunk and append to l_payload CLOB
 26    WHILE TRUE
 27    LOOP
 28      IF (LENGTH(p_json_clob) - (n_chunksize * n_count) >= n_chunksize) THEN
 29        -- Extract next chunk of size n_chunksize
 30        l_payload_buf :=
 31          dbms_lob.substr(
 32            p_json_clob
 33          , n_chunksize
 34          , (l_payload_init + (n_chunksize * n_count+1))
 35                         );
 36        dbms_lob.writeappend(l_payload, n_chunksize, l_payload_buf);
 37        n_count := n_count + 1;
 38
 39      ELSE
 40        -- Extract final chunk
 41        l_payload_buf :=
 42           dbms_lob.substr(
 43              p_json_clob
 44            , (l_payload_end - (l_payload_init + (n_chunksize * n_count)))
 45            , (l_payload_init + (n_chunksize * n_count+1))
 46                          );
 47
 48        dbms_lob.writeappend(l_payload, length(l_payload_buf), l_payload_buf);
 49        RETURN l_payload;
 50      END IF;
 51    END LOOP;
 52  END extract_payload;
 53  /

Function created.

Note: JSON key names in the example given are enclosed in double quotes and followed immediately a colon and blank space. There may be variations in your use case, for which you may need to make adjustments in the delimiter position derivation logic.

Testing the Solution: Extract CLOB from CLOB

Case 1: Small payload < 100 characters

SQL> -- Anonymous block to call extract payload
SQL> -- Testing with a small payload < 100 characters
SQL> DECLARE
  2    l_json_clob  CLOB := '[
  3         {
  4             "id": "1",
  5             "payloadType": "csv",
  6             "payload": "Small payload < 100 characters"
  7         }
  8     ]';
  9   l_output_clob  CLOB;
 10  BEGIN
 11    l_output_clob := extract_payload(l_json_clob, 'payload');
 12
 13    insert into bigfiles values (l_output_clob);
 14    commit;
 15  END;
 16  /

PL/SQL procedure successfully completed.

SQL> select substr(data,1,30) data_fragment
  2       , dbms_lob.getlength(data) data_length
  3  from bigfiles;

DATA_FRAGMENT                  DATA_LENGTH
------------------------------ -----------
Small payload < 100 characters          30

Case 2:  Big payload > 32767 characters

SQL> -- Anonymous block to call extract payload
SQL> -- Testing with a big payload > 32767 characters
SQL> DECLARE
  2   l_json_clob    CLOB := empty_clob();
  3   l_output_clob  CLOB;
  4  BEGIN
  5    dbms_lob.createtemporary(l_json_clob, true);
  6    l_json_clob := '[{
  7           "id": "1",
  8           "payloadType": "csv",
  9           "payload": "';
 10
 11    for i in 1 .. 5 loop
 12      l_json_clob := l_json_clob || lpad('x', 32767, 'x');
 13    end loop;
 14    l_json_clob := l_json_clob || '"}]';
 15
 16    l_output_clob := extract_payload(l_json_clob, 'payload');
 17    dbms_lob.freetemporary(l_json_clob);
 18
 19    insert into bigfiles values (l_output_clob);
 20    commit;
 21  END;
 22  /

SQL> select substr(data,1,30) data_fragment
  2       , dbms_lob.getlength(data) data_length
  3  from bigfiles;

DATA_FRAGMENT                  DATA_LENGTH
------------------------------ -----------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx      163835
Small payload < 100 characters          30

Summary

This articles demonstrates an approach for extracting a very long string or CLOB from JSON, in a pre-12.2 Oracle database with no/limited support for JSON features.

For older database versions with smaller max string length, reduce the value of n_chunksize (say, set it to < 4000 instead of < 32767) in function extract_payload and you are good to go.

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: