Thursday, January 10, 2013

CSV Data: Using a SQL query to dump data directly to a CSV file.

Why do so many people want to dump data into Excel?

The common tools like TOAD or SQL-Developer have an option to dump their query results directly into Excel or a CSV file (or various other formats). It seems to work well for relatively small data sets, but when it gets to be hundreds of thousands or millions of rows, it seems like people regularly have problems making it work. The following is a solution that I've given out and trusted in these situations.

Basically, we kick off a script via SQLPlus that runs the query and directly spools the output to a CSV file. The script below is an example of how to make this work. Once you have a query that returns the data you desire past it in as the source query at the indicated place in the script.

Notice that this script will spool or write the results into a file named "Unloaded-Data.csv". Change this to any desired file name.

You have the option of writing a "Headings" row at the first of the file -- edit that part to include all the column names and make sure that you enclose them both in single and double quotes as shown. If you don't want headings that piece can be deleted.

In your source query it's a good idea to explicitly format dates with the "TO_CHAR( )" function. Use a format that Excel will readily recognize ( the one that I prefer is 'YYYY-MM-DD HH24:MI:SS' even though different ones are used in the example below). Also, for testing you may wish to include a row limiter in your where clause "and rownum < 50" -- this way you can test the CSV creation process without waiting for millions of rows to be written to the file.

As you can see at the end of the query, it produces a CSV record by concatenating quotes around and commas between the column values of your query. Edit this to include all the columns, in the order you prefer, from the source query that you just pasted in.

prompt ********** UnLoading Data **********

WHENEVER SQLERROR EXIT SQL.SQLCODE

-------------------------------------------------------------------------------------
--  pagesize     0  =  Turns off the default of splitting the results into sections.
--  heading    off  =  Turns off column headings.
--  feedback   off  =  Turns off any feedback. ( ie: "20 Rows Returned" )
--  trimspool   on  =  Trims off extra spaces at the end of each line.
--  linesize 32767  =  Makes sure that a line (row of data) can be really long.
--  termout    off  =  Do not show the data in the SQL-Plus window (much faster).
--  verify     off  =  Do not show the substitution variable replacement values.
-------------------------------------------------------------------------------------
   set pagesize        0
   set heading       off
   set feedback      off
   set trimspool      on
   set linesize    32767
   set termout       off
   set verify        off



--------------------------------------------------
-- Unload the data to the following CSV file.
--------------------------------------------------
   spool  Unloaded-Data.csv

--------------------------------------------------
-- Write a column "Headers" row to the CSV file.
--------------------------------------------------
   
   ----------------------------------------------------------------------------
   -- List all the column headers that you wish to put in the CSV file. 
   -- They do NOT need to be the same as the column values from the query. 
   -- Make sure each one is enclosed by single and double quotes.
   ----------------------------------------------------------------------------
   select   '"FULL_NAME_"'    
   ||','||  '"PROCEDURE_DATE"'
   ||','||  '"PROCEDURE_ID"'  
   ||','||  '"ENCOUNTER_ID"'  
   ||','||  '"PROCEDURE_DSC"' 
   ||','||  '"ROOM_ID"'       
   ||','||  '"HEIGHT"'        
   ||','||  '"CHARGE_AMT"'    
   ||','||  '"BIRTHDAY"'      
   ||','||  '"BLOOD_TYPE"'    
   ||','||  '"PHONE"'         
   ||','||  '"INSURANCE_ID"'  
   -------
     as  optional_header_row
   from  dual ;

--------------------------------------------------
-- The source Query in a WITH clause and
-- one CSV formatted column of all the values.
--------------------------------------------------
   with  source_query  as
   ( ----------------------------------------------------------------------------
     -- Source Query goes here. 
     -- This selects and formats the data as you want in the CSV file.
     ----------------------------------------------------------------------------
     select  initcap(p.last_nm)||', '||initcap(p.first_nm)  as full_name_str
          ,  to_char(c.procedure_dts,'YYYY-MM-DD HH:MI am') as procedure_date_str
          ,  c.procedure_id                                 as procedure_id_no
          ,  c.encounter_id                                 as encounter_id_no
          ,  c.procedure_dsc                                as procedure_dsc_str      
          ,  e.room_id                                      as room_id_str
          ,  e.pt_height_inches_val                         as height_no
          ,  e.total_charge_amt                             as charge_amt_no
          ,  to_char(p.birth_dt,'MM/DD/YYYY')               as birthday_str
          ,  p.blood_type_cd                                as blood_type_str
          ,  p.phone_txt                                    as phone_str
          ,  p.insurance_id                                 as insurance_id_no 
       from  train.xyz_procedure  c
             left join
             train.xyz_encounter  e  on ( c.encounter_id = e.encounter_id )
             left join
             train.xyz_patient    p  on ( e.patient_id = p.patient_id )
      where  nvl( e.total_charge_amt, 0 ) < 1000
      order  by  p.last_nm, p.first_nm, c.procedure_dts
     ----------------------------------------------------------------------------
       )
   ----------------------------------------------------------------
   -- List all the columns that you want to include. It encloses 
   -- each value in double quotes and separates them with commas.
   ----------------------------------------------------------------
   select '"'||    full_name_str             ||'"'
   ||','||'"'||    procedure_date_str        ||'"'
   ||','||'"'||    procedure_id_no           ||'"'
   ||','||'"'||    encounter_id_no           ||'"'
   ||','||'"'||    procedure_dsc_str         ||'"'
   ||','||'"'||    room_id_str               ||'"'
   ||','||'"'||    height_no                 ||'"'
   ||','||'"'||    charge_amt_no             ||'"'
   ||','||'"'||    birthday_str              ||'"'
   ||','||'"'||    blood_type_str            ||'"'
   ||','||'"'||    phone_str                 ||'"'
   ||','||'"'||    insurance_id_no           ||'"'  
   ------------ 
     as  csv_data_row
   from  source_query ;



--------------------------------------------------
-- Close the CSV file and terminate the script.
--------------------------------------------------
spool off

set termout on
prompt *********** Finished CSV ***********

exit


Now all you need to do is run the script. One easy way to do this is to create a folder and put two scripts there. For this example place the previous script in the folder and name it "CSVunload.sql". Create another script called "Go.bat" and put into it the following contents. The one important line of this script is the one that starts with "sqlplus". Edit this line an put in your appropriate 'username' and 'dbsid' (database identifier). Make sure the script name after the second "@" matches your previous SQL script name. Now, just double-click on the "Go.bat" file and enter your password (it pauses to ask you for that).

@echo off
echo =============================
echo ==  Unloading Data to CSV  ==
echo =============================
echo.

sqlplus -s   "username @ dbsid"   @CSVunload.sql

SET myerror=%errorlevel%
IF /I '%myerror%'=='0' GOTO TagOK
echo.
echo *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
echo *=*  ERROR !  The Dump to CSV Failed !  *=*
echo *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
echo.
GOTO TagAllDone
:TagOK
echo.
echo Dump to CSV completed OK.
echo.
:TagAllDone

pause


Remember that millions of rows may take a while to download and write to the file. When it finishes it should say that it completed OK. Excel by default will open a CSV file (it will look just like a normal Excel file) -- then if you desire you can 'save-as' an Excel type file. Even though I like CSV files much better.

-----

Here is one more example that will stop and prompt the user for three parameters to use in the query. It asks for an owner search string, a date, and a number. Then in the query it shows you how to use those 3 parameters to limit the result set. Note that when you prompt for a parameter "s_date" then in the query you use it with 2 ampersands and a period: "&&s_date."

These substitution variables work by replacing the "&&s_date." in the query with the exact text they contain before the query is submitted to the Oracle engine.

This script is provided without all the comments -- also note that the main query doesn't use a "WITH" clause -- either way should work fine.

prompt
accept  s_own_srch   prompt 'Enter which owners (search string): '

prompt
accept  s_anlz_dt    prompt 'Enter begin date for Last-Analyzed YYYY-MM-DD: '

prompt 
accept  s_row_ct     prompt 'Enter the MIN row count: '

prompt
prompt  ********** UnLoading Data **********
prompt

WHENEVER SQLERROR EXIT SQL.SQLCODE

   set pagesize        0
   set heading       off
   set feedback      off
   set trimspool      on
   set linesize    32767
   set termout       off
   set verify        off
   spool  Unloaded-Data.csv


   select   '"owner"'
   ||','||  '"table_name"'
   ||','||  '"tablespace_name"'
   ||','||  '"last_analyzed"'
   ||','||  '"num_rows"'
   -------
     as  optional_header_row
   from  dual ;


   select '"'||    owner             ||'"'
   ||','||'"'||    table_name        ||'"'
   ||','||'"'||    tablespace_name   ||'"'
   ||','||'"'||    last_analyzed     ||'"'
   ||','||'"'||    num_rows          ||'"'
   ------------ 
     as  csv_data_row
   from  ( -----------------------------------------
           -- Source Query Here
           -----------------------------------------
           select  owner
                ,  table_name
                ,  tablespace_name
                ,  to_char
                     ( last_analyzed
                     , 'YYYY-MM-DD HH24:MI:SS'
                     )  as last_analyzed
                ,  num_rows
             from  all_tables
            where  owner like upper('%'|| '&&s_own_srch.' ||'%')
              and  last_analyzed > to_date('&&s_anlz_dt.', 'YYYY-MM-DD')
              and  num_rows      > &&s_row_ct.
            order  by  owner
                    ,  table_name   
           -----------------------------------------
         )
   ;
   

spool off

set termout on

prompt *********** Finished CSV ***********

exit


Enjoy your day(ta)!

No comments:

Post a Comment