Here's a small set of CSV data -- intentionally made just a little bit messy. Note that some of the fields are enclosed by quote marks and others are not; if a string has a comma within it (such as some of the names below) then it MUST be enclosed by quotes. Notice that the first row of data is actually the list of column names. And, take note of the date format.
If you want to run this demo yourself copy the data and other files below into one single folder on your machine as indicated. You can supply directory paths along with the file-names, but putting everything in one folder eliminates the need for that & makes things a little simpler. This demo was executed on a Windows machine, but it works very much the same on a Unix machine (of course you would use a shell script file instead of a batch file).
Create this file as "SourceData.csv":
ID_VAL, JUNK, NUM_VAL, PERSON_NM, THE_DATE 8 ,yy, 37.251 , "Joe Jones" , "2009-08-18 13:27:33" "" , zzz , 6.471 , Sam Smith , "2010-12-30 16:13:39" , , , , 10 ,1, 0 , , 19 ,2, "0.25" , Bob Benson , "9999-99-99 99:99:99" "47" , 3 , 13.23 , "," , 2007-05-02 22:58:02 106 , , 44.732 , "Liz Loyd" , "" 2249 ,"", "" , "Nelson, Ned" , 1997-06-05 04:42:26 "32154", ,0.001,"fizwig, Ray, M","1913-02-04 18:01:00" "",abc,"42.1234",Sam b smiTh,"2010-12-30 16:13:39" "12",~,2.4,",",2007-05-02 22:58:02 ,,,, 98, ",",".9732","jAmes, jAn",""
We will need a table in which to load the data. Note that the column names do not need to match the names given in the CSV file. Execute this command in a SQL session -- you don't need to put this in a file on your machine.
/* ===== code ===== */ Create table scott.loader_demo ( id_num integer , some_dts date , name_txt varchar2(20) , extra_col varchar2(5) , val_num number ) ;
Next we will need a control file -- this is the heart of using "SQL-Loader". It defines where the data is coming from and where it is going; it contains all the details of how the load should proceed. In this example we see how to do a basic load of simple data-types. The following control file contains lots of comments -- all these, of course, can be stripped out.
Create this file as "ControlFile.ctl":
------------------------------------------------------------
-- SQL-Loader Control File
------------------------------------------------------------
OPTIONS
( skip=1 -- Note {op1}
, errors=10 -- Note {op2}
-- , load=100 -- Note {op3}
, direct=true -- Note {op4}
, rows=10000 -- Note {op5}
-- , bindsize=67108864 -- Note {op6}
-- , readsize=67108864 -- Note {op7}
)
------------------------------------------------------------
UNRECOVERABLE -- Note {L1}
LOAD DATA -- Note {L2}
INFILE 'SourceData.csv' -- Note {L3}
BADFILE 'LoadResults_BAD.log' -- Note {L4}
DISCARDFILE 'LoadResults_DISCARDED.log' -- Note {L5}
APPEND -- Note {L6}
INTO TABLE scott.loader_demo -- Note {L7}
-- WHEN id_num <> '' -- Note {L8}
-------------------------------------------------------
Fields Terminated by "," -- Note {x1}
Optionally Enclosed by '"' -- Note {x2}
Trailing Nullcols -- Note {x3}
------------------------------------------------------------
-- Note {c1}
( id_num
, junk_01 FILLER -- Note {c2}
, val_num
, name_txt "initcap(trim(:name_txt))" -- Note {c3}
, some_dts DATE "YYYY-MM-DD HH24:MI:SS" -- Note {c4}
)
------------------------------------------------------------
-- {op1}: Skip the first <n> header rows (column names).
-- {op2}: Allow <n> errors before aborting the load.
-- {op3}: Load only this many rows, then stop.
-- {op4}: Direct=true can be much faster; data-blocks
-- are directly built onto the table.
-- Direct=false does conventional inserts;
-- Indexes updated as each row is inserted.
-- Indexes and Constraints affect the
-- performance of either load type.
-- {op5}: Load <n> rows between each commit.
-- {op6}: For conventional loads, setting this to a
-- large value can increase performance.
-- {op7}: Works in conjunction with 'bindsize'.
---------
-- {L1}: Can speed up the load. In case of database
-- fail during load, loaded data is be lost.
-- {L2}: Section to specify files and tables
-- {L3}: The name of the CSV source data file.
-- {L4}: Error Records will be written to this file.
-- {L5}: Valid rows not loaded, written to this file.
-- (All-NULLs or rows filtered out by a WHEN)
-- {L6}: Table load method, choose one of the following:
-- INSERT, APPEND, REPLACE, or TRUNCATE.
-- {L7}: The name of the destination Oracle table.
-- {L8}: Load only records that meet a condition.
---------
-- {x1}: The field terminator ( usually <,> or <|> ).
-- {x2}: The quote character ( usually <"> ).
-- {x3}: Missing fields at end of record = NULL.
---------
-- {c1}: This section is the column list. The number
-- and order of the columns should match your
-- CSV data, not necessarily the target table.
-- The column names in this list should be the
-- column names of your target table.
-- {c2}: Use key word FILLER to ignore a column of CSV
-- data. The name in this list does not matter.
-- {c3}: Example using functions on CSV values.
-- Use quotes around functions as shown.
-- {c4}: Use key word DATE to convert CSV date values.
-- Format string in ("") must match CSV data.
------------------------------------------------------------
A further explanation of the control file is given at the end of this article.
The last thing we need to do is to execute the load -- the command can be issued from a command prompt; but in this example we put the command in a batch file. In Unix you could put the command in a shell-script.
In single quotes, you'll need to supply the proper 'user-name / password @ database-identifier' the last 2 elements point the load at your newly created control file and define the results log file. Now if you have a properly installed Oracle-Client, double-clicking on this batch file will execute the load.
If you don't want to put the password in the batch file just leave it out (as well as the preceding slash) and it will ask you for the password at run-time (i.e. 'scott @ DBSID' ).
Create this file as "GoLoad.bat": (Edit the Log-In information)
@echo off echo. sqlldr 'scott / tiger @ DBSID' control='ControlFile.ctl' log='LoadResults.log' pause
When the load finishes, you can observe that the specified data has been inserted into your table:
Don't forget to examine the three 'LoadResults' log files. They contain information about the load process. You will be able to see which records didn't load & why.
Another interesting detail is that you can combine your data and the control file as shown in the following example. Here the "INFILE" is specified as "*" -- and you need to include the "BEGINDATA" just prior to the lines of CSV data. This would be executed in the same way the previous example was executed.
---------------------------------------------------------------------- -- SQL-Loader: A Control File that Includes the DATA. ---------------------------------------------------------------------- OPTIONS ( skip=1 , rows=10000 , errors=10 ) ---------------------------------------------------------------------- LOAD DATA INFILE * BADFILE 'LoadResults_BAD.log' DISCARDFILE 'LoadResults_DISCARDED.log' APPEND INTO TABLE scott.loader_demo ---------------------------------------------------------------------- Fields Terminated by "," Optionally Enclosed by '"' Trailing Nullcols ---------------------------------------------------------------------- ( id_num , junk_01 FILLER , val_num , name_txt "initcap(trim(:name_txt))" , some_dts DATE "YYYY-Mon-DD HH24:MI" ) ---------------------------------------------------------------------- BEGINDATA "ID_NUM","junk","A_VALUE","FULL_NAME","THE_DATE" 8,"a",37.251,"Joe jones","2009-Aug-18 13:27" 6,b,23.9,Ray Rulon,1984-Jan-23 02:37 13 , c , 0.2 , Elie Ellison , 1998-Feb-13 13:28 ,d,6.471,sam SmItH,2010-Dec-30 16:13 47,"e",13.23,"","2007-May-02 22:58" 993,"bad-date",274.51,"Bad, Bob","01/01/2001" 421,"f",24.5," Sanders, slY ","2003-Sep-12 23:59" 2,"g",1432.8,"Jan Jepson","2005-Jul-05 16:02" 106,"h","44.732","Liz Loyd","" ,,,, 1,,,, 27,"i",1.567,"bOb bEnnie bEnson","2000-Mar-30 10:44" 7,,22.21,"Mel morris","1978-Apr-01 04:10" 2249,"j",,"Nelson, ned","1997-Jun-05 04:42" 0,"",0,"Empty",
Additional notes on the control file:
The first section of the control-file is the "OPTIONS" -- here you set a few details of the load process. The "skip" option allows you to skip as many rows as you wish at the beginning of the CSV file. Normally it is used to skip one row of column names if they exist. If there are no column names, just specify "skip=0".
The "rows" option allows you to specify how many rows will be loaded between each "Commit". For most circumstances, something like "rows=10000" would work just fine.
The "errors" option allows you to specify how many errors will be tolerated before the whole load process is aborted. If your data is very bad and you just want to load the rows that will load, you could set this to a high value.
The "load" parameter is usually included while testing a load; it tells the process to load a specific number of rows and then quit -- even if the CSV file has many more rows of data.
The next section "LOAD DATA" is where you define the file-names (in single quotes) and the destination table name (not in quotes).
The "INFILE" is your CSV data file that contains the records you wish to load.
The "BADFILE" will contain all the records from the load process that 'errored-off' for some reason.
The "DISCARDFILE" will contain all the records that didn't load because all the columns were NULL or were disqualified by the "WHEN" clause.
The keyword "APPEND" will insert the new records into the table leaving existing records in the table. If you specify "TRUNCATE" instead of "APPEND" then the table will be truncated prior to loading the new records.
You can include an optional "WHEN" clause with a condition that controls which CSV records will be loaded. Some possible examples include: id_num <> '' , id_num = '8' . Remember that this is not a "WHERE" clause and you can't use Oracle functions here -- it is actually quite limited.
The next section defines the delimiter (usually a comma) and the quote marks that can be used to enclose a value. "Trailing Nullcols" tells the loader to interpret missing values at the end of records as NULLs.
The last and very important section of the control file is the column list. Make sure that the order and number of columns match your CSV data -- and that the column names match your destination table (except for "FILLER" columns). So, the position in your control-list tells SQL-Loader which column in the CSV file to get the value from, and its name in your control-list tells SQL-Loader which column in your destination table the value will be placed in.
If you have a CSV column that you wish to ignore, give it any name (junk, dummy, etc.) and put the keyword "FILLER" immediately after. This tells SQL-Loader to ignore that CSV column of data.
Notice that you do not need to insert data into every column in your destination table. Table columns not listed in the control-list will be left NULL.
Date values should be followed with the "DATE" keyword and the appropriate date format string. CSV records that have dates not matching this string will be rejected.
It is also possible to manipulate the data during the load process. Notice that you can use Oracle functions on the values by enclosing the expression in double quotes as shown. Make sure you repeat the column name preceded by a colon inside the function expression.
A few more possibilities in manipulating the data:
Review & Going further... In your column list, remember that the number and position of the entries should match your CSV data file. ( CSV_Col_1 , CSV_Col_2 , CSV_Col_3 , CSV_Col_4 , CSV_Col_5 ) Mark the CSV columns that you won't use with the "FILLER" keyword. The column names of unused filler data do not matter. ( CSV_Col_1 FILLER , CSV_Col_2 FILLER , CSV_Col_3 , CSV_Col_4 , CSV_Col_5 ) The columns that you will use need to be labeled with the name of the table column where they will be loaded. ( CSV_Col_1 FILLER , CSV_Col_2 FILLER , val_num , name_txt , some_dts ) Manipulating the CSV Values! You can then add modifiers to alter the data. DATE values will need to have the format string included after the "DATE" keyword. To use incoming CSV values in an expression with Oracle functions, you refer to them with the name of their destination column as a bind variable (ex ":name_txt"). With that as the column-name you then use Oracle functions and enclose the whole expression in double-quotes as shown on the name_txt line. ( CSV_Col_1 FILLER , CSV_Col_2 FILLER , val_num , name_txt "initcap(trim( :name_txt ))" , some_dts DATE "YYYY-MM-DD HH24:MI:SS" ) Date Manipulation! There are a couple of extra things you can do with a date column. Instead of using the DATE keyword, you can use the "to_date" function. You can even add more functions to manipulate the date as shown on the third row below (this would truncate the date and set it to the last day of the previous month). Notice the fourth row shows how to assign the SYSDATE to a date column -- incoming CSV values are ignored. , some_dts DATE "YYYY-MM-DD HH24:MI:SS" , some_dts "to_date( :some_dts ,'YYYY-MM-DD HH24:MI:SS')" , some_dts "trunc(to_date( :some_dts ,'YYYY-MM-DD HH24:MI:SS'),'MM')-1" , some_dts "sysdate" Using Constants! You can also put a constant number in a numeric field or a constant string value in a Varchar2 field -- ignoring the CSV values. Enclose numeric constants in double-quotes; enclose string constants in single-quotes then double-quotes. ( CSV_Col_1 FILLER , CSV_Col_2 FILLER , val_num "1234" , name_txt "'Bill Bagnnn'" , some_dts DATE "YYYY-MM-DD HH24:MI:SS" ) Adding a Sequence! If you have an extra column in your table where you would like to put a sequence number, then put that column name at the end of your list and the keyword "SEQUENCE". Actually it appears that you can put it anywhere in your list: first, in-between, or last. It also does not have to be a numeric column. The two values in parenthesis are the starting sequence number and then the increment value. If you are appending to a table instead of truncate-and-loading, then you can use "SEQUENCE(MAX,1)". That will find the maximum sequence value already in your table an then increment from there. ( id_num , CSV_Col_2 FILLER , val_num , name_txt "initcap(trim( :name_txt ))" , some_dts DATE "YYYY-MM-DD HH24:MI:SS" , extra_col SEQUENCE(1,1) ) That's it!
Thank you. This page is a thing of beauty.
ReplyDeleteHi Folks,
ReplyDeleteI have a data file fields seperated by comma (,) and enclosed by quotations. But inside the quoted
string there are again commas.
Example
"A Street","T/A, slow","123 4567 8901"
How to load such data. Please help.
control file looks likes this.
LOAD DATA
APPEND
INTO TABLE CM_A
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
STREET CHAR "RTRIM(:STREET)",
NAME CHAR "RTRIM(:NAME)",
PHONE CHAR "RTRIM(:PHONE)"
)
Please help
Notice in the example in my post... there is a comma in a couple of the name values. The way I've shown it deals with these commas appropriately.
ReplyDeleteThis was a great help!, thanks a lot!
ReplyDeleteThis is a great example with complete description.Thanks a lot steve.Very happy to learn from blog.Please post many topics like this with complete example .Its awesome .Thanks a lot.
ReplyDelete