Q6.1: Alternative to row at a time processing


Someone asked how they could speed up their processing. They were batch updating/inserting gobs of information. Their algorithm was something as follows:
... In another case I do:
If exists (select record) then
        update record
else
        insert record
I'm not sure which wa[y] is faster or if it makes a difference. I am doing this for as many as 4000 records at a time (calling a stored procedure 4000 times!). I am interesting in knowing any way to improve this. The parameter translation alone on the procedure calls takes 40 seconds for 4000 records. I am using exec in DB-Lib.

Would RPC or CT-Lib be better/faster?

A netter responded stating that it was faster to ditch their algorithm and to apply a set based strategy:
The way to take your approach is to convert the row at a time processing (which is more traditional type of thinking) into a batch at a time (which is more relational type of thinking). Now I'm not trying to insult you to say that you suck or anything like that, we just need to dial you in to think in relational terms.

The idea is to do batches (or bundles) of rows rather than processing a single one at a time.

So let's take your example (since you didn't give exact values [probably out of kindness to save my eyeballs] I'll use your generic example to extend what I'm talking about):

Before:

        if exists (select record) then
           update record
        else
           insert record

New way:

  1. Load all your rows into a table named new_stuff in a separate work database (call it work_db) and load it using bcp -- no third GL needed.
    1. truncate new_stuff and drop all indexes
    2. sort your data using UNIX sort and sort it by the clustered columns
    3. load it using bcp
    4. create clustered index using with sorted_data and any ancillary non-clustered index.
  2. Assuming that your target table is called old_stuff
  3. Do the update in a single batch:
       begin tran
    
         /* delete any rows in old_stuff which would normally
         ** would have been updated... we'll insert 'em instead!
         ** Essentially, treat the update as a delete/insert.
         */
    
         delete old_stuff
           from old_stuff,
                new_stuff
          where old_stuff.key = new_stuff.key
    
        /* insert entire new table:  this adds any rows
        ** that would have been updated before and
        ** inserts the new rows
        */
         insert old_stuff
            select * from new_stuff
    
       commit tran
       
You can do all this without writing 3-GL, using bcp and a shell script.

A word of caution:

Since these inserts/updates are batched orientated you may blow your log if you attempt to do too many at a time. In order to avoid this use the set rowcount directive to create bite-size chunks.