Welcome! Log In Create A New Profile

Advanced

Add records from .csv

Posted by gpredl 
Add records from .csv
October 03, 2020 09:48AM
Hi,

in part of a program I'm adding thousands of records from a .csv file. It's slow. Using HWrite(File, Record#) in total time makes nearly no difference to HAdd(File). It's an SSD drive and there's plenty of cache for the drive. It takes about 1,5 hours to add 10.000 records on my computer which has a Xeaon processor. Our customers will have no Xeons, they will have i5s at best.

Is there any way to add records faster?

Kind regards,
Guenter Predl
office@windev.at
Re: Add records from .csv
October 03, 2020 01:22PM
Hi Guenter,

first idea was to hadd with primary key only and build the keys later, BUT:

there must be something weird with your import process: (other dependencys ?)

i just testet a csv-import

- 190'000 records
- HF-classic File , Record length = 62, with Primary (ID) and 4 other keys (2 single, 2 composites)
- Hadd
- SSD Drive, I7 Processor

--> took me 2 minutes

regards, Heinz
Argus
Re: Add records from .csv
October 03, 2020 01:23PM
I am using Fabrice's WXReplication... I'm saying that because it DOES add records by bunches during some processes and I have made some speed testing... So, comparison:
you 10000 records in 1.5 hours = 3600+1800 s = 5400 s so, ABOUT 0.5 second per records
WXReplication = about 2 or 3 HUNDREDTHs of a second per record

So I would say that your code behavior is NOT normal and you should have it looked at...

What comes to mind: the slow time is not coming from the hadd, but from the remaining of your loop (either extraction from csv or displaying information)...

If you don't find on your own, you should ask fabrice. He's very good at optimization.
Re: Add records from .csv
October 03, 2020 01:34PM
Hi Guenter,

just had a look at my code: i have a pogress bar and because this could slow down the process i do NOT update it with every record (gauge)
but only after 200 records read:

IF CNT > 1

SizeRead +=64

// gauge(sizeread, FileSize)
// Gauge update
//
itx ++
IF itx > 200 THEN
itx= 0
Gauge(SizeRead/10, FileSize/10)
Multitask()
END

IF Left(OneLine,2) = "99" THEN // last record
Gauge(100,100)
BREAK
ELSE
// -------------- do the import

maybe this helps

regards, Heinz
Re: Add records from .csv
October 03, 2020 03:33PM
Hello Heinz, hello Argus,

thanks a lot for your suggestions!! I'll look into the code to optimize the loop!

Yes, there are a lot of ExtractString(...) operations within the loop! Maybe, I can read the csv-records in a different way .... I'll tell here about the results!

Kind regards,
Guenter Predl
office@windev.at
Re: Add records from .csv
October 04, 2020 10:29AM
Hi,

1) I put the ASCII-file into a string by fLoadText(..) This sped up the process a little but not much.

2) I checked two items for being in the system by HReadSeekFirst(...) I did not need the data itself and converted to HSeekFirst(...) and this did the big difference! Now the import-process is fast enough!

Thank you for your help!

Kind regards,
Guenter Predl
office@windev.at
Re: Add records from .csv
October 04, 2020 03:32PM
Hi Guenter,

Yes, multitask en redraw a progress is quite slow compared to database actions. Doing that only every xxx records will help for sure.

if it is possible try to read the records you need for the HReadFirst ALL at once, at the beginning of the process.
Store them into an structured array and use ArraySeek instead.
This save you dozens of round trips to the database which is relatively slow.

It depends of course on how many records wil be involved, but I do this where ever possible to speed things up.

Arie
Argus
Re: Add records from .csv
October 05, 2020 02:47PM
> Store them into an structured array and use
> ArraySeek instead.

And to go faster than with the array, just do a query to read them all, and do hreadseek on the query (all in memory, very fast), this way you remove the transfer times between records and array.
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: