Since I always forget (database management isn’t my day-job): a short guide on how to quickly import large datasets in TXT or CSV into PostgreSQL. For smaller sets I still use ruby and FasterCSV to import the set, but nothing beats native DB imports in terms of speed. And speed doesn’t matter when importing a few megabytes of data, but it certainly matters when it gets more than that.
In this example I’ll use my current use-case, importing a large Drive-Time Matrix table, with drive times and distances between two postal codes. The head of the TXT file is formatted as such:
"Frompc4","Topc4","Time","Distance"
"1011","1011",0,0
"1011","1012",6,1737
"1011","1013",9,3378
"1011","1014",13,6056
"1011","1015",10,3198
"1011","1016",10,3112
"1011","1017",6,1706
"1011","1018",5,1791
"1011","1019",7,3146
Now let’s assume we want to import this in a table ‘DTM’ with the following columns: from_pc (integer), to_pc (integer), time (integer), distance (integer). It is recommended to use the psql command, since PgAdmin may disconnect while the copy command is running (run psql -h localhost -U username -d database
).
A few ‘problems’ here, which we ‘ll tackle in the example:
The command we’ll be using is copy
, which typically is used like copy DESTINATION from SOURCE
.
Most sites tell you that you’ll be able to import a csv file using the following command:
\copy dtm from '/path/to/csv/DTM.csv' DELIMITERS ',' CSV;
But that doesn’t solve all issues: column names aren’t equal, the header row is imported as well, and Postgres may not make any assumptions about how to convert from string to integer.
Skipping the header is simple:
\copy dtm from '/path/to/csv/DTM.csv' DELIMITERS ',' CSV HEADER;
To map the columns, mention them in the csv-file’s order:
\copy dtm(frompc,topc,time,distance) from '/path/to/csv/DTM.csv' DELIMITERS ',' CSV HEADER;
Well, and in case you were worrying about type-conversion, for me it worked automagically :)
Enjoyed this? Follow me on Mastodon or add the RSS, euh ATOM feed to your feed reader.
Dit artikel van murblog van Maarten Brouwers (murb) is in licentie gegeven volgens een Creative Commons Naamsvermelding 3.0 Nederland licentie .