In this post, I use what I learned in yesterday’s post to load the Web Robots’ Kickstarter data to PostgreSQL.
The python script for this task is found here.
Using Pandas for writing to a SQL database
One way to load data is to specify individual INSERT statements, one for each record. Alternatively, you can use pandas.DataFrame.to_sql
to write records stored in a DataFrame to a SQL database.
import psycopg2
from sqlalchemy import create_engine
engine = create_engine(
'postgresql://localhost:5432/{dbname}'.format(dbname=dbname))
pd.read_csv(data_file).to_sql(table_name, engine)
This second method is simpler, but I encounter the following error when I try to append data to the same SQL table.
for data_file in data_files:
pd.read_csv(data_file).to_sql(tblname, engine, if_exists="append")
DataError: (psycopg2.DataError) invalid input syntax for type double precision: "[]"
LINE 1: ...hnology/wearables?ref=category_modal&sort=magic', '[]', fals...
^
Debugging
Taking a look at the data types of the data frames (representing different files) which pass and fail, we find the data types are different.
df_pass = pd.read_csv(data_files[1])
df_fail = pd.read_csv(data_files[2])
for i,(p,f) in enumerate(zip(df_pass.dtypes, df_fail.dtypes)):
if p!=f:
print(df_pass.columns[i], p, f)
goal int64 float64
friends float64 object
is_starred float64 object
is_backing float64 object
permissions float64 object
Because there are column mismatches in data types, an error is returned.
Starting a new
To start afresh, we drop (ie. delete) the table if it exists before reloading the data.
cur.execute("DROP TABLE IF EXISTS {table};".format(table=tblname))
conn.commit()
Defining table structure
To solve the column mismatch problem, we need to first define the table column types before inserting the data into the table.
The following SQL statement is used to define the Web Robots’ Kickstarter table:
CREATE TABLE info (
index INTEGER,
id INTEGER,
photo VARCHAR,
name VARCHAR,
blurb VARCHAR,
goal FLOAT,
pledged FLOAT,
state VARCHAR,
slug VARCHAR,
disable_communication BOOLEAN,
country VARCHAR,
currency VARCHAR,
currency_symbol VARCHAR,
currency_trailing_code BOOLEAN,
deadline INTEGER,
state_changed_at INTEGER,
created_at INTEGER,
launched_at INTEGER,
staff_pick BOOLEAN,
backers_count INTEGER,
static_usd_rate FLOAT,
usd_pledged FLOAT,
creator VARCHAR,
location VARCHAR,
category VARCHAR,
profile VARCHAR,
spotlight BOOLEAN,
urls VARCHAR,
source_url VARCHAR,
friends VARCHAR,
is_starred VARCHAR,
is_backing VARCHAR,
permissions VARCHAR
);
Loading data turned out to be much more trickier than I thought. -- csiu.