Day43: Pandas to SQL

Posted by csiu on April 8, 2017 | with: 100daysofcode,

In this post, I use what I learned in yesterday’s post to load the Web Robots’ Kickstarter data to PostgreSQL.

The python script for this task is found here.

Using Pandas for writing to a SQL database

One way to load data is to specify individual INSERT statements, one for each record. Alternatively, you can use pandas.DataFrame.to_sql to write records stored in a DataFrame to a SQL database.

import psycopg2
from sqlalchemy import create_engine

engine = create_engine(
        'postgresql://localhost:5432/{dbname}'.format(dbname=dbname))

pd.read_csv(data_file).to_sql(table_name, engine)

This second method is simpler, but I encounter the following error when I try to append data to the same SQL table.

for data_file in data_files:
    pd.read_csv(data_file).to_sql(tblname, engine, if_exists="append")
DataError: (psycopg2.DataError) invalid input syntax for type double precision: "[]"
LINE 1: ...hnology/wearables?ref=category_modal&sort=magic', '[]', fals...
                                                             ^

Debugging

Taking a look at the data types of the data frames (representing different files) which pass and fail, we find the data types are different.

df_pass = pd.read_csv(data_files[1])
df_fail = pd.read_csv(data_files[2])

for i,(p,f) in enumerate(zip(df_pass.dtypes, df_fail.dtypes)):
    if p!=f:
        print(df_pass.columns[i], p, f)
goal int64 float64
friends float64 object
is_starred float64 object
is_backing float64 object
permissions float64 object

Because there are column mismatches in data types, an error is returned.

Starting a new

To start afresh, we drop (ie. delete) the table if it exists before reloading the data.

cur.execute("DROP TABLE IF EXISTS {table};".format(table=tblname))
conn.commit()

Defining table structure

To solve the column mismatch problem, we need to first define the table column types before inserting the data into the table.

The following SQL statement is used to define the Web Robots’ Kickstarter table:

CREATE TABLE info (
    index INTEGER,
    id INTEGER,
    photo VARCHAR,
    name VARCHAR,
    blurb VARCHAR,
    goal FLOAT,
    pledged FLOAT,
    state VARCHAR,
    slug VARCHAR,
    disable_communication BOOLEAN,
    country VARCHAR,
    currency VARCHAR,
    currency_symbol VARCHAR,
    currency_trailing_code BOOLEAN,
    deadline INTEGER,
    state_changed_at INTEGER,
    created_at INTEGER,
    launched_at INTEGER,
    staff_pick BOOLEAN,
    backers_count INTEGER,
    static_usd_rate FLOAT,
    usd_pledged FLOAT,
    creator VARCHAR,
    location VARCHAR,
    category VARCHAR,
    profile VARCHAR,
    spotlight BOOLEAN,
    urls VARCHAR,
    source_url VARCHAR,
    friends VARCHAR,
    is_starred VARCHAR,
    is_backing VARCHAR,
    permissions VARCHAR
);



Loading data turned out to be much more trickier than I thought. -- csiu.