Loading compressed data into MySQL in batches from python

We used what is below to load 12 million rows into a laptop MySQL database in 48 minutes. I’m sure it can be improved, but we were initially OK with this result.

import gzip, os, sys, string, mysql.connector

fd = gzip.open(os.path.join("c:\\users\\showard\\downloads\\orders.txt.gz"))
conn = mysql.connector.connect(host='', user='root', passwd='******', db='test')
curInsert = conn.cursor()
stmt = "insert into dcspp_order values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"
rows = []
k = 0
for l in fd:
  l = l.decode()
  t = l.split(',')
  if len(t) == 22 and t[0].find("orders.txt") == -1:
    k = k + 1
    for i in range(22):
      if len(t[i]) == 0 or t[i] == '\n':
        t[i] = None      
      if k % 1000 == 0:
        rows = []
        k = 0

However, given the fact we could load the uncompressed file in seven minutes with MySQL tools, the choice became obvious.

mysql> load data local infile 'express_orders.txt' into table express_order fields terminated by ",";
Query OK, 12428870 rows affected, 966 warnings (7 min 15.24 sec)
Records: 12428870  Deleted: 0  Skipped: 0  Warnings: 966


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.