{"id":3440,"date":"2016-07-29T09:02:24","date_gmt":"2016-07-29T14:02:24","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=3440"},"modified":"2016-07-29T09:02:24","modified_gmt":"2016-07-29T14:02:24","slug":"loading-compressed-data-into-mysql-in-batches-from-python","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2016\/07\/29\/loading-compressed-data-into-mysql-in-batches-from-python\/","title":{"rendered":"Loading compressed data into MySQL in batches from python"},"content":{"rendered":"<p>We used what is below to load 12 million rows into a laptop MySQL database in 48 minutes.  I&#8217;m sure it can be improved, but we were initially OK with this result.<\/p>\n<pre lang=\"python\">\r\nimport gzip, os, sys, string, mysql.connector\r\n\r\nfd = gzip.open(os.path.join(\"c:\\\\users\\\\showard\\\\downloads\\\\orders.txt.gz\"))\r\nconn = mysql.connector.connect(host='127.0.0.1', user='root', passwd='******', db='test')\r\ncurInsert = conn.cursor()\r\nstmt = \"insert into dcspp_order values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)\"\r\nrows = []\r\nk = 0\r\nfor l in fd:\r\n  l = l.decode()\r\n  t = l.split(',')\r\n  if len(t) == 22 and t[0].find(\"orders.txt\") == -1:\r\n    k = k + 1\r\n    for i in range(22):\r\n      if len(t[i]) == 0 or t[i] == '\\n':\r\n        t[i] = None      \r\n    try:\r\n      rows.append((t[0],t[1],t[2],t[3],t[4],t[5],t[6],t[7],t[8],t[9],t[10],t[11],t[12],t[13],t[14],t[15],t[16],t[17],t[18],t[19],t[20],t[21]))\r\n      if k % 1000 == 0:\r\n        curInsert.executemany(stmt,rows)\r\n        conn.commit()\r\n        rows = []\r\n        k = 0\r\n    except:\r\n      print(rows[0])\r\n      print(str(sys.exc_info()[1]))\r\n<\/pre>\n<p>However, given the fact we could load the uncompressed file in seven minutes with MySQL tools, the choice became obvious.<\/p>\n<pre lang=\"text\">\r\nmysql> load data local infile 'express_orders.txt' into table express_order fields terminated by \",\";\r\nQuery OK, 12428870 rows affected, 966 warnings (7 min 15.24 sec)\r\nRecords: 12428870  Deleted: 0  Skipped: 0  Warnings: 966\r\n\r\nmysql>\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>We used what is below to load 12 million rows into a laptop MySQL database in 48 minutes. I&#8217;m sure it can be improved, but we were initially OK with this result. import gzip, os, sys, string, mysql.connector fd =&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2016\/07\/29\/loading-compressed-data-into-mysql-in-batches-from-python\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[19,24,23,26],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3440"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=3440"}],"version-history":[{"count":7,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3440\/revisions"}],"predecessor-version":[{"id":3450,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3440\/revisions\/3450"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=3440"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=3440"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=3440"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}