{"id":3710,"date":"2014-06-03T12:12:04","date_gmt":"2014-06-03T17:12:04","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=3710"},"modified":"2014-06-03T12:12:04","modified_gmt":"2014-06-03T17:12:04","slug":"comparing-web-content-using-python","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2014\/06\/03\/comparing-web-content-using-python\/","title":{"rendered":"Comparing web content using python"},"content":{"rendered":"<p>We have an active\/passive data center setup.  We replicate the database data using GoldenGate, and maintain the application software through regular build processes.  ATG publishing is included in the database data, and the targeter files are maintained via rsync.  We have not been able to force the passive data center to read the newly copied targeter files into memory, so we periodically restart the passive data center application servers so they are ready to take traffic.<\/p>\n<p>We wanted an alert whenever the home page differed for whatever reason, such as a new promotion being displayed on the site.  We used the BeautifulSoup library to accomplish this.<\/p>\n<pre lang=\"python\">\r\n#!\/usr\/bin\/env python\r\n\r\nimport sys, string, socket, urllib2\r\nfrom bs4 import BeautifulSoup\r\nfrom mymail import *\r\n\r\na = urllib2.urlopen(\"http:\/\/active\").read()\r\nc = urllib2.urlopen(\"http:\/\/passive\").read()\r\nsoup = BeautifulSoup(a)\r\nahp=soup.find(id=\"home-page-hero\")\r\nsoup = BeautifulSoup(c)\r\nchp=soup.find(id=\"home-page-hero\")\r\nif ahp != chp:\r\n  print \"content differs, must restart\"\r\n  pythonMail(\"foo@foobar.com\",['foo@foobar.com'],\"Content differs in passive DC\",\"Please restart ATG on ecm01-15 in the passive data center\",\"html\",\"2\")\r\nelse:\r\n  print \"content matches\"\r\n  pythonMail(\"foo@foobar.com\",['foo@foobar.com'],\"Content matches in passive DC, no action required\",\"\",\"html\",\"3\")\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>We have an active\/passive data center setup. We replicate the database data using GoldenGate, and maintain the application software through regular build processes. ATG publishing is included in the database data, and the targeter files are maintained via rsync. We&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2014\/06\/03\/comparing-web-content-using-python\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[37,38,26],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3710"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=3710"}],"version-history":[{"count":3,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3710\/revisions"}],"predecessor-version":[{"id":3761,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3710\/revisions\/3761"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=3710"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=3710"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=3710"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}