As we move towards an enterprise data analytics platform, I take every opportunity I can to come up with simple jobs in Hadoop, Hive, and Pig. Below is one I ran in Pig that groups the top 50 URL requests…
Deleting profiles and all associated orders
Nothing fancy, just a simple component to delete all profiles with no activity in the last 180 days. We needed this in our development and QA environments to reclaim disk space. import javax.transaction.TransactionManager; import atg.commerce.order.*; import atg.commerce.pricing.*; import atg.dtm.*; import…
Query facebook posts using RestFB
import com.restfb.*; import com.restfb.types.*; import com.restfb.json.*; import com.restfb.util.*; import com.restfb.FacebookClient.*; import java.net.*; import java.io.*; import java.util.*; import org.json.simple.*; import org.json.simple.parser.*; public class fbLogin { public static void main(String args[]) { AccessToken accessToken = new DefaultFacebookClient().obtainAppAccessToken(“153104261535601″,”5f7acbbe49a0fcd29afe0280d4dadc6c”); System.out.println(accessToken.getAccessToken()); //User user = facebookClient.fetchObject(“me”,…
HDFS – Where are my file blocks?
At times I would like to know how a file is stored in HDFS. What is below will show which blocks exist for a given file, as well as on which nodes they are stored. import java.io.*; import java.util.*; import…
Simple example of tracking memory using getrusage
There is no real purpose to this post, as I rarely if ever write C code for any useful purpose anymore. However, what I do use it for is better understanding Linux system calls and other various operations in the…
Formatting garbage collection output with timestamps
If you are running a version of java where -XX:+PrintGCDateStamps is not honored, you can run what is below. For those that are unaware, by default garbage collection will print the seconds since the JVM process started. In the heat…