Below is a very simple example to get started on how Hive can be queried from a web front end. The idea behind this would be to present the user with a csv file mime type that would be opened…
Month: May 2013
Querying Hadoop from Tomcat
Below is a very simple example for how to print in Tomcat the contents of a file stored in HDFS. I am not entirely sure where this would be useful, for the following reasons: * Unless you have a list…
Pig script to group URL requests in JBOSS
As we move towards an enterprise data analytics platform, I take every opportunity I can to come up with simple jobs in Hadoop, Hive, and Pig. Below is one I ran in Pig that groups the top 50 URL requests…
Deleting profiles and all associated orders
Nothing fancy, just a simple component to delete all profiles with no activity in the last 180 days. We needed this in our development and QA environments to reclaim disk space. import javax.transaction.TransactionManager; import atg.commerce.order.*; import atg.commerce.pricing.*; import atg.dtm.*; import…
Query facebook posts using RestFB
import com.restfb.*; import com.restfb.types.*; import com.restfb.json.*; import com.restfb.util.*; import com.restfb.FacebookClient.*; import java.net.*; import java.io.*; import java.util.*; import org.json.simple.*; import org.json.simple.parser.*; public class fbLogin { public static void main(String args[]) { AccessToken accessToken = new DefaultFacebookClient().obtainAppAccessToken(“153104261535601″,”5f7acbbe49a0fcd29afe0280d4dadc6c”); System.out.println(accessToken.getAccessToken()); //User user = facebookClient.fetchObject(“me”,…
HDFS – Where are my file blocks?
At times I would like to know how a file is stored in HDFS. What is below will show which blocks exist for a given file, as well as on which nodes they are stored. import java.io.*; import java.util.*; import…
Simple example of tracking memory using getrusage
There is no real purpose to this post, as I rarely if ever write C code for any useful purpose anymore. However, what I do use it for is better understanding Linux system calls and other various operations in the…
Formatting garbage collection output with timestamps
If you are running a version of java where -XX:+PrintGCDateStamps is not honored, you can run what is below. For those that are unaware, by default garbage collection will print the seconds since the JVM process started. In the heat…
Does hadoop/HDFS distribute writes to all data nodes on ingest?
I like simple, command line test cases. Lather, rinse, repeat (do any shampoo bottles actually have that anymore 🙂 ?) I wanted to ensure I could prove that ingests to hadoop actually didn’t send everything through the name node, which…
One way to tell if a thread pool is hung
A hung ThreadPoolExecutor in java can be manifested in various ways. At least one, that I wanted to post for my own notes, is below. A thread dump may show something like what is below: “pool-9-thread-5” prio=6 tid=0x01b84400 nid=0x16e0 waiting…