Joining data in Pig

If you are an RDBMS person that is used to joining data from tables, you will be happy to know that this capability exists in Pig.

Below is a very simple example.

grunt> fs -cat /people
1,1,steve howard
2,1,becky howard
3,2,john smith
4,2,susan smith
5,3,jeff wilson
6,3,regina wilson
grunt> fs -cat /address
1,7768 farm hill dr.,blacklick, oh , 43004
2,123 anywhere ave.,columbus, oh, 43215
3,456 stingle st.,louisville, ky, 12345
grunt> p1 = load '/people' using PigStorage(',');
grunt> p2 = load '/address' using PigStorage(',');
grunt> jnd = join p1 by $1, p2 by $0;
grunt> dump jnd;
(2,1,becky howard,1,7768 farm hill dr.,blacklick, oh , 43004)
(1,1,steve howard,1,7768 farm hill dr.,blacklick, oh , 43004)
(4,2,susan smith,2,123 anywhere ave.,columbus, oh, 43215)
(3,2,john smith,2,123 anywhere ave.,columbus, oh, 43215)
(6,3,regina wilson,3,456 stingle st.,louisville, ky, 12345)
(5,3,jeff wilson,3,456 stingle st.,louisville, ky, 12345)
grunt>

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.