Tuesday, December 24, 2013

Hello Hive!

New year....new job!

So the next chapter is in ad-tech and big data. Before a few weeks ago I had only heard of hadoop and hive from a couple articles I had read. Wasn't really sure what MapReduce even meant other than

"MapReduce is a programming model for processing large data sets with a paralleldistributed algorithm on a cluster" - wikipedia

.....

umm right.....

Fast forward to now.  After some more reading and understanding the above statement is much clearer.  Hadoop basically cuts a "job" up and a master node tells a bunch of worker nodes what part of the data to work on and how to work on them.  This is all abstracted out from the programmer however as once Hadoop is configured the programmer just runs his/her job and hadoop handles the distributing.

So what's hive?   Hive is making hadoop SQLesque.  It's an abstraction of hadoop to let programmer brains reuse SQL skills instead of having to write hadoop map reduce classes all day long.

Anyway on to an example!

The source is available @ https://github.com/asharif/hello_hive

You need to install hadoop and hive and have both available in the PATH.

There are only two files here.  hello_hive.q and hello_hive.py.

hello_hive.q

CREATE TABLE words (word STRING ) ROW FORMAT DELIMITED LINES TERMINATED BY '\n' STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/Users/asharif/development/hello_hive/input' OVERWRITE INTO TABLE words;

add FILE hello_hive.py;

INSERT OVERWRITE TABLE words
SELECT
TRANSFORM (word) USING 'python hello_hive.py'
AS (word)
FROM words;

SELECT * FROM words;

The above hive script creates an imaginary table using the file in the given path (don't forget to change that path!).  It uses the '\n' as the row delimiter so each new line is a new row in the table.

Then the fun part.  A simple python script is added and the python script is used to overwrite the contents of the table.

hello_hive.py


import sys

for line in sys.stdin:
        print "hive row: " + line

As you can see all the python script does is prepend the text "hive row: " to each row.

Now if you run this

hive -f hello_hive.q


you will get the following output:


hive row: i

hive row: am

hive row: the

hive row: best

hive row: i

hive row: am

Time taken: 0.098 seconds, Fetched: 12 row(s)

Sunday, November 10, 2013

FizzBuzz

So I started interviewing after about 5 years for a new position. I was surprised how many places required simple algorithms and sophomore/junior year computer science trivia. This skews the interview in favor of new grads as they are not too far removed from these classes and they are use to artificially solving questions without a compiler and on paper. Maybe this is on purpose to filter out the older crowd?

Word to the wise the interview landscape has changed a bit. Nobody cares what you have done or your experience. Anyway us "old" timers (30 somethings lol) must adapt or become real estate agents. What is the best way? Lots of practice and studying. I busted out my "Intro to Algorithms" by Cormen and went to work. Here is my fizzbuzz repo to help fellow coders out.

https://github.com/asharif/fizzbuzz

Now next step. I need to buy Cracking the coding interview http://www.amazon.com/Cracking-Coding-Interview-Fourth-Edition/dp/145157827X and bust out my notebook and try to get the little practice problems right on paper.

Monday, September 30, 2013

Sorting algorithm refreshers........Quick Sort!

Another sorting algorithm post. This is probably the most famous one. Operating at O(n log n) it's considered very good and not crappy at all! I might make some more posts with other sorts like merge sort and radix sort, but I think I've reviewed sorting enough for now!

steps are the following:
  1. Choose a pivot (I just pick the right most in array)
  2. Parition the array such that all values less than pivot are to the left and all values greater than pivot are to the right of the array
  3. Place the pivot in it's correct spot after partitioning
  4. Recursively do the previous steps for each array partition


#include


int list[] = { 3, 4, 1, 10, 23, 40, 2, 9};
int arr_length = sizeof(list)/sizeof(list[0]);

int partition(int left, int right, int pivot);
void rec_quick_sort(int left, int right);


int main(void) {
    
    
    rec_quick_sort(0, 7);
    
    
    
    
    for( int i = 0; i < arr_length; ++i) {
        printf("%d ", list[i]);
    }
    
    printf("\n");
    
}


void rec_quick_sort(int left, int right) {
    
    if( right - left <= 0 ) {
        return;
    }
    
    int pivot = list[right];
    
    int partition_i = partition(left, right, pivot);
    rec_quick_sort(left, partition_i-1);
    rec_quick_sort(partition_i, right);
}

int partition(int left, int right, int pivot) {
    
    
    int left_p = left -1;
    int right_p = right;
    
    while(1) {
        
        
        //find bigger than pivot
        while(++left_p < right_p && list[left_p] < pivot);
        
        //find smaller than pivot
        while(--right_p > left_p && list[right_p] > pivot);
        
        
        if(left_p >= right_p) {
            break;
        
        }else {
            
            int temp = list[left_p];
            list[left_p] = list[right_p];
            list[right_p] = temp;
        }
        
        
    }
    
    int temp2 = list[left_p];
    list[left_p] = list[right];
    list[right] = temp2;
    
    return left_p;
    
    
    
}

Sorting algorithm refreshers........Insertion Sort!

Here is the 3rd and final of the "crappy" sorting algorithms. It too performs at the O(n^2) growth rate. This algorithm is better than the previous 2 however since it is designed for inserting. This means that the algorithm assumes part of the array is sorted so therefore less swaps.

  1. Start with a outer loop that starts ONE index above the very first element and counts till the end of the array
  2. Store the contents of the array at the outer loops index
  3. Inside have another loop that is responsible for shifting all the values less than/greater than the value in array pointed to by the outer loops index
  4. Once everything has been shifted place the value from step 2 into the newly created space from step 3
#include

int main(void) {
    
    int list[] = { 3, 4, 1, 10, 23, 40, 2, 9};
    
    int arr_length = sizeof(list)/sizeof(list[0]);
    
    for(int i = 1; i < arr_length; ++i){        
        
        int curr = list[i];
        int curr_i = i;
        
        while(curr_i > 0 && list[curr_i-1] > curr ) {
            
            list[curr_i] = list[curr_i -1];
            curr_i--;
        } 
        
        list[curr_i] = curr;
               
    }
    
    for( int i = 0; i < arr_length; ++i) {
        printf("%d ", list[i]);
    }
    
    printf("\n");
    
}

Sorting algorithm refreshers......Selection Sort!

So post two of sorting algorithm refreshers. This time it's Selection Sort with the same crappy O(n^2) growth rate but with possibly less swaps.

  1. Start an outer loop from the start of the array counting till the end of the array
  2. Start an inner loop starting from the outer loops counter and going till the end of the array
  3. Using the inner loop keep track of the smallest/largest value
  4. After inner loop is complete swap the smallest value you found with the contents that is pointed to from the index of the outer loop
#include

int main(void) {
    
    int list[] = { 3, 4, 1, 10, 23, 40, 2, 9};
    
    int arr_length = sizeof(list)/sizeof(list[0]);
    
    for(int i = 0; i < arr_length; ++i){        
        
        
        int min = i;
        for( int j = i+1; j < arr_length; ++j) {
            
            if( list[j] < list[min]) {
                min = j;
            }
        }
        
        if ( min != i) {
            
            int temp = list[i];
            list[i] = list[min];
            list[min] = temp;
        }
       
        
    }
    
    for( int i = 0; i < arr_length; ++i) {
        printf("%d ", list[i]);
    }
    
    printf("\n");
    
}

Sorting algorithm refreshers...Bubble Sort!

So it's been awhile since I did my sorting algorithms. It's funny how sophmore/junior year computer science can trip me up if I haven't thought about it in awhile. Anyway the next few posts will be for future reference. They are all in C and I'll start with the root of all evil Bubble Sort and it's very very bad O(n^2) growth rate!

Steps for bubble sort:
  1. Start at the end of the array with the outer loop and count backwards till the start of the array.
  2. Have an inner loop start at the start of the array and count forward till it reaches the index of the outerloop from step 1.
  3. Within the inner loop if you meet any two elements (j & j+1) that are not sorted....sort them!

#include

int main(void) {
    
    int list[] = { 3, 4, 1, 10, 23, 40, 2, 9};
    
    int arr_length = sizeof(list)/sizeof(list[0]);
    
    for(int i = arr_length-1; i >= 0; --i){        
        
        for(int j = 0; j < i; ++j) {
            
            if(list[j] > list[j+1]) {
                
                int temp = list[j];
                list[j] = list[j+1];
                list[j+1] = temp;
            }
        }
        
    }
    
    for( int i = 0; i < arr_length; ++i) {
        printf("%d ", list[i]);
    }
    
    printf("\n");
    
}


Wednesday, September 25, 2013

Groovy why do you suck so badly?

I have had it. I've tried it. I hate it. I hate the hype. I hate the people who use it. I hate all of it. I'm going going back back to Java Java.... Check out this exception:
GroovyCastException: Cannot cast object '54' with class 'java.lang.String' to class 'java.math.BigDecimal'
I NEED a BigDecimal. Do I seriously have to cast the class to an int first? What good is groovy if it can't figure such a thing out. It wastes way more time dealing with absolute BS things like this than to actually write Java. Groovy isn't even a real language. Just a bunch of template hacks on Java. I'm over it. I'm going back to Java.

Wednesday, September 18, 2013

Circuit Design, Embedded Programming with C and JavaFX

So awhile back I decided that I would not be shown up by EEs with their cool understanding of manipulating the physical world while all I was able to do was manipulate the virtual world. I decided to learn circuit design and embedded programming. I remember seeing a cool youtube video of lights dancing around in tune with music frequency so I thought let me try.

Anyway, once I got the basic breadboard and electronic components down it was rather ez. Electronics is not that much different than programming really. There are components that help you control the flow of current (not unlike software components that help you control the flow of data). The micro controller I used was an Atmega168. It took care of the analog to digital conversion. I wrote a simple program in C and flashed the microcontroller with it. All it did was continuously read from a data in port (serial) and opened one of it's I/O ports to let current through to LEDs. Then on the computer I wrote a JavaFX app that detected the frequencies in a music file and in real time wrote to a port when the frequencies were at certain levels.

Anyway here is the example. I can put the source code up if there is any interest.

Monday, September 16, 2013

Java String.toUpperCase() ...what the?

Just the other day I ran into a strange strange bug. I had a string of characters that I had to build. And as a delimiter the host system I was communicating with used char 254. Anyway I build out my string and sent it to the host. On the host I was receiving char 222 as the delimiter! After scratching my head and looking into it deeper it seemed odd that

hex : FE, binary: 11111110

was turning into

hex: DE, binary: 11011110

I tried the Locale.getDefault() and Locale.ENGLISH to no avail.

Could it be that the implementation of String.toUpperCase has a mask for ALL chars except specific hard coded ones? I have no clue, but I wrote the following to get around the problem:

public static String toUpperCase(String input) {
        
        char[] chars = input.toCharArray();
        
        
        for(int i = 0; i < chars.length; ++i ) {
            
            if( chars[i] > 96 && chars[i] < 123 ) {
                
                chars[i] &= 223;
            }
            
        }
        
        return new String(chars);
        
    }

lower case ASCII values a-z are 97-122 so I just bitwise AND it them with 223 (11011111) to get the upper case equivalents. UPDATE: Seems like Java is working perfectly well and it was my understanding. Char 254 is a real character. It's uppercase is char 222! See the following: http://www.scism.lsbu.ac.uk/jfl/Appa/appa4.html

Sunday, April 7, 2013

ClassStalker......hot reload for any java maven app without Jrebel! Yes (save refresh like Play framework or Grails now in vanilla Java)

Credit goes to spring source and vmware. I just wrote some utility code to let any maven app have save refresh changes without full recompiles and incremental builds. And no need for Jrebel! This is much the benefit of Play! and Grails..and of course the other dynamic languages. Here is the link. Read the readme or just dump all files under bin in the root of your project and run the run.sh file https://github.com/asharif/classstalker

Sunday, March 31, 2013

bootstrap4j, ez way to start open source java web development

So when I first started Java web development I started with open source and I was completely lost.  So many frameworks so many libraries.  It took me what felt like forever to get settled.

Maven archetypes were essentially what helped me understand how to control everything and not commit 500 jars to my source code repo. Sill even with maven, getting a hello world up with all the standard frameworks was very time consuming.

I decided to make a maven archetype.  Include all the things of a standard application.  I was originally going to just publish it to the maven archetype repo but thought this might be even simpler for newbies.

https://github.com/asharif/bootstrap4j

just read the readme file, it's only two steps.  The first time it runs it will take awhile as it's downloading all the jars for the project template but after that you can get up and running with Java development in a few second.....that's right not a few days...lol

Wednesday, February 6, 2013

Image to Zebra GRF

So we use a Zebra printer at work to print thermal labels. Well I wanted to add an image to the label. Zebra makes you use a windows only all in one tool that is way too bloated for my taste. So a bit of research and I made my own to convert BMP images to GRF ascii. It's in Java and it now converts multiple different image formats. Note that the branch is still under testing since I don't have a zebra printer with me. Branch is nativeImageAPI https://github.com/asharif/img2grf