Saturday, January 16, 2016

The reverant - 5.0 out of 5 starsIntelligent movie alert! - Even if there was no plot its worth your time to watch this movie for the wild nature scenery!

amazing scenery. amazing story telling. I really felt like i'm there in the cold mountains at 1850, truely amazin. If you are intelligent you won't waste your time here though this is a long movie, its good enough for me to watch this movie for the pictures of the wild mountants this is a good enough reason to watch this movie even if it had no plot.

Wednesday, December 09, 2015

Scala Map and flatMap the missing explanation

Map
  1. you provide a function to map item -> item
  2. map scans every item in the input container
  3. map applies your function to each item
  4. map creates the same container as was originally provided (List for List) and wraps the results in it.
flatMap
  1. you provide a function to map item -> Container[Item] so flatMap expects you to create a container for each item you scan
  2. flatMap then creates the same external container
  3. flatMap then strips each item from the container it's in and adds it to the external container which it's creating the same - as the original container provided.

Sunday, November 15, 2015

Squash multiple push in git

And my way of squashing multiple push is (perhaps you pushed to your own branch many commits and now you wish to do a pull request and you don't want to clutter them with many commits which you have already pushed). The way I do that (no other simpler option as far as I can tell is).
  1. Create new branch for the sake of squash (branch from the original branch you wish to pull request to).
  2. Push the newly created branch.
  3. Merge branch with commits (already pushed) to new branch.
  4. Rebase new branch and squash.
  5. Push new branch.
  6. Create new pull request for new branch which now has single commit.
Example:
git checkout branch_you_wish_to_pull_request_to
git checkout -b new_branch_will_have_single_squashed_commit
git push -u new_branch_will_have_single_squashed_commit
git merge older_branch_with_all_those_multiple_commits
git rebase -i (here you squash)
git push origin new_branch_will_have_single_squashed_commit
You can now pull request into branch_you_wish_to_pull_request_to

Sunday, October 04, 2015

Best lecturers! Feynman Prize Recipients

Feynman Prize Recipients

2014-2015 Kevin Gilmartin, English
2013-2014 Steven Frautschi, Theoretical Physics
2012-2013 John Johnson, Planetary Astronomy
2011-2012 Paul Asimow, Geology and Geochemistry
2010-2011 Morgan Kousser, History and Social Science
2009-2010 Dennis Dougherty, Chemistry
2008-2009 Shuki Bruck, Computation and Neural Systems and Electrical Engineering
2007-2008 Zhen-Gang Wang, Chemical Engineering
2006-2007 Michael Brown, Planetary Astronomy
2005-2006 Richard Murray, Control and Dynamical Systems
2004-2005 Christopher Brennen, Mechanical Engineering
2003-2004 George Rossman, Mineralogy
2002-2003 Niles Pierce, Applied and Computational Mathematics
2001-2002 Joseph Kirschvink, Geobiology
2000-2001 David Stevenson, Planetary Science
1999-2000 Donald Cohen, Applied Mathematics
1998-1999 Emlyn Hughes, Physics
1997-1998 Barbara Imperiali, Chemistry
1996-1997 R. David Middlebrook, Electrical Engineering
1995-1996 Yaser Abu-Mostafa, Electrical Engineering and Computer Science
1994-1995 Erik Antonsson, Mechanical Engineering
1993-1994 Tom Tombrello, Basic and Applied Physics

Lecture 04 - Error and Noise

Queue animation with d3

Got a need to animate your queue? me too! Below is an example of how it works if this page works for you you will see below this section an animated queue. A few words of the components created for this animation: polling.js - responsibility - poll a server and get current queue state. animation.js - responsibility - know how to animate the queue. conf.js - responsibility - this component is generic tweak it for your needs . parser.js - responsibility - when server returns response via polling.js then parse it. queue.html - responsibility - example tie up the parts and provide example how to use.


Thursday, September 24, 2015

Referring to internal organization docker repository

in maven you set your internal repository with settings.xml in docker you can just pass it to the pull command for example:
docker pull $DOCKER_REPO_HOST:$DOCKER_REPO_PORT/$REPO_NAME

Tuesday, September 22, 2015

Masterpiece Alert! - Sergei Eisenstein: Que viva Mexico! (1931) Movie Review

Masterpiece! I don't give a damn I don't understand the language watching it without subtitles its just beautiful.



Masterpiece! I don't give a damn I don't understand the language watching it without subtitles its just beautiful. It's amazing to see like the actual life in mexico in 1931 its just amazing, pure fun. pure people, watching how they live. Very different from today's movies, documentary or nature, its just like you are there, you see the things, get back to 1931 and to mexico!!

Show column names on hive

Show column names on hive with hive> set hive.cli.print.header=true;

to make vertica work in intellij use dbvisualizer vertica #jdbc jar otherwise got many exceptions

See the response by kesten here his method worked for me: The driver i used to make vertica work on intellij is thedbvisualizer software vertica jdbc db driver. Example connection jdbc string:jdbc:vertica://HOST:5433/DBNAME`

Saturday, September 19, 2015

Battleship Potemkin movie review (1925) [HD]

There is something special about these old movies its like they invented everything!,



There is something special about these old movies its like they invented everything! At times i couldn't believe this movie is so old, it's like these old movies are inventing stuff and nothing got invented in the past 100 years of film makery. The movie is one big propaganda but i don't mind it, I heard that Eisenstein has got better movies than this one and i would like to watch it. What I really liked about it was the picture shots, those are great, you get to see how they dressed like 100 years ago and for real! you get to see how the athmosphere was like there, to see the sea there, the ships, its great to see all these stuff especially with such kind of a plot. I would not call it the plot of the year but it's worthwhile.

Monday, January 26, 2015

hadoop java.lang.IllegalArgumentException: Unknown codec: com.hadoop.compression.lzo.LzoCodec

if you just got:
2015-01-26 07:51:55,293 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201501260748_0001_m_000001_3: java.lang.IllegalArgumentException: Unknown codec: com.hadoop.compression.lzo.LzoCodec
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1847)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1714)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1728)
at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)

goto:  https://code.google.com/p/hadoop-gpl-packing/downloads/list

download the jar and the rpm install the rpm and then

sudo cp /opt/hadoopgpl/native/Linux-amd64-64/* /usr/lib/hadoop/lib/native/
sudo cp /opt/hadoopgpl/lib/* /usr/lib/hadoop/lib/

Wednesday, December 24, 2014

Hadoop keypoints for beginners

Hadoop keypoints for beginners

Key Concepts

  • Every mapper communicates with all reducers (potentially sending data to all of them).
  • Shuffle - communication from mappers to reducers.
  • Block Size - Files are split to blocks you can any file size it would just get split into blocks (64MB / 128MB ...)
  • Partitioner - Splits map result to reducer the by hash so same keys always reach same reducers. same key will always reach same reducer.
  • If you want your result reside in single file (cant be multiple blocks) you need a single reducer.

Hadoop default configuration

  • /etc/hadoop/conf

Jobs management

  1. Master sends the actual jar to be executed to data nodes.
  2. Hadoop sends data from mappers to reducers even before mappers finished in order to have reducers be able to already start working.
  3. hadoop jar is the command that will run your jar (you should have already uploaded it to hadoop cluster with -put
  4. Show list of running jobs mapred job -list

Hadoop Map Reduce

  1. FileInputFormat takes a file and if we have multiple blocks splits it to multiple mappers.
  2. You should have a main (not mapper and not reducer which is your main) it receives the command line parameters uses the FileInputFormat to split the file (which has usually multiple blocks to multiple mappers. The FileInputFormat uses the InputPath. We have similarly FileOutputFormat.
  3. When referring to Strings for example in output you refer to job.setOutputKeyClass(Text.class)
  4. You can give the inputPath multiple * for all files in directory or all directories or just use the Jobs api to add multiple paths.
  5. You have access to main Job object which handles the job from your main.
  6. The id of the key object brought to mapper is the offset from file (it has also the actual key).
  7. In mapper you use context.write in order to write the result (context is parameter to mapper).
  8. If you write a string as output from mapper you write it with new Text(somestr)
  9. mapper is a pure function it gets input key, value and emits key, value or multiple keys and values. So as its as much pure as possible its not intended to perform states meaning it will not combine results for mapping internally (see combiners if you need that).
  10. Reducers receive a key -> values it will get called multiple times withsorted keys.

Example Hadoop Map Reduce

You need to have 3 files.
  1. Job manager.
  2. Mapper.
  3. Reducer.
Let's see each of them in an exmaple.
1. JobManager
/**
   * Setup the job.
   */
  public static void main(String[] args) throws Exception {

    // inputs
    if (args.length != 2) {
      System.out.printf("Usage: YourJob  \n");
      System.exit(-1);
    }

    // Set job inputs outputs.
    Job job = new Job();
    job.setJarByClass(YourJob.class);
    job.setJobName("Your Job Length");
    FileInputFormat.setInputPaths(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    job.setMapperClass(YourMapper.class);
    job.setReducerClass(YourReducer.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(DoubleWritable.class);

    // exit..
    boolean success = job.waitForCompletion(true);
    System.exit(success ? 0 : 1);
  }
2. YourMapper
public class YourMapper extends Mapper<LongWritable, Text, Text, IntWritable> { // 

    @Override
    public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {

      final String line = value.toString();
      for (String word: line.split("\\W+")) {
         context.write(context, new Text(word), new IntWritable(1)); // write mapper output.
      }
    }
3. YourReducer
public class YourReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

  /**
   * Note you get a key and then a list of all values which were emitted for this key.
   * The keys which are handed to reducers are sorted.
   */
  @Override
  public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptedException {

      int count = 0;
      for (IntWritable value: values) {
          count++;
      }
      context.write(new Text(key), new IntWritable(count)); // we use context to write output to file.

  }

Tool Runner

Formalizes the command line arguments into hadoop jobs so you don't need to mess with them yourself they will have the same pattern. hadoop jar somejar.jar YourMain -D mapred.reduce.tasks=2 someinputdir someoutputdir

State VS Stateless

map reduce are inherently stateless (pure functions), very nice, you should stick to it as much as possible. however if you need state persisted (but reallymwar, try to find other ways before reverting to state) you will use the public void setup(Context context) method its a standard setup method. Likewise you have a cleanup method which is called after map/reduce finishes. It's quiete common that in cleanup if you used some states you will write out the state to disk. Note In order to use toolrunner your job main class should extends Configured implements Tool Then instead of main method you will override the run method inherited from toolrunner.
public class YourJob extends Configured implements Tool {

    @Override
    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        return 0;
    }
    .
    .
}
You still need the main method as you need to explicitly run the run method.
public static void main(String[] args) throws Exception {
  int exitCode = ToolRunner.run(new Configuration(), new WordCount(), args);
  System.exit(exitCode);
}

Combiner the local mapper mini-reducer

In wordcount mapper resulted with 1 for every word. Which is kind of silly. In order to reduce the noise and communication between mappers and reducers you can use a combiner to combine the mapper results.
job.setCombinerClass(MiniReducer.class) it can actually be your standard reducer.
mapper: map(pointer,house) --> map(house,1) mapper: map(pointer,house) --> map(house,1) combiner: map(house,1) --> map(house,2) Note: Combiner runs locally in the mapper. reducer: map(house,1) --> map(house,5)

No map reduce, simple HDFS access

You can also access the HDFS programatically without any relation to map/reduce.
Filesystem.get(conf);
Path p = new Path("/somepath/file");

Distributed Cache

A command to distribute files to all mappers so that when they start up they will already have the data locally (usually for configuration or other side data).DistributedCache.addCacheFile.. and more commands see its API for more info. you can skip using this api and just pass -files conf1.conf conf2.conf ... to toolrunner and it will distribute te in the hadoop jar task. Note that when mapper loads this file it would not need any directories it simply loads the filename no directory needed (mapper will have it in current directory).

Counters

Maps, Reducers can read and write (group, name) --> int-value should be used only for management not business logic (each of them can read any of the other). (not atomic).
context.getCounter("counter-group","counter-name").increment(1);

Useful hadoop commands

  • hadoop fs -rm -r myoutputdir
  • hadoop fs -cat myoutputdir/part-r-00000 | head
  • hadoop jar myjob.jar YourJobMain -D someparam=true inputdir outputdir
  • hadoop jar myjob.jar MyJobMain -fs=file:/// -jt=local inputdir outputdir(run local job)

Partitioners

  • TotalOrderPartitioner so that you can partition the data and still preserve some order in the different files (like a-h file1 h-t partition 2 and still preserve that the files are in similar size not most of the data in one file (for example if you have many words starting with m and only one starting with b)).

CAUSION

  • When you receive parameter to your reduce job Text value multiple times (obviously) then it might be the same reference but the value can be different (it will reuse the param reference to mapper but will change hte value so do not store the local value it might mean different in different runs.

Scripting languages

Hive
  1. Kind of an SQL for map reduce jobs.
  2. Can add user defined functions.
Pig
  1. Kind of a scripting language for map reduce jobs.

Other Components

  1. Data transfers - FlumeScoop
  2. Workflow management - oozieYARN
Impala
  1. Instead of start, read data, write data, those are living servers on hadoop cluster, so much faster (not standard mapreduce, not even mapreduce)
  2. Query language similar to HiveQL

Tuesday, December 23, 2014

Creating and accessing mysql with docker

docker run --name some-app --link some-mysql:mysql -d application-that-uses-mysql
docker ps
# Now to get the mysql ip address
docker inspect [result ip from ps for mysql instance]
# Replace below ip with real mysql ip result from inspect
mysql --host=172.17.0.5 --user=root --password=mysecretpassword