Get me outta here!

My missives

A technologist's view of things …

Menu

Skip to content
  • Home
  • Current Focus – The pitter patter of small feats
    • Cassandra Links
    • CouchDB Links
    • Graph DataStores
    • HBase
    • Links on CAP Theorem
    • MongoDB Links
    • NOSQL Debates
    • NOSQL Discussions
    • NOSQL Internals
    • NOSQL Tales From The Field
    • NOSQL-Introduction[Open This First]
    • Other NOSQL Links
    • Redis Links
    • Riak Links
    • Voldemort Links
  • Publications & Open Source
  • Upanayanam
  • Used To Work On …

Author Archives

ksankarhttp://doubleclix.wordpress.com/Am a technologist by profession - interests include Cloud Computing, Big Data, Analytics Clouds, Precision Time Synchronization, Ubuntu Linux, Robotics, ...

TCS Siruseri Campus

May 18, 2013 by ksankar

Today am visiting the TCS Siruseri campus in Chennai. Very elegant & interesting structure built by an Urguan Architect.

  • The legend has it that while discussing the architecture, Carlos drew butterflies & that stuck as the theme.
  • A majestic 5-floor-high open Atrium corridor forms the spine of the butterfly with 6 buildings forming the wings, the buildings themselves are butterflies w/ a small spine elevator bank in the middle and the two north & south wings buildings.
  • The side view below shows three buildings EC1(right),EC2 & EC3. I am in EC3.
  • The spine atrium on the far side has a pond, benches, shops & interesting restaurants – The Saravana Bhavan food is exotic. The spine even has a Subway !
  • The campus hosts > 20,000 associates

The second picture below shows the full 6 buildings and the observation tower (which is still under construction)

image description

1417_1

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
Good Things in Life architecturesiruseriTCS Leave a comment

Scaling Big Data – Impermium

May 8, 2013 by ksankar

Came acorss an informative blog on scaling big data – “Built to Scale: How does Impermium process data?” Quick notes from the blog:
Doodle-Stevenson

  1. Don’t fall in love with a technology so much that you cannot be separated – Be flexible in scaling as you grow

    • “Parting is such a sweet sorrow”, but change is an essential component of an infrastructure at scale 
    • The technology selection and consumption should be a continuous process, introducing new technologies as needed by the growth. I found Impermium’s path from grep to Solr to Elastic Search very illuminating; I have done the same before.  
  2. Technology needs are not static

    • A corollary of #1 above – Growth on all parts of the stack will not be uniform.
    • For example Impermium found scaling challenges in search and they moved to Solr & then to Elastic Search
  3. There are no perfect technologies

    • If you are doing interesting work, be ready to tango with open source code. This is essential – I also found this to be true.
    • Even if you don’t plan to change the code, many times deep understanding comes from reading the code
  4. Select technologies that you can dance with

    • The flip side is that one should select technologies that you are comfortable working under the hood.
    • In my case, while I love Erlang, I am not that comfortable with that language. So given a chance, I will go with Java or Scala
  5. Benchmark is nothing but a story in a specific context

    • So true. Benchmarks are transitory & personal.
    • Understand them, but they need not be true for your transforms, your data model and your processing.
    • Benchmark early & benchmark often … with your scenarions, models, transformations, mapreduces & data

Thanks Young for the short but very interesting blog. Keep up the good work …

Cheers

<k/>

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
scalability Technology and Software Leave a comment

Kareem Abdul-Jabbar: 20 things I wish I had known when I was 30

May 1, 2013 by ksankar

Kareem-Abdul-Jabbar-Skyhook-WallpaperKareem Abdul-Jabbar has an excellent blog at Esquire on 20 pieces of advise to the younger self at 30. Thanks to Jason Hiner‘s tweet.

The blog & the comments are a must read.

A few of them hit home for me:

  • Be patient
  • Listen More than Talk
  • Being right is not always the right thing to be
  • Do one thing every day that helps someone else.
  • Do one thing every day that you look forward to doing. 
  • Don’t be so quick to judge.
  • Everything doesn’t have to be fixed.
  • Play the Piano
  • Become Financially Literate

Ref: Wallpaper from http://www.basketwallpapers.com/USA/Kareem-Abdul-Jabbar/

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
Good Things in Life Leave a comment

Data Science Engineers – The new breed of Data Scientists ?

April 16, 2013 by ksankar

While there is lots of interesting discussions about Data Scientists, or lack there of. The role of Data Science in Big Data is well understood. I think the need is actually for Data Science Engineers. I had a set of pictures explaining this concept and interestingly came across a blog by HortonWorks on the topic of Data Scientists.

Image

 

Data Science Engineers 01-01

What says thee ?

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
NOSQL 1 Comment

Google Doodle on Euler’s 306th Birthday

April 16, 2013 by ksankar

Image

 

Good work guys …

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
NOSQL Leave a comment

Is our Neocortex a Giant Semantic Bloom Filter ? Of Natural Intelligence, Machine Learning & Jeff Hawkins

April 14, 2013 by ksankar

L’Apéritif:

Image

In a set of four lectures spanning about 3 years, Jeff Hawkins explains how & why big data can only be solved by evolutionary-adaptive-continuously-learning models incorporating principles from the working of Neocortex.
It does make sense – especially for NLP, NLU & Knowledge Representation. I am a big fan of the Borgs and their coordinated intelligence.

These are my annotated picture-notes …

L’Entrée:

Let me begin at the beginning. The other day I came across 4 very interesting talks by Jeff Hawkins on Biological Inspired Machine intelligence.

Call it serendipity because we have been looking for more effective ways for Knowledge Representation (KR) & Natural Language Understanding (NLU)

For example movie names, while very easy for humans to understand, a MaxEnt NER finds it very hard.  Knowledge Representation & Association is more harder !

We are experimenting with a few techniques like word-based tries (ie. spell-check sentences by words), higher order federated Bloom Filters and n-gram hashing. Planning to incorporate some of Jeff’s ideas …

I digress … Topics for another day … back to Jeff & Machine Intelligence …

Very inspiring, extremely thought provoking talks – as usual the inimitable Jeff Hawkins at his best

  1. Google Tech Talk : Jeff Hawkins, “Building Brains to Understand the World’s Data“
  2. UC Berkeley Graduate Lectures
    • “Intelligence & Brain : recent Advances in Understanding How the Brain works” by Jeff Hawkins
    • “Intelligence and Machines: Creating Intelligent Machines by Modeling The Brain” By Jeff Hawkins
  3. “Advances in Modeling Neocortex and its impact on Machine Intelligence” by Jeff Hawkins,  Smith Group Lecture presented at the Beckman Institute for Advanced Science & Technology at the University of Illinois at Urbana-Champaign

Le Plat Principal:

The four talks have lot of depth and are packed. Moreover Jeff talks very fast – I listened to the talks a few times – at least 3 hrs per one hour talk. You should listen to them slowly & rewind as reqd. It takes a few hours to get one’s head around the various ideas.

Let me annotate a few of his slides – those I was able to internalize to some extent:

Focus & premise[3]:

Hawkins-100-02-01

The assertion, that many problems can only be solved by incorporating principles from the working on Neocortex, is interesting.

BTW, it does make sense – especially for NLU & Knowledge Representation.

As Jeff mentions later, the behavior need not be human-like, but the representation, interpretation & “understanding” would be.

Neocortex Architecture[3]:

“Neocortex is just a sheet of cells  2mm thick, the size of a dinner napkin” – Amazing what it can do!

Hawkins-100-03-01

The Six Principal Essentials of Biological Intelligence

The picture says it all.

Hawkins-100-04-01

Learning involves training and adaptive connections

Hawkins-100-05-01

The concept of streaming events & the learning mechanisms

Patterns from complex data streams

Hawkins-100-06-01

The paper “Hierarchical Temporal memory” has the gory details about the Hierarchical Temporal Learning.

Future

Hawkins-100-09-01

Interesting observation: Emotion, the fundamental aspect of being human, is not a requirement for intelligence – reminds us of Spock, of course.

Machine intelligence is not about replicating human behavior or even passing the turing test. I agree on this – we need the machines to think & do things we cannot do thus augmenting us. Make us stronger where we are weak !

Le Digestif

What interested me most was the sematic knowledge representation, NLP & NLU. The ability to understand and store concepts, the capacity to generalize as well as the mechanisms of strengthening and weakening connections based on external signals – just beautiful …

Agree that the Sparse Distributed Representation could be the language of all the intelligent machines.

The SDR looks a lot like a giant Bloom Filter

Hawkins-100-10-01

Hawkins-100-11-01The planes can be considered as rows and a column as the temporal dimension of the semantic mapping (the memory of sequences). Which equates to a giant n-dimensional Bloom Filer – a data structure we can grok (Pun intended as Jeff’s product is called Grok!).

The bloom filter analogy, while extremely simplistic, is conceptually congruent, in the sense that “similar values have similar representation”, of course depending on the hash algorithm.

After listening to the talks and thinking them over, I have a thousand questions in many directions. I will post the answers as we develop this through for our needs. Please send in your insights as comments to this blog. AM sure it will help a few folks !

Hawkins-100-12-01

  1. How do we handle semantic categories ? 
  2. How do we build more sophisticated representations based on spatial patterns ?
  3. What is the hash function that maps a slice of semantic to this giant Bloom Filter ?
  4. How does it handle collision? Corruption ? Clustering for resiliency/self adjusting representation ?
    • Collision might be good and I think that is what Jeff calls as semantic generalization
  5. How does the semantic slice mapping function differentiate between a search & computation to trigger appropriate actions?
    • For example the following two questions require different actions: 
      • “What is stock price of IBM ?” vs.
      • “What is the volatility as reflected in the beta of IBM for this quarter ?” 
      • The first one is a search while the second has computation …
  6. Is the hash function same for all of us or is it different for each person ?
    • Most probably the function is a learned artifact.
  7. Another interesting vector is the Hierarchy & higher patterns of temporal coalescence/slowness – the high-order capability, tweaking the learning rates across the layers.
    • How can this be modeled with the analytical data structures we have?
    • And what are the mechanics for stable representation of pattern sequences – because with dynamicity and temporality comes the difficulty of snapshots and consistency between them.
    • The unique representation of the same sequence, at a later time in context of the earlier invocation is interesting …
  8. How do we “put a classifier on the top” ?
    • Play with permanence? Probability?
  9. What are the algorithms to prevent run away prediction?
    • I agree that we could account for rapid state difference vs. slower state; we still will have to encapsulate it in some form of code

Finally, can we build “Amazingly Intelligent Machines™?” Yes We can !

And agree with Jeff that “It is essential, for the survival of the spices, that we build them” …

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
Analytics Artificial Intelligence Big Data Good Things in Life Machine Learning 1 Comment

The Big Data Convergence

April 8, 2013 by ksankar

As we scan the concepts, technologies, products and the practices in the big data space, lot of things get muddier.

Neither the progression nor the boundaries are clear. We are still in the descriptive stage in terms of the application of the analytics technologies.

I had a good conversation with Bob Friday yesterday – his question was “What prevents us from answering 80% of the questions via automatic inferences ?” And that is the “Adaptive” stage we need to be …

I think a diagram is much better than me writing 100,000 words. So here it is :

Image

In many ways, a lot of the underlying technologies are converging.

For example, A(rtificial) I(ntelligence) = NLP + N(atural) L(anguage) U(nderstanding) + ML + K(nowledge) R(epresentation) + Reasoning
Are Amazing Intelligent Machines in the works ?

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
Analytics Artificial Intelligence Big Data Hadoop Machine Learning 1 Comment

Big Data State Of The Union

March 21, 2013 by ksankar

An informative study by TCS on the current state of Big Data “The Emerging Big Returns on Big Data”

.

Image

Of course, you should download and read the whole report. Some interesting highlights:

  • There’s a polarity in spending on Big Data, with a minority of companies
    spending massive amounts and a larger number spending very little
  • The business functions expecting the greatest ROI on Big Data are not the ones
    you may think – while Sales & Marketing have initiatives, finance & logistics are betting on big data for efficiences & insights
  • The biggest challenges to getting business value from Big Data are as much
    cultural as they are technological
  • Nearly half the data (49%) is unstructured or semi-structured, while 51% is
    structured. The heavy use of unstructured data is remarkable given that
    just a few years ago it was nearly zero in most companies – Enterprises have gone multi-structured !
  • Monitoring how customers use their products to detect product and design
    flaws is seen as a critical application for Big Data

Cheers & Happy Reading …

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
Big Data Hadoop Machine Learning Leave a comment

An ode to the Easter Eggs, Ecstasies & Agonies of a GoogleIO Ticket

March 14, 2013 by ksankar

Chronicles of my failed attempt at procuring a GioogleIO Ticket … The Google Wallet ate my GogleIO 2013 Ticket !

It was the night before GoogleIO … Excitement was in the air … Tweets were in order …

Image

The order of the day was to find all Easter Eggs in the page …

Image

I clicked and clicked and clicked … and got thru all the easter Eggs …

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

And I slept …

It was early AM when I woke up … still 15 min before the GoogleIO stores open …

Image

The wait was agonizing, but all for a good cause, so I thought …

I was there when the GoogleIO Ticket store opened …

Image

I was not disappointed when my first try failed after 6 minutes …

Image

And my optimism payed off when it eventually found me a precious little ticket …

Image

I reviewed the purchase … and gave it to Google Wallet … little did I know that …

Image

But the screen stayed there and the time ticked down ….

By now the verdict was clear – The Google Wallet is going to eat my lucky GoogleIO Ticket ….

And It did …..

Image

And soon after the registration ended …. The cold hand of fate …

Image

Can I find a kind soul at Google to help me or should I wait for GoogleIo 2014 ? ….

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
NOSQL Leave a comment

Google Doodle Celebrating Douglas Adams

March 11, 2013 by ksankar

Image

Interesting interactions on various parts of the image

I always was a fan of “A Hitchhiker’s Guide To the Galaxy”

Thank You Sophia Foster-Dimino, Corrie Scalisi, Kevin Laughlin, Manuel Clément, and Leon Hong - You made my day …

Share this:

  • StumbleUpon
  • Digg
  • Print
  • Reddit
  • Twitter
  • Email
  • Facebook

Like this:

Like Loading...
Artificial Intelligence Good Things in Life Leave a comment

Post navigation

← Older posts

Analytics Artificial Intelligence aws/ec2 Big Data Blogroll Book Reviews Cloud Computing Facebook Good Things in Life Google Hadoop mac Machine Learning MapReduce MongoDB Movies & Music NOSQL scalability Social Media Technology and Software Twitter ubuntu Uncategorized Virtualization Web 2.0

Recent Posts

  • TCS Siruseri Campus
  • Scaling Big Data – Impermium
  • Kareem Abdul-Jabbar: 20 things I wish I had known when I was 30
  • Data Science Engineers – The new breed of Data Scientists ?
  • Google Doodle on Euler’s 306th Birthday
  • Is our Neocortex a Giant Semantic Bloom Filter ? Of Natural Intelligence, Machine Learning & Jeff Hawkins
  • The Big Data Convergence
  • Big Data State Of The Union
  • An ode to the Easter Eggs, Ecstasies & Agonies of a GoogleIO Ticket
  • Google Doodle Celebrating Douglas Adams
  • Bayesian Analytics Processing Pipeline
  • Big Data – Technologies, Platforms & Products
  • The Sign of the 9ers – 3 Lessons from the revival of the 49ers franchise
  • 5 Steps to Pragmatic Data …er… Big Data
  • Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms

My tweets

  • #TCS campus-Butterfly w/ a 5 floor spine Atrium-6 mini b/f bldgs for the wings;Atrium has#Saravanabhavan & #Subway ! goo.gl/BZePn 17 hours ago
  • RT @JonYoungAuthor: Beware of dissipating your powers; strive constantly to concentrate them.- Johann von Goethe 6 days ago
  • RT @GuyKawasaki: “Success is not final, failure is not fatal: it is the courage to continue that counts.” ― Winston Churchill 6 days ago
  • Interestingly came across Hoeffding's Inequality in two disjoint occasions today.Just now at #CalTech lecture goo.gl/u7yUP 6 days ago
  • #MachineLearning w/ Srinath @bloomreachinc goo.gl/fsd4k He makes it interesting & relevant + neat puzzles.Tempted to fastforward ! 1 week ago
Follow @ksankar

Topics

Top Posts

  • TCS Siruseri Campus
  • A Simple Minded Cloud Reference Architecture - Part II
  • Hadoop for a Distributed Complex Event Processing Network
  • Notes on MongoDB @ AWS-Ubuntu-12.04 XFS, RAID10 & LVM
  • A Simple Minded Cloud Reference Architecture
  • Is our Neocortex a Giant Semantic Bloom Filter ? Of Natural Intelligence, Machine Learning & Jeff Hawkins
  • MongoDB mongorestore Assertion failure b.empty error
  • Kareem Abdul-Jabbar: 20 things I wish I had known when I was 30
  • A path through a NOSQL Summer Reading
  • What or Who is a Data Scientist ?
Blog at WordPress.com. | Theme: Something Fishy by Caroline Moore.
Follow

Get every new post delivered to your Inbox.

Powered by WordPress.com
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.
%d bloggers like this: