TCS Siruseri Campus


Today am visiting the TCS Siruseri campus in Chennai. Very elegant & interesting structure built by an Urguan Architect.

  • The legend has it that while discussing the architecture, Carlos drew butterflies & that stuck as the theme.
  • A majestic 5-floor-high open Atrium corridor forms the spine of the butterfly with 6 buildings forming the wings, the buildings themselves are butterflies w/ a small spine elevator bank in the middle and the two north & south wings buildings.
  • The side view below shows three buildings EC1(right),EC2 & EC3. I am in EC3.
  • The spine atrium on the far side has a pond, benches, shops & interesting restaurants – The Saravana Bhavan food is exotic. The spine even has a Subway !
  • The campus hosts > 20,000 associates

The second picture below shows the full 6 buildings and the observation tower (which is still under construction)

image description

1417_1

Kareem Abdul-Jabbar: 20 things I wish I had known when I was 30


Kareem-Abdul-Jabbar-Skyhook-WallpaperKareem Abdul-Jabbar has an excellent blog at Esquire on 20 pieces of advise to the younger self at 30. Thanks to Jason Hiner‘s tweet.

The blog & the comments are a must read.

A few of them hit home for me:

  • Be patient
  • Listen More than Talk
  • Being right is not always the right thing to be
  • Do one thing every day that helps someone else.
  • Do one thing every day that you look forward to doing. 
  • Don’t be so quick to judge.
  • Everything doesn’t have to be fixed.
  • Play the Piano
  • Become Financially Literate

Ref: Wallpaper from http://www.basketwallpapers.com/USA/Kareem-Abdul-Jabbar/

Is our Neocortex a Giant Semantic Bloom Filter ? Of Natural Intelligence, Machine Learning & Jeff Hawkins


L’Apéritif:

Image

In a set of four lectures spanning about 3 years, Jeff Hawkins explains how & why big data can only be solved by evolutionary-adaptive-continuously-learning models incorporating principles from the working of Neocortex.
It does make sense – especially for NLP, NLU & Knowledge Representation. I am a big fan of the Borgs and their coordinated intelligence.

These are my annotated picture-notes …

L’Entrée:

Let me begin at the beginning. The other day I came across 4 very interesting talks by Jeff Hawkins on Biological Inspired Machine intelligence.

Call it serendipity because we have been looking for more effective ways for Knowledge Representation (KR) & Natural Language Understanding (NLU)

For example movie names, while very easy for humans to understand, a MaxEnt NER finds it very hard.  Knowledge Representation & Association is more harder !

We are experimenting with a few techniques like word-based tries (ie. spell-check sentences by words), higher order federated Bloom Filters and n-gram hashing. Planning to incorporate some of Jeff’s ideas …

I digress … Topics for another day … back to Jeff & Machine Intelligence …

Very inspiring, extremely thought provoking talks – as usual the inimitable Jeff Hawkins at his best

  1. Google Tech Talk : Jeff Hawkins, “Building Brains to Understand the World’s Data
  2. UC Berkeley Graduate Lectures
  3. “Advances in Modeling Neocortex and its impact on Machine Intelligence” by Jeff Hawkins,  Smith Group Lecture presented at the Beckman Institute for Advanced Science & Technology at the University of Illinois at Urbana-Champaign

Le Plat Principal:

The four talks have lot of depth and are packed. Moreover Jeff talks very fast – I listened to the talks a few times – at least 3 hrs per one hour talk. You should listen to them slowly & rewind as reqd. It takes a few hours to get one’s head around the various ideas.

Let me annotate a few of his slides – those I was able to internalize to some extent:

Focus & premise[3]:

Hawkins-100-02-01

The assertion, that many problems can only be solved by incorporating principles from the working on Neocortex, is interesting.

BTW, it does make sense – especially for NLU & Knowledge Representation.

As Jeff mentions later, the behavior need not be human-like, but the representation, interpretation & “understanding” would be.

Neocortex Architecture[3]:

“Neocortex is just a sheet of cells  2mm thick, the size of a dinner napkin” – Amazing what it can do!

Hawkins-100-03-01

The Six Principal Essentials of Biological Intelligence

The picture says it all.

Hawkins-100-04-01

Learning involves training and adaptive connections

Hawkins-100-05-01

The concept of streaming events & the learning mechanisms

Patterns from complex data streams

Hawkins-100-06-01

The paper “Hierarchical Temporal memory” has the gory details about the Hierarchical Temporal Learning.

Future

Hawkins-100-09-01

Interesting observation: Emotion, the fundamental aspect of being human, is not a requirement for intelligence – reminds us of Spock, of course.

Machine intelligence is not about replicating human behavior or even passing the turing test. I agree on this – we need the machines to think & do things we cannot do thus augmenting us. Make us stronger where we are weak !

Le Digestif

What interested me most was the sematic knowledge representation, NLP & NLU. The ability to understand and store concepts, the capacity to generalize as well as the mechanisms of strengthening and weakening connections based on external signals – just beautiful …

Agree that the Sparse Distributed Representation could be the language of all the intelligent machines.

The SDR looks a lot like a giant Bloom Filter

Hawkins-100-10-01

Hawkins-100-11-01The planes can be considered as rows and a column as the temporal dimension of the semantic mapping (the memory of sequences). Which equates to a giant n-dimensional Bloom Filer – a data structure we can grok (Pun intended as Jeff’s product is called Grok!).

The bloom filter analogy, while extremely simplistic, is conceptually congruent, in the sense that “similar values have similar representation”, of course depending on the hash algorithm.

After listening to the talks and thinking them over, I have a thousand questions in many directions. I will post the answers as we develop this through for our needs. Please send in your insights as comments to this blog. AM sure it will help a few folks !

Hawkins-100-12-01

  1. How do we handle semantic categories ? 
  2. How do we build more sophisticated representations based on spatial patterns ?
  3. What is the hash function that maps a slice of semantic to this giant Bloom Filter ?
  4. How does it handle collision? Corruption ? Clustering for resiliency/self adjusting representation ?
    • Collision might be good and I think that is what Jeff calls as semantic generalization
  5. How does the semantic slice mapping function differentiate between a search & computation to trigger appropriate actions?
    • For example the following two questions require different actions: 
      • What is stock price of IBM ?” vs.
      • What is the volatility as reflected in the beta of IBM for this quarter ?” 
      • The first one is a search while the second has computation …
  6. Is the hash function same for all of us or is it different for each person ?
    • Most probably the function is a learned artifact.
  7. Another interesting vector is the Hierarchy & higher patterns of temporal coalescence/slowness – the high-order capability, tweaking the learning rates across the layers.
    • How can this be modeled with the analytical data structures we have?
    • And what are the mechanics for stable representation of pattern sequences – because with dynamicity and temporality comes the difficulty of snapshots and consistency between them.
    • The unique representation of the same sequence, at a later time in context of the earlier invocation is interesting …
  8. How do we “put a classifier on the top” ?
    • Play with permanence? Probability?
  9. What are the algorithms to prevent run away prediction?
    • I agree that we could account for rapid state difference vs. slower state; we still will have to encapsulate it in some form of code

Finally, can we build “Amazingly Intelligent Machines?” Yes We can !

And agree with Jeff that “It is essential, for the survival of the spices, that we build them” …

The Sign of the 9ers – 3 Lessons from the revival of the 49ers franchise


It is always interesting & informative to understand & learn from how great teams are formed – corporate or sports. Daniel Brown’s Article “How Jed York orchestrated 49ers rebuilding” has excellent three insights:

  1. Recognize when a team is faltering & Take bold steps

    20130127_093206_rebuild49ers

    • On Dec 26,2010 after a bad defeat, Jed York reached a breaking point. He vowed to revive the team- fired the coach and started a set of systemic steps –  from hiring a new coach, “expanded the players’ lounge, built an expansive outdoor weightlifting facility and knocked down walls to give meeting rooms more square footage, upgraded the cafeteria … spending lavishly, demanding excellence and changing the culture of the entire organization”.
  2. Have a bold vision & Lead from the top

    • Players stay with the 49ers because they know they’re in a first-class organization. That all starts at the top. If the top is not leading the charge, then you’re going to get mediocrity.The quality of ownership means a ton in pro football -Harris Burton

  3. Show the passion, Hire the best & Keep both that way

    • What Jed shows — and what his uncle showed — is a passion for excellence

    • Jed hired Jim Harbaugh as the head coach. It was not easy – he outmaneuvered Stanford (which wanted to keep him), the University of Michigan (Harbaugh’s alma mater), the Denver Broncos and Miami Dolphins to reel in coaching’s biggest prize
    • The key for the 49ers was a six-hour meeting in which York and new general manager Trent Baalke laid out their plan for reinvigorating the franchise.

      Jed talked about his vision and that sealed the deal – Jim Harbaugh, on the day he was hired

    • The one thing that Jim Harbaugh has that Bill Walsh has is the ability to motivate … the guys to bring them to their peak potential -Brent Jones

Next Sunday, at the SuperBowl XLVII the 49ers face the Ravens – My best wishes and I predict a great match with the 49ers winning the trophy …

Trivia: Am sure sharp eyes would have caught the title “The Sign of the 9ers” as a tribute to Sir Arthur Conan Doyle’s “Sign Of The Four

Hitchhiker’s guide to activating iPhone with AT&T


Got a new iPhone5 for the (not so) little one. And the most interesting part was activation. The AT&T web site is very counter intuitive.

  • It will ask you for an Order Number from the packing slip, but the number in packing slip is not right
  • It will ask you to open the battery cover – am not even sure iPhone has a battery cover, so don’t ruin the iPhone.
  • The good news is that the information an activation needs are all readily available, but the AT&T web site confuses the issue.

Finally I figured it out, nothing earth shattering, but still helpful when you have your son asking “Have you activated the phone ?” every 30 seconds ;o)

You will need four pieces of information:

  • Order Number (17 digits) & Activation Number (11 digita)These are not in the Packing slip but they are in the e-mails titled “AT&T Order Received” & “AT&T Order Shipment Notification” from AT&T Online Services. Buried in the e-mail under Activate Your service, you can see both these pieces of information.
  • IMEI Number & Smart Chip ICCID – AT&T will ask you to do all kinds of things to get to these numbers. Forget all the gymnastics. In iPhone 5, Settings-General-About. viola ! IMEI (15 digits) & ICCID (20 digita) are displayed just above Modern Firmware.

So easy and intuitive, once we know where to look for !

BTW, the activation can take upto 3 hrs. And don’t forget to cycle the power. I waited for 3 hrs and nothing happened. But when I called the support number, it instructed me to turn the power off and on . Viola ! The iPhone came alive !

Cheers & enjoy the iPhone. As always, it is a phenomenal device …

<k/>

Glenlivet 15 French Oak Reserve


Glen Livet 15Occasionally I pickup couple of bottles of Single Malt to try out – Glen Roths, Highland Park,  Lagavualin and now GlenLivet 15. Sometimes I stay with one for a few months (stayed with Glenrothes when my old boss, Glenn, introduced us to the fine scotch; then Lagavualin since February when a good friend Chris introduced me to it,…)

I would have preferred The Guardians Single Cask … may be later

  • Came across an interesting blog by Mike Grushin. I agree – cask strength single malts taste better with a few ice cubes (let them melt a little & don’t overdo)
  • And Mike writes about EC2 also !
  • A few good reviews
  • The taste is smooth and the creamy finish is noticeable
  • A tasting video by Ian Logan, Glenlivet’s International Brand Ambassador ! I never knew about brand ambassadors – my kind of job
  • While we think of tastes, there are folks who think of whisky design ! 2012 Whisky Design Winners is an interesting read
  • Next on my list GlenFiddich, Bourbons, …  - What says thee ?

Notes on MongoDB @ AWS-Ubuntu-12.04 XFS, RAID10 & LVM


Notes & References from our experiences on the MongoDB Data Layer for BioInformatics. Like they say, don’t blindly execute any scripts & question everything. As I researchd into each aspect, I came across a set of good references. I have annotated and contextually ordered the reference list, to help one make informed NOSQL data infrastructure design & optimization decisions, at the end of this blog.

This is Part 1. Part 2 – MongoOps would cover Backup, Replication & Sharding and finally Part 3 on Aggregation Framework. Let us see how I fair …

Our setup and rationale:

  • Ubuntu – Easy admin & maintenance
  • Mongo 2.1.x – We need the aggregation framework now. I will update along with newer versions of mongo in the dev branch. So you can count me to test the latest aggregation framework code !
  • m2.xlarge- This (baby) beast has 17 GB memory & 2 cores with ~4 MHz CPU. We have multiple collections, fact tables, datasets, analytics and so forth.
    • I plan to load test and find the best configuration.
  • xfs- Performance, Extensibility.
    • “supports I/O suspend & write-cache flushing – critical for multi-disk consistent snapshots”
    • “XFS better performance from these moody disks!”
  • RAID10- Availability, “Enhance Operational Durability” & Resilience.
    • One drawback : Cannot use ebs snapshots
      • Replica with a non-RAID ebs (to backup from) should do the trick. Will explain the configuration in Part 2 – MongoOps:Backup
    • “AWS protects against drive failure, RAID10 protects against failures at the EBS technology layer”
  • 8 X 32 GB- Striping across multiple disk spindles
    • Dimishing performance returns after 8 disks
  • LVM- gives the extensibility (RAID10 cannot be extended)
    • And use xfs_growfs at the file system layer
  • Replica Sets & Sharding (Future, will blog)

I started from the excellent writeup on Amazon EC2 Quickstart by Sandeep Parikh. He has done a wonderful job. Found a few things I had to change to fit our setup.

  1. Ubuntu 12.04 requires a few installs
    • sudo apt-get install mdadm
    • sudo apt-get install lvm2 xfsprogs
  2. Three useful commands to inspect the volumes, devices & partitions
    1. df -h
    2. fdisk -l
    3. cat /proc/partitions
  3. The devices are named xvdf, xvdg, … not sdf,sdg
    • For example, sudo mdadm –verbose –create /dev/md0 –metadata 1.2 –level=10 –raid-device=4 /dev/xvdf /dev/xvdg /dev/xvdh /dev/xvdi
  4. Added –metadata 1.2 for the mdadm command
  5. There were a few discussions (Ref: 1.F. Best Practices discussion in mongodbuser group) whether we need –chunk 256 in the command. The conclusion is that while this might help other applications, it is not needed for mongodb
  6. sudo blockdev –setra 65536 /dev/md0 (not 128) <- change in command
  7. The conf file is /etc/mongodb.conf in Ubuntu 12.04 (not mongod.conf)
  8. The init.d file is /etc/init.d/mongodb (not mongod)
  9. The logpath & dbpath in /etc/mongodb don’t take effect. They are overshadowed by the defaults in the /etc/init.d/mongodb. This threw me off for sometime, finally I made the changes in the /etc/init.d/mongodb file
    • I think there is room for improvement. If I get time, I will refactor this file & make it simpler. Don’t want to jump-in without enough deep thought – I am sure the good folks at 10Gen have good reasons for the complexity
  10. sudo mkfs.xfs -f /dev/vg0/data to format xfs
  11. And appropriate command for fstab
    • echo ‘/dev/vg0/data /data xfs defaults,auto,noatime,noexec 0 0′ | sudo tee -a /etc/fstab
  12. The volume scheme at the Quickstart is 90% /data, 5% /journal and 5% /log might not work.I had first setup 4 X 30 GB ebs and this gave 3 GB for journal. Then mongod wouldn’t start with the error
    • Fri May 4 03:36:57 [initandlisten] ERROR: Insufficient free space for journal files
    • Fri May 4 03:36:57 [initandlisten] Please make at least 3379MB available in /data/journal or use –smallfiles
    • Fri May 4 03:36:57 [initandlisten]
    • Fri May 4 03:36:57 [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
    • This also threw me a little bit.
    • I think a single volume with three directories is better than three separate volumes, especially as they are in the same logical volume. May be the three separate volumes is to limit the disk usage, but if the disk is full in any one of them, the system would be down anyway. So am not sure if the three separate volume buys us anything
  13. Need apt support for MongoDB dev releases
    • 10Gen should add support for development releases in apt. I had to install 2.0.x, download 2.1.x and then copy over the /usr/bin/ directory. It worked now, but I don’t think it is safe
  14. I will update this blog as I complete the data layer

Finally the references (Helped me to understand the nuances & the details). I have annotated them and ordered contextually to help one to make informed infrastructure design & optimization decisions; start from the beginning & read thru sequentially :

  1. 10Gen Resources
    1. EC2 Quickstart - The bible ;o) This is for Amazon Linux, so needs some changes for Ubuntu
      1. The best source and should be used as a base to build your MongoDB infrastructure.
      2. A good plan is read the links, move on to rest of the references and then come back to build your infrastructure.
    2. MongoDB on AWS – An excellent paper (in pdf format) by Miles Ward
    3. Overview & Topology
    4. MongoDB on EC2 overview
    5. MongoDB Best Practices – Good overview tips
  2. MongoDB/AWS
    1. EBS Overview – must read to understand EBS
    2. RAID10 your EBS – good blog on why RAID10 for ebs; as aws provides redundancy, why do we mirror ebs volumes ?
    3. ServerFault Q&A on RAID10 & aws
    4. Getting good I/O from ebs – A few good points. Make sure to read rest of this section before making any drastic steps ;o)
    5. AWS ebs benchmarks – gives one an insight into RAID0, RAID10 et al
    6. Best Practices for MongoDB/RAID – discussion thread at mongodb-user Google group
    7. MongoDB memory usage & working set – discussion at mongo-user google group. Might help you with sizing the instance
  3. MongoDB/Ubuntu
    1. MongoDB, RAID10 & Ubuntu - includes detailed commands & explanations
    2. MongoDB, LVM, XFS,RAID10 – Another list of commands
    3. Installing MongoDB – Quick overview. Would be good to read the rest before jumping into installation
  4. RAID/LVM blogs
    1. Quick intro to LVM
    2. Managing RAID & LAVM with Linux – good intro
    3. Linux RAID smackdown
    4. Managing RAID 10
    5. Grow XFS on LVM
    6. LVM commands Cheat sheet
  5. RAID Gory Details
    1. http://tldp.org/HOWTO/Software-RAID-HOWTO.html
    2. https://raid.wiki.kernel.org/index.php/RAID_setup
    3. Complex RAID10 with mdadm
  6. LVM gory details
    1. http://www.howtoforge.com/linux_lvm
    2. http://tldp.org/HOWTO/LVM-HOWTO/
    3. http://linuxdevcenter.com/pub/a/linux/2006/04/27/managing-disk-space-with-lvm.html
  7. More Gory Details for enquiring minds who want to know
    1. http://www.issociate.de/board/post/377091/RAID10:_near,_far,_offset_–_which_one?.html
    2. https://www.linuxquestions.org/questions/linux-server-73/software-raid10-does-the-disk-order-in-mdadm-matter-671016/