Came acorss an informative blog on scaling big data – “Built to Scale: How does Impermium process data?” Quick notes from the blog:
Don’t fall in love with a technology so much that you cannot be separated – Be flexible in scaling as you grow
- “Parting is such a sweet sorrow”, but change is an essential component of an infrastructure at scale
- The technology selection and consumption should be a continuous process, introducing new technologies as needed by the growth. I found Impermium’s path from grep to Solr to Elastic Search very illuminating; I have done the same before.
Technology needs are not static
- A corollary of #1 above – Growth on all parts of the stack will not be uniform.
- For example Impermium found scaling challenges in search and they moved to Solr & then to Elastic Search
There are no perfect technologies
- If you are doing interesting work, be ready to tango with open source code. This is essential – I also found this to be true.
- Even if you don’t plan to change the code, many times deep understanding comes from reading the code
Select technologies that you can dance with
- The flip side is that one should select technologies that you are comfortable working under the hood.
- In my case, while I love Erlang, I am not that comfortable with that language. So given a chance, I will go with Java or Scala
Benchmark is nothing but a story in a specific context
- So true. Benchmarks are transitory & personal.
- Understand them, but they need not be true for your transforms, your data model and your processing.
- Benchmark early & benchmark often … with your scenarions, models, transformations, mapreduces & data
Thanks Young for the short but very interesting blog. Keep up the good work …