My missives

September 23, 2008

Cloud Computing, Grids and Paczkis – Part Deux

Filed under: Cloud Computing, Web 2.0 — ksankar @ 10:31 am

<epilogue  – the end as the beginning or conclusions first>

  • Both the papers are well written and I thank the authors for the details as well as the e-mails. It is worth the cloud community’s time to deliberate and debate the concepts.
  • IMHO, Grids and Clouds are conceptually distinct beasts; but if one squints long enough or one abstracts to the stratosphere, they might look the same. Let us call them Paczkis (Pronounced “punchkeys” ) on Paczki day, Grid on Grid days and Cloud, rest of the days ;o) And as a cloud expert once said, “that which we call a cloud, by any other name would smell as sweet”.
  • Whether the grid domain is successful or not, I leave to knowledgeable folks like Ian to decide. Assuming grids have a large accidental complexity (which acts as barrier to entry), it is reasonable to conclude that, may be, clouds can simplify them – of course, now all grids will morph into cloud infrastructures, anyway ;o) And that is good – the cloud community also gets the experts and their experiences and thinking.

Parting question: Is the Hadoop infrastructure a cloud or a grid ? By the same extension would Google apps be grids than clouds ? ;o)
</epilogue>

<The gory details>
Now that we have covered the basics in the prologue, let us double-click couple of times. Jha, Merzky and Fox in their paper “Using Clouds to Provide Grids Higher-Levels of Abstraction and Explicit Support for Usage Modes” talk about the challenges with the grids and address complexity, interoperability, deployment support, They have done a good job of explaining a lot of the concepts.

I do disagree with a few of their statements viz “in some ways Clouds are the evolution of Grids”, “Cloud systems (or just Clouds) are, in some respects, narrow Grids” and agree with one statement “Cloud computing is a catch-all term for better contextualization, virtualization and most importantly simplicity of use” which they refer from the “Future of the TeraGrid” papers. Their definition of virtualization is a little weak, IMHO. But they understand the canonical cloud “A distinguishing feature of Cloud interfaces are that instead of exposing the largest possible amount of semantics for a specific domain, Clouds tend to expose a minimal amount of semantics to the end-user, while still being useful” – excellente ! While various Cloud infrastructure offerings do exhibit the characteristics of “affinity” as described by the authors, I am not sure “affinity” is a cloud  artifact – layers of “affinity” may be. All their 3 observations are correct, but I can’t agree with the semantic ordering (Fig.2).
But the fundamental premise that grids can be saved by a cloud wrapper is questionable. I think, this could be hog in a tuxedo (I will leave the dissertation on pigs and makeups to politicians!) Remember, there is domain complexity and there is accidental complexity. I think the grid community has done a good job on the accidental complexity by various frameworks like Globus (even I had occasional contribution to the grid world). I think this is what Ian Foster is saying in his blog.  And there are certain classes of problems addressed by grids and as per Ian’s blog they are doing well. No need for a cloud based tuxedo for grids, thank you ;o)

May be over reliance to WS-* rather than a REST architecture is the cause, may be the breadth and depth of the interfaces are complex, may be … this discussion I leave alone … it is for folks like Ian to debate and conclude.

Another interesting paper “Grids Challenged by a Web 2.0 and Multicore Sandwich” by Fox and Pierce, explains cloud as “Broad grid” (as opposed to “narrow grid” by Jha et al). They believe that  grid “infrastructure is ripe for significant revisions”. I tend to agree with most of their observations viz “the problem of the next generation of computing will be an abundance (rather than a scarcity) of computing power for many problems”, “these clouds address ‘commodity usage’ rather than the high
performance simulations and data transport that are characteristic of Grids
” and their insightful discussions on the trajectory and locus of massively multi-core systems (BTW, didn’t see two of my favorite topics Erlang and functional programming mentioned anywhere – CSP is mentioned, in passing ), You see, grids came from the world of high performance and parallel computing which is more related to the multi-core paradigms rather than virtualization and clouds.

On the notion of grids vs. clouds, my insights are in my previous blog Cloud Computing and Grids.

</The gory details>

<prologue – the beginning as the end>
The question “Is cloud a grid” has been discussed in detail in the cloud computing forms. The current crop of discussions swirl around Ian’s blog on paper by Jha, Merzky and Fox. Geoffrey pointed out another paper by Fox and Pierce in the same zip code. I did read both of them in some detail (need to think more as the papers have good depth) and here are some quick thoughts:
<relevance>
I agree that the discussion is somewhat orthogonal. Whether you call them grids or clouds or paczkis, providers will provide what makes sense for them and so is the case with cloud consumers. But, as I had said in one of my earlier blogs, our view of a domain has influence on our solutions – form follows the function. So, this is not an academic exercise but has pragmatic relevance. And, for a domain to grow, we need systemic disciplined definitions, interfaces and programming models.
</relevance>
</prologue>

—————————-

[Geoffrey07] Grids Challenged by a Web 2.0 and Multicore Sandwich

[IanBlog] A critique of “Using Clouds to Provide Grids…”

[Jha08] Using Clouds to Provide Grids…

[KrishnaBlog] Cloud Computing and Grids

[Paczkis] Pronounced “punchkeys”

September 19, 2008

Anarchic Scalability

Filed under: Uncategorized — ksankar @ 7:53 am

I was (again) reading thru Roy Fielding’s Ph.D dissertation. In his discussion about internet scalability, he describes “anarchic scalability” very well, that is more relevant today than then !

Quoting Roy (I do not think I can do any better)

“Most software systems are created with the implicit assumption that the entire system is under the control of one entity …

… such an assumption cannot be safely made when the system runs openly on the Internet. Anarchic scalability refers to the need for architectural elements to continue operating when they are subjected to an unanticipated load, or when given malformed or maliciously constructed data, since they may be communicating with elements outside their organizational control. The architecture must be amenable to mechanisms that enhance visibility and scalability. ..

… clients cannot be expected to maintain knowledge of all servers. Servers cannot be expected to retain knowledge of state across requests. …

… particularly newsworthy information can also lead to “flash crowds”: sudden spikes in access attempts as news of its availability spreads across the world.

… multiple organizational boundaries implies that multiple trust boundaries could be present in any communication “

September 14, 2008

Erlang Chronicles ….

Filed under: Cloud Computing, Erlang — ksankar @ 10:18 am

This is the footnote for my Erlang post Cloud Encounters of the Erlang Kind. Here I will list the useful pointers to start and maintain an Erlang development system as well as an Erlang based Cloud Infrastructure. Just my impressions, views, thoughts and experiences – do not claim any competency – yet ;o)

Development Environment

  1. Script to add Erlang bundle to Textmate http://netcetera.org/cgi-bin/tmbundles.cgi?bundle=Erlang
  2. Notes on Compiling Erlang for OS X

Server side artifacts

Cloud Infrastructure

Note:

[Sep 14, 2008: Pardon me for the anemic start. This is WIP and so will develop and add more details in the next few days, as I plug along ...]

Cloud Encounters of the Erlang Kind …

Filed under: Cloud Computing, Erlang, Technology and Software — ksankar @ 8:43 am
Tags: ,

I have started building (or more precisely hacking my way into) a robust control plane for the cloud infrastructure and an convinced that Erlang is the right language:

  • A Cloud Control Plane is very similar to the control plane of networks – in fact in many ways it is probable that the hypervisor/VM is the next network – actually it is a layer slightly above the VM. What is a network control plane – it is distributed, it is self configuring (to some degree, with hints from network admins) and lots of processes connected by a set of very intelligent protocols. If one looks at a Cloud Control Plane that way, choice of Erlang is not that illogical
  • Very relevant architectural concept – Subsumption. Really like the way Wikipedia says about it ” …the lowest layers can work like fast-adapting mechanisms, while the higher layers work to achieve the overall goal ..” in this case lower layers work on protocol primitives (or “layers of augmented FSMs” http://ai.eecs.umich.edu/cogarch2/specific/subsumption.html) for reliability, connectivity, code/data mobility et al while the upper layers work on the goal of a stable cloud infrastructure !
  • R.A.Brook’s Intelligence without representation is a good read. What is the relevance here one might ask – a cloud is a “very large, coordinated, distributed infrastructure”, as Red Hat’s Jim Whitehurst would say[1] and we need ‘Robust Chaos rather then brittle determinism’ – robust chaos maintained by highly distributed massively scalable concurrent processing and that is an Erlang substrate supported by appropriate intelligent protocols (like Paxos, like P2P gossip, like XMPP, like …)
  • Concurrent processes – whether they are on a single machine across multiple processors (or even on a single virtualized processor) or across multiple machines – and transparency thereof is required, by the nature of the beast – a cloud will span machines and processors
  • And the paradigms of high availability (not across a few systems, but on a massive scale) again requires us to go beyond the Turing machine and Von Newman architectures. (Am sure this argument needs more logic – I will add as I conceptualize and internalize more)
  • High scalability – how high one might ask. In a recent press release Citrix talks about “Web 2.0 style tagging and searching capability which allows … to assign metadata and virtual tags to workloads, …  Once tagged, virtual machines are easily located using powerful searching and sorting capabilities based on application type, qos, department, cost center, location, origin, owner, performance requirements, or any other important attributes…”

References:

[1]  [http://resources.zdnet.co.uk/articles/features/0,1000002000,39454819,00.htm]

Notes:

  • Am keeping this post at the architectural level and am collecting the development level pointers in the companion blog Erlang Chronicles

[Sep 14, 2008: Pardon me for the anemic start.This is WIP and so will develop and add more details in the next few days, as I plug along ...]

Blog at WordPress.com.