It all started with a tweet
- It so happens that I have been working on a similar worksheet for pricing & configuring our analytics infrastructure;
- I modified the one I am working on (inspired by the original at ec2 pricing_and_capacity) & morphed into the one Otis wanted
- The Excel worksheet is hosted it in github. Feel free to modify it to fit your needs. Let me know as well …
- I have four sets of prices viz. on-demand, reserved-light,reserved-medium usage and reserved- heavy usage. The prices are calculated for one year (8640 hrs) off of the cell M1 – one has to prorate the upfront fees to get the effective $/hr rate
- The worksheet has multiple uses – I use it to compute the price difference for different usage patterns-high memory for Spark, different sizes for HBase cluster et al. As it is a spreadsheet one could sort it on varying criteria; one could change the numbers (say 6 months) and see what model makes sense.
- BTW, it is interesting to see that the Light -Reserved costs more in all cases except for the storage models.
- Long time ago, I had a graphical representation, which has become very dated. I might resurrect it with the new prices …
The Spreadsheet :
The left columns have the attributes of the various EC2 models.
The 8640 (hrs/year) is in M1. All the calculations are based on this cell. The reserved light is interesting … it costs more !
The reserved medium does save $. Moreover, one can stop the instances when not needed.
I have calculated the yearly price prorating the upfront fees et al. But for Heavy Reserved, it is somewhat meaningless as they will charge for the whole year even if the instances are stopped. But changing the value in M1 gives a feel for the different tiers …
I would be happy to hear other inferences we can make and add columns to the worksheet …
Finally we have our VPC and Mongo replica sets working. I still have to figure out the snapshots. Some notes – would appreciate comments, ideas, insights & wisdom. I have the full slides at slideshare.
I will post my notes from snapshot configuration …
Notes & References from our experiences on the MongoDB Data Layer for BioInformatics. Like they say, don’t blindly execute any scripts & question everything. As I researchd into each aspect, I came across a set of good references. I have annotated and contextually ordered the reference list, to help one make informed NOSQL data infrastructure design & optimization decisions, at the end of this blog.
This is Part 1. Part 2 – MongoOps would cover Backup, Replication & Sharding and finally Part 3 on Aggregation Framework. Let us see how I fair …
Our setup and rationale:
- Ubuntu – Easy admin & maintenance
- Mongo 2.1.x – We need the aggregation framework now. I will update along with newer versions of mongo in the dev branch. So you can count me to test the latest aggregation framework code !
- m2.xlarge– This (baby) beast has 17 GB memory & 2 cores with ~4 MHz CPU. We have multiple collections, fact tables, datasets, analytics and so forth.
- I plan to load test and find the best configuration.
- xfs– Performance, Extensibility.
- “supports I/O suspend & write-cache flushing – critical for multi-disk consistent snapshots”
- “XFS better performance from these moody disks!”
- RAID10– Availability, “Enhance Operational Durability” & Resilience.
- One drawback : Cannot use ebs snapshots
- Replica with a non-RAID ebs (to backup from) should do the trick. Will explain the configuration in Part 2 – MongoOps:Backup
- “AWS protects against drive failure, RAID10 protects against failures at the EBS technology layer”
- 8 X 32 GB– Striping across multiple disk spindles
- Dimishing performance returns after 8 disks
- LVM– gives the extensibility (RAID10 cannot be extended)
- And use xfs_growfs at the file system layer
- Replica Sets & Sharding (Future, will blog)
I started from the excellent writeup on Amazon EC2 Quickstart by Sandeep Parikh. He has done a wonderful job. Found a few things I had to change to fit our setup.
- Ubuntu 12.04 requires a few installs
- sudo apt-get install mdadm
- sudo apt-get install lvm2 xfsprogs
- Three useful commands to inspect the volumes, devices & partitions
- df -h
- fdisk -l
- cat /proc/partitions
- The devices are named xvdf, xvdg, … not sdf,sdg
- For example, sudo mdadm –verbose –create /dev/md0 –metadata 1.2 –level=10 –raid-device=4 /dev/xvdf /dev/xvdg /dev/xvdh /dev/xvdi
- Added –metadata 1.2 for the mdadm command
- There were a few discussions (Ref: 1.F. Best Practices discussion in mongodbuser group) whether we need –chunk 256 in the command. The conclusion is that while this might help other applications, it is not needed for mongodb
- sudo blockdev –setra 65536 /dev/md0 (not 128) <- change in command
- The conf file is /etc/mongodb.conf in Ubuntu 12.04 (not mongod.conf)
- The init.d file is /etc/init.d/mongodb (not mongod)
- The logpath & dbpath in /etc/mongodb don’t take effect. They are overshadowed by the defaults in the /etc/init.d/mongodb. This threw me off for sometime, finally I made the changes in the /etc/init.d/mongodb file
- I think there is room for improvement. If I get time, I will refactor this file & make it simpler. Don’t want to jump-in without enough deep thought – I am sure the good folks at 10Gen have good reasons for the complexity
- sudo mkfs.xfs -f /dev/vg0/data to format xfs
- And appropriate command for fstab
- echo ‘/dev/vg0/data /data xfs defaults,auto,noatime,noexec 0 0’ | sudo tee -a /etc/fstab
- The volume scheme at the Quickstart is 90% /data, 5% /journal and 5% /log might not work.I had first setup 4 X 30 GB ebs and this gave 3 GB for journal. Then mongod wouldn’t start with the error
- Fri May 4 03:36:57 [initandlisten] ERROR: Insufficient free space for journal files
- Fri May 4 03:36:57 [initandlisten] Please make at least 3379MB available in /data/journal or use –smallfiles
- Fri May 4 03:36:57 [initandlisten]
- Fri May 4 03:36:57 [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
- This also threw me a little bit.
- I think a single volume with three directories is better than three separate volumes, especially as they are in the same logical volume. May be the three separate volumes is to limit the disk usage, but if the disk is full in any one of them, the system would be down anyway. So am not sure if the three separate volume buys us anything
- Need apt support for MongoDB dev releases
- 10Gen should add support for development releases in apt. I had to install 2.0.x, download 2.1.x and then copy over the /usr/bin/ directory. It worked now, but I don’t think it is safe
- I will update this blog as I complete the data layer
Finally the references (Helped me to understand the nuances & the details). I have annotated them and ordered contextually to help one to make informed infrastructure design & optimization decisions; start from the beginning & read thru sequentially :
- 10Gen Resources
- EC2 Quickstart – The bible ;o) This is for Amazon Linux, so needs some changes for Ubuntu
- The best source and should be used as a base to build your MongoDB infrastructure.
- A good plan is read the links, move on to rest of the references and then come back to build your infrastructure.
- MongoDB on AWS – An excellent paper (in pdf format) by Miles Ward
- Overview & Topology
- MongoDB on EC2 overview
- MongoDB Best Practices – Good overview tips
- EBS Overview – must read to understand EBS
- RAID10 your EBS – good blog on why RAID10 for ebs; as aws provides redundancy, why do we mirror ebs volumes ?
- ServerFault Q&A on RAID10 & aws
- Getting good I/O from ebs – A few good points. Make sure to read rest of this section before making any drastic steps ;o)
- AWS ebs benchmarks – gives one an insight into RAID0, RAID10 et al
- Best Practices for MongoDB/RAID – discussion thread at mongodb-user Google group
- MongoDB memory usage & working set – discussion at mongo-user google group. Might help you with sizing the instance
- MongoDB, RAID10 & Ubuntu – includes detailed commands & explanations
- MongoDB, LVM, XFS,RAID10 – Another list of commands
- Installing MongoDB – Quick overview. Would be good to read the rest before jumping into installation
- RAID/LVM blogs
- Quick intro to LVM
- Managing RAID & LAVM with Linux – good intro
- Linux RAID smackdown
- Managing RAID 10
- Grow XFS on LVM
- LVM commands Cheat sheet
- RAID Gory Details
- Complex RAID10 with mdadm
- LVM gory details
- More Gory Details for enquiring minds who want to know