Notes on MongoDB @ AWS-Ubuntu-12.04 XFS, RAID10 & LVM

Notes & References from our experiences on the MongoDB Data Layer for BioInformatics. Like they say, don’t blindly execute any scripts & question everything. As I researchd into each aspect, I came across a set of good references. I have annotated and contextually ordered the reference list, to help one make informed NOSQL data infrastructure design & optimization decisions, at the end of this blog.

This is Part 1. Part 2 – MongoOps would cover Backup, Replication & Sharding and finally Part 3 on Aggregation Framework. Let us see how I fair …

Our setup and rationale:

  • Ubuntu – Easy admin & maintenance
  • Mongo 2.1.x – We need the aggregation framework now. I will update along with newer versions of mongo in the dev branch. So you can count me to test the latest aggregation framework code !
  • m2.xlarge– This (baby) beast has 17 GB memory & 2 cores with ~4 MHz CPU. We have multiple collections, fact tables, datasets, analytics and so forth.
    • I plan to load test and find the best configuration.
  • xfs– Performance, Extensibility.
    • “supports I/O suspend & write-cache flushing – critical for multi-disk consistent snapshots”
    • “XFS better performance from these moody disks!”
  • RAID10– Availability, “Enhance Operational Durability” & Resilience.
    • One drawback : Cannot use ebs snapshots
      • Replica with a non-RAID ebs (to backup from) should do the trick. Will explain the configuration in Part 2 – MongoOps:Backup
    • “AWS protects against drive failure, RAID10 protects against failures at the EBS technology layer”
  • 8 X 32 GB– Striping across multiple disk spindles
    • Dimishing performance returns after 8 disks
  • LVM– gives the extensibility (RAID10 cannot be extended)
    • And use xfs_growfs at the file system layer
  • Replica Sets & Sharding (Future, will blog)

I started from the excellent writeup on Amazon EC2 Quickstart by Sandeep Parikh. He has done a wonderful job. Found a few things I had to change to fit our setup.

  1. Ubuntu 12.04 requires a few installs
    • sudo apt-get install mdadm
    • sudo apt-get install lvm2 xfsprogs
  2. Three useful commands to inspect the volumes, devices & partitions
    1. df -h
    2. fdisk -l
    3. cat /proc/partitions
  3. The devices are named xvdf, xvdg, … not sdf,sdg
    • For example, sudo mdadm –verbose –create /dev/md0 –metadata 1.2 –level=10 –raid-device=4 /dev/xvdf /dev/xvdg /dev/xvdh /dev/xvdi
  4. Added –metadata 1.2 for the mdadm command
  5. There were a few discussions (Ref: 1.F. Best Practices discussion in mongodbuser group) whether we need –chunk 256 in the command. The conclusion is that while this might help other applications, it is not needed for mongodb
  6. sudo blockdev –setra 65536 /dev/md0 (not 128) <- change in command
  7. The conf file is /etc/mongodb.conf in Ubuntu 12.04 (not mongod.conf)
  8. The init.d file is /etc/init.d/mongodb (not mongod)
  9. The logpath & dbpath in /etc/mongodb don’t take effect. They are overshadowed by the defaults in the /etc/init.d/mongodb. This threw me off for sometime, finally I made the changes in the /etc/init.d/mongodb file
    • I think there is room for improvement. If I get time, I will refactor this file & make it simpler. Don’t want to jump-in without enough deep thought – I am sure the good folks at 10Gen have good reasons for the complexity
  10. sudo mkfs.xfs -f /dev/vg0/data to format xfs
  11. And appropriate command for fstab
    • echo ‘/dev/vg0/data /data xfs defaults,auto,noatime,noexec 0 0’ | sudo tee -a /etc/fstab
  12. The volume scheme at the Quickstart is 90% /data, 5% /journal and 5% /log might not work.I had first setup 4 X 30 GB ebs and this gave 3 GB for journal. Then mongod wouldn’t start with the error
    • Fri May 4 03:36:57 [initandlisten] ERROR: Insufficient free space for journal files
    • Fri May 4 03:36:57 [initandlisten] Please make at least 3379MB available in /data/journal or use –smallfiles
    • Fri May 4 03:36:57 [initandlisten]
    • Fri May 4 03:36:57 [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
    • This also threw me a little bit.
    • I think a single volume with three directories is better than three separate volumes, especially as they are in the same logical volume. May be the three separate volumes is to limit the disk usage, but if the disk is full in any one of them, the system would be down anyway. So am not sure if the three separate volume buys us anything
  13. Need apt support for MongoDB dev releases
    • 10Gen should add support for development releases in apt. I had to install 2.0.x, download 2.1.x and then copy over the /usr/bin/ directory. It worked now, but I don’t think it is safe
  14. I will update this blog as I complete the data layer

Finally the references (Helped me to understand the nuances & the details). I have annotated them and ordered contextually to help one to make informed infrastructure design & optimization decisions; start from the beginning & read thru sequentially :

  1. 10Gen Resources
    1. EC2 Quickstart – The bible ;o) This is for Amazon Linux, so needs some changes for Ubuntu
      1. The best source and should be used as a base to build your MongoDB infrastructure.
      2. A good plan is read the links, move on to rest of the references and then come back to build your infrastructure.
    2. MongoDB on AWS – An excellent paper (in pdf format) by Miles Ward
    3. Overview & Topology
    4. MongoDB on EC2 overview
    5. MongoDB Best Practices – Good overview tips
  2. MongoDB/AWS
    1. EBS Overview – must read to understand EBS
    2. RAID10 your EBS – good blog on why RAID10 for ebs; as aws provides redundancy, why do we mirror ebs volumes ?
    3. ServerFault Q&A on RAID10 & aws
    4. Getting good I/O from ebs – A few good points. Make sure to read rest of this section before making any drastic steps ;o)
    5. AWS ebs benchmarks – gives one an insight into RAID0, RAID10 et al
    6. Best Practices for MongoDB/RAID – discussion thread at mongodb-user Google group
    7. MongoDB memory usage & working set – discussion at mongo-user google group. Might help you with sizing the instance
  3. MongoDB/Ubuntu
    1. MongoDB, RAID10 & Ubuntu – includes detailed commands & explanations
    2. MongoDB, LVM, XFS,RAID10 – Another list of commands
    3. Installing MongoDB – Quick overview. Would be good to read the rest before jumping into installation
  4. RAID/LVM blogs
    1. Quick intro to LVM
    2. Managing RAID & LAVM with Linux – good intro
    3. Linux RAID smackdown
    4. Managing RAID 10
    5. Grow XFS on LVM
    6. LVM commands Cheat sheet
  5. RAID Gory Details
    3. Complex RAID10 with mdadm
  6. LVM gory details
  7. More Gory Details for enquiring minds who want to know

2 thoughts on “Notes on MongoDB @ AWS-Ubuntu-12.04 XFS, RAID10 & LVM

  1. I’m trying to understand the replicas configuration here. The MongoDB data needs to be on a EBS volume attached to an EC2 instance. But MongoDB docs suggest running replicas on different instances. How will this work if a EBS volume can be attached to only a single EC2 instance?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s