Friday, August 6, 2010 | Posted by cowmix at 2:34 PM

The performance of Postgresql: SSD vs consumer HDD -- Summer of 2010

Abstract

I benchmark the database-processing performance of two commercially-available SSDs vs. two traditional SATA HDDs.

Introduction

Here in the 5AM IT department, we are all about server virtualization. We deploy almost everything into either our own virtual server infrastructure or to any number of 3rd party cloud providers. We just LOVE virtual machines.

While virtualization has solved many problems, it’s also created a few issues. One obvious but huge challenge is that each virtual server we add creates yet another system (with new responsibilities like monitoring system status, log aggregation, etc) to manage.

As our virtual server count increases, we’ll need to build a system to reliably collect vital operational data. And hundreds of devices and servers generate thousands of data samples per minute that need to be captured, stored, and reported.

The quest for a low-power server with high-performance I/O

By definition, any server that can handle tens of thousands of pieces of operational data per minute must handle a huge I/O load. This will require a RAID 5, 6, or 10 subsystem. But adding a RAID to a server involves significant cost, complexity, size, and power requirements.

This is something we would love to avoid.

The dream: two redundant, low-power (and low cost) identical 1U servers at each site

1U server types have room for only one (or two) standard SATA hard drives. But SATA drives tend to crumble under any rigorous database load.

So until recently, low-power 1U servers were inadequate because they fail in disk I/O.

Along comes the SSD

The word on the street is that SSD offers RAID-class performance. If true, the introduction of low-cost SSD storage devices makes using low-power 1U servers as a high I/O appliance feasible.

But really, how much better is does the SSD perform over a traditional hard drive (HDD) for an application like a high read/write relational database? Google was unhelpful in locating generic studies on SSD performance in RDBMS environments. Searches for both MySQL and Postgresql benchmark data didn't yield much either.

So I decided to conduct my own small apples-to-apple comparison benchmarks of SSD vs traditional desktop hard drives.

Who's who in this benchmark?

Lets introduce the cast of characters:

Storage Devices to be evaluated:

  • 1.5 TB 7200 RPM Drive – Seagate ST31500341AS

  • 300 GB 10,000 RPM Drive – Western Digital WD30000HLFS

  • 128 GB SSD – Patriot PKA128GS25SSDR

  • 100 GB SSD – Patriot PI100GS25SSDR

Host DB:

  • Hardware

  • SG45H7 Shuttle desktop

  • 8 GB RAM

  • Q9650 3.0 GHz quad core

  • Host OS: Ubuntu 10.4 running the 2.6.32-24 kernel

  • Filesystem: EXT4

Client 1:

  • Hardware

  • ASUS GS1

  • 4 GB RAM

  • T7500 2.2 GHz dual core

  • Host OS: Xubuntu 10.4 running the 2.6.32-24 kernel

Client 2:

  • Hardware

  • Dell Vostro 220S

  • 4 GB RAM

  • E7400 2.8 GHz dual core

  • Host OS: Windows 7 Professional

Network: Switched gigabit ethernet

Database: Stock Ubuntu Postgresql 8.4.4


How about some pre-test tests?

Before I started the actual database benchmarks, I decided to establish the independent raw read/write performance of each drive. SSD manufacturers always tout these numbers as one of the main ways to judge SSD performance. So my goal was to (a) discover what each device was able to do, and (b) see if the raw read/write claims of the SSD manufacturer bears any resemblance to reality.

Luckily, Ubuntu added a raw performance benchmark tool to its Disk Utility program that allowed me to conduct these raw I/O tests with ease.

Armed with that, I tested each drive.

1.5 TB Seagate:

This hard drive over the past year and a half is the standard 1.5TB hard drive for desktops. The manufacturer claims a sustained data transfer rate of 135MB/sec and an average latency of 4.16 msec.

How does it really perform?


The average read and write rates start good, but end up below what the manufacture says is the maximum (102.9 MB/s read and 118.3 MB/s for write).


The average access time, 15.3 ms in these tests, is nowhere as fast as the 4.16 ms claimed spec.


300 GB Western Digital:


This hard drive was THE standard 10K RPM drive for hardcore gamers and performance workstations. I was not able to locate manufacture claims on a sustained data transfer rates, but they do claim an average seek time of 4.2 msec.

The average read and write rates appear to be more consistent than the 1.5 TB drive but rate about the same overall.The average access time, 7.2 ms, is a lot closer to the manufacture spec of 4.2 ms than what the 1.5 TB drive achieved.


128 GB SSD:


This drive was purchased a few months ago for its explicit compatibility with Mac OS X. The manufacturer claims the drive has the transfer rates of 260 MB/s read and 180 MB/s for write. The manufacturer makes no seek time claims.

The average read and write rates are super consistent of about 130 MB/s but are well below the vendor claims of 260 MB/s read and 180 MB/s for write . The average access time of .2 ms is an order of magnitude faster than what the mechanical drives can achieve.


100 GB SSD:

I purchased this drive from my local Fry’s because it had the highest read/write rate of any other drive they had in stock (yet was the cheapest price for that class). The manufacturer claims the drive has transfer rates of 285 MB/s read and 275 MB/s for write, and they do not make seek time claims.

The average read and write rates are pretty consistent and come close to the manufacture claims of 285 MB/s read and 275 MB/s for write. Awesome.

The average access time of .1 ms is an order of magnitude faster than the mechanical drives and even faster than the 128 BG SSD drive.


Postgresql benchmark background --- pgbench

The benchmark system used in this evaluation is standard issue with most Postgresql distributions, pgbench (http://developer.postgresql.org/pgdocs/postgres/pgbench.html). In these series of tests, pgbench is being utilized in a mostly default fashion. In that "default" form, pgbench roughly emulates the industry standard TPC-B transactional benchmark.


The rough set of operation performed per transaction on a pre-seeded table set are:
  1. BEGIN;

  2. UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

  3. SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

  4. UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

  5. UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

  6. INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

  7. END;

Postgresql benchmark #1 – one million rows, one client machine

In the first test, I pre-seeded the test table with one million 'account' rows with the following command:

client-1:~$ pgbench -h 192.168.101.120 -U userone -i -s 100 userone

The approximate on-disk footprint of that pre-seeded data was around 2.9 GB of data.

The command to run the actual transactional test was:

client-1:~$ pgbench -h 192.168.101.120 -U userone -c 10 -T 500 userone

The average transactions per second for each storage device (averaged out over 10 runs):

  • 7200RPM - 106

  • 10K RPM - 158

  • SSD128 - 458

  • SSD100 - 1472

In this first test the fast SSD performed almost 14 times faster than the 7200 RPM hard drive. It also interesting that the ‘slow’ SSD drive is three times slower than the ‘fast’ SSD drive.

Postgresql benchmark #2 – ten million rows, one client machine

In the second test, I pre-seeded the test table with ten million 'account' rows with the following command:

client-1:~$ pgbench -h 192.168.101.120 -U userone -i -s 1000 userone

The approximate on disk footprint of that pre-seeded data was around 22.5 GB of data.

The command to run the actual transactional test was:

client-1:~$ pgbench -h 192.168.101.120 -U userone -c 10 -T 500 userone

The average transactions per second for each storage device (averaged out over 10 runs):

  • 7200RPM - 34

  • 10K RPM - 47

  • SSD128 - 245

  • SSD100 - 184


This was a little unexpected: the 'slow' SSD drive is 1.3 times faster than the 'fast' SSD drive.

I ran this test 10 times over the course of two hours, so these results are confirmed. Weird.

The expected (but pleasing) results, of course, are both SSDs are five to seven times faster than the 7200 RPM hard drive.

Postgresql benchmark #2 – ten million rows, two client machines

In the last test I pre-seeded the test table with ten million 'account' rows with the following command:

client-1:~$ pgbench -h 192.168.101.120 -U userone -i -s 1000 userone

..and..

client-2:~$ pgbench -h 192.168.101.120 -U usertwo -i -s 1000 usertwo

The approximate on disk footprint of that pre-seeded data was around 45 GB of data.

The purpose of pre-seeding two different databases under the same instance of Postgresql was this:

when I ran two different client machines, I wanted each client machine to draw from different account tables. This would make sure that the Postgresql server could not utilize cached data between pgclient sessions. I wanted to make sure data was being pulled directly from disk as much as possible.

The command to run the actual transactional test was:

client-1:~$ pgbench -h 192.168.101.120 -U userone -c 10 -T 500 userone

..and..

client-2:~$ pgbench -h 192.168.101.120 -U usertwo -c 10 -T 500 usertwo

The average transactions per second for each storage device (averaged out over 10 runs):

  • 7200RPM - 14

  • 10K RPM - 23

  • SSD128 - 74

  • SSD100 - 122

NOTE: These are the tps for only client machine 1. These numbers are not an aggregation of the reported tps from both client machines.

This is more like it. The 'fast' SSD only dipped down 33% from when only the single client machine was running the benchmark. The 'slow' SSD dipped 70% from when only the single client machine was running the benchmark.

Both SSDs are still five to seven times faster than the 7200 RPM hard drive.

Conclusion

While I am not completely shocked, the results are pretty impressive: Without any tuning, modern SSD storage devices are five to fourteen times faster than their rotational brothers using a default 8.4.X Postgresql configuration. Come to think of it, in my 20+ years experience of working with databases, I have never seen the overall performance of database increase by half or an order of magnitude with so little work involved.

Ok, maybe I am a little shocked.

Anyway, here are some followup conditions I may test:

  • More rows: I plan to bump up the account row count to 400 million. This will create roughly a 90GB footprint for the database. It will be interesting to see how things scale there.

  • More physical clients: I have a Mac Mini and MacBookPro on my gigabit LAN being lazy. I will interesting to see how things scale with more clients slamming the DB.

  • RAID: It would great to throw in some tps numbers for a RAID-based system just to get more context for the SSD results.
  • Different file systems: EXT4 is not the only game in town. My next tests may include XFS and Btrfs as alternative file systems.
UPDATE #1(2010-08-08):
I've started a thread on the Postgresql performance mailing list regarding this post. Check it out.

UPDATE #2(2010-08-08):
I also submitted this to Hacker News.

8 comments:

bionic said...

I think your numbers are off wrt HDD claims.

Latency is due to the rotation of the disk, and seek times is from moving the heads. The Seagate site states:

Latency 4.16ms
Random read seek time <8.5ms

Add in host overhead and you're closer to your measured numbers.

Likewise, the Velociraptor is specified as:

Average Latency 3.00 ms (nominal)
Read Seek Time 4.2 ms

Which if you add them together yields something more in line with what you measured.

cowmix said...

bionic: Thanks for the correction.

patata said...

Thank you for sharing your tests. It`s impressive

But I wonder about long-term stability: how long can resist an ssd drive an intensive database production environment?

Greetings from Spain

Mark said...

Just because a SSD drive has a faster throughput, which performs well on streaming large amounts of data, IOPS of the drive will eventually be a bottleneck.

I believe this is why a lot of people opt for the more expensive Intel SSD drives (i'm sure there are various brands with high IOPS ratings), but from my experience if the drive is strangled on IOPS, it won't perform well when there is a lot of concurrency, where even a slower drive will still be able to process the queue of requests much faster.

I imagine it's not that much different than the MHZ of a CPU, faster numbers != better, depending on the application.

scoulibaly said...

Ohhh ! 122 TPS with a 40 GB base. I'm looking to start a big base 500GB with Postgresql and SSD drives, and was hoping that performances would not drop that much (1400 -> 120 means divide by 10).

How can this behaviour be avoided ? This maxes out the size of any postgresql database right ?

Thank you !

human.0.id said...

In addition to this topic - there is a new Intel PCIe x8 SSD Series 910 came out recently - 400GB and 800GB for enterprise level, with 1000MBps/750MBps read/write throughput and 75,000 and 180,000 IOPS on 4K data chunks. It would be interesting to see how they perform with a DB. The price on them is about $1850 and $3900 respectively MSRP. Good article! Thanks for sharing.

otr214426 said...

VIRUS REMOVAL

Is Your Computer Sluggish or Plagued With a Virus? – If So you Need Online Tech Repairs
As a leader in online computer repair, Online Tech Repairs Inc has the experience to deliver professional system optimization and virus removal.Headquartered in Great Neck, New York our certified technicians have been providing online computer repair and virus removal for customers around the world since 2004.
Our three step system is easy to use; and provides you a safe, unobtrusive, and cost effective alternative to your computer service needs. By using state-of-the-art technology our computer experts can diagnose, and repair your computer system through the internet, no matter where you are.
Our technician will guide you through the installation of Online Tech Repair Inc secure software. This software allows your dedicated computer expert to see and operate your computer just as if he was in the room with you. That means you don't have to unplug everything and bring it to our shop, or have a stranger tramping through your home.
From our remote location the Online Tech Repairs.com expert can handle any computer issue you want addressed, like:
• - System Optimization
• - How it works Software Installations or Upgrades
• - How it works Virus Removal
• - How it works Home Network Set-ups
Just to name a few.
If you are unsure of what the problem may be, that is okay. We can run a complete diagnostic on your system and fix the problems we encounter. When we are done our software is removed; leaving you with a safe, secure and properly functioning system. The whole process usually takes less than an hour. You probably couldn't even get your computer to your local repair shop that fast!
Call us now for a FREE COMPUTER DIAGONISTIC using DISCOUNT CODE (otr214426@gmail.com) on +1-914-613-3786 or chat with us on www.onlinetechrepairs.com.






otr214426 said...

Problem: HP Printer not connecting to my laptop.

I had an issue while connecting my 2 year old HP printer to my brother's laptop that I had borrowed for starting my own business. I used a quick google search to fix the problem but that did not help me.

I then decided to get professional help to solve my problem. After having received many quotations from various companies, i decided to go ahead with Online Tech Repair (www.onlinetechrepairs.com).

Reasons I chose them over the others:
1) They were extremely friendly and patient with me during my initial discussions and responded promptly to my request.
2) Their prices were extremely reasonable.
3) They were ready and willing to walk me through the entire process step by step and were on call with me till i got it fixed.

How did they do it
1) They first asked me to state my problem clearly and asked me a few questions. This was done to detect any physical connectivity issues with the printer.
2) After having answered this, they confirmed that the printer and the laptop were functioning correctly.
3) They then, asked me if they could access my laptop remotely to troubleshoot the problem and fix it. I agreed.
4) One of the tech support executives accessed my laptop and started troubleshooting.
5) I sat back and watched as the tech support executive was navigating my laptop to spot the issue. The issue was fixed.
6) I was told that it was due to an older version of the driver that had been installed.

My Experience
I loved the entire friendly conversation that took place with them. They understood my needs clearly and acted upon the solution immediately. Being a technical noob, i sometimes find it difficult to communicate with tech support teams. It was a very different experience with the guys at Online Tech Repairs. You can check out their website www.onlinetechrepairs.com or call them on 1-914-613-3786.
Would definitely recommend this service to anyone who needs help fixing their computers.
Thanks a ton guys. Great Job....!!

Post a Comment