Benchmarking filesystem performance for Oracle using SLOB

UPDATE 20140227: I am leaving this post here for historical reference, but the version of SLOB I used while writing on it is now fully deprecated.  Please go directly to the distribution page for SLOB2 and use the information there to retrieve the latest version of SLOB and learn how to use it, as the usage and functionality have changed.  For additional tips, consider Yury’s “SLOB2 Kick start“.  Please do not attempt to use the scripts, instructions or techniques I have described here unless you are still using the old version of SLOB.  If you are still using the old version, you should not be.

 

Update: Please see Part 1.5 for updated scripts and part 2 for results.

This post will cover techniques I’ve used to run SLOB (see http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/) to benchmark the performance of various filesystems and raw devices when used for Oracle datafiles and redo logs. I will not write about the actual results, since they’re only relevant to my specific hardware and software setup (spoiler alert: raw wins), but instead discuss the methodology and database parameters and configurations I used in hopes that they will help others to run their own tests, or that others will notice flaws I can remedy to produce more correct results. Likewise, this post is not about specific tuning parameters for individual filesystems, but instead about a way to run the tests to compare performance from one set of tuning parameters to another.

After you have downloaded and installed SLOB, you need to get a good idea about exactly what you want to test and which metrics you will look at to determine the results. In my case we have been having some OS issues resulting in unclean shutdowns that lead to long fsck times on our Oracle data volumes, so I am investigating alternative filesystems to try to find something with performance matching or beating ext3 that won’t be subject to fsck on boot. I also chose to run tests to compare the redo logging subsystem’s performance when redo is stored on various filesystems or raw devices.

So I am running three different tests:

  1. Pure read-only I/O performance
  2. Concurrent read-write I/O performance
  3. Write-only redo I/O performance

For each test I first needed to identify an appropriate metric. For read-only performance the obvious choice is physical reads per second. For concurrent read-write performance I measured the sum of physical reads per second and physical writes per second. For redo performance I measured redo generated per second.

After selecting your metrics you next need to determine how to configure the database to make sure you are testing what you wanted to test. To that end, I configured the database as described below. If you want to use SLOB to test other aspects of performance you need to monitor the wait events noted in your AWR reports to be sure that the database isn’t doing something you don’t really want to test. For example, if you are running a ton of SLOB tests overnight and the autotask window kicks in and the SQL Tuning Advisor or Segment Advisor start running, those will skew your results so you may wish to disable those tasks.

Setting Up Test Parameters

Each test requires a different set of initialization parameters and other configuration to isolate the desired variable (filesystem or raw device, in my case). I think the parameters I used are valid for the tests I ran, but I am very interested in any comments from others. For each of the various filesystems I wanted to test, the storage admin created a LUN, mounted it to the server, and created the filesystem (or configured the raw device). I put a separate tablespace on each LUN, each containing a single 10GB datafile.

Read-Only I/O Performance

This is the easiest item to force. I want the database to fulfill read requests from disk rather than from the buffer cache, so I simply took my existing baseline configuration (as used for our SAP systems) and set db_cache_size to 64M. With such a small buffer cache only minimal data will be cached, and the majority of reads will come from disk. You can confirm this in the SLOB-generated AWR report by verifying that the number of physical reads per second is relatively close to the number of logical reads per second. For example, if you show 20,000 logical reads per second and only 100 physical reads per second, you haven’t done it right as most gets are coming from cache. You may need to lie to Oracle about your CPU_COUNT to get your buffer cache small enough to force physical read I/O.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user read sessions and zero writers.

Concurrent Read-Write I/O Performance

This one took a lot of effort for me to come up with a setup that measured what I wanted. Initial efforts showed significant waits on log buffer space or cache buffers chains and a lot of redo generation without corresponding physical writes so I had to tweak things until I found a setup that (mostly) produced AWR reports with “db file sequential read”, “db file parallel read”, “write complete waits” and “DB CPU” within the top waits.

I eventually settled on a db_cache_size of 128M to force read I/O to physical while allowing writes to occur without waiting on cache buffer chains. I set log_buffer=1073741824 to reduce log buffer waits on writes, though Oracle seems to have limited the log buffer to 353,560K as reported in the AWR reports. I created three 64M redo logs on raw devices and ran the database in NOARCHIVELOG mode to force frequent checkpoints and corresponding physical writes to datafiles and exclude any I/O spent on archiving redo logs.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user sessions, with half of them as readers and half of them as writers.

Write-Only Redo I/O Performance

For this test I wanted to purely measure the amount of redo generated while excluding datafile checkpoints. I set db_cache_size to 32G to allow reads to come from cache and created three 4G redo log groups on each tested filesystem, intending the full test block to run without causing a log switch that would force a checkpoint.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user write sessions, with zero readers.

A Harness For SLOB

Setting up each SLOB run involves running the provided setup.sh script to create the users, schemata and data for each test, then running the provided runit.sh script with parameters to indicate the desired number of readers and writers. For example, to use 32 different users with their data in tablespace TABLESPACENAME, and then run SLOB with 16 writers and 16 readers, you would run:

./setup.sh TABLESPACENAME 32 ; ./runit.sh 16 16

After the run SLOB will produce an AWR report which you should review to see the results of your test run. SLOB also produces a drop_users.sql script to clear out the users generated by setup.sh, and you should run that script and re-create your users anytime you change the user count. A benchmark run only once has no validity, and a benchmark with nothing to compare to is useless, so you’ll want to create some scripts to run SLOB repeatedly, saving the AWR reports in between so that you can review the overall results.

Here is the content of the script I used to automate individual SLOB runs, with discussion after:

#!/bin/sh
# harness to run SLOB

FS=$1
NUMUSERS=$2
NUMWRITERS=$3
NUMREADERS=$4
RUNNUM=$5
RUNDIR="run${RUNNUM}"

echo "Dropping users and bouncing DB"

sqlplus -s / as sysdba<<EOF
@drop_users
alter system switch logfile;
alter system checkpoint;
shutdown immediate;
exit;
EOF

rm /oracle/SLOB/arch/SLOB*dbf

sqlplus -s / as sysdba<<EOF2
startup;
exit;
EOF2

echo "Setting up users for $FS $NUMUSERS"

./setup.sh $FS $NUMUSERS

echo "Running SLOB: $FS $NUMUSERS $NUMWRITERS $NUMREADERS (run $RUNNUM)"

./runit.sh $NUMWRITERS $NUMREADERS

echo "Renaming AWR report"

mv awr.txt $RUNDIR/SLOB-AWR-$FS-$NUMUSERS-$NUMWRITERS-$NUMREADERS-$RUNDIR.txt

To provide a consistent environment for each test, I drop the previous test’s users, force a redo log switch and checkpoint, and then bounce the database. I also remove any archived logs generated by the previous run so I don’t run out of space (this is a test DB I don’t need, don’t do that on any DB you can’t afford to lose). This script takes five arguments: the tablespace name, the number of concurrent users, the number of writers, the number of readers, and the run number for each combination. I’ve named my tablespaces after the filesystem (“BTRFS”, “EXT3”, “EXT4”, “XFS”, “RAWTS”, and so on) so when it calls setup.sh the user data is created within the appropriate tablespace on the filesystem to be tested. The last line renames SLOB’s awr.txt to reflect the filesystem, user count, number of readers/writers and run number.

I save this script as mydoit.sh and create another script to call it repeatedly. This example is for the read-only testing:

#!/bin/sh
echo "Starting runs for 8 readers"
./mydoit.sh EXT3 8 0 8 1
./mydoit.sh EXT4 8 0 8 1
./mydoit.sh BTRFS 8 0 8 1
./mydoit.sh XFS 8 0 8 1
./mydoit.sh RAWTS 8 0 8 1

./mydoit.sh EXT3 8 0 8 2
./mydoit.sh EXT4 8 0 8 2
./mydoit.sh BTRFS 8 0 8 2
./mydoit.sh XFS 8 0 8 2
./mydoit.sh RAWTS 8 0 8 2

./mydoit.sh EXT3 8 0 8 3
./mydoit.sh EXT4 8 0 8 3
./mydoit.sh BTRFS 8 0 8 3
./mydoit.sh XFS 8 0 8 3
./mydoit.sh RAWTS 8 0 8 3

echo "Starting runs for 16 readers"
./mydoit.sh EXT3 16 0 16 1
./mydoit.sh EXT4 16 0 16 1
./mydoit.sh BTRFS 16 0 16 1
./mydoit.sh XFS 16 0 16 1
./mydoit.sh RAWTS 16 0 16 1

./mydoit.sh EXT3 16 0 16 2
./mydoit.sh EXT4 16 0 16 2
./mydoit.sh BTRFS 16 0 16 2
./mydoit.sh XFS 16 0 16 2
./mydoit.sh RAWTS 16 0 16 2

./mydoit.sh EXT3 16 0 16 3
./mydoit.sh EXT4 16 0 16 3
./mydoit.sh BTRFS 16 0 16 3
./mydoit.sh XFS 16 0 16 3
./mydoit.sh RAWTS 16 0 16 3

# and so on for 32 concurrent sessions...

After this script runs I have a pile of AWR reports to review and compare their read performance. For each metric I’m interested in I saved the results from each run into a spreadsheet and generated average results for each set. That gives me an average number across three runs for each filesystem’s physical read I/O per second for 8 concurrent users, 16 concurrent users, and 32 concurrent users. Similar scripts run for the redo-only testing and for the read/write testing, with the results logged in the same way. I then generate charts within the spreadsheet to visually compare the results. Some performance trends are very obvious visually and we are already following up with our OS vendor for some more information.

Advertisements

8 thoughts on “Benchmarking filesystem performance for Oracle using SLOB

    1. pardydba Post author

      Thanks Kevin! Before I share the results I need to read up a bit on licensing to convince myself that I can share the numbers without getting sued. I’m not trying to compare Oracle to other products, or to include any commercial filesystems so hopefully I can do so without anybody’s legal department getting involved. If you’re aware of any references out there making it clear that we can do so, I’d be very interested in reading them.

      Reply
  1. kevinclosson

    Hi Brian,

    Your caution on that specific matter is wise. I cannot play lawyer here but can assure you Oracle has bored, overpaid lawyers so they could drag you down with frivolity.

    Perhaps avoid citing specific numbers. It’s all about “educational purposes.” Perhaps you could choose one file system as the baseline. Let’s say ext3 is the baseline. You then get your SLOBops , PIO, etc from that recipe and that hits the graph as the integer value 1. Everything else is relative to that?

    Personally, any time I see someone set timing on in SQL*Plus and paste a measurement I start looking for the trench coats.

    Reply
    1. pardydba Post author

      Now that’s the kind of thing that I can’t see the lawyers finding a significant problem with (scaling the results against a baseline like ext3). Thanks for the idea.

      My overall results are still pending while I await testing ASM (need to install GI) and Oracle’s internal NFS, so I hope to get a follow up posted next late week with visuals. ASM should closely track raw performance and I have no idea what to expect from NFS.

      Brief version for those interested — BTRFS was horrible for redo logging with ext3 and raw producing 4x the redo bytes vs BTRFS, ext4 was about 2x better than BTRFS but only half of ext3 and raw.

      All were pretty even on read only testing (raw 12% better than the rest). Ext3/4, XFS and raw all produced about 2.5x more total physical I/O for the combined read/write tests compared to BTRFS, with 4 readers and 4 writers. That jumps up to 3x better for not-BTRFS as users scale up to 16 and 32 concurrent sessions.

      BTRFS was a clear laggard on anything involving writes, and all testing was with copy-on-write turned off. This was all done on SLES11 SP2 with a hacked in ext4 driver (since SLES is dropping ext4 in favor of BTRFS and only provides read-only ext4 stock).

      Maybe posting this much will be enough for BTRFS fans to provide some tuning suggestions…

      Reply
  2. kevinclosson

    >Maybe posting this much will be enough for BTRFS fans to provide some tuning suggestions…

    funny you say that…I have an email planned for Chris Mason. He’s a great guy and takes pride so we can learn from this stuff.

    Reply
    1. pardydba Post author

      I’ll post a list of the various event/trace/other parameters I’m using when I post the scaled numbers (it’s the SAP black box of required magic parameters), but if you’d like to get that sooner to include when emailing Chris just let me know.

      Reply
  3. Pingback: Benchmarking filesystem performance for Oracle using SLOB (Part 1.5): Improved test scripts | Pardy DBA

  4. Pingback: Benchmarking filesystem performance for Oracle using SLOB (Part 2) – Results! | Pardy DBA

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s