Monthly Archives: February 2013

SLOB.R v0.6: An R script for analyzing SLOB results from repeated runs

UPDATE 20140227: I am leaving this post here for historical reference, but the version of SLOB I used while writing on it is now fully deprecated.  Please go directly to the distribution page for SLOB2 and use the information there to retrieve the latest version of SLOB and learn how to use it, as the usage and functionality have changed.  For additional tips, consider Yury’s “SLOB2 Kick start“.  Please do not attempt to use the scripts, instructions or techniques I have described here unless you are still using the old version of SLOB.  If you are still using the old version, you should not be.

This post covers usage of my SLOB.R script, used to analyze SLOB results from repeated test runs.  The contents of the SLOB.R script are at the bottom of this post, but first I will show the contents of a sample SLOB.R session, followed by how you can produce similar results.

A Sample SLOB.R Session

The first step is to start R.  On Windows you’ll double-click an icon, on Linux you’ll just type R and hit enter.  Once R has started, you will be in the interactive R interface and it will display some introductory text standard in R.

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-w64-mingw32/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

Once you are in R, you need to load the SLOB.R script and then load your SLOB data.  SLOB data is produced by running SLOB repeatedly, saving the awr.txt output in between each run, then running the SLOB/misc/awr_info.sh script to summarize the collected AWR reports. I have saved the SLOB.R script in my Cygwin home directory, and saved the awr_info.sh output in a file called “awrinfo” in the same directory.

> source('C:/cygwin/home/brian/SLOB.R') 
> SLOB <- loadSLOB('C:/cygwin/home/brian/awrinfo')

Now you have the contents of your awr_info.sh output in an R variable called SLOB.  You can call this variable anything you wish, but I used SLOB.

To use SLOB.R you need to tell it a little bit about your AWR data: specifically which variables you are trying to test and how many sessions you used. In this example I am comparing performance of the XFS and EXT3 filesystems using the SLOB physical read, redo, and physical write models.  The variables to be tested (EXT3 and XFS) are embedded in the filenames I used when saving the awr.txt report between SLOB runs, and the session counts are the numbers I used when running SLOB’s runit.sh script to put load on the database.  We tell SLOB.R about this by setting a few R variables containing this data.  You can call them anything you wish, you just need to know their names.

> SLOB.sessions <- c(8, 16, 32) 
> SLOB.read.vars <- c('XFS', 'EXT3') 
> SLOB.redo.vars <- c('XFS', 'EXT3') 
> SLOB.readwrite.vars <- c('XFS', 'EXT3')

As you can see, variable assignment in R uses the <- operator.  I’m using the R built-in c() function (concatenate) to create vector variables that contain multiple values (you can think of them as like an array for now).  The SLOB.sessions variable contains three integer values: 8, 16, and 32; the other three variables each contain two string values: ‘XFS’ and ‘EXT3’.  For this demo I am only including two variables but it works fine with many more than that.  I have been using about 7 of them.  I am comparing filesystems, but you might be comparing storage vendor 1 vs storage vendor 2, or fibre channel vs dNFS, or Oracle 11.2.0.1 vs Oracle 11.2.0.3.  As long as the variables are identifiable from your AWR filenames, they will work.

You can view the contents of any R variable just by typing its name.

> SLOB.sessions
[1]  8 16 32
> SLOB.read.vars
[1] "XFS"  "EXT3"

With these variables set up you can now use the main SLOB.R driver functions to summarize the contents of your SLOB AWR reports.  First I’ll call the SLOBreads() function to summarize physical read performance.  This function summarizes the PREADS column from awr_info.sh output, by variable and session count.  To produce a better average it discards the lowest value and highest value from each combination of variable and session count.  Other SLOB.R driver functions are SLOBredo() and SLOBreadwrite().

> SLOBreads(SLOB, SLOB.reads.vars, SLOB.sessions)
            8       16       32  Overall
XFS  27223.89 46248.05 61667.21 44886.22
EXT3 30076.77 49302.59 59113.00 46094.39

So this indicates that for 8 reader sessions, the XFS filesystem gave me an average of 27,223.89 physical reads per second, while EXT3 gave me 30,076.77 physical reads per second.  The columns for 16 and 32 sessions have the same meaning as the 8 session column.  The ‘Overall’ column is an average of ALL the data points, regardless of session count.

You don’t have to use variables when calling the SLOB.R driver functions.  You can specify the variables or session counts directly in the call.  This is part of how R works.  Another example, showing how you can receive the same output by calling it with a variable or not:

> SLOBredo(SLOB, SLOB.redo.vars, SLOB.sessions)
             8        16        32   Overall 
XFS  326480426 336665385 321823426 326425272 
EXT3 304188997 325307026 326618609 317991362 
> SLOBredo(SLOB, c('XFS', 'EXT3'), c(8, 16, 32)) 
             8        16        32   Overall 
XFS  326480426 336665385 321823426 326425272 
EXT3 304188997 325307026 326618609 317991362

The numbers above would indicate that when storing my redo logs on XFS, SLOB could push 326,480,426 bytes of redo per second with 8 sessions.  On EXT3 with 8 sessions I saw 304,188,977 bytes of redo per second.  The 16, 32 and Overall columns have meanings similar to what I showed before.

The SLOBreadwrite() function reports the sum of physical reads and physical writes, with the columns all meaning the same as they do for the other functions.

> SLOBreadwrite(SLOB, SLOB.readwrite.vars, SLOB.sessions)
            8       16       32  Overall 
XFS  18520.44 20535.41 20728.07 19823.37 
EXT3 19568.04 21730.94 22641.14 21203.78

How To Create Output For SLOB.R

SLOB.R is smart enough to figure out which of your runs are testing reads, which are testing redo, and which are testing readwrite performance.  But for this to work you have to follow the naming convention defined in the SLOB/misc/README file when renaming your awr.txt files for processing by awr_info.sh:  [whatever].[number of SLOB writers].[number of SLOB readers] — SLOB.R expects your variables to be uniquely identifiable strings in the ‘whatever‘ part.

I recommend using scripts to run SLOB repeatedly and save the awr.txt output in between.  I provided some scripts in a prior post, but you can use your own scripts as long as your filenames match the required format.

Once you have your AWR output collected, run SLOB/misc/awr_info.sh on all the files, and save its output.  This is the file you will load into R.

SLOB.R Script (v0.6)

Save this as SLOB.R.  You may find it easier to use the pastebin version.

# SLOB.R version 0.6
#   BJP 2013 - Twitter @BrianPardy - https://pardydba.wordpress.com/
# See http://wp.me/p2Jp2m-4i for more information
#
#
# Assumes three possible SLOB test models: READS, REDO, WRITES
#   READS are readers-only
#   REDO and WRITES are writers only, differing in size of buffer_cache (small = REDO, large = WRITES)
#
#   Reports PREADS in SLOB.R READS model
#   Reports REDO in SLOB.R REDO model
#   Reports PWRITES in SLOB.R WRITES model
#   Use SLOB.R meta-model READWRITE to report PREADS+PWRITES in WRITES model
#
# Setup:
#   Run SLOB as usual, with at least three runs for each variable tested for each model
#   Save awr.txt between runs in filename matching [something]-variable.writers.readers
#     Example:  AWR-FCSAN-run1.8.0, AWR-FCSAN-run2.8.0 (...) AWR-FCSAN-run10.8.0
#               AWR-ISCSINAS-run1.8.0, AWR-ISCSINAS-run2.8.0 (...) AWR-ISCSINAS-run10.8.0
#       (In this case, the variables would be "FCSCAN" and "ISCSINAS", comparing fibrechannel SAN to NAS)
#   Run awr_info.sh from SLOB distribution against all AWR files at the end and save the output
#   Load awr_info.sh output into R with:  SLOB <- loadSLOB("filename")
#
# Hints:
#   Best results achieved with more SLOB runs - myavg() drops high and low values per set, averages remaining
# 
# Detailed example usage:
#   Testing SLOB read, redo and readwrite models 10 times each with 8, 16, and 32 sessions on EXT3 vs EXT4
#   Used a tablespace on EXT3 for EXT3 testing and a tablespace on EXT4 for EXT4 testing
#   Used redo logs on EXT3 for EXT3 REDO testing and redo logs on EXT4 for EXT4 REDO testing
#   Ran SLOB/misc/awr_info.sh on all awr.txt reports generated from these 60 SLOB runs
#   Saved awr_info.sh output as filename "awrinfo"
#
#  (Start R)
#  > source("SLOB.R")
#  > SLOB <- loadSLOB("awrinfo")
#  > SLOB.sesscounts < c(8, 16, 32)            # Specify the number of sessions used in tests
#  > SLOB.read.vars <- c('EXT3', 'EXT4')       # Specify the variables for READ testing: EXT3 vs EXT4
#  > SLOB.redo.vars <- SLOB.read.vars          # Same variables for REDO testing as for READ testing
#  > SLOB.readwrite.vars <- SLOB.read.vars     # Same variables for READWRITEtesting as for READ testing
#  > SLOB.reads <- SLOBreads(SLOB, SLOB.reads.vars, SLOB.sesscounts)
#  > SLOB.redo <- SLOBredo(SLOB, SLOB.redo.vars, SLOB.sesscounts)
#  > SLOB.readwrite <- SLOBreadwrite(SLOB, SLOB.readwrite.vars, SLOB.sesscounts)
### Previous three lines populate SLOB.reads, SLOB.redo and SLOB.readwrite variables with AWR results
### You can then interrogate those variables by typing their names
# > SLOB.reads
#             8       16       32  Overall
# XFS  27223.89 46248.05 61667.21 44886.22
# EXT3 30076.77 49302.59 59113.00 46094.39
#
#
# Usage variants for more detailed investigation.  Consider this advanced usage.
#   Most people should just use SLOBreads(), SLOBredo() and SLOBreadwrite()
#
#
#   Get average REDO bytes for variable 'foo' across all sessions: 
#     avg(SLOB, 'REDO', 'foo')
#     redoavg(SLOB, 'foo')
#
#   Get average REDO bytes for variable 'foo' with 8 sessions:
#     avg(SLOB, 'REDO', 'foo', 8)
#     redoavg(SLOB, 'foo', 8)
#
#   Get average PREADS (physical read) for variable 'foo' across all sessions:
#     avg(SLOB, 'READS', 'foo')
#     readavg(SLOB, 'foo')
#
#   Get average PWRITES (physical writes) for variable 'foo' with 16 sessions:
#     avg(SLOB, 'WRITES', 'foo', 16)
#     writeavg(SLOB, 'foo', 16)
#
#   Get sum of PREADS and PWRITES for variable 'foo' with 32 sessions:
#     avg(SLOB, 'READWRITE', 'foo', 32)
#     readwriteavg(SLOB, 'foo', 32)
#
#   Get average REDO bytes for multiple variables ('foo' and 'bar') across all sessions:
#     sapply(c('foo', 'bar'), redoavg, dat=SLOB)
#       or for 16 sessions:
#     sapply(c('foo', 'bar'), redoavg, dat=SLOB, sessioncount=16)
#     alternate:   sapply(c('foo', 'bar'), avg, dat=SLOB, sessioncount=16, model='READS')
#     (Note: This returns separate results for each variable, it does not combine and average them)
#
#   Get sum of PREADS and PWRITES for multiple variables ('XFS' and 'EXT3') across 16 sessions:
#     sapply(c('XFS', 'EXT3'), avg, dat=SLOB, sessioncount=16, model='READWRITE')
#
#   View the top waits in XFS READ model with 16 sessions:
#     waits(SLOB, str='XFS', model='READS', sessioncount=16)
#
#   View all data for a particular model with a specific session count:
#     SLOB[intersect(REDO(SLOB), sessions(16, SLOB)),]
#
#

getruns <- function(dat, model, str, sessioncount) {
  tmp <- intersect(grep(str, dat$FILE), eval(parse(text=paste(model, '(dat)', sep=''))))           
  if(missing(sessioncount)) { return(tmp)} 
  else { intersect(tmp, sessions(sessioncount, dat))}
}

loadSLOB <- function(filename) {
  read.table(file=filename, sep="|", header=TRUE)
}

# heuristically identify REDO model runs - if this sucks for you, comment it out
# and uncomment the alternate REDO function below.  It expects your filenames to include
# the string '-REDO-' when testing REDO performance.
REDO <- function(dat) {
  setdiff(which(dat$REDO/(dat$PWRITES*8192) > 2), READS(dat))
}

#REDO <- function(dat) {
#  grep('-REDO-', dat$FILE)
#}

WRITES <- function(dat) {
  setdiff(which(dat$WRITERS > 0), REDO(dat))
}

READS <- function(dat) {
  which(dat$READERS > 0)
}

READWRITE <- function(dat) {
  WRITES(dat)
}

sessions <- function(n, dat) {
  union(which(dat$WRITERS == n), which(dat$READERS == n))
}

myavg <- function(z) {
  mean(z[!z %in% c(min(z), max(z))])
}

getavg <- function(mode, dat) {
   myavg(eval(parse(text=paste('dat$', mode, sep=''))))
}

redoavg <- function(dat, ...) {
  getavg('REDO', dat[getruns(dat, 'REDO', ...),])
}

readavg <- function(dat, ...) {
  getavg('PREADS', dat[getruns(dat, 'READS', ...),])
}

writeavg <- function(dat, ...) {
  getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),])
}

readwriteavg <- function(dat, ...) {
  getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),]) + getavg('PREADS', dat[getruns(dat, 'WRITES', ...),])
}

avg <- function(dat, model, ...) {
  if(model=='REDO') {
    getavg('REDO', dat[getruns(dat, 'REDO', ...),])
  } else if(model=='READS') {
    getavg('PREADS', dat[getruns(dat, 'READS', ...),])
  } else if (model=='WRITES') {
    getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),])
  } else if (model=='READWRITE') {
    getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),]) + getavg('PREADS', dat[getruns(dat, 'WRITES', ...),])
  }
}

waits <- function(dat, ...) {
  as.character(dat[getruns(dat, ...), 'TOP.WAIT'])
}

testdata <- function(dat, model, str, ...) {
  if(model=='REDO') {
    sapply(str, avg, dat=dat, model='REDO', ...)
  } else if(model=='READS') {
    sapply(str, avg, dat=dat, model='READS', ...)
  } else if(model=='WRITES') {
    sapply(str, avg, dat=dat, model='WRITES', ...)
  } else if (model=='READWRITE') {
    sapply(str, avg, dat=dat, model='READWRITE', ...)
  }
}

readdata <- function(dat, str, ...) {
  sapply(str, avg, dat=dat, model='READS', ...)
}

redodata <- function(dat, str, ...) {
  sapply(str, avg, dat=dat, model='REDO', ...)
}

readwritedata <- function(dat, str, ...) {
  sapply(str, avg, dat=dat, model='READWRITE', ...)
}

SLOBreads <- function(dat, strs, sessioncounts) {
  z <- data.frame()
  for(i in sessioncounts) { 
    z <- rbind(z, readdata(dat, strs, i))
  }
  z <- rbind(z, readdata(dat, strs))
  z <- t(z)
  colnames(z) <- c(sessioncounts, 'Overall')
  rownames(z) <- strs
  return(z)
}

SLOBredo <- function(dat, strs, sessioncounts) {
  z <- data.frame()
  for(i in sessioncounts) { 
    z <- rbind(z, redodata(dat, strs, i))
  }
  z <- rbind(z, redodata(dat, strs))
  z <- t(z)
  colnames(z) <- c(sessioncounts, 'Overall')
  rownames(z) <- strs
  return(z)
}

SLOBreadwrite <- function(dat, strs, sessioncounts) {
  z <- data.frame()
  for(i in sessioncounts) { 
    z <- rbind(z, readwritedata(dat, strs, i))
  }
  z <- rbind(z, readwritedata(dat, strs))
  z <- t(z)
  colnames(z) <- c(sessioncounts, 'Overall')
  rownames(z) <- strs
  return(z)
}

That’s it for now.  I welcome comments and questions.

Collection of links, tips and tools for running SLOB

UPDATE 20140227: I am leaving this post here for historical reference, but the version of SLOB I used while writing on it is now fully deprecated.  Please go directly to the distribution page for SLOB2 and use the information there to retrieve the latest version of SLOB and learn how to use it, as the usage and functionality have changed.  For additional tips, consider Yury’s “SLOB2 Kick start“.  Please do not attempt to use the scripts, instructions or techniques I have described here unless you are still using the old version of SLOB.  If you are still using the old version, you should not be.

This post contains a collection of various tips and tools I have found useful for running SLOB. Where possible I’ve provided attribution for where I first learned of each tip, but some items are simply common sense or have already been posted by multiple people, so I can’t be sure that I have linked every possible source. If you have a write up out there that I’ve missed that you would like linked, please let me know in the comments. I have only focused on running SLOB to test physical read performance, redo generation performance, and mixed workload read/write performance so these tips will only cover those use cases, not other potential uses like testing logical I/O performance. I would be happy to include LIO testing tips if anyone shares some in a comment.

Getting Started – Things You Need

  1. First you’ll need to have an installation of the Oracle database licensed to run AWR reports.
  2. Next go to the OakTable site and download the SLOB distribution, SLOB includes a simple database creation script in the misc/create_database_kit/ directory along with a README describing how to use it, or you can use your existing database. I extract the kit into my oracle user’s home directory, but you can put it anywhere.
  3. You should also get the simple SLOB init.ora file for read IOPS testing and start your database with those parameters to get a quick start on testing physical read performance. If you’re using Oracle 11.2.0.3 you really should use the underscore parameters in this init.ora for accurate results.
  4. Read the SLOB readme if you haven’t already.

General SLOB Tips For Best Results

  1. Disable automatic generation of AWR snapshots. SLOB relies on the differences between an AWR snapshot taken at the start of the test run and another snapshot taken at the end of the test run, so if an automated AWR snapshot occurs it will throw off your results.
  2. Disable backups, autotasks, resource manager, and so on. You want the database to be as idle as possible, other than SLOB. See Yury’s SLOB index page which includes these suggestions.
  3. Save the awr.txt file that SLOB generates after each run so you can compare performance with previous runs. Use the awr_info.sh script included with SLOB to summarize your collected AWR reports for easy reading.
  4. Review your results. Check for unexpected differences from run to run, don’t simply run it once and expect meaningful results; you want to see consistent performance to have confidence your tests accurately reflect your setup.
  5. A throwaway SLOB run after creating the users with setup.sh (or bouncing the DB to change init parameters) will help with repeatability.
  6. Start small, with 4 or 8 sessions, and then try again with a few more sessions to find the sweet spot with the best performance for your hardware. Once you hit a session count where performance starts to degrade, don’t bother running more sessions than that. On high-powered hardware you might be able to run 128 sessions, but a smaller server might work better with 32.
  7. If you aren’t trying to test variances in your storage, keep your storage consistent from test to test. To put this another way, if (like me) you are using SLOB to test the performance of various filesystems, don’t create all your filesystems on separate LUNs. You probably don’t know how each LUN might be striped across the storage backend. Use the same LUN for every test, running mkfs as necessary to change filesystems or fdisk to reformat for raw usage. (Kevin Closson suggested this in email, I can’t find a public posting to link to.)

Physical Read Testing Tips

  1. Disable Oracle IO optimization that turns your desired db file sequential reads into db file parallel reads. Kevin Closson has also recommended use of these parameters in the simple init.ora file for read testing. I consider this a requirement, not a recommendation. I already mentioned this above but it’s worth repeating.

Redo Generation Testing Tips

  1. When testing M writers with N redo logs configured, preface your test with (N*2)+1 log switches followed by a throwaway SLOB run using M readers. See this tweet from Kevin Closson. This will produce more consistent results from run to run.

Read/Write Testing Tips

  1. If your buffer cache is small enough (and it should be), running SLOB with zero readers and some writers will produce enough random reads to represent a mixed workload. See this comment from Kevin Closson. Including dedicated read sessions is not necessary.

SLOB Tools

  1. The SLOB distribution contains a script called awr_info.sh in the misc/ directory that will summarize the most important parts of your AWR report for easy reading.
  2. My SLOB.R script to interrogate SLOB awr_info.sh output in R.
  3. Yury Velikanov’s SLOB On Steroids, which I haven’t yet tried but is worth a look.

Advanced Tips

  1. SLOB is simple, yet powerful, and people are always finding new uses for it. So read what others write about SLOB, like Yury’s SLOB index linked above, my other SLOB posts, and posts about SLOB from Karl Arao, flashdba, Martin Bach and of course Kevin Closson himself.
  2. Make mistakes and learn from them.

Further Reading

  1. Added 20130220: Check out the new SLOB mind map!  Thanks to Martin Berger for the great idea.
  2. Added 20130226: Lies, Damned Lies, and I/O Statistics by Matt Hayward.  While not directly related to SLOB, this is worthwhile reading for anyone involved in any sort of I/O benchmarking.
  3. Added 20130320: Using SLOB for PIO Testing by FlashDBA, including a configuration-file driven SLOB harness that you will probably like more than the simple ones I put out there.

Benchmarking filesystem performance for Oracle using SLOB (Part 2) – Results!

UPDATE 20140227: I am leaving this post here for historical reference, but the version of SLOB I used while writing on it is now fully deprecated.  Please go directly to the distribution page for SLOB2 and use the information there to retrieve the latest version of SLOB and learn how to use it, as the usage and functionality have changed.  For additional tips, consider Yury’s “SLOB2 Kick start“.  Please do not attempt to use the scripts, instructions or techniques I have described here unless you are still using the old version of SLOB.  If you are still using the old version, you should not be.

Update 20130212: Please consider this a work in progress. I am re-running most of these tests now using the same LUN for every filesystem to exclude storage-side variance. I am also using some additional init parameters to disable Oracle optimizations to focus the read tests more on db file sequential reads and avoid db file parallel reads. My posts so far document the setup I’ve used and I’m already noting flaws in it. Use this and others’ SLOB posts as a guide to run tests on YOUR system with YOUR setup, for best results.

Update 20130228: I have re-run my tests using the same storage LUN for each test.  I have updated the charts, text, and init parameters to reflect my latest results.  I have also used a 4k BLOCKSIZE for all redo log testing.  Further, instead of running each test three times and taking the average, I have run each test ten times, dropped the highest and lowest values, and averaged the remaining eight, in hopes of gaining a more accurate result.  Where I have changed my setup compared to the original post, I have left the old text in, struck out like this.

In Part 1 and Part 1.5 I covered the setup and scripts I’ve used to benchmark performance of various filesystems (and raw devices) using SLOB for Oracle datafiles and redo logs. In this post I will provide more details, including the results and database init parameters in use for my tests. This is an SAP shop so I have several event and _fix_control parameters in place that may have performance impacts. SAP says “use them” so I wanted my tests to reflect the performance I could hope to see through the SAP applications,

I do not intend these tests to demonstrate the maximum theoretical I/O that our hardware can push using each filesystem. I intend them to demonstrate what performance I can expect if I were to change the databases to use each listed filesystem while making minimal (preferably zero) other changes to our environment. I know, for example, that adding more network paths or faster ethernet or splitting the bonded paths will improve Oracle direct NFS performance, but I am testing with what we have available at the moment, not what we COULD have.

Now that 11g is certified on Oracle Linux 6, we are building out an OL6.3 box and I will be running identical tests on that server in a couple weeks. I’m expecting to see some significant improvement in BTRFS performance there.

With that said, some information on the test environment:

  • Server OS: SLES 11SP2 (kernel 3.0.51-0.7.9-default)
  • Oracle version: 11.2.0.3.2 (CPUApr2012 + SAP Bundle Patch May2012)
  • 2 CPU sockets, 8 cores per socket, with hyper-threading (Oracle sees CPU_COUNT = 32 but I have set CPU_COUNT to 2 to allow me to create a very small buffer cache)
  • All filesystems and raw devices created within a single volume in a single NetApp storage aggregate
  • I use only single instance databases so no effort was made to test cluster filesystems or RAC performance

Some notes about each test:

  1. All tests run in NOARCHIVELOG mode — obviously you should not do this in production
  2. Each filesystem was tested by creating SLOB users in a single tablespace containing a single 16GB 4GB datafile on the indicated filesystem
  3. All redo logs created with 4k blocksize
  4. All datafiles created with default parameters, I did not specify anything other than size and filename
  5. Read-only tests run with a 64M 32M db_cache_size to force PIO
  6. Redo generation tests run with a large log_buffer and two 32GB redo logs on the listed filesystem, with a 32G db_cache_size. I also set log_checkpoint_timeout to 999999999 to avert time-based checkpoints.
  7. XFS redo testing performed with filesystemio_options=none, as otherwise I could not create redo logs on XFS without using an underscore parameter and specifying the sector size when creating the logs (see http://flashdba.com/4k-sector-size/ for more information on this issue). All other tests used filesystemio_options=setall; only XFS redo required none.   All tests run with filesystemio_options=setall.
  8. Read-write tests run with three 64MB redo logs stored on raw devices and 128M db_cache_size
  9. Every test was run three ten times with 8 sessions, three ten times with 16 sessions, and three ten times with 32 sessions, and an overall average was taken after dropping high and low values. Prior to each batch of tests the database was bounced. Per Kevin Closson’s recommendation, I executed a throwaway SLOB run after each bounce, discarding the results.
  10. “NFS” refers to Oracle direct NFS  I dropped the NFS tests, as our ethernet setup is currently only 1Gb and does not perform as well as FC
  11. Automated tasks like the gather stats job, segment and tuning advisors were disabled
  12. The BTRFS volume was mounted with nodatacow to avoid copy-on-write

Results

Oracle’s standard license prohibits publication of customer publication of benchmark results without express written consent. To avoid any confusion that these results represent a benchmark, I am NOT publishing any of the absolute numbers. Instead I have scaled all results such that the performance of a RAW device is equal to 1, and all other results are reported relative to RAW. I make no claims that these results represent the best possible performance available from my software/hardware configuration. What they should accurately reflect, though, is what someone with a similar setup could expect to see if they were to create redo logs or datafiles on the indicated filesystem without performing any tuning to optimize the database, OS or filesystem.

Comments

I am disappointed with the performance of BTRFS. Our OS vendor is deprecating ext4 in favor of BTRFS so if we’re going to abandon ext3 due to fsck issues BTRFS is the path of least resistance. Ext4 appears to provide performance similar to ext3 and should reduce fsck runtimes so if we stay on cooked devices that looks like the way to go. Raw performance won overall (though storage write caching appears to have made ext3 look better than raw) but it has management issues such as the 255-device limit and inability to extend a datafile on a raw device. ASM should provide the same overall performance as raw without the limitations, but adds additional management overhead with the need to install Grid Infrastructure and that just gives one more target to keep up to date on patches. XFS had poor performance for redo logs in my environment but good performance elsewhere, and it should entirely avoid fsck-on-boot problems.

Direct NFS seemed to deliver the most consistent performance from test to test, with a smaller standard deviation than any of the others. This might be relevant for anyone who requires consistent performance more than maximal performance.

Read Performance

Scaled read performance

Scaled read performance

Redo Generation Performance

Scaled redo generation performance

Scaled redo generation performance

Read/Write Performance

Scaled read/write performance

Scaled read/write performance

Init Parameters

Below are the init parameters used for each test. Note that filesystemio_options had to be set to none for the XFS redo generation testing, but other than that these parameters are accurate for all the tests.

Read Performance

SLOB.__db_cache_size=34359738368
SLOB.__oracle_base='/oracle/SLOB'#ORACLE_BASE set from environment
SLOB.__shared_pool_size=704643072
*._db_block_prefetch_limit=0
*._db_block_prefetch_quota=0
*._db_file_noncontig_mblock_read_count=0
*._disk_sector_size_override=TRUE
*._fix_control='5099019:ON','5705630:ON','6055658:OFF','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6972291:ON','8937971:ON','9196440:ON','9495669:ON','13077335:ON'#SAP_112031_201202 RECOMMENDED SETTINGS
*._mutex_wait_scheme=1#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._mutex_wait_time=10#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optim_peek_user_binds=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_adaptive_cursor_sharing=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_extended_cursor_sharing_rel='NONE'#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_use_feedback=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*.audit_file_dest='/oracle/SLOB/saptrace/audit'
*.compatible='11.2.0.2'
*.control_file_record_keep_time=30
*.control_files='/oracle/SLOB/cntrl/cntrlSLOB.dbf','/oracle/SLOB/cntrlm/cntrlSLOB.dbf'
*.cpu_count=1
*.db_block_size=8192
*.db_cache_size=33554432
*.db_files=500
*.db_name='SLOB'
*.db_recovery_file_dest_size=8589934592
*.db_recovery_file_dest='/oracle/SLOB/fra'
*.db_writer_processes=2
*.diagnostic_dest='/oracle/SLOB/saptrace'
*.event='10027','10028','10142','10183','10191','10995 level 2','38068 level 100','38085','38087','44951 level 1024'#SAP_112030_201112 RECOMMENDED SETTINGS
*.filesystemio_options='SETALL'
*.local_listener='LISTENER_SLOB'
*.log_archive_dest_1='LOCATION=/oracle/SLOB/arch/SLOBarch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=14221312
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=1600
*.optimizer_dynamic_sampling=6
*.parallel_execution_message_size=16384
*.parallel_max_servers=0
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=10737418240
*.processes=800
*.query_rewrite_enabled='FALSE'
*.recyclebin='off'
*.remote_login_passwordfile='EXCLUSIVE'
*.remote_os_authent=TRUE#SAP note 1431798
*.replication_dependency_tracking=FALSE
*.resource_manager_plan=''
*.sessions=800
*.sga_max_size=10737418240
*.shared_pool_size=5242880000
*.star_transformation_enabled='TRUE'
*.undo_retention=432000
*.undo_tablespace='PSAPUNDO'

Redo Generation Performance

SLOB.__db_cache_size=34359738368
SLOB.__oracle_base='/oracle/SLOB'#ORACLE_BASE set from environment
SLOB.__shared_pool_size=704643072
*._db_block_prefetch_limit=0
*._db_block_prefetch_quota=0
*._db_file_noncontig_mblock_read_count=0
*._disk_sector_size_override=TRUE
*._fix_control='5099019:ON','5705630:ON','6055658:OFF','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6972291:ON','8937971:ON','9196440:ON','9495669:ON','13077335:ON'#SAP_112031_201202 RECOMMENDED SETTINGS
*._mutex_wait_scheme=1#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._mutex_wait_time=10#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optim_peek_user_binds=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_adaptive_cursor_sharing=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_extended_cursor_sharing_rel='NONE'#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_use_feedback=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*.audit_file_dest='/oracle/SLOB/saptrace/audit'
*.compatible='11.2.0.2'
*.control_file_record_keep_time=30
*.control_files='/oracle/SLOB/cntrl/cntrlSLOB.dbf','/oracle/SLOB/cntrlm/cntrlSLOB.dbf'
*.cpu_count=1
*.db_block_size=8192
*.db_cache_size=34359738368
*.db_files=500
*.db_name='SLOB'
*.db_recovery_file_dest_size=8589934592
*.db_recovery_file_dest='/oracle/SLOB/fra'
*.db_writer_processes=2
*.diagnostic_dest='/oracle/SLOB/saptrace'
*.event='10027','10028','10142','10183','10191','10995 level 2','38068 level 100','38085','38087','44951 level 1024'#SAP_112030_201112 RECOMMENDED SETTINGS
*.filesystemio_options='SETALL'
*.local_listener='LISTENER_SLOB'
*.log_archive_dest_1='LOCATION=/oracle/SLOB/arch/SLOBarch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=268427264
*.log_checkpoint_timeout=99999999
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=1600
*.optimizer_dynamic_sampling=6
*.parallel_execution_message_size=16384
*.parallel_max_servers=0
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=10737418240
*.processes=800
*.query_rewrite_enabled='FALSE'
*.recyclebin='off'
*.remote_login_passwordfile='EXCLUSIVE'
*.remote_os_authent=TRUE#SAP note 1431798
*.replication_dependency_tracking=FALSE
*.resource_manager_plan=''
*.sessions=800
*.sga_max_size=68719476736
*.shared_pool_size=5242880000
*.star_transformation_enabled='TRUE'
*.undo_retention=432000
*.undo_tablespace='PSAPUNDO'

Read/Write Performance

SLOB.__db_cache_size=34359738368
SLOB.__oracle_base='/oracle/SLOB'#ORACLE_BASE set from environment
SLOB.__shared_pool_size=704643072
*._db_block_prefetch_limit=0
*._db_block_prefetch_quota=0
*._db_file_noncontig_mblock_read_count=0
*._disk_sector_size_override=TRUE
*._fix_control='5099019:ON','5705630:ON','6055658:OFF','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6972291:ON','8937971:ON','9196440:ON','9495669:ON','13077335:ON'#SAP_112031_201202 RECOMMENDED SETTINGS
*._mutex_wait_scheme=1#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._mutex_wait_time=10#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optim_peek_user_binds=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_adaptive_cursor_sharing=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_extended_cursor_sharing_rel='NONE'#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_use_feedback=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*.audit_file_dest='/oracle/SLOB/saptrace/audit'
*.compatible='11.2.0.2'
*.control_file_record_keep_time=30
*.control_files='/oracle/SLOB/cntrl/cntrlSLOB.dbf','/oracle/SLOB/cntrlm/cntrlSLOB.dbf'
*.cpu_count=1
*.db_block_size=8192
*.db_cache_size=134217728
*.db_files=500
*.db_name='SLOB'
*.db_recovery_file_dest_size=8589934592
*.db_recovery_file_dest='/oracle/SLOB/fra'
*.db_writer_processes=2
*.diagnostic_dest='/oracle/SLOB/saptrace'
*.event='10027','10028','10142','10183','10191','10995 level 2','38068 level 100','38085','38087','44951 level 1024'#SAP_112030_201112 RECOMMENDED SETTINGS
*.filesystemio_options='SETALL'
*.local_listener='LISTENER_SLOB'
*.log_archive_dest_1='LOCATION=/oracle/SLOB/arch/SLOBarch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=268427264
*.log_checkpoint_timeout=99999999
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=1600
*.optimizer_dynamic_sampling=6
*.parallel_execution_message_size=16384
*.parallel_max_servers=0
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=10737418240
*.processes=800
*.query_rewrite_enabled='FALSE'
*.recyclebin='off'
*.remote_login_passwordfile='EXCLUSIVE'
*.remote_os_authent=TRUE#SAP note 1431798
*.replication_dependency_tracking=FALSE
*.resource_manager_plan=''
*.sessions=800
*.sga_max_size=34359738368
*.shared_pool_size=5242880000
*.star_transformation_enabled='TRUE'
*.undo_retention=432000
*.undo_tablespace='PSAPUNDO'

Supplementary Data

I have removed the content of this section on 20130228 as it related to my previous results and not the updated results now contained in this posting.

Benchmarking filesystem performance for Oracle using SLOB (Part 1.5): Improved test scripts

UPDATE 20140227: I am leaving this post here for historical reference, but the version of SLOB I used while writing on it is now fully deprecated.  Please go directly to the distribution page for SLOB2 and use the information there to retrieve the latest version of SLOB and learn how to use it, as the usage and functionality have changed.  For additional tips, consider Yury’s “SLOB2 Kick start“.  Please do not attempt to use the scripts, instructions or techniques I have described here unless you are still using the old version of SLOB.  If you are still using the old version, you should not be.

 

Minor update 20130207: Adjusted read-write testing script for three redo logs, not two.

In Part 1 I described a starter methodology for using SLOB to benchmark filesystem performance for Oracle. After many SLOB runs and much valuable advice from Kevin Closson, I now have an improved method. My next post will contain the results of these tests, with all numbers scaled against RAW performance for the sake of comparison. Since my hardware and network don’t match yours, my specific numbers aren’t very relevant; but for a DBA or SA considering changing the filesystem they use for Oracle without a priori knowledge of how to best tune it, this will hopefully present a useful baseline.

Previously I was bouncing the database, dropping and recreating the SLOB users before every test run. Now I am initializing the test batches for each filesystem/volume by bouncing the database, creating the users and executing a throwaway SLOB run to prime the database.

Test Scripts

Below are the scripts I am using to test read-only, redo generation, and read-write performance. If you use any of them, edit the scripts to reflect your base directory (the oracle user’s home), SLOB install directory and desired location for SLOB AWR output. Before running any of the scripts, prepare your environment by creating a tablespace on each filesystem/device type that you wish to test. Name the tablespace something that will make it clear to you which filesystem you are testing. I used a single 16GB datafile for each tablespace. Each script runs SLOB setup.sh to create the users and then SLOB runit.sh to execute the tests. After an initial throwaway run to warm things up, they will run SLOB three times each with 8, 16, and 32 sessions. You can adjust the session count by changing the for loops. At the end of each script the awr.txt file generated by SLOB is renamed and moved to the $SLOBAWR directory, using the filename convention expected by the awr_info.sh script (found in the latest SLOB release.) All tests are performed in NOARCHIVELOG mode, don’t do that on a database that you need. The scripts assume that you are executing them from within the SLOB directory.

Read-Only Performance

Before running the read-only test, I created two 32G redo logs on raw devices. The read-only test should not hit redo at all so where you place them doesn’t really matter. For read-only testing you should use a very small db_cache_size, I used 64M. The small cache will make sure that reads are fulfilled through physical I/O rather than cached.

#!/bin/sh
#
# Usage: ./testread.sh FS RUNCOUNT
#
# Script assumes a tiny SGA to force physical reads
# Script will create SLOB schema users in tablespace named $FS, and run each test $RUNCOUNT times

TYPE=$1
NUMRUNS=$2
BASEDIR=/oracle/SLOB
SLOBDIR=$BASEDIR/SLOB
DROPUSERS=$SLOBDIR/drop_users.sql
AWRDIR=$SLOBDIR/SLOBAWR
COUNTER=1

if [ -z "$TYPE" -o -z "$NUMRUNS" ]; then
    echo "Usage: $0 FS RUNCOUNT"
    exit 1
fi

mkdir $AWRDIR >& /dev/null

echo "Starting SLOB read-only performance testing for $TYPE ($NUMRUNS runs)"

echo "Dropping existing users and bouncing database"

sqlplus -s / as sysdba<<EOF
@$DROPUSERS;
shutdown immediate;
startup;
exit;
EOF

echo "Setting up SLOB user schemata"
$SLOBDIR/setup.sh $TYPE 32

echo "Performing N+1 logswitch"
sqlplus -s / as sysdba<<EOF2
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
exit;
EOF2

echo "Throwaway SLOB run to prime database"
$SLOBDIR/runit.sh 0 8

for i in 8 16 32; do (
    while [ $COUNTER -le $NUMRUNS ]; do
        echo "Running SLOB for $i readers (run #$COUNTER)"

        $SLOBDIR/runit.sh 0 $i
        echo "Renaming AWR report"
        mv awr.txt $AWRDIR/SLOB-AWR-$TYPE-$COUNTER.0.$i
        COUNTER=$((COUNTER+1))
    done )
done

Redo Performance

Before running the redo generation test, I created two 32G redo logs the filesystem whose performance I wish to test. For redo testing you should use a large db_cache_size; I used 32G. I also increased log_buffer to reduce log buffer waits, but I have not been able to eliminate them entirely. SAP requires specific log_buffer settings so I don’t want to deviate from them too much, as I want this test to have at least some similarity to our production performance. Prior to each test run I performed N+1 log switches (where N is the number of configured redo logs) to clear out any pending redo.

One possible improvement others may wish to consider when testing redo generation performance would be to configure your UNDO tablespace to use the same filesystem type as your redo logs. I did not do so, so my max performance is somewhat constrained by writes to UNDO. Each test should be similarly constrained so I am not concerned about that at the moment.

#!/bin/sh
#
# Usage: ./testredo.sh FS RUNCOUNT
#
# Script assumes you have pre-created exactly two redo logs on the filesystem/device to be tested
# Script will create SLOB schema users in tablespace named $FS, and run each test $RUNCOUNT times

TYPE=$1
NUMRUNS=$2
BASEDIR=/oracle/SLOB
SLOBDIR=$BASEDIR/SLOB
DROPUSERS=$SLOBDIR/drop_users.sql
AWRDIR=$SLOBDIR/SLOBAWR
COUNTER=1

if [ -z "$TYPE" -o -z "$NUMRUNS" ]; then
    echo "Usage: $0 FS RUNCOUNT"
    exit 1
fi

mkdir $AWRDIR >& /dev/null

echo "Starting SLOB redo generation performance testing for $TYPE ($NUMRUNS runs)"
echo "Assuming two redo logs exist"

echo "Dropping existing users and bouncing database"

sqlplus -s / as sysdba<<EOF
@$DROPUSERS;
shutdown immediate;
startup;
exit;
EOF

echo "Setting up SLOB user schemata"
$SLOBDIR/setup.sh $TYPE 32

echo "Throwaway SLOB run to prime database"
$SLOBDIR/runit.sh 8 0

for i in 8 16 32; do (
    while [ $COUNTER -le $NUMRUNS ]; do
        echo "Running SLOB for $i writers (run #$COUNTER)"
        echo "Performing N+1 logswitch"
        sqlplus -s / as sysdba<<EOF2
        alter system switch logfile;
        alter system switch logfile;
        alter system switch logfile;
        exit;
EOF2

        $SLOBDIR/runit.sh $i 0 
        echo "Renaming AWR report"
        mv awr.txt $AWRDIR/SLOB-AWR-$TYPE-$COUNTER.$i.0
        COUNTER=$((COUNTER+1))
    done )
done

Read-Write Performance

Before running the read-write performance test, I replaced my redo logs with three 64M files on raw devices. I also decreased db_cache_size to 128M to help make sure reads are fulfilled from physical disk instead of cache. I left log_buffer as-is from the redo testing.

#!/bin/sh
#
# Usage: ./testrw.sh FS RUNCOUNT
#
# Script assumes you have pre-created exactly three small (64M) redo logs 
# Script will create SLOB schema users in tablespace named $FS, and run each test $RUNCOUNT times

TYPE=$1
NUMRUNS=$2
BASEDIR=/oracle/SLOB
SLOBDIR=$BASEDIR/SLOB
DROPUSERS=$SLOBDIR/drop_users.sql
AWRDIR=$SLOBDIR/SLOBAWR
COUNTER=1

if [ -z "$TYPE" -o -z "$NUMRUNS" ]; then
    echo "Usage: $0 FS RUNCOUNT"
    exit 1
fi

mkdir $AWRDIR >& /dev/null

echo "Starting SLOB read/write performance testing for $TYPE ($NUMRUNS runs)"
echo "Assuming three redo logs exist"

echo "Dropping existing users and bouncing database"

sqlplus -s / as sysdba<<EOF
@$DROPUSERS;
shutdown immediate;
startup;
exit;
EOF

echo "Setting up SLOB user schemata"
$SLOBDIR/setup.sh $TYPE 32

echo "Throwaway SLOB run to prime database"
$SLOBDIR/runit.sh 4 4

for i in 8 16 32; do (
    while [ $COUNTER -le $NUMRUNS ]; do
        echo "Running SLOB for $i read/write sessions (run #$COUNTER)"
        echo "Performing N+1 logswitch"
        sqlplus -s / as sysdba<<EOF2
        alter system switch logfile;
        alter system switch logfile;
        alter system switch logfile;
        alter system switch logfile;
        exit;
EOF2

        READERS=$((i/2))
        WRITERS=$READERS
        $SLOBDIR/runit.sh $WRITERS $READERS
        echo "Renaming AWR report"
        mv awr.txt $AWRDIR/SLOB-AWR-$TYPE-$COUNTER.$WRITERS.$READERS
        COUNTER=$((COUNTER+1))
    done )
done

Benchmarking filesystem performance for Oracle using SLOB

UPDATE 20140227: I am leaving this post here for historical reference, but the version of SLOB I used while writing on it is now fully deprecated.  Please go directly to the distribution page for SLOB2 and use the information there to retrieve the latest version of SLOB and learn how to use it, as the usage and functionality have changed.  For additional tips, consider Yury’s “SLOB2 Kick start“.  Please do not attempt to use the scripts, instructions or techniques I have described here unless you are still using the old version of SLOB.  If you are still using the old version, you should not be.

 

Update: Please see Part 1.5 for updated scripts and part 2 for results.

This post will cover techniques I’ve used to run SLOB (see http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/) to benchmark the performance of various filesystems and raw devices when used for Oracle datafiles and redo logs. I will not write about the actual results, since they’re only relevant to my specific hardware and software setup (spoiler alert: raw wins), but instead discuss the methodology and database parameters and configurations I used in hopes that they will help others to run their own tests, or that others will notice flaws I can remedy to produce more correct results. Likewise, this post is not about specific tuning parameters for individual filesystems, but instead about a way to run the tests to compare performance from one set of tuning parameters to another.

After you have downloaded and installed SLOB, you need to get a good idea about exactly what you want to test and which metrics you will look at to determine the results. In my case we have been having some OS issues resulting in unclean shutdowns that lead to long fsck times on our Oracle data volumes, so I am investigating alternative filesystems to try to find something with performance matching or beating ext3 that won’t be subject to fsck on boot. I also chose to run tests to compare the redo logging subsystem’s performance when redo is stored on various filesystems or raw devices.

So I am running three different tests:

  1. Pure read-only I/O performance
  2. Concurrent read-write I/O performance
  3. Write-only redo I/O performance

For each test I first needed to identify an appropriate metric. For read-only performance the obvious choice is physical reads per second. For concurrent read-write performance I measured the sum of physical reads per second and physical writes per second. For redo performance I measured redo generated per second.

After selecting your metrics you next need to determine how to configure the database to make sure you are testing what you wanted to test. To that end, I configured the database as described below. If you want to use SLOB to test other aspects of performance you need to monitor the wait events noted in your AWR reports to be sure that the database isn’t doing something you don’t really want to test. For example, if you are running a ton of SLOB tests overnight and the autotask window kicks in and the SQL Tuning Advisor or Segment Advisor start running, those will skew your results so you may wish to disable those tasks.

Setting Up Test Parameters

Each test requires a different set of initialization parameters and other configuration to isolate the desired variable (filesystem or raw device, in my case). I think the parameters I used are valid for the tests I ran, but I am very interested in any comments from others. For each of the various filesystems I wanted to test, the storage admin created a LUN, mounted it to the server, and created the filesystem (or configured the raw device). I put a separate tablespace on each LUN, each containing a single 10GB datafile.

Read-Only I/O Performance

This is the easiest item to force. I want the database to fulfill read requests from disk rather than from the buffer cache, so I simply took my existing baseline configuration (as used for our SAP systems) and set db_cache_size to 64M. With such a small buffer cache only minimal data will be cached, and the majority of reads will come from disk. You can confirm this in the SLOB-generated AWR report by verifying that the number of physical reads per second is relatively close to the number of logical reads per second. For example, if you show 20,000 logical reads per second and only 100 physical reads per second, you haven’t done it right as most gets are coming from cache. You may need to lie to Oracle about your CPU_COUNT to get your buffer cache small enough to force physical read I/O.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user read sessions and zero writers.

Concurrent Read-Write I/O Performance

This one took a lot of effort for me to come up with a setup that measured what I wanted. Initial efforts showed significant waits on log buffer space or cache buffers chains and a lot of redo generation without corresponding physical writes so I had to tweak things until I found a setup that (mostly) produced AWR reports with “db file sequential read”, “db file parallel read”, “write complete waits” and “DB CPU” within the top waits.

I eventually settled on a db_cache_size of 128M to force read I/O to physical while allowing writes to occur without waiting on cache buffer chains. I set log_buffer=1073741824 to reduce log buffer waits on writes, though Oracle seems to have limited the log buffer to 353,560K as reported in the AWR reports. I created three 64M redo logs on raw devices and ran the database in NOARCHIVELOG mode to force frequent checkpoints and corresponding physical writes to datafiles and exclude any I/O spent on archiving redo logs.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user sessions, with half of them as readers and half of them as writers.

Write-Only Redo I/O Performance

For this test I wanted to purely measure the amount of redo generated while excluding datafile checkpoints. I set db_cache_size to 32G to allow reads to come from cache and created three 4G redo log groups on each tested filesystem, intending the full test block to run without causing a log switch that would force a checkpoint.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user write sessions, with zero readers.

A Harness For SLOB

Setting up each SLOB run involves running the provided setup.sh script to create the users, schemata and data for each test, then running the provided runit.sh script with parameters to indicate the desired number of readers and writers. For example, to use 32 different users with their data in tablespace TABLESPACENAME, and then run SLOB with 16 writers and 16 readers, you would run:

./setup.sh TABLESPACENAME 32 ; ./runit.sh 16 16

After the run SLOB will produce an AWR report which you should review to see the results of your test run. SLOB also produces a drop_users.sql script to clear out the users generated by setup.sh, and you should run that script and re-create your users anytime you change the user count. A benchmark run only once has no validity, and a benchmark with nothing to compare to is useless, so you’ll want to create some scripts to run SLOB repeatedly, saving the AWR reports in between so that you can review the overall results.

Here is the content of the script I used to automate individual SLOB runs, with discussion after:

#!/bin/sh
# harness to run SLOB

FS=$1
NUMUSERS=$2
NUMWRITERS=$3
NUMREADERS=$4
RUNNUM=$5
RUNDIR="run${RUNNUM}"

echo "Dropping users and bouncing DB"

sqlplus -s / as sysdba<<EOF
@drop_users
alter system switch logfile;
alter system checkpoint;
shutdown immediate;
exit;
EOF

rm /oracle/SLOB/arch/SLOB*dbf

sqlplus -s / as sysdba<<EOF2
startup;
exit;
EOF2

echo "Setting up users for $FS $NUMUSERS"

./setup.sh $FS $NUMUSERS

echo "Running SLOB: $FS $NUMUSERS $NUMWRITERS $NUMREADERS (run $RUNNUM)"

./runit.sh $NUMWRITERS $NUMREADERS

echo "Renaming AWR report"

mv awr.txt $RUNDIR/SLOB-AWR-$FS-$NUMUSERS-$NUMWRITERS-$NUMREADERS-$RUNDIR.txt

To provide a consistent environment for each test, I drop the previous test’s users, force a redo log switch and checkpoint, and then bounce the database. I also remove any archived logs generated by the previous run so I don’t run out of space (this is a test DB I don’t need, don’t do that on any DB you can’t afford to lose). This script takes five arguments: the tablespace name, the number of concurrent users, the number of writers, the number of readers, and the run number for each combination. I’ve named my tablespaces after the filesystem (“BTRFS”, “EXT3”, “EXT4”, “XFS”, “RAWTS”, and so on) so when it calls setup.sh the user data is created within the appropriate tablespace on the filesystem to be tested. The last line renames SLOB’s awr.txt to reflect the filesystem, user count, number of readers/writers and run number.

I save this script as mydoit.sh and create another script to call it repeatedly. This example is for the read-only testing:

#!/bin/sh
echo "Starting runs for 8 readers"
./mydoit.sh EXT3 8 0 8 1
./mydoit.sh EXT4 8 0 8 1
./mydoit.sh BTRFS 8 0 8 1
./mydoit.sh XFS 8 0 8 1
./mydoit.sh RAWTS 8 0 8 1

./mydoit.sh EXT3 8 0 8 2
./mydoit.sh EXT4 8 0 8 2
./mydoit.sh BTRFS 8 0 8 2
./mydoit.sh XFS 8 0 8 2
./mydoit.sh RAWTS 8 0 8 2

./mydoit.sh EXT3 8 0 8 3
./mydoit.sh EXT4 8 0 8 3
./mydoit.sh BTRFS 8 0 8 3
./mydoit.sh XFS 8 0 8 3
./mydoit.sh RAWTS 8 0 8 3

echo "Starting runs for 16 readers"
./mydoit.sh EXT3 16 0 16 1
./mydoit.sh EXT4 16 0 16 1
./mydoit.sh BTRFS 16 0 16 1
./mydoit.sh XFS 16 0 16 1
./mydoit.sh RAWTS 16 0 16 1

./mydoit.sh EXT3 16 0 16 2
./mydoit.sh EXT4 16 0 16 2
./mydoit.sh BTRFS 16 0 16 2
./mydoit.sh XFS 16 0 16 2
./mydoit.sh RAWTS 16 0 16 2

./mydoit.sh EXT3 16 0 16 3
./mydoit.sh EXT4 16 0 16 3
./mydoit.sh BTRFS 16 0 16 3
./mydoit.sh XFS 16 0 16 3
./mydoit.sh RAWTS 16 0 16 3

# and so on for 32 concurrent sessions...

After this script runs I have a pile of AWR reports to review and compare their read performance. For each metric I’m interested in I saved the results from each run into a spreadsheet and generated average results for each set. That gives me an average number across three runs for each filesystem’s physical read I/O per second for 8 concurrent users, 16 concurrent users, and 32 concurrent users. Similar scripts run for the redo-only testing and for the read/write testing, with the results logged in the same way. I then generate charts within the spreadsheet to visually compare the results. Some performance trends are very obvious visually and we are already following up with our OS vendor for some more information.