docs/source/tuto_disk/analysis.org

   1 #+TITLE: Modeling I/O: the realistic way
   2 #+AUTHOR: The SimGrid Team
   3 #+OPTIONS: toc:nil title:nil author:nil num:nil
   4
   5 * Modeling I/O: the realistic way
   6 :PROPERTIES:
   7 :CUSTOM_ID: howto_disk
   8 :END:
   9
  10 ** Introduction
  11
  12  This tutorial presents how to perform faithful IO experiments in
  13  SimGrid. It is based on the paper "Adding Storage Simulation
  14  Capacities to the SimGridToolkit: Concepts, Models, and API".
  15
  16  The paper presents a series of experiments to analyze the performance
  17  of IO operations (read/write) on different kinds of disks (SATA, SAS,
  18  SSD). In this tutorial, we present a detailed example of how to
  19  extract experimental data to simulate: i) performance degradation
  20  with concurrent operations (Fig. 8 in the paper) and ii) variability
  21  in IO operations (Fig. 5 to 7).
  22
  23  - Link for paper: https://hal.inria.fr/hal-01197128
  24  - Link for data: https://figshare.com/articles/dataset/Companion_of_the_SimGrid_storage_modeling_article/1175156
  25
  26  *Disclaimer*:
  27 - The purpose of this document is to illustrate how we can
  28  extract data from experiments and inject on SimGrid. However, the
  29  data shown on this page may *not* reflect the reality.
  30 - You must run similar experiments on your hardware to get realistic
  31   data for your context.
  32 - SimGrid has been in active development since the paper release in
  33   2015, thus the XML description used in the paper may have evolved
  34   while MSG was superseeded by S4U since then.
  35
  36 *** Running this tutorial
  37
  38  A Dockerfile is available in =docs/source/tuto_disk=. It allows you to
  39  re-run this tutorial. For that, build the image and run the container:
  40  - =docker build -t tuto_disk .=
  41  - =docker run -it tuto_disk=
  42
  43 ** Analyzing the experimental data
  44  We start by analyzing and extracting the real data available.
  45 *** Scripts
  46
  47  We use a special method to create non-uniform histograms to represent
  48  the noise in IO operations.
  49
  50  Unable to install the library properly, I copied the important methods
  51  here.
  52
  53  Copied from: https://rdrr.io/github/dlebauer/pecan-priors/src/R/plots.R
  54
  55  #+begin_src R :results output :session *R* :exports none
  56 #' Variable-width (dagonally cut) histogram
  57 #'
  58 #'
  59 #' When constructing a histogram, it is common to make all bars the same width.
  60 #' One could also choose to make them all have the same area.
  61 #' These two options have complementary strengths and weaknesses; the equal-width histogram oversmooths in regions of high density, and is poor at identifying sharp peaks; the equal-area histogram oversmooths in regions of low density, and so does not identify outliers.
  62 #' We describe a compromise approach which avoids both of these defects. We regard the histogram as an exploratory device, rather than as an estimate of a density.
  63 #' @title Diagonally Cut Histogram
  64 #' @param x is a numeric vector (the data)
  65 #' @param a is the scaling factor, default is 5 * IQR
  66 #' @param nbins is the number of bins, default is assigned by the Stuges method
  67 #' @param rx  is the range used for the left of the left-most bin to the right of the right-most bin
  68 #' @param eps used to set artificial bound on min width / max height of bins as described in Denby and Mallows (2009) on page 24.
  69 #' @param xlab is label for the x axis
  70 #' @param plot = TRUE produces the plot, FALSE returns the heights, breaks and counts
  71 #' @param lab.spikes = TRUE labels the \% of data in the spikes
  72 #' @return list with two elements, heights of length n and breaks of length n+1 indicating the heights and break points of the histogram bars.
  73 #' @author Lorraine Denby, Colin Mallows
  74 #' @references Lorraine Denby, Colin Mallows. Journal of Computational and Graphical Statistics. March 1, 2009, 18(1): 21-31. doi:10.1198/jcgs.2009.0002.
  75  dhist<-function(x, a=5*iqr(x),
  76                  nbins=nclass.Sturges(x), rx = range(x,na.rm = TRUE),
  77                  eps=.15, xlab = "x", plot = TRUE,lab.spikes = TRUE)
  78  {
  79
  80    if(is.character(nbins))
  81      nbins <- switch(casefold(nbins),
  82                      sturges = nclass.Sturges(x),
  83                      fd = nclass.FD(x),
  84                      scott = nclass.scott(x),
  85                      stop("Nclass method not recognized"))
  86    else if(is.function(nbins))
  87      nbins <- nbins(x)
  88
  89    x <- sort(x[!is.na(x)])
  90    if(a == 0)
  91      a <- diff(range(x))/100000000
  92    if(a != 0 & a != Inf) {
  93      n <- length(x)
  94      h <- (rx[2] + a - rx[1])/nbins
  95      ybr <- rx[1] + h * (0:nbins)
  96      yupper <- x + (a * (1:n))/n
  97                                          # upper and lower corners in the ecdf
  98      ylower <- yupper - a/n
  99                                          #
 100      cmtx <- cbind(cut(yupper, breaks = ybr), cut(yupper, breaks =
 101                                  ybr, left.include = TRUE), cut(ylower, breaks = ybr),
 102                    cut(ylower, breaks = ybr, left.include = TRUE))
 103      cmtx[1, 3] <- cmtx[1, 4] <- 1
 104                                          # to replace NAs when default r is used
 105      cmtx[n, 1] <- cmtx[n, 2] <- nbins
 106                                          #
 107                                          #checksum <- apply(cmtx, 1, sum) %% 4
 108      checksum <- (cmtx[, 1] + cmtx[, 2] + cmtx[, 3] + cmtx[, 4]) %%
 109      4
 110                                          # will be 2 for obs. that straddle two bins
 111      straddlers <- (1:n)[checksum == 2]
 112                                          # to allow for zero counts
 113      if(length(straddlers) > 0) {
 114        counts <- table(c(1:nbins, cmtx[ - straddlers, 1]))
 115      } else {
 116        counts <- table(c(1:nbins, cmtx[, 1]))
 117      }
 118      counts <- counts - 1
 119                                          #
 120      if(length(straddlers) > 0) {
 121        for(i in straddlers) {
 122          binno <- cmtx[i, 1]
 123          theta <- ((yupper[i] - ybr[binno]) * n)/a
 124          counts[binno - 1] <- counts[binno - 1] + (
 125                                                    1 - theta)
 126          counts[binno] <- counts[binno] + theta
 127        }
 128      }
 129      xbr <- ybr
 130      xbr[-1] <- ybr[-1] - (a * cumsum(counts))/n
 131      spike<-eps*diff(rx)/nbins
 132      flag.vec<-c(diff(xbr)<spike,F)
 133      if ( sum(abs(diff(xbr))<=spike) >1) {
 134        xbr.new<-xbr
 135        counts.new<-counts
 136        diff.xbr<-abs(diff(xbr))
 137        amt.spike<-diff.xbr[length(diff.xbr)]
 138        for (i in rev(2:length(diff.xbr))) {
 139          if (diff.xbr[i-1]<=spike&diff.xbr[i]<=spike&
 140              !is.na(diff.xbr[i])) {
 141            amt.spike<-amt.spike+diff.xbr[i-1]
 142            counts.new[i-1]<-counts.new[i-1]+counts.new[i]
 143            xbr.new[i]<-NA
 144            counts.new[i]<-NA
 145            flag.vec[i-1]<-T
 146          }
 147          else amt.spike<-diff.xbr[i-1]
 148        }
 149        flag.vec<-flag.vec[!is.na(xbr.new)]
 150        flag.vec<-flag.vec[-length(flag.vec)]
 151        counts<-counts.new[!is.na(counts.new)]
 152        xbr<-xbr.new[!is.na(xbr.new)]
 153
 154      }
 155      else flag.vec<-flag.vec[-length(flag.vec)]
 156      widths <- abs(diff(xbr))
 157      ## N.B. argument "widths" in barplot must be xbr
 158      heights <- counts/widths
 159    }
 160    bin.size <- length(x)/nbins
 161    cut.pt <- unique(c(min(x) - abs(min(x))/1000,
 162                       approx(seq(length(x)), x, (1:(nbins - 1)) * bin.size, rule = 2)$y, max(x)))
 163    aa <- hist(x, breaks = cut.pt, plot = FALSE, probability = TRUE)
 164    if(a == Inf) {
 165      heights <- aa$counts
 166      xbr <- aa$breaks
 167    }
 168    amt.height<-3
 169    q75<-quantile(heights,.75)
 170    if (sum(flag.vec)!=0) {
 171      amt<-max(heights[!flag.vec])
 172      ylim.height<-amt*amt.height
 173      ind.h<-flag.vec&heights> ylim.height
 174      flag.vec[heights<ylim.height*(amt.height-1)/amt.height]<-F
 175      heights[ind.h] <- ylim.height
 176    }
 177    amt.txt<-0
 178    end.y<-(-10000)
 179    if(plot) {
 180      barplot(heights, abs(diff(xbr)), space = 0, density = -1, xlab =
 181              xlab, plot = TRUE, xaxt = "n",yaxt='n')
 182      at <- pretty(xbr)
 183      axis(1, at = at - xbr[1], labels = as.character(at))
 184      if (lab.spikes) {
 185        if (sum(flag.vec)>=1) {
 186          usr<-par('usr')
 187          for ( i in seq(length(xbr)-1)) {
 188            if (!flag.vec[i]) {
 189              amt.txt<-0
 190              if (xbr[i]-xbr[1]<end.y) amt.txt<-1
 191            }
 192            else {
 193              amt.txt<-amt.txt+1
 194              end.y<-xbr[i]-xbr[1]+3*par('cxy')[1]
 195            }
 196            if (flag.vec[i]) {
 197              txt<-paste(' ',format(round(counts[i]/
 198                                          sum(counts)*100)),'%',sep='')
 199              par(xpd = TRUE)
 200              text(xbr[i+1]-xbr[1],ylim.height-par('cxy')[2]*(amt.txt-1),txt, adj=0)
 201            }}
 202        }
 203        else print('no spikes or more than one spike')
 204      }
 205      invisible(list(heights = heights, xbr = xbr))
 206    }
 207    else {
 208      return(list(heights = heights, xbr = xbr,counts=counts))
 209    }
 210  }
 211
 212 #' Calculate interquartile range
 213 #'
 214 #' Calculates the 25th and 75th quantiles given a vector x; used in function \link{dhist}.
 215 #' @title Interquartile range
 216 #' @param x vector
 217 #' @return numeric vector of length 2, with the 25th and 75th quantiles of input vector x.
 218  iqr<-function(x){
 219    return(diff(quantile(x, c(0.25, 0.75), na.rm = TRUE)))
 220  }
 221
 222  #+end_src
 223
 224 *** Data preparation
 225
 226  Some initial configurations/list of packages.
 227
 228  #+begin_src R :results output :session *R* :exports both
 229  library(jsonlite)
 230  library(ggplot2)
 231  library(plyr)
 232  library(dplyr)
 233  library(gridExtra)
 234
 235  IO_INFO = list()
 236  #+end_src
 237
 238  This was copied from the =sg_storage_ccgrid15.org= available at the
 239  figshare of the paper. Before executing this code, please download and
 240  decompress the appropriate file.
 241
 242  #+begin_src sh :results output :exports both
 243  curl -O -J -L "https://ndownloader.figshare.com/files/1928095"
 244  tar xfz bench.tgz
 245  #+end_src
 246
 247  Preparing data for varialiby analysis.
 248
 249  #+BEGIN_SRC R :session :results output :export none
 250
 251  clean_up <- function (df, infra){
 252  names(df) <- c("Hostname","Date","DirectIO","IOengine","IOscheduler","Error","Operation","Jobs","BufferSize","FileSize","Runtime","Bandwidth","BandwidthMin","BandwidthMax","Latency", "LatencyMin", "LatencyMax","IOPS")
 253  df=subset(df,Error=="0")
 254  df=subset(df,DirectIO=="1")
 255  df <- merge(df,infra,by="Hostname")
 256  df$Hostname = sapply(strsplit(df$Hostname, "[.]"), "[", 1)
 257  df$HostModel = paste(df$Hostname, df$Model, sep=" - ")
 258  df$Duration = df$Runtime/1000 # fio outputs runtime in msec, we want to display seconds
 259  df$Size = df$FileSize/1024/1024
 260  df=subset(df,Duration!=0.000)
 261  df$Bwi=df$Duration/df$Size
 262  df[df$Operation=="read",]$Operation<- "Read"
 263  df[df$Operation=="write",]$Operation<- "Write"
 264  return(df)
 265  }
 266
 267  grenoble <- read.csv('./bench/grenoble.csv', header=FALSE,sep = ";",
 268  stringsAsFactors=FALSE)
 269  luxembourg <- read.csv('./bench/luxembourg.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 270  nancy <- read.csv('./bench/nancy.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 271  all <- rbind(grenoble,nancy, luxembourg)
 272  infra <- read.csv('./bench/infra.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 273  names(infra) <- c("Hostname","Model","DiskSize")
 274
 275  all = clean_up(all, infra)
 276  griffon = subset(all,grepl("^griffon", Hostname))
 277  griffon$Cluster <-"Griffon (SATA II)"
 278  edel = subset(all,grepl("^edel", Hostname))
 279  edel$Cluster<-"Edel (SSD)"
 280
 281  df = rbind(griffon[griffon$Jobs=="1" & griffon$IOscheduler=="cfq",],
 282             edel[edel$Jobs=="1" & edel$IOscheduler=="cfq",])
 283  #Get rid off of 64 Gb disks of Edel as they behave differently (used to be "edel-51")
 284  df = df[!(grepl("^Edel",df$Cluster) & df$DiskSize=="64 GB"),]
 285  #+END_SRC
 286
 287  Preparing data for concurrent analysis.
 288  #+begin_src R :results output :session *R* :exports both
 289    dfc = rbind(griffon[griffon$Jobs>1 & griffon$IOscheduler=="cfq",],
 290               edel[edel$Jobs>1 & edel$IOscheduler=="cfq",])
 291    dfc2 = rbind(griffon[griffon$Jobs==1 & griffon$IOscheduler=="cfq",],
 292               edel[edel$Jobs==1 & edel$IOscheduler=="cfq",])
 293    dfc = rbind(dfc,dfc2[sample(nrow(dfc2),size=200),])
 294
 295    dd <- data.frame(
 296          Hostname="??",
 297          Date = NA, #tmpl$Date,
 298          DirectIO = NA,
 299          IOengine = NA,
 300          IOscheduler = NA,
 301          Error = 0,
 302          Operation = NA, #tmpl$Operation,
 303          Jobs = NA, # #d$nb.of.concurrent.access,
 304          BufferSize = NA, #d$bs,
 305          FileSize = NA, #d$size,
 306          Runtime = NA,
 307          Bandwidth = NA,
 308          BandwidthMin = NA,
 309          BandwidthMax = NA,
 310          Latency = NA,
 311          LatencyMin = NA,
 312          LatencyMax = NA,
 313          IOPS = NA,
 314          Model = NA, #tmpl$Model,
 315          DiskSize = NA, #tmpl$DiskSize,
 316          HostModel = NA,
 317          Duration = NA, #d$time,
 318          Size = NA,
 319          Bwi = NA,
 320          Cluster = NA) #tmpl$Cluster)
 321
 322    dd$Size = dd$FileSize/1024/1024
 323    dd$Bwi = dd$Duration/dd$Size
 324
 325    dfc = rbind(dfc, dd)
 326    # Let's get rid of small files!
 327    dfc = subset(dfc,Size >= 10)
 328    # Let's get rid of 64Gb edel disks
 329    dfc = dfc[!(grepl("^Edel",dfc$Cluster) & dfc$DiskSize=="64 GB"),]
 330
 331    dfc$TotalSize=dfc$Size * dfc$Jobs
 332    dfc$BW = (dfc$TotalSize) / dfc$Duration
 333    dfc = dfc[dfc$BW>=20,] # get rid of one point that is typically an outlier and does not make sense
 334
 335    dfc$method="lm"
 336    dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Read",]$method="loess"
 337
 338    dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method="lm"
 339    dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
 340
 341    dfc[dfc$Cluster=="Griffon (SATA II)" & dfc$Operation=="Write",]$method="lm"
 342    dfc[dfc$Cluster=="Griffon (SATA II)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
 343
 344    dfd = dfc[dfc$Operation=="Write" & dfc$Jobs ==1 &
 345              (dfc$Cluster %in% c("Griffon (SATA II)", "Edel (SSD)")),]
 346    dfd = ddply(dfd,c("Cluster","Operation","Jobs","DiskSize"), summarize,
 347                mean = mean(BW), num = length(BW), sd = sd(BW))
 348    dfd$BW=dfd$mean
 349    dfd$ci = 2*dfd$sd/sqrt(dfd$num)
 350
 351    dfrange=ddply(dfc,c("Cluster","Operation","DiskSize"), summarize,
 352                max = max(BW))
 353    dfrange=ddply(dfrange,c("Cluster","DiskSize"), mutate,
 354                BW = max(max))
 355    dfrange$Jobs=16
 356
 357  #+end_src
 358
 359 *** Griffon (SATA)
 360 **** Modeling resource sharing w/ concurrent access
 361
 362  This figure presents the overall performance of IO operation with
 363  concurrent access to the disk. Note that the image is different
 364  from the one in the paper. Probably, we need to further clean the
 365  available data to obtain exaclty the same results.
 366
 367  #+begin_src R :results output graphics :file fig/griffon_deg.png :exports both :width 600 :height 400 :session *R*
 368    ggplot(data=dfc,aes(x=Jobs,y=BW, color=Operation)) + theme_bw() +
 369      geom_point(alpha=.3) +
 370      geom_point(data=dfrange, size=0) +
 371      facet_wrap(Cluster~Operation,ncol=2,scale="free_y")+ # ) + #
 372      geom_smooth(data=dfc[dfc$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
 373      geom_smooth(data=dfc[dfc$method=="lm",], color="black", method=lm,se=TRUE) +
 374      geom_point(data=dfd, aes(x=Jobs,y=BW),color="black",shape=21,fill="white") +
 375      geom_errorbar(data=dfd, aes(x=Jobs, ymin=BW-ci, ymax=BW+ci),color="black",width=.6) +
 376      xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)")  + guides(color=FALSE)  + xlim(0,NA) + ylim(0,NA)
 377
 378  #+end_src
 379
 380 ***** Read
 381  Getting read data for Griffon from 1 to 15 concurrent reads.
 382
 383  #+begin_src R :results output :session *R* :exports both
 384  deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read")
 385  model = lm(BW~Jobs, data = deg_griffon)
 386  IO_INFO[["griffon"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
 387
 388  toJSON(IO_INFO, pretty = TRUE)
 389  #+end_src
 390
 391
 392 ***** Write
 393
 394  Same for write operations.
 395
 396  #+begin_src R :results output :session *R* :exports both
 397  deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
 398  mean_job_1 = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
 399  model = lm(BW~Jobs, data = deg_griffon)
 400  IO_INFO[["griffon"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
 401  toJSON(IO_INFO, pretty = TRUE)
 402  #+end_src
 403
 404
 405 **** Modeling read/write bandwidth variability
 406
 407  Fig.5 in the paper presents the noise in the read/write operations in
 408  the Griffon SATA disk.
 409
 410  The paper uses regular histogram to illustrate the distribution of the
 411  effective bandwidth. However, in this tutorial, we use dhist
 412  (https://rdrr.io/github/dlebauer/pecan-priors/man/dhist.html) to have a
 413  more precise information over the highly dense areas around the mean.
 414
 415 ***** Read
 416  First, we present the histogram for read operations.
 417  #+begin_src R :results output graphics :file fig/griffon_read_dhist.png :exports both :width 600 :height 400 :session *R*
 418  griffon_read = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
 419  dhist(1/griffon_read$Bwi)
 420  #+end_src
 421
 422  Saving it to be exported in json format.
 423
 424  #+begin_src R :results output :session *R* :exports both
 425  griffon_read_dhist = dhist(1/griffon_read$Bwi, plot=FALSE)
 426  IO_INFO[["griffon"]][["noise"]][["read"]] = c(breaks=list(griffon_read_dhist$xbr), heights=list(unclass(griffon_read_dhist$heights)))
 427  IO_INFO[["griffon"]][["read_bw"]] = mean(1/griffon_read$Bwi)
 428  toJSON(IO_INFO, pretty = TRUE)
 429  #+end_src
 430
 431 ***** Write
 432
 433  Same analysis for write operations.
 434  #+begin_src R :results output graphics :file fig/griffon_write_dhist.png :exports both :width 600 :height 400 :session *R*
 435  griffon_write = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
 436  dhist(1/griffon_write$Bwi)
 437  #+end_src
 438
 439  #+begin_src R :results output :session *R* :exports both
 440  griffon_write_dhist = dhist(1/griffon_write$Bwi, plot=FALSE)
 441  IO_INFO[["griffon"]][["noise"]][["write"]] = c(breaks=list(griffon_write_dhist$xbr), heights=list(unclass(griffon_write_dhist$heights)))
 442  IO_INFO[["griffon"]][["write_bw"]] = mean(1/griffon_write$Bwi)
 443  toJSON(IO_INFO, pretty = TRUE)
 444  #+end_src
 445
 446 *** Edel (SSD)
 447  This section presents the exactly same analysis for the Edel SSDs.
 448
 449 **** Modeling resource sharing w/ concurrent access
 450
 451 ***** Read
 452  Getting read data for Edel from 1 to 15 concurrent operations.
 453
 454  #+begin_src R :results output :session *R* :exports both
 455  deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read")
 456  model = loess(BW~Jobs, data = deg_edel)
 457  IO_INFO[["edel"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
 458  toJSON(IO_INFO, pretty = TRUE)
 459  #+end_src
 460
 461 ***** Write
 462
 463  Same for write operations.
 464
 465  #+begin_src R :results output :session *R* :exports both
 466  deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
 467  mean_job_1 = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
 468  model = lm(BW~Jobs, data = deg_edel)
 469  IO_INFO[["edel"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
 470  toJSON(IO_INFO, pretty = TRUE)
 471  #+end_src
 472
 473
 474 **** Modeling read/write bandwidth variability
 475
 476 ***** Read
 477
 478  #+begin_src R :results output graphics :file fig/edel_read_dhist.png :exports both :width 600 :height 400 :session *R*
 479  edel_read = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
 480  dhist(1/edel_read$Bwi)
 481  #+end_src
 482
 483  Saving it to be exported in json format.
 484
 485  #+begin_src R :results output :session *R* :exports both
 486  edel_read_dhist = dhist(1/edel_read$Bwi, plot=FALSE)
 487  IO_INFO[["edel"]][["noise"]][["read"]] = c(breaks=list(edel_read_dhist$xbr), heights=list(unclass(edel_read_dhist$heights)))
 488  IO_INFO[["edel"]][["read_bw"]] = mean(1/edel_read$Bwi)
 489  toJSON(IO_INFO, pretty = TRUE)
 490  #+end_src
 491
 492 ***** Write
 493  #+begin_src R :results output graphics :file fig/edel_write_dhist.png :exports both :width 600 :height 400 :session *R*
 494
 495  edel_write = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
 496  dhist(1/edel_write$Bwi)
 497  #+end_src
 498
 499  Saving it to be exported later.
 500  #+begin_src R :results output :session *R* :exports both
 501  edel_write_dhist = dhist(1/edel_write$Bwi, plot=FALSE)
 502  IO_INFO[["edel"]][["noise"]][["write"]] = c(breaks=list(edel_write_dhist$xbr), heights=list(unclass(edel_write_dhist$heights)))
 503  IO_INFO[["edel"]][["write_bw"]] = mean(1/edel_write$Bwi)
 504  toJSON(IO_INFO, pretty = TRUE)
 505  #+end_src
 506
 507 ** Exporting to JSON
 508  Finally, let's save it to a file to be opened by our simulator.
 509
 510  #+begin_src R :results output :session *R* :exports both
 511  json = toJSON(IO_INFO, pretty = TRUE)
 512  cat(json, file="IO_noise.json")
 513  #+end_src
 514
 515
 516 ** Injecting this data in SimGrid
 517
 518  To mimic this behavior in SimGrid, we use two features in the platform
 519  description: non-linear sharing policy and bandwidth factors. For more
 520  details, please see the source code in =tuto_disk.cpp=.
 521
 522 *** Modeling resource sharing w/ concurrent access
 523
 524  The =set_sharing_policy= method allows the user to set a callback to
 525  dynamically change the disk capacity. The callback is called each time
 526  SimGrid will share the disk between a set of I/O operations.
 527
 528  The callback has access to the number of activities sharing the
 529  resource and its current capacity. It must return the new resource's
 530  capacity.
 531
 532  #+begin_src C++ :results output :eval no :exports code
 533  static double disk_dynamic_sharing(double capacity, int n)
 534  {
 535     return capacity; //useless callback
 536  }
 537
 538  auto* disk = host->create_disk("dump", 1e6, 1e6);
 539  disk->set_sharing_policy(sg4::Disk::Operation::READ, sg4::Disk::SharingPolicy::NONLINEAR, &disk_dynamic_sharing);
 540  #+end_src
 541
 542
 543 *** Modeling read/write bandwidth variability
 544
 545  The noise in I/O operations can be obtained by applying a factor to
 546  the I/O bandwidth of the disk. This factor is applied when we update
 547  the remaining amount of bytes to be transferred, increasing or
 548  decreasing the effective disk bandwidth.
 549
 550  The =set_factor= method allows the user to set a callback to
 551  dynamically change the factor to be applied for each I/O operation.
 552  The callback has access to size of the operation and its type (read or
 553  write). It must return a multiply factor (e.g. 1.0 for doing nothing).
 554
 555  #+begin_src C++ :results output :eval no :exports code
 556  static double disk_variability(sg_size_t size, sg4::Io::OpType op)
 557  {
 558     return 1.0; //useless callback
 559  }
 560
 561  auto* disk = host->create_disk("dump", 1e6, 1e6);
 562  disk->set_factor_cb(&disk_variability);
 563  #+end_src
 564
 565
 566 *** Running our simulation
 567  The binary was compiled in the provided docker container.
 568
 569  #+begin_src sh :results output :exports both
 570  ./tuto_disk > ./simgrid_disk.csv
 571  #+end_src
 572
 573
 574 ** Analyzing the SimGrid results
 575
 576 The figure below presents the results obtained by SimGrid.
 577
 578 The experiment performs I/O operations, varying the number of
 579 concurrent operations from 1 to 15. We run only 20 simulations for
 580 each case.
 581
 582 We can see that the graphics are quite similar to the ones obtained in
 583 the real platform.
 584
 585  #+begin_src R :results output graphics :file fig/simgrid_results.png :exports both :width 600 :height 400 :session *R*
 586  sg_df = read.csv("./simgrid_disk.csv")
 587  sg_df = sg_df %>% group_by(disk, op, flows) %>% mutate(bw=((size*flows)/elapsed)/10^6, method=if_else(disk=="edel" & op=="read", "loess", "lm"))
 588  sg_dfd = sg_df %>% filter(flows==1 & op=="write") %>% group_by(disk, op, flows) %>% summarize(mean = mean(bw), sd = sd(bw), se=sd/sqrt(n()))
 589
 590  sg_df[sg_df$op=="write" & sg_df$flows ==1,]$method=""
 591
 592  ggplot(data=sg_df, aes(x=flows, y=bw, color=op)) + theme_bw() +
 593      geom_point(alpha=.3) +
 594      geom_smooth(data=sg_df[sg_df$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
 595      geom_smooth(data=sg_df[sg_df$method=="lm",], color="black", method=lm,se=TRUE) +
 596      geom_errorbar(data=sg_dfd, aes(x=flows, y=mean, ymin=mean-2*se, ymax=mean+2*se),color="black",width=.6) +
 597      facet_wrap(disk~op,ncol=2,scale="free_y")+ # ) + #
 598      xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)")  + guides(color=FALSE)  + xlim(0,NA) + ylim(0,NA)
 599
 600  #+end_src
 601
 602 Note: The variability in griffon read operation seems to decrease when
 603 we have more concurrent operations. This is a particularity of the
 604 griffon read speed profile and the elapsed time calculation.
 605
 606 Given that:
 607 - Each point represents the time to perform the N I/O operations.
 608 - Griffon read speed decreases with the number of concurrent
 609   operations.
 610
 611 With 15 read operations:
 612 - At the beginning, every read gets the same bandwidth, about
 613   42MiB/s.
 614 - We sample the noise in I/O operations, some will be faster than
 615   others (e.g. factor > 1).
 616
 617 When the first read operation finish:
 618 - We will recalculate the bandwidth sharing, now considering that we
 619   have 14 active read operations. This will increase the bandwidth for
 620   each operation (about 44MiB/s).
 621 - The remaining "slower" activities will be speed up.
 622
 623 This behavior keeps happening until the end of the 15 operations,
 624 at each step, we speed up a little the slowest operations and
 625 consequently, decreasing the variability we see.