docs/source/tuto_disk/analysis.rst

   1
   2
   3
   4 .. _howto_disk:
   5
   6 Modeling I/O: the realistic way
   7 -------------------------------
   8
   9 Introduction
  10 ~~~~~~~~~~~~
  11
  12 This tutorial presents how to perform faithful IO experiments in
  13 SimGrid. It is based on the paper "Adding Storage Simulation
  14 Capacities to the SimGridToolkit: Concepts, Models, and API".
  15
  16 The paper presents a series of experiments to analyze the performance
  17 of IO operations (read/write) on different kinds of disks (SATA, SAS,
  18 SSD). In this tutorial, we present a detailed example of how to
  19 extract experimental data to simulate: i) performance degradation
  20 with concurrent operations (Fig. 8 in the paper) and ii) variability
  21 in IO operations (Fig. 5 to 7).
  22
  23 - Link for paper: `https://hal.inria.fr/hal-01197128 <https://hal.inria.fr/hal-01197128>`_
  24
  25 - Link for data: `https://figshare.com/articles/dataset/Companion_of_the_SimGrid_storage_modeling_article/1175156 <https://figshare.com/articles/dataset/Companion_of_the_SimGrid_storage_modeling_article/1175156>`_
  26
  27 **Disclaimer**:
  28
  29 - The purpose of this document is to illustrate how we can
  30   extract data from experiments and inject on SimGrid. However, the
  31   data shown on this page may **not** reflect the reality.
  32
  33 - You must run similar experiments on your hardware to get realistic
  34   data for your context.
  35
  36 - SimGrid has been in active development since the paper release in
  37   2015, thus the MSG and XML description used in the paper may have
  38   evolved and may not be available anymore.
  39
  40 Running this tutorial
  41 ^^^^^^^^^^^^^^^^^^^^^
  42
  43 A Dockerfile is available in ``docs/source/tuto_disk``. It allows you to
  44 re-run this tutorial. For that, build the image and run the container:
  45
  46 - ``docker build -t tuto_disk .``
  47
  48 - ``docker run -it tuto_disk``
  49
  50 Analyzing the experimental data
  51 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  52
  53 We start by analyzing and extracting the real data available.
  54
  55 Scripts
  56 ^^^^^^^
  57
  58 We use a special method to create non-uniform histograms to represent
  59 the noise in IO operations.
  60
  61 Unable to install the library properly, I copied the important methods
  62 here.
  63
  64 Copied from: `https://rdrr.io/github/dlebauer/pecan-priors/src/R/plots.R <https://rdrr.io/github/dlebauer/pecan-priors/src/R/plots.R>`_
  65
  66 Data preparation
  67 ^^^^^^^^^^^^^^^^
  68
  69 Some initial configurations/list of packages.
  70
  71 .. code:: R
  72
  73     library(jsonlite)
  74     library(ggplot2)
  75     library(plyr)
  76     library(dplyr)
  77     library(gridExtra)
  78
  79     IO_INFO = list()
  80
  81 ::
  82
  83
  84     Use suppressPackageStartupMessages() to eliminate package startup
  85     messages.
  86
  87     Attaching package: 'dplyr'
  88
  89     The following objects are masked from 'package:plyr':
  90
  91         arrange, count, desc, failwith, id, mutate, rename, summarise,
  92         summarize
  93
  94     The following objects are masked from 'package:stats':
  95
  96         filter, lag
  97
  98     The following objects are masked from 'package:base':
  99
 100         intersect, setdiff, setequal, union
 101
 102     Attaching package: 'gridExtra'
 103
 104     The following object is masked from 'package:dplyr':
 105
 106         combine
 107
 108 This was copied from the ``sg_storage_ccgrid15.org`` available at the
 109 figshare of the paper. Before executing this code, please download and
 110 decompress the appropriate file.
 111
 112 .. code:: sh
 113
 114     curl -O -J -L "https://ndownloader.figshare.com/files/1928095"
 115     tar xfz bench.tgz
 116
 117 Preparing data for varialiby analysis.
 118
 119 .. code:: R
 120
 121
 122     clean_up <- function (df, infra){
 123     names(df) <- c("Hostname","Date","DirectIO","IOengine","IOscheduler","Error","Operation","Jobs","BufferSize","FileSize","Runtime","Bandwidth","BandwidthMin","BandwidthMax","Latency", "LatencyMin", "LatencyMax","IOPS")
 124     df=subset(df,Error=="0")
 125     df=subset(df,DirectIO=="1")
 126     df <- merge(df,infra,by="Hostname")
 127     df$Hostname = sapply(strsplit(df$Hostname, "[.]"), "[", 1)
 128     df$HostModel = paste(df$Hostname, df$Model, sep=" - ")
 129     df$Duration = df$Runtime/1000 # fio outputs runtime in msec, we want to display seconds
 130     df$Size = df$FileSize/1024/1024
 131     df=subset(df,Duration!=0.000)
 132     df$Bwi=df$Duration/df$Size
 133     df[df$Operation=="read",]$Operation<- "Read"
 134     df[df$Operation=="write",]$Operation<- "Write"
 135     return(df)
 136     }
 137
 138     grenoble <- read.csv('./bench/grenoble.csv', header=FALSE,sep = ";",
 139     stringsAsFactors=FALSE)
 140     luxembourg <- read.csv('./bench/luxembourg.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 141     nancy <- read.csv('./bench/nancy.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 142     all <- rbind(grenoble,nancy, luxembourg)
 143     infra <- read.csv('./bench/infra.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 144     names(infra) <- c("Hostname","Model","DiskSize")
 145
 146     all = clean_up(all, infra)
 147     griffon = subset(all,grepl("^griffon", Hostname))
 148     griffon$Cluster <-"Griffon (SATA II)"
 149     edel = subset(all,grepl("^edel", Hostname))
 150     edel$Cluster<-"Edel (SSD)"
 151
 152     df = rbind(griffon[griffon$Jobs=="1" & griffon$IOscheduler=="cfq",],
 153                edel[edel$Jobs=="1" & edel$IOscheduler=="cfq",])
 154     #Get rid off of 64 Gb disks of Edel as they behave differently (used to be "edel-51")
 155     df = df[!(grepl("^Edel",df$Cluster) & df$DiskSize=="64 GB"),]
 156
 157 Preparing data for concurrent analysis.
 158
 159 .. code:: R
 160
 161     dfc = rbind(griffon[griffon$Jobs>1 & griffon$IOscheduler=="cfq",],
 162                edel[edel$Jobs>1 & edel$IOscheduler=="cfq",])
 163     dfc2 = rbind(griffon[griffon$Jobs==1 & griffon$IOscheduler=="cfq",],
 164                edel[edel$Jobs==1 & edel$IOscheduler=="cfq",])
 165     dfc = rbind(dfc,dfc2[sample(nrow(dfc2),size=200),])
 166
 167     dd <- data.frame(
 168           Hostname="??",
 169           Date = NA, #tmpl$Date,
 170           DirectIO = NA,
 171           IOengine = NA,
 172           IOscheduler = NA,
 173           Error = 0,
 174           Operation = NA, #tmpl$Operation,
 175           Jobs = NA, # #d$nb.of.concurrent.access,
 176           BufferSize = NA, #d$bs,
 177           FileSize = NA, #d$size,
 178           Runtime = NA,
 179           Bandwidth = NA,
 180           BandwidthMin = NA,
 181           BandwidthMax = NA,
 182           Latency = NA,
 183           LatencyMin = NA,
 184           LatencyMax = NA,
 185           IOPS = NA,
 186           Model = NA, #tmpl$Model,
 187           DiskSize = NA, #tmpl$DiskSize,
 188           HostModel = NA,
 189           Duration = NA, #d$time,
 190           Size = NA,
 191           Bwi = NA,
 192           Cluster = NA) #tmpl$Cluster)
 193
 194     dd$Size = dd$FileSize/1024/1024
 195     dd$Bwi = dd$Duration/dd$Size
 196
 197     dfc = rbind(dfc, dd)
 198     # Let's get rid of small files!
 199     dfc = subset(dfc,Size >= 10)
 200     # Let's get rid of 64Gb edel disks
 201     dfc = dfc[!(grepl("^Edel",dfc$Cluster) & dfc$DiskSize=="64 GB"),]
 202
 203     dfc$TotalSize=dfc$Size * dfc$Jobs
 204     dfc$BW = (dfc$TotalSize) / dfc$Duration
 205     dfc = dfc[dfc$BW>=20,] # get rid of one point that is typically an outlier and does not make sense
 206
 207     dfc$method="lm"
 208     dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Read",]$method="loess"
 209
 210     dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method="lm"
 211     dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
 212
 213     dfc[dfc$Cluster=="Griffon (SATA II)" & dfc$Operation=="Write",]$method="lm"
 214     dfc[dfc$Cluster=="Griffon (SATA II)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
 215
 216     dfd = dfc[dfc$Operation=="Write" & dfc$Jobs ==1 &
 217               (dfc$Cluster %in% c("Griffon (SATA II)", "Edel (SSD)")),]
 218     dfd = ddply(dfd,c("Cluster","Operation","Jobs","DiskSize"), summarize,
 219                 mean = mean(BW), num = length(BW), sd = sd(BW))
 220     dfd$BW=dfd$mean
 221     dfd$ci = 2*dfd$sd/sqrt(dfd$num)
 222
 223     dfrange=ddply(dfc,c("Cluster","Operation","DiskSize"), summarize,
 224                 max = max(BW))
 225     dfrange=ddply(dfrange,c("Cluster","DiskSize"), mutate,
 226                 BW = max(max))
 227     dfrange$Jobs=16
 228
 229 Griffon (SATA)
 230 ^^^^^^^^^^^^^^
 231
 232 Modeling resource sharing w/ concurrent access
 233 ::::::::::::::::::::::::::::::::::::::::::::::
 234
 235 This figure presents the overall performance of IO operation with
 236 concurrent access to the disk. Note that the image is different
 237 from the one in the paper. Probably, we need to further clean the
 238 available data to obtain exaclty the same results.
 239
 240 .. code:: R
 241
 242     ggplot(data=dfc,aes(x=Jobs,y=BW, color=Operation)) + theme_bw() +
 243       geom_point(alpha=.3) +
 244       geom_point(data=dfrange, size=0) +
 245       facet_wrap(Cluster~Operation,ncol=2,scale="free_y")+ # ) + #
 246       geom_smooth(data=dfc[dfc$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
 247       geom_smooth(data=dfc[dfc$method=="lm",], color="black", method=lm,se=TRUE) +
 248       geom_point(data=dfd, aes(x=Jobs,y=BW),color="black",shape=21,fill="white") +
 249       geom_errorbar(data=dfd, aes(x=Jobs, ymin=BW-ci, ymax=BW+ci),color="black",width=.6) +
 250       xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)")  + guides(color=FALSE)  + xlim(0,NA) + ylim(0,NA)
 251
 252 .. image:: fig/griffon_deg.png
 253
 254 Read
 255 ''''
 256
 257 Getting read data for Griffon from 1 to 15 concurrent reads.
 258
 259 .. code:: R
 260
 261     deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read")
 262     model = lm(BW~Jobs, data = deg_griffon)
 263     IO_INFO[["griffon"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
 264
 265     toJSON(IO_INFO, pretty = TRUE)
 266
 267 ::
 268
 269
 270     {
 271       "griffon": {
 272         "degradation": {
 273           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575]
 274         }
 275       }
 276     }
 277
 278 Write
 279 '''''
 280
 281 Same for write operations.
 282
 283 .. code:: R
 284
 285     deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
 286     mean_job_1 = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
 287     model = lm(BW~Jobs, data = deg_griffon)
 288     IO_INFO[["griffon"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
 289     toJSON(IO_INFO, pretty = TRUE)
 290
 291 ::
 292
 293
 294     {
 295       "griffon": {
 296         "degradation": {
 297           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 298           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 299         }
 300       }
 301     }
 302
 303 Modeling read/write bandwidth variability
 304 :::::::::::::::::::::::::::::::::::::::::
 305
 306 Fig.5 in the paper presents the noise in the read/write operations in
 307 the Griffon SATA disk.
 308
 309 The paper uses regular histogram to illustrate the distribution of the
 310 effective bandwidth. However, in this tutorial, we use dhist
 311 (`https://rdrr.io/github/dlebauer/pecan-priors/man/dhist.html <https://rdrr.io/github/dlebauer/pecan-priors/man/dhist.html>`_) to have a
 312 more precise information over the highly dense areas around the mean.
 313
 314 Read
 315 ''''
 316
 317 First, we present the histogram for read operations.
 318
 319 .. code:: R
 320
 321     griffon_read = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
 322     dhist(1/griffon_read$Bwi)
 323
 324 .. image:: fig/griffon_read_dhist.png
 325
 326 Saving it to be exported in json format.
 327
 328 .. code:: R
 329
 330     griffon_read_dhist = dhist(1/griffon_read$Bwi, plot=FALSE)
 331     IO_INFO[["griffon"]][["noise"]][["read"]] = c(breaks=list(griffon_read_dhist$xbr), heights=list(unclass(griffon_read_dhist$heights)))
 332     IO_INFO[["griffon"]][["read_bw"]] = mean(1/griffon_read$Bwi)
 333     toJSON(IO_INFO, pretty = TRUE)
 334
 335 ::
 336
 337     Warning message:
 338     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 339       argument 'probability' is not made use of
 340
 341     {
 342       "griffon": {
 343         "degradation": {
 344           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 345           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 346         },
 347         "noise": {
 348           "read": {
 349         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 350         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 351           }
 352         },
 353         "read_bw": [68.5425]
 354       }
 355     }
 356
 357 Write
 358 '''''
 359
 360 Same analysis for write operations.
 361
 362 .. code:: R
 363
 364     griffon_write = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
 365     dhist(1/griffon_write$Bwi)
 366
 367 .. image:: fig/griffon_write_dhist.png
 368
 369 .. code:: R
 370
 371     griffon_write_dhist = dhist(1/griffon_write$Bwi, plot=FALSE)
 372     IO_INFO[["griffon"]][["noise"]][["write"]] = c(breaks=list(griffon_write_dhist$xbr), heights=list(unclass(griffon_write_dhist$heights)))
 373     IO_INFO[["griffon"]][["write_bw"]] = mean(1/griffon_write$Bwi)
 374     toJSON(IO_INFO, pretty = TRUE)
 375
 376 ::
 377
 378     Warning message:
 379     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 380       argument 'probability' is not made use of
 381
 382     {
 383       "griffon": {
 384         "degradation": {
 385           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 386           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 387         },
 388         "noise": {
 389           "read": {
 390         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 391         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 392           },
 393           "write": {
 394         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 395         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 396           }
 397         },
 398         "read_bw": [68.5425],
 399         "write_bw": [50.6045]
 400       }
 401     }
 402
 403 Edel (SSD)
 404 ^^^^^^^^^^
 405
 406 This section presents the exactly same analysis for the Edel SSDs.
 407
 408 Modeling resource sharing w/ concurrent access
 409 ::::::::::::::::::::::::::::::::::::::::::::::
 410
 411 Read
 412 ''''
 413
 414 Getting read data for Edel from 1 to 15 concurrent operations.
 415
 416 .. code:: R
 417
 418     deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read")
 419     model = loess(BW~Jobs, data = deg_edel)
 420     IO_INFO[["edel"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
 421     toJSON(IO_INFO, pretty = TRUE)
 422
 423 ::
 424
 425
 426     {
 427       "griffon": {
 428         "degradation": {
 429           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 430           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 431         },
 432         "noise": {
 433           "read": {
 434         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 435         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 436           },
 437           "write": {
 438         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 439         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 440           }
 441         },
 442         "read_bw": [68.5425],
 443         "write_bw": [50.6045]
 444       },
 445       "edel": {
 446         "degradation": {
 447           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515]
 448         }
 449       }
 450     }
 451
 452 Write
 453 '''''
 454
 455 Same for write operations.
 456
 457 .. code:: R
 458
 459     deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
 460     mean_job_1 = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
 461     model = lm(BW~Jobs, data = deg_edel)
 462     IO_INFO[["edel"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
 463     toJSON(IO_INFO, pretty = TRUE)
 464
 465 ::
 466
 467
 468     {
 469       "griffon": {
 470         "degradation": {
 471           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 472           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 473         },
 474         "noise": {
 475           "read": {
 476         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 477         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 478           },
 479           "write": {
 480         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 481         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 482           }
 483         },
 484         "read_bw": [68.5425],
 485         "write_bw": [50.6045]
 486       },
 487       "edel": {
 488         "degradation": {
 489           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515],
 490           "write": [132.2771, 170.174, 170.137, 170.1, 170.063, 170.026, 169.9889, 169.9519, 169.9149, 169.8779, 169.8408, 169.8038, 169.7668, 169.7298, 169.6927]
 491         }
 492       }
 493     }
 494
 495 Modeling read/write bandwidth variability
 496 :::::::::::::::::::::::::::::::::::::::::
 497
 498 Read
 499 ''''
 500
 501 .. code:: R
 502
 503     edel_read = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
 504     dhist(1/edel_read$Bwi)
 505
 506 .. image:: fig/edel_read_dhist.png
 507
 508 Saving it to be exported in json format.
 509
 510 .. code:: R
 511
 512     edel_read_dhist = dhist(1/edel_read$Bwi, plot=FALSE)
 513     IO_INFO[["edel"]][["noise"]][["read"]] = c(breaks=list(edel_read_dhist$xbr), heights=list(unclass(edel_read_dhist$heights)))
 514     IO_INFO[["edel"]][["read_bw"]] = mean(1/edel_read$Bwi)
 515     toJSON(IO_INFO, pretty = TRUE)
 516
 517 ::
 518
 519     Warning message:
 520     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 521       argument 'probability' is not made use of
 522
 523     {
 524       "griffon": {
 525         "degradation": {
 526           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 527           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 528         },
 529         "noise": {
 530           "read": {
 531         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 532         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 533           },
 534           "write": {
 535         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 536         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 537           }
 538         },
 539         "read_bw": [68.5425],
 540         "write_bw": [50.6045]
 541       },
 542       "edel": {
 543         "degradation": {
 544           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515],
 545           "write": [132.2771, 170.174, 170.137, 170.1, 170.063, 170.026, 169.9889, 169.9519, 169.9149, 169.8779, 169.8408, 169.8038, 169.7668, 169.7298, 169.6927]
 546         },
 547         "noise": {
 548           "read": {
 549         "breaks": [104.1667, 112.3335, 120.5003, 128.6671, 136.8222, 144.8831, 149.6239, 151.2937, 154.0445, 156.3837, 162.3555, 170.3105, 178.3243],
 550         "heights": [0.1224, 0.1224, 0.1224, 0.2452, 1.2406, 61.6128, 331.2201, 167.6488, 212.1086, 31.3996, 2.3884, 1.747]
 551           }
 552         },
 553         "read_bw": [152.7139]
 554       }
 555     }
 556
 557 Write
 558 '''''
 559
 560 .. code:: R
 561
 562
 563     edel_write = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
 564     dhist(1/edel_write$Bwi)
 565
 566 .. image:: fig/edel_write_dhist.png
 567
 568 Saving it to be exported later.
 569
 570 .. code:: R
 571
 572     edel_write_dhist = dhist(1/edel_write$Bwi, plot=FALSE)
 573     IO_INFO[["edel"]][["noise"]][["write"]] = c(breaks=list(edel_write_dhist$xbr), heights=list(unclass(edel_write_dhist$heights)))
 574     IO_INFO[["edel"]][["write_bw"]] = mean(1/edel_write$Bwi)
 575     toJSON(IO_INFO, pretty = TRUE)
 576
 577 ::
 578
 579     Warning message:
 580     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 581       argument 'probability' is not made use of
 582
 583     {
 584       "griffon": {
 585         "degradation": {
 586           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 587           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 588         },
 589         "noise": {
 590           "read": {
 591         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 592         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 593           },
 594           "write": {
 595         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 596         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 597           }
 598         },
 599         "read_bw": [68.5425],
 600         "write_bw": [50.6045]
 601       },
 602       "edel": {
 603         "degradation": {
 604           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515],
 605           "write": [132.2771, 170.174, 170.137, 170.1, 170.063, 170.026, 169.9889, 169.9519, 169.9149, 169.8779, 169.8408, 169.8038, 169.7668, 169.7298, 169.6927]
 606         },
 607         "noise": {
 608           "read": {
 609         "breaks": [104.1667, 112.3335, 120.5003, 128.6671, 136.8222, 144.8831, 149.6239, 151.2937, 154.0445, 156.3837, 162.3555, 170.3105, 178.3243],
 610         "heights": [0.1224, 0.1224, 0.1224, 0.2452, 1.2406, 61.6128, 331.2201, 167.6488, 212.1086, 31.3996, 2.3884, 1.747]
 611           },
 612           "write": {
 613         "breaks": [70.9593, 79.9956, 89.0654, 98.085, 107.088, 115.9405, 123.5061, 127.893, 131.083, 133.6696, 135.7352, 139.5932, 147.4736],
 614         "heights": [0.2213, 0, 0.3326, 0.4443, 1.4685, 11.8959, 63.869, 110.286, 149.9741, 202.887, 80.8298, 9.0298]
 615           }
 616         },
 617         "read_bw": [152.7139],
 618         "write_bw": [131.7152]
 619       }
 620     }
 621
 622 Exporting to JSON
 623 ~~~~~~~~~~~~~~~~~
 624
 625 Finally, let's save it to a file to be opened by our simulator.
 626
 627 .. code:: R
 628
 629     json = toJSON(IO_INFO, pretty = TRUE)
 630     cat(json, file="IO_noise.json")
 631
 632 Injecting this data in SimGrid
 633 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 634
 635 To mimic this behavior in SimGrid, we use two features in the platform
 636 description: non-linear sharing policy and bandwidth factors. For more
 637 details, please see the source code in ``tuto_disk.cpp``.
 638
 639 Modeling resource sharing w/ concurrent access
 640 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 641
 642 The ``set_sharing_policy`` method allows the user to set a callback to
 643 dynamically change the disk capacity. The callback is called each time
 644 SimGrid will share the disk between a set of I/O operations.
 645
 646 The callback has access to the number of activities sharing the
 647 resource and its current capacity. It must return the new resource's
 648 capacity.
 649
 650 .. code:: C++
 651
 652     static double disk_dynamic_sharing(double capacity, int n)
 653     {
 654        return capacity; //useless callback
 655     }
 656
 657     auto* disk = host->create_disk("dump", 1e6, 1e6);
 658     disk->set_sharing_policy(sg4::Disk::Operation::READ, sg4::Disk::SharingPolicy::NONLINEAR, &disk_dynamic_sharing);
 659
 660 Modeling read/write bandwidth variability
 661 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 662
 663 The noise in I/O operations can be obtained by applying a factor to
 664 the I/O bandwidth of the disk. This factor is applied when we update
 665 the remaining amount of bytes to be transferred, increasing or
 666 decreasing the effective disk bandwidth.
 667
 668 The ``set_factor`` method allows the user to set a callback to
 669 dynamically change the factor to be applied for each I/O operation.
 670 The callback has access to size of the operation and its type (read or
 671 write). It must return a multiply factor (e.g. 1.0 for doing nothing).
 672
 673 .. code:: C++
 674
 675     static double disk_variability(sg_size_t size, sg4::Io::OpType op)
 676     {
 677        return 1.0; //useless callback
 678     }
 679
 680     auto* disk = host->create_disk("dump", 1e6, 1e6);
 681     disk->set_factor_cb(&disk_variability);
 682
 683 Running our simulation
 684 ^^^^^^^^^^^^^^^^^^^^^^
 685
 686 The binary was compiled in the provided docker container.
 687
 688 .. code:: sh
 689
 690     ./tuto_disk > ./simgrid_disk.csv
 691
 692 Analyzing the SimGrid results
 693 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 694
 695 The figure below presents the results obtained by SimGrid.
 696
 697 The experiment performs I/O operations, varying the number of
 698 concurrent operations from 1 to 15. We run only 20 simulations for
 699 each case.
 700
 701 We can see that the graphics are quite similar to the ones obtained in
 702 the real platform.
 703
 704 .. code:: R
 705
 706     sg_df = read.csv("./simgrid_disk.csv")
 707     sg_df = sg_df %>% group_by(disk, op, flows) %>% mutate(bw=((size*flows)/elapsed)/10^6, method=if_else(disk=="edel" & op=="read", "loess", "lm"))
 708     sg_dfd = sg_df %>% filter(flows==1 & op=="write") %>% group_by(disk, op, flows) %>% summarize(mean = mean(bw), sd = sd(bw), se=sd/sqrt(n()))
 709
 710     sg_df[sg_df$op=="write" & sg_df$flows ==1,]$method=""
 711
 712     ggplot(data=sg_df, aes(x=flows, y=bw, color=op)) + theme_bw() +
 713         geom_point(alpha=.3) +
 714         geom_smooth(data=sg_df[sg_df$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
 715         geom_smooth(data=sg_df[sg_df$method=="lm",], color="black", method=lm,se=TRUE) +
 716         geom_errorbar(data=sg_dfd, aes(x=flows, y=mean, ymin=mean-2*se, ymax=mean+2*se),color="black",width=.6) +
 717         facet_wrap(disk~op,ncol=2,scale="free_y")+ # ) + #
 718         xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)")  + guides(color=FALSE)  + xlim(0,NA) + ylim(0,NA)
 719
 720 .. image:: fig/simgrid_results.png
 721
 722 Note: The variability in griffon read operation seems to decrease when
 723 we have more concurrent operations. This is a particularity of the
 724 griffon read speed profile and the elapsed time calculation.
 725
 726 Given that:
 727
 728 - Each point represents the time to perform the N I/O operations.
 729
 730 - Griffon read speed decreases with the number of concurrent
 731   operations.
 732
 733 With 15 read operations:
 734
 735 - At the beginning, every read gets the same bandwidth, about
 736   42MiB/s.
 737
 738 - We sample the noise in I/O operations, some will be faster than
 739   others (e.g. factor > 1).
 740
 741 When the first read operation finish:
 742
 743 - We will recalculate the bandwidth sharing, now considering that we
 744   have 14 active read operations. This will increase the bandwidth for
 745   each operation (about 44MiB/s).
 746
 747 - The remaining "slower" activities will be speed up.
 748
 749 This behavior keeps happening until the end of the 15 operations,
 750 at each step, we speed up a little the slowest operations and
 751 consequently, decreasing the variability we see.