1 #+TITLE: Modeling I/O: the realistic way
2 #+AUTHOR: The SimGrid Team
3 #+OPTIONS: toc:nil title:nil author:nil num:nil
5 * Modeling I/O: the realistic way
12 This tutorial presents how to perform faithful IO experiments in
13 SimGrid. It is based on the paper "Adding Storage Simulation
14 Capacities to the SimGridToolkit: Concepts, Models, and API".
16 The paper presents a series of experiments to analyze the performance
17 of IO operations (read/write) on different kinds of disks (SATA, SAS,
18 SSD). In this tutorial, we present a detailed example of how to
19 extract experimental data to simulate: i) performance degradation
20 with concurrent operations (Fig. 8 in the paper) and ii) variability
21 in IO operations (Fig. 5 to 7).
23 - Link for paper: https://hal.inria.fr/hal-01197128
24 - Link for data: https://figshare.com/articles/dataset/Companion_of_the_SimGrid_storage_modeling_article/1175156
27 - The purpose of this document is to illustrate how we can
28 extract data from experiments and inject on SimGrid. However, the
29 data shown on this page may *not* reflect the reality.
30 - You must run similar experiments on your hardware to get realistic
31 data for your context.
32 - SimGrid has been in active development since the paper release in
33 2015, thus the XML description used in the paper may have evolved
34 while MSG was superseeded by S4U since then.
36 *** Running this tutorial
38 A Dockerfile is available in =docs/source/tuto_disk=. It allows you to
39 re-run this tutorial. For that, build the image and run the container:
40 - =docker build -t tuto_disk .=
41 - =docker run -it tuto_disk=
43 ** Analyzing the experimental data
44 We start by analyzing and extracting the real data available.
47 We use a special method to create non-uniform histograms to represent
48 the noise in IO operations.
50 Unable to install the library properly, I copied the important methods
53 Copied from: https://rdrr.io/github/dlebauer/pecan-priors/src/R/plots.R
55 #+begin_src R :results output :session *R* :exports none
56 #' Variable-width (dagonally cut) histogram
59 #' When constructing a histogram, it is common to make all bars the same width.
60 #' One could also choose to make them all have the same area.
61 #' These two options have complementary strengths and weaknesses; the equal-width histogram oversmooths in regions of high density, and is poor at identifying sharp peaks; the equal-area histogram oversmooths in regions of low density, and so does not identify outliers.
62 #' We describe a compromise approach which avoids both of these defects. We regard the histogram as an exploratory device, rather than as an estimate of a density.
63 #' @title Diagonally Cut Histogram
64 #' @param x is a numeric vector (the data)
65 #' @param a is the scaling factor, default is 5 * IQR
66 #' @param nbins is the number of bins, default is assigned by the Stuges method
67 #' @param rx is the range used for the left of the left-most bin to the right of the right-most bin
68 #' @param eps used to set artificial bound on min width / max height of bins as described in Denby and Mallows (2009) on page 24.
69 #' @param xlab is label for the x axis
70 #' @param plot = TRUE produces the plot, FALSE returns the heights, breaks and counts
71 #' @param lab.spikes = TRUE labels the \% of data in the spikes
72 #' @return list with two elements, heights of length n and breaks of length n+1 indicating the heights and break points of the histogram bars.
73 #' @author Lorraine Denby, Colin Mallows
74 #' @references Lorraine Denby, Colin Mallows. Journal of Computational and Graphical Statistics. March 1, 2009, 18(1): 21-31. doi:10.1198/jcgs.2009.0002.
75 dhist<-function(x, a=5*iqr(x),
76 nbins=nclass.Sturges(x), rx = range(x,na.rm = TRUE),
77 eps=.15, xlab = "x", plot = TRUE,lab.spikes = TRUE)
80 if(is.character(nbins))
81 nbins <- switch(casefold(nbins),
82 sturges = nclass.Sturges(x),
84 scott = nclass.scott(x),
85 stop("Nclass method not recognized"))
86 else if(is.function(nbins))
89 x <- sort(x[!is.na(x)])
91 a <- diff(range(x))/100000000
92 if(a != 0 & a != Inf) {
94 h <- (rx[2] + a - rx[1])/nbins
95 ybr <- rx[1] + h * (0:nbins)
96 yupper <- x + (a * (1:n))/n
97 # upper and lower corners in the ecdf
98 ylower <- yupper - a/n
100 cmtx <- cbind(cut(yupper, breaks = ybr), cut(yupper, breaks =
101 ybr, left.include = TRUE), cut(ylower, breaks = ybr),
102 cut(ylower, breaks = ybr, left.include = TRUE))
103 cmtx[1, 3] <- cmtx[1, 4] <- 1
104 # to replace NAs when default r is used
105 cmtx[n, 1] <- cmtx[n, 2] <- nbins
107 #checksum <- apply(cmtx, 1, sum) %% 4
108 checksum <- (cmtx[, 1] + cmtx[, 2] + cmtx[, 3] + cmtx[, 4]) %%
110 # will be 2 for obs. that straddle two bins
111 straddlers <- (1:n)[checksum == 2]
112 # to allow for zero counts
113 if(length(straddlers) > 0) {
114 counts <- table(c(1:nbins, cmtx[ - straddlers, 1]))
116 counts <- table(c(1:nbins, cmtx[, 1]))
120 if(length(straddlers) > 0) {
121 for(i in straddlers) {
123 theta <- ((yupper[i] - ybr[binno]) * n)/a
124 counts[binno - 1] <- counts[binno - 1] + (
126 counts[binno] <- counts[binno] + theta
130 xbr[-1] <- ybr[-1] - (a * cumsum(counts))/n
131 spike<-eps*diff(rx)/nbins
132 flag.vec<-c(diff(xbr)<spike,F)
133 if ( sum(abs(diff(xbr))<=spike) >1) {
136 diff.xbr<-abs(diff(xbr))
137 amt.spike<-diff.xbr[length(diff.xbr)]
138 for (i in rev(2:length(diff.xbr))) {
139 if (diff.xbr[i-1]<=spike&diff.xbr[i]<=spike&
140 !is.na(diff.xbr[i])) {
141 amt.spike<-amt.spike+diff.xbr[i-1]
142 counts.new[i-1]<-counts.new[i-1]+counts.new[i]
147 else amt.spike<-diff.xbr[i-1]
149 flag.vec<-flag.vec[!is.na(xbr.new)]
150 flag.vec<-flag.vec[-length(flag.vec)]
151 counts<-counts.new[!is.na(counts.new)]
152 xbr<-xbr.new[!is.na(xbr.new)]
155 else flag.vec<-flag.vec[-length(flag.vec)]
156 widths <- abs(diff(xbr))
157 ## N.B. argument "widths" in barplot must be xbr
158 heights <- counts/widths
160 bin.size <- length(x)/nbins
161 cut.pt <- unique(c(min(x) - abs(min(x))/1000,
162 approx(seq(length(x)), x, (1:(nbins - 1)) * bin.size, rule = 2)$y, max(x)))
163 aa <- hist(x, breaks = cut.pt, plot = FALSE, probability = TRUE)
169 q75<-quantile(heights,.75)
170 if (sum(flag.vec)!=0) {
171 amt<-max(heights[!flag.vec])
172 ylim.height<-amt*amt.height
173 ind.h<-flag.vec&heights> ylim.height
174 flag.vec[heights<ylim.height*(amt.height-1)/amt.height]<-F
175 heights[ind.h] <- ylim.height
180 barplot(heights, abs(diff(xbr)), space = 0, density = -1, xlab =
181 xlab, plot = TRUE, xaxt = "n",yaxt='n')
183 axis(1, at = at - xbr[1], labels = as.character(at))
185 if (sum(flag.vec)>=1) {
187 for ( i in seq(length(xbr)-1)) {
190 if (xbr[i]-xbr[1]<end.y) amt.txt<-1
194 end.y<-xbr[i]-xbr[1]+3*par('cxy')[1]
197 txt<-paste(' ',format(round(counts[i]/
198 sum(counts)*100)),'%',sep='')
200 text(xbr[i+1]-xbr[1],ylim.height-par('cxy')[2]*(amt.txt-1),txt, adj=0)
203 else print('no spikes or more than one spike')
205 invisible(list(heights = heights, xbr = xbr))
208 return(list(heights = heights, xbr = xbr,counts=counts))
212 #' Calculate interquartile range
214 #' Calculates the 25th and 75th quantiles given a vector x; used in function \link{dhist}.
215 #' @title Interquartile range
217 #' @return numeric vector of length 2, with the 25th and 75th quantiles of input vector x.
219 return(diff(quantile(x, c(0.25, 0.75), na.rm = TRUE)))
226 Some initial configurations/list of packages.
228 #+begin_src R :results output :session *R* :exports both
238 This was copied from the =sg_storage_ccgrid15.org= available at the
239 figshare of the paper. Before executing this code, please download and
240 decompress the appropriate file.
242 #+begin_src sh :results output :exports both
243 curl -O -J -L "https://ndownloader.figshare.com/files/1928095"
247 Preparing data for varialiby analysis.
249 #+BEGIN_SRC R :session :results output :export none
251 clean_up <- function (df, infra){
252 names(df) <- c("Hostname","Date","DirectIO","IOengine","IOscheduler","Error","Operation","Jobs","BufferSize","FileSize","Runtime","Bandwidth","BandwidthMin","BandwidthMax","Latency", "LatencyMin", "LatencyMax","IOPS")
253 df=subset(df,Error=="0")
254 df=subset(df,DirectIO=="1")
255 df <- merge(df,infra,by="Hostname")
256 df$Hostname = sapply(strsplit(df$Hostname, "[.]"), "[", 1)
257 df$HostModel = paste(df$Hostname, df$Model, sep=" - ")
258 df$Duration = df$Runtime/1000 # fio outputs runtime in msec, we want to display seconds
259 df$Size = df$FileSize/1024/1024
260 df=subset(df,Duration!=0.000)
261 df$Bwi=df$Duration/df$Size
262 df[df$Operation=="read",]$Operation<- "Read"
263 df[df$Operation=="write",]$Operation<- "Write"
267 grenoble <- read.csv('./bench/grenoble.csv', header=FALSE,sep = ";",
268 stringsAsFactors=FALSE)
269 luxembourg <- read.csv('./bench/luxembourg.csv', header=FALSE,sep = ";", stringsAsFactors=FALSE)
270 nancy <- read.csv('./bench/nancy.csv', header=FALSE,sep = ";", stringsAsFactors=FALSE)
271 all <- rbind(grenoble,nancy, luxembourg)
272 infra <- read.csv('./bench/infra.csv', header=FALSE,sep = ";", stringsAsFactors=FALSE)
273 names(infra) <- c("Hostname","Model","DiskSize")
275 all = clean_up(all, infra)
276 griffon = subset(all,grepl("^griffon", Hostname))
277 griffon$Cluster <-"Griffon (SATA II)"
278 edel = subset(all,grepl("^edel", Hostname))
279 edel$Cluster<-"Edel (SSD)"
281 df = rbind(griffon[griffon$Jobs=="1" & griffon$IOscheduler=="cfq",],
282 edel[edel$Jobs=="1" & edel$IOscheduler=="cfq",])
283 #Get rid off of 64 Gb disks of Edel as they behave differently (used to be "edel-51")
284 df = df[!(grepl("^Edel",df$Cluster) & df$DiskSize=="64 GB"),]
287 Preparing data for concurrent analysis.
288 #+begin_src R :results output :session *R* :exports both
289 dfc = rbind(griffon[griffon$Jobs>1 & griffon$IOscheduler=="cfq",],
290 edel[edel$Jobs>1 & edel$IOscheduler=="cfq",])
291 dfc2 = rbind(griffon[griffon$Jobs==1 & griffon$IOscheduler=="cfq",],
292 edel[edel$Jobs==1 & edel$IOscheduler=="cfq",])
293 dfc = rbind(dfc,dfc2[sample(nrow(dfc2),size=200),])
297 Date = NA, #tmpl$Date,
302 Operation = NA, #tmpl$Operation,
303 Jobs = NA, # #d$nb.of.concurrent.access,
304 BufferSize = NA, #d$bs,
305 FileSize = NA, #d$size,
314 Model = NA, #tmpl$Model,
315 DiskSize = NA, #tmpl$DiskSize,
317 Duration = NA, #d$time,
320 Cluster = NA) #tmpl$Cluster)
322 dd$Size = dd$FileSize/1024/1024
323 dd$Bwi = dd$Duration/dd$Size
326 # Let's get rid of small files!
327 dfc = subset(dfc,Size >= 10)
328 # Let's get rid of 64Gb edel disks
329 dfc = dfc[!(grepl("^Edel",dfc$Cluster) & dfc$DiskSize=="64 GB"),]
331 dfc$TotalSize=dfc$Size * dfc$Jobs
332 dfc$BW = (dfc$TotalSize) / dfc$Duration
333 dfc = dfc[dfc$BW>=20,] # get rid of one point that is typically an outlier and does not make sense
336 dfc[dfc$Cluster=="Edel (SSD)" & dfc$Operation=="Read",]$method="loess"
338 dfc[dfc$Cluster=="Edel (SSD)" & dfc$Operation=="Write" & dfc$Jobs ==1,]$method="lm"
339 dfc[dfc$Cluster=="Edel (SSD)" & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
341 dfc[dfc$Cluster=="Griffon (SATA II)" & dfc$Operation=="Write",]$method="lm"
342 dfc[dfc$Cluster=="Griffon (SATA II)" & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
344 dfd = dfc[dfc$Operation=="Write" & dfc$Jobs ==1 &
345 (dfc$Cluster %in% c("Griffon (SATA II)", "Edel (SSD)")),]
346 dfd = ddply(dfd,c("Cluster","Operation","Jobs","DiskSize"), summarize,
347 mean = mean(BW), num = length(BW), sd = sd(BW))
349 dfd$ci = 2*dfd$sd/sqrt(dfd$num)
351 dfrange=ddply(dfc,c("Cluster","Operation","DiskSize"), summarize,
353 dfrange=ddply(dfrange,c("Cluster","DiskSize"), mutate,
360 **** Modeling resource sharing w/ concurrent access
362 This figure presents the overall performance of IO operation with
363 concurrent access to the disk. Note that the image is different
364 from the one in the paper. Probably, we need to further clean the
365 available data to obtain exaclty the same results.
367 #+begin_src R :results output graphics :file fig/griffon_deg.png :exports both :width 600 :height 400 :session *R*
368 ggplot(data=dfc,aes(x=Jobs,y=BW, color=Operation)) + theme_bw() +
369 geom_point(alpha=.3) +
370 geom_point(data=dfrange, size=0) +
371 facet_wrap(Cluster~Operation,ncol=2,scale="free_y")+ # ) + #
372 geom_smooth(data=dfc[dfc$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
373 geom_smooth(data=dfc[dfc$method=="lm",], color="black", method=lm,se=TRUE) +
374 geom_point(data=dfd, aes(x=Jobs,y=BW),color="black",shape=21,fill="white") +
375 geom_errorbar(data=dfd, aes(x=Jobs, ymin=BW-ci, ymax=BW+ci),color="black",width=.6) +
376 xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)") + guides(color=FALSE) + xlim(0,NA) + ylim(0,NA)
381 Getting read data for Griffon from 1 to 15 concurrent reads.
383 #+begin_src R :results output :session *R* :exports both
384 deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read")
385 model = lm(BW~Jobs, data = deg_griffon)
386 IO_INFO[["griffon"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
388 toJSON(IO_INFO, pretty = TRUE)
394 Same for write operations.
396 #+begin_src R :results output :session *R* :exports both
397 deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
398 mean_job_1 = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
399 model = lm(BW~Jobs, data = deg_griffon)
400 IO_INFO[["griffon"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
401 toJSON(IO_INFO, pretty = TRUE)
405 **** Modeling read/write bandwidth variability
407 Fig.5 in the paper presents the noise in the read/write operations in
408 the Griffon SATA disk.
410 The paper uses regular histogram to illustrate the distribution of the
411 effective bandwidth. However, in this tutorial, we use dhist
412 (https://rdrr.io/github/dlebauer/pecan-priors/man/dhist.html) to have a
413 more precise information over the highly dense areas around the mean.
416 First, we present the histogram for read operations.
417 #+begin_src R :results output graphics :file fig/griffon_read_dhist.png :exports both :width 600 :height 400 :session *R*
418 griffon_read = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
419 dhist(1/griffon_read$Bwi)
422 Saving it to be exported in json format.
424 #+begin_src R :results output :session *R* :exports both
425 griffon_read_dhist = dhist(1/griffon_read$Bwi, plot=FALSE)
426 IO_INFO[["griffon"]][["noise"]][["read"]] = c(breaks=list(griffon_read_dhist$xbr), heights=list(unclass(griffon_read_dhist$heights)))
427 IO_INFO[["griffon"]][["read_bw"]] = mean(1/griffon_read$Bwi)
428 toJSON(IO_INFO, pretty = TRUE)
433 Same analysis for write operations.
434 #+begin_src R :results output graphics :file fig/griffon_write_dhist.png :exports both :width 600 :height 400 :session *R*
435 griffon_write = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
436 dhist(1/griffon_write$Bwi)
439 #+begin_src R :results output :session *R* :exports both
440 griffon_write_dhist = dhist(1/griffon_write$Bwi, plot=FALSE)
441 IO_INFO[["griffon"]][["noise"]][["write"]] = c(breaks=list(griffon_write_dhist$xbr), heights=list(unclass(griffon_write_dhist$heights)))
442 IO_INFO[["griffon"]][["write_bw"]] = mean(1/griffon_write$Bwi)
443 toJSON(IO_INFO, pretty = TRUE)
447 This section presents the exactly same analysis for the Edel SSDs.
449 **** Modeling resource sharing w/ concurrent access
452 Getting read data for Edel from 1 to 15 concurrent operations.
454 #+begin_src R :results output :session *R* :exports both
455 deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read")
456 model = loess(BW~Jobs, data = deg_edel)
457 IO_INFO[["edel"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
458 toJSON(IO_INFO, pretty = TRUE)
463 Same for write operations.
465 #+begin_src R :results output :session *R* :exports both
466 deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
467 mean_job_1 = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
468 model = lm(BW~Jobs, data = deg_edel)
469 IO_INFO[["edel"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
470 toJSON(IO_INFO, pretty = TRUE)
474 **** Modeling read/write bandwidth variability
478 #+begin_src R :results output graphics :file fig/edel_read_dhist.png :exports both :width 600 :height 400 :session *R*
479 edel_read = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
480 dhist(1/edel_read$Bwi)
483 Saving it to be exported in json format.
485 #+begin_src R :results output :session *R* :exports both
486 edel_read_dhist = dhist(1/edel_read$Bwi, plot=FALSE)
487 IO_INFO[["edel"]][["noise"]][["read"]] = c(breaks=list(edel_read_dhist$xbr), heights=list(unclass(edel_read_dhist$heights)))
488 IO_INFO[["edel"]][["read_bw"]] = mean(1/edel_read$Bwi)
489 toJSON(IO_INFO, pretty = TRUE)
493 #+begin_src R :results output graphics :file fig/edel_write_dhist.png :exports both :width 600 :height 400 :session *R*
495 edel_write = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
496 dhist(1/edel_write$Bwi)
499 Saving it to be exported later.
500 #+begin_src R :results output :session *R* :exports both
501 edel_write_dhist = dhist(1/edel_write$Bwi, plot=FALSE)
502 IO_INFO[["edel"]][["noise"]][["write"]] = c(breaks=list(edel_write_dhist$xbr), heights=list(unclass(edel_write_dhist$heights)))
503 IO_INFO[["edel"]][["write_bw"]] = mean(1/edel_write$Bwi)
504 toJSON(IO_INFO, pretty = TRUE)
508 Finally, let's save it to a file to be opened by our simulator.
510 #+begin_src R :results output :session *R* :exports both
511 json = toJSON(IO_INFO, pretty = TRUE)
512 cat(json, file="IO_noise.json")
516 ** Injecting this data in SimGrid
518 To mimic this behavior in SimGrid, we use two features in the platform
519 description: non-linear sharing policy and bandwidth factors. For more
520 details, please see the source code in =tuto_disk.cpp=.
522 *** Modeling resource sharing w/ concurrent access
524 The =set_sharing_policy= method allows the user to set a callback to
525 dynamically change the disk capacity. The callback is called each time
526 SimGrid will share the disk between a set of I/O operations.
528 The callback has access to the number of activities sharing the
529 resource and its current capacity. It must return the new resource's
532 #+begin_src C++ :results output :eval no :exports code
533 static double disk_dynamic_sharing(double capacity, int n)
535 return capacity; //useless callback
538 auto* disk = host->create_disk("dump", 1e6, 1e6);
539 disk->set_sharing_policy(sg4::Disk::Operation::READ, sg4::Disk::SharingPolicy::NONLINEAR, &disk_dynamic_sharing);
543 *** Modeling read/write bandwidth variability
545 The noise in I/O operations can be obtained by applying a factor to
546 the I/O bandwidth of the disk. This factor is applied when we update
547 the remaining amount of bytes to be transferred, increasing or
548 decreasing the effective disk bandwidth.
550 The =set_factor= method allows the user to set a callback to
551 dynamically change the factor to be applied for each I/O operation.
552 The callback has access to size of the operation and its type (read or
553 write). It must return a multiply factor (e.g. 1.0 for doing nothing).
555 #+begin_src C++ :results output :eval no :exports code
556 static double disk_variability(sg_size_t size, sg4::Io::OpType op)
558 return 1.0; //useless callback
561 auto* disk = host->create_disk("dump", 1e6, 1e6);
562 disk->set_factor_cb(&disk_variability);
566 *** Running our simulation
567 The binary was compiled in the provided docker container.
569 #+begin_src sh :results output :exports both
570 ./tuto_disk > ./simgrid_disk.csv
574 ** Analyzing the SimGrid results
576 The figure below presents the results obtained by SimGrid.
578 The experiment performs I/O operations, varying the number of
579 concurrent operations from 1 to 15. We run only 20 simulations for
582 We can see that the graphics are quite similar to the ones obtained in
585 #+begin_src R :results output graphics :file fig/simgrid_results.png :exports both :width 600 :height 400 :session *R*
586 sg_df = read.csv("./simgrid_disk.csv")
587 sg_df = sg_df %>% group_by(disk, op, flows) %>% mutate(bw=((size*flows)/elapsed)/10^6, method=if_else(disk=="edel" & op=="read", "loess", "lm"))
588 sg_dfd = sg_df %>% filter(flows==1 & op=="write") %>% group_by(disk, op, flows) %>% summarize(mean = mean(bw), sd = sd(bw), se=sd/sqrt(n()))
590 sg_df[sg_df$op=="write" & sg_df$flows ==1,]$method=""
592 ggplot(data=sg_df, aes(x=flows, y=bw, color=op)) + theme_bw() +
593 geom_point(alpha=.3) +
594 geom_smooth(data=sg_df[sg_df$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
595 geom_smooth(data=sg_df[sg_df$method=="lm",], color="black", method=lm,se=TRUE) +
596 geom_errorbar(data=sg_dfd, aes(x=flows, y=mean, ymin=mean-2*se, ymax=mean+2*se),color="black",width=.6) +
597 facet_wrap(disk~op,ncol=2,scale="free_y")+ # ) + #
598 xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)") + guides(color=FALSE) + xlim(0,NA) + ylim(0,NA)
602 Note: The variability in griffon read operation seems to decrease when
603 we have more concurrent operations. This is a particularity of the
604 griffon read speed profile and the elapsed time calculation.
607 - Each point represents the time to perform the N I/O operations.
608 - Griffon read speed decreases with the number of concurrent
611 With 15 read operations:
612 - At the beginning, every read gets the same bandwidth, about
614 - We sample the noise in I/O operations, some will be faster than
615 others (e.g. factor > 1).
617 When the first read operation finish:
618 - We will recalculate the bandwidth sharing, now considering that we
619 have 14 active read operations. This will increase the bandwidth for
620 each operation (about 44MiB/s).
621 - The remaining "slower" activities will be speed up.
623 This behavior keeps happening until the end of the 15 operations,
624 at each step, we speed up a little the slowest operations and
625 consequently, decreasing the variability we see.