日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 综合教程 >内容正文

综合教程

yarn关于app max attempt深度解析,针对长服务appmaster平滑重启

發(fā)布時(shí)間:2023/12/13 综合教程 27 生活家
生活随笔 收集整理的這篇文章主要介紹了 yarn关于app max attempt深度解析,针对长服务appmaster平滑重启 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

在YARN上開發(fā)長服務(wù),需要注意fault-tolerance,本篇文章對(duì)appmaster的平滑重啟的一個(gè)參數(shù)做了解析,如何設(shè)置可以有助于達(dá)到appmaster平滑重啟。

在yarn-site.xml有個(gè)參數(shù)

/**
   * The maximum number of application attempts.
   * It's a global setting for all application masters.
   */
yarn.resourcemanager.am.max-attempts

一個(gè)全局的appmaster重試次數(shù)的限制,yarn提交應(yīng)用時(shí),還可以為單獨(dú)一個(gè)應(yīng)用設(shè)置最大重試次數(shù)

/**
   * Set the number of max attempts of the application to be submitted. WARNING:
   * it should be no larger than the global number of max attempts in the Yarn
   * configuration.
   * @param maxAppAttempts the number of max attempts of the application
   * to be submitted.
   */
  @Public
  @Stable
  public abstract void setMaxAppAttempts(int maxAppAttempts);

當(dāng)attempt失敗時(shí),如果設(shè)置keepContainersAcrossAppAttempts了,resource manager會(huì)決定上個(gè)attempt的container是否仍然保留著。

boolean keepContainersAcrossAppAttempts = false;
switch (finalAttemptState) {
  case FINISHED:
  {
    appEvent = new RMAppFinishedAttemptEvent(applicationId,
        appAttempt.getDiagnostics());
  }
  break;
  case KILLED:
  {
    // don't leave the tracking URL pointing to a non-existent AM
    appAttempt.setTrackingUrlToRMAppPage();
    appAttempt.invalidateAMHostAndPort();
    appEvent =
        new RMAppFailedAttemptEvent(applicationId,
            RMAppEventType.ATTEMPT_KILLED,
            "Application killed by user.", false);
  }
  break;
  case FAILED:
  {
    // don't leave the tracking URL pointing to a non-existent AM
    appAttempt.setTrackingUrlToRMAppPage();
    appAttempt.invalidateAMHostAndPort();

    if (appAttempt.submissionContext
      .getKeepContainersAcrossApplicationAttempts()
        && !appAttempt.submissionContext.getUnmanagedAM()) {
      // See if we should retain containers for non-unmanaged applications
      if (!appAttempt.shouldCountTowardsMaxAttemptRetry()) {
        // Premption, hardware failures, NM resync doesn't count towards
        // app-failures and so we should retain containers.
        keepContainersAcrossAppAttempts = true;
      } else if (!appAttempt.maybeLastAttempt) {
        // Not preemption, hardware failures or NM resync.
        // Not last-attempt too - keep containers.
        keepContainersAcrossAppAttempts = true;
      }
    }
    appEvent =
        new RMAppFailedAttemptEvent(applicationId,
          RMAppEventType.ATTEMPT_FAILED, appAttempt.getDiagnostics(),
          keepContainersAcrossAppAttempts);

  }
}

關(guān)注appAttempt.maybeLastAttempt這個(gè)變量,rs如何判斷是否這次attempt是最后一次呢?

private void createNewAttempt() {
    ApplicationAttemptId appAttemptId =
        ApplicationAttemptId.newInstance(applicationId, attempts.size() + 1);
    RMAppAttempt attempt =
        new RMAppAttemptImpl(appAttemptId, rmContext, scheduler, masterService,
          submissionContext, conf,
          // The newly created attempt maybe last attempt if (number of
          // previously failed attempts(which should not include Preempted,
          // hardware error and NM resync) + 1) equal to the max-attempt
          // limit.
          maxAppAttempts == (getNumFailedAppAttempts() + 1), amReq);
    attempts.put(appAttemptId, attempt);
    currentAttempt = attempt;
  }

在每次構(gòu)造新的attempt時(shí)候,maxAppAttempts == (getNumFailedAppAttempts() + 1)會(huì)決定,已經(jīng)失敗的次數(shù)+1,是否已經(jīng)達(dá)到了maxAppAttempts的限制了。

而maxAppAttempts這個(gè)參數(shù)是由global和individual兩個(gè)配置取min,決定的。

int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
        YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
    int individualMaxAppAttempts = submissionContext.getMaxAppAttempts();
    if (individualMaxAppAttempts <= 0 ||
        individualMaxAppAttempts > globalMaxAppAttempts) {
      this.maxAppAttempts = globalMaxAppAttempts;
      LOG.warn("The specific max attempts: " + individualMaxAppAttempts
          + " for application: " + applicationId.getId()
          + " is invalid, because it is out of the range [1, "
          + globalMaxAppAttempts + "]. Use the global max attempts instead.");
    } else {
      this.maxAppAttempts = individualMaxAppAttempts;
    }

總結(jié):

如果希望appmaster可以達(dá)到不斷重啟,而且可以接管之前的container,需要把yarn.resourcemanager.am.max-attempts這個(gè)參數(shù)盡量調(diào)大,比如設(shè)置為10000,并且提交app時(shí)候設(shè)置submit context的最大次數(shù),以及刷新窗口,這樣基本就可以滿足長服務(wù)應(yīng)用在yarn上面的運(yùn)行需求了。

總結(jié)

以上是生活随笔為你收集整理的yarn关于app max attempt深度解析,针对长服务appmaster平滑重启的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。