Juniper NETWORKS- logoTelemetry hauv Junos rau AI / ML Workloads
Sau: Shalini Mukherjee

Taw qhia

Raws li AI pawg traffic yuav tsum tau lossless tes hauj lwm nrog high throughput thiab tsawg latency, lub ntsiab tseem ceeb ntawm AI network yog sau los ntawm kev soj ntsuam cov ntaub ntawv. Junos Telemetry enables granular saib xyuas ntawm qhov tseem ceeb ntawm kev ua tau zoo, nrog rau cov pib thiab cov txee rau congestion tswj thiab traffic load ntsuas. gRPC ntu txhawb nqa streaming ntawm telemetry cov ntaub ntawv. gRPC yog qhov niaj hnub, qhib qhov chaw, ua haujlwm siab ua haujlwm uas tau tsim los ntawm HTTP / 2 thauj. Nws txhawb nqa cov peev txheej hauv ib txwm muaj peev xwm streaming thiab suav nrog cov ntaub ntawv hloov pauv tau yooj yim-metadata hauv cov ntawv thov. Thawj kauj ruam hauv telemetry yog kom paub cov ntaub ntawv twg yuav tsum tau sau. Peb tuaj yeem txheeb xyuas cov ntaub ntawv no hauv ntau hom ntawv. Thaum peb sau cov ntaub ntawv, nws yog ib qho tseem ceeb uas yuav tsum tau nthuav tawm nyob rau hauv ib hom ntawv uas yooj yim rau kev saib xyuas, txiav txim siab thiab txhim kho cov kev pabcuam uas tau muab. Hauv daim ntawv no, peb siv pawg telemetry suav nrog Telegraf, InfluxDB, thiab Grafana. Cov pawg telemetry no sau cov ntaub ntawv siv tus qauv thawb. Cov qauv rub tawm ib txwm muaj peev xwm siv tau, yuav tsum muaj kev cuam tshuam los ntawm phau ntawv, thiab tuaj yeem suav nrog cov ntaub ntawv tsis sib xws hauv cov ntaub ntawv lawv sau. Push qauv kov yeej cov kev txwv no los ntawm kev xa cov ntaub ntawv asynchronously. Lawv txhawb cov ntaub ntawv los ntawm kev siv cov neeg siv khoom zoo tags thiab cov npe. Thaum cov ntaub ntawv nyob rau hauv ib hom ntawv nyeem tau ntau dua, peb muab nws tso rau hauv ib lub database thiab siv nws hauv kev sib tham sib pom web daim ntawv thov rau kev txheeb xyuas lub network. Daim duab. 1 qhia peb li cas cov pawg no yog tsim los rau kev sau cov ntaub ntawv, khaws cia, thiab kev pom, los ntawm cov khoom siv hauv network thawb cov ntaub ntawv mus rau tus neeg sau rau cov ntaub ntawv tso tawm ntawm dashboards rau kev tshuaj xyuas.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software -

TIG Stack

Peb siv Ubuntu server los nruab tag nrho cov software suav nrog TIG pawg.

Xov tooj
Txhawm rau sau cov ntaub ntawv, peb siv Telegraf ntawm Ubuntu server khiav 22.04.2. Telegraf version khiav hauv qhov demo no yog 1.28.5.
Telegraf yog plugin tsav tus neeg rau zaub mov rau kev sau thiab qhia txog kev ntsuas. Nws siv processor plugins txhawm rau txhim kho thiab normalize cov ntaub ntawv. Cov zis plugins yog siv los xa cov ntaub ntawv no mus rau ntau lub khw muag ntaub ntawv. Hauv daim ntawv no peb siv ob plugins: ib qho rau openconfig sensors thiab lwm yam rau Juniper haiv neeg sensors.
InfluxDB
Txhawm rau khaws cov ntaub ntawv hauv ib lub sijhawm series database, peb siv InfluxDB. Cov tso zis plugin hauv Telegraf xa cov ntaub ntawv mus rau InuxDB, uas khaws cia nws zoo heev. Peb tab tom siv V1.8 vim tsis muaj CLI tam sim no rau V2 thiab siab dua.
Grafana
Grafana yog siv los pom cov ntaub ntawv no. Grafana rub cov ntaub ntawv los ntawm InfluxDB thiab tso cai rau cov neeg siv los tsim kev nplua nuj thiab sib tham sib dashboards. Ntawm no, peb tab tom khiav version 10.2.2.

Configuration On The Switch

Txhawm rau siv cov pawg no, peb thawj zaug yuav tsum tau teeb tsa qhov hloov pauv raws li qhia hauv daim duab 2. Peb tau siv qhov chaw nres nkoj 50051. Txhua qhov chaw nres nkoj tuaj yeem siv ntawm no. Nkag mus rau QFX hloov thiab ntxiv cov kev teeb tsa hauv qab no.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Hloov

Nco tseg: Qhov kev teeb tsa no yog rau labs / POCs vim tias tus password raug xa mus rau hauv cov ntawv ntshiab. Siv SSL kom zam qhov no.

Ib puag ncig

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Ib puag ncig

Nginx
Qhov no yog qhov xav tau yog tias koj tsis tuaj yeem nthuav tawm qhov chaw nres nkoj uas Grafana tau tuav. Cov kauj ruam tom ntej yog nruab nginx ntawm Ubuntu server los ua tus neeg sawv cev rov qab. Thaum nginx tau teeb tsa, ntxiv cov kab uas pom hauv daim duab 4 mus rau "default" cov ntaub ntawv thiab txav cov ntaub ntawv los ntawm /etc/nginx rau /etc/nginx/sites-enabled.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx1

Xyuas kom meej tias lub firewall raug kho kom muab tag nrho nkag mus rau nginx kev pabcuam raws li pom hauv daim duab 5.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx2

Thaum nginx tau teeb tsa thiab qhov yuav tsum tau hloov pauv, peb yuav tsum nkag mus rau Grafana los ntawm a web browser los ntawm kev siv tus IP chaw nyob ntawm Ubuntu server qhov twg tag nrho cov software raug teeb tsa.
Muaj ib qho glitch me me hauv Grafana uas tsis cia koj rov pib dua tus password qub. Siv cov kauj ruam no yog tias koj khiav mus rau qhov teeb meem no.
Cov kauj ruam yuav tsum tau ua ntawm Ubuntu server los teeb tsa tus password hauv Grafana:

  • Mus rau /var/lib/grafana/grafana.db
  • Nruab sqllite3
    o sudo apt nruab sqlite3
  • Khiav cov lus txib no ntawm koj lub davhlau ya nyob twg
    o sqlite3 grafana.db
  •  Sqlite command prompt qhib; khiav cov lus nug hauv qab no:
    > rho tawm los ntawm tus neeg siv qhov chaw nkag = 'admin'
  • Pib dua grafana thiab ntaus admin li username thiab password. Nws prompts rau tus password tshiab.

Thaum tag nrho cov software tau teeb tsa, tsim cov ntawv teeb tsa hauv Telegraf uas yuav pab rub cov ntaub ntawv telemetry los ntawm kev hloov thiab thawb mus rau InfluxDB.

Openconfig Sensor Plugin

Ntawm Ubuntu server, kho qhov /etc/telegraf/telegraf.conf cov ntaub ntawv ntxiv txhua qhov xav tau plugins thiab sensors. Rau cov openconfig sensors, peb siv gNMI plugin qhia hauv daim duab 6. Rau cov hom phiaj demo, ntxiv lub hostname li "spine1", qhov chaw nres nkoj tus naj npawb "50051" uas yog siv rau gRPC, tus username thiab password ntawm tus hloov, thiab tus naj npawb. ntawm vib nas this rau redial thaum tsis ua hauj lwm.
Nyob rau hauv lub subscription stanza, ntxiv ib tug tshwj xeeb lub npe, "cpu" rau no tshwj xeeb sensor, lub sensor txoj kev, thiab lub sij hawm ncua sij hawm rau lob cov ntaub ntawv no los ntawm tus hloov. Ntxiv tib lub plugin inputs.gnmi thiab inputs.gnmi.subscription rau tag nrho cov qhib configg sensors. (Daim duab 6)

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx3

Native Sensor Plugin

Nov yog Juniper telemetry interface plugin siv rau cov sensors ib txwm muaj. Hauv tib lub telegraf.conf cov ntaub ntawv, ntxiv cov haiv neeg sensor plugin inputs.jti_openconfig_telemetry qhov twg cov teb yuav luag zoo ib yam li openconfig. Siv tus cim tus neeg siv khoom ID rau txhua lub sensor; ntawm no, peb siv "telegraf3". Lub npe tshwj xeeb siv ntawm no rau qhov ntsuas no yog "mem" (Daim duab 7).

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx4

Thaum kawg, ntxiv qhov tso zis plugin outputs.influxdb xa cov ntaub ntawv sensor no rau InfluxDB. Ntawm no, cov ntaub ntawv muaj npe hu ua "telegraf" nrog tus neeg siv lub npe ua "inux" thiab lo lus zais "influxdb" (Daim duab 8).

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx5

Thaum koj tau kho cov ntaub ntawv telegraf.conf, rov pib qhov kev pabcuam telegraf. Tam sim no, tshawb xyuas hauv InfluxDB CLI kom paub tseeb tias kev ntsuas yog tsim rau txhua qhov tshwj xeeb sensors. Ntaus "influx" nkag mus rau InfluxDB CLI.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx6

Raws li pom hauv daim duab. 9, nkag mus rau influxDB sai thiab siv cov ntaub ntawv "telegraf". Txhua lub npe tshwj xeeb muab rau cov sensors tau teev tseg raws li kev ntsuas.
Txhawm rau pom cov zis ntawm ib qho kev ntsuas, tsuas yog kom paub tseeb tias cov ntaub ntawv telegraf yog qhov tseeb thiab lub sensor ua haujlwm, siv cov lus txib "xaiv * los ntawm cpu txwv 1" raws li pom hauv daim duab 10.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx7

Txhua lub sijhawm hloov pauv rau cov ntaub ntawv telegraf.conf, nco ntsoov nres InuxDB, rov pib Telegraf, thiab tom qab ntawd pib InfluxDB.
Nkag mus rau Grafana los ntawm qhov browser thiab tsim dashboards tom qab kom ntseeg tau tias cov ntaub ntawv raug sau raug.
Mus rau Kev Txuas> InfuxDB> Ntxiv cov ntaub ntawv tshiab.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx8

  1. Muab lub npe rau cov ntaub ntawv no. Nyob rau hauv no demo nws yog "test-1".
  2.  Hauv qab HTTP stanza, siv Ubuntu server IP thiab 8086 chaw nres nkoj.
    Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx9
  3. Hauv InfluxDB cov ntsiab lus, siv tib lub npe database, "telegraf," thiab muab lub npe siv thiab tus password ntawm Ubuntu server.
  4. Nyem Txuag & kuaj. Xyuas kom koj pom cov lus, "ua tiav".
    Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Nginx10
  5. Thaum cov ntaub ntawv tau tiav lawm, mus rau Dashboards thiab nyem Tshiab. Cia peb tsim ob peb lub dashboards uas yog qhov tseem ceeb rau AI / ML workloads hauv hom editor.

Examples Of Sensor Graphs

Cov hauv qab no yog examples ntawm qee lub txee loj uas tseem ceeb rau kev saib xyuas AI / ML network.
Percentage siv rau ingress interface et-0/0/0 ntawm qaum-1
Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Graphs

  • Xaiv cov ntaub ntawv qhov chaw raws li kev xeem-1.
  • Hauv seem NTAWM, xaiv qhov ntsuas raws li "interface". Qhov no yog lub npe tshwj xeeb siv rau txoj kev sensor no.
  • Hauv qhov chaw nyob, xaiv ntaus ntawv::tag, thiab hauv tag tus nqi, xaiv lub hostname ntawm qhov hloov, uas yog, qaum1.
  • Hauv seem SELECT, xaiv cov ceg sensor uas koj xav saib xyuas; Nyob rau hauv cov ntaub ntawv no xaiv "flo (/interfaces/interface[if_name='et-0/0/0']/state/counters/if_in_1s_octes)". Tam sim no hauv tib seem, nyem rau ntawm "+" thiab ntxiv qhov kev suav lej (/50000000000 * 100). Peb yeej xam qhov percentage siv 400G interface.
  • Xyuas kom meej tias FORMAT yog "lub sij hawm-series," thiab sau npe daim duab hauv ALIAS seem.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Graphs1Peak tsis nyob rau txhua tus queue

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Graphs2

  • Xaiv cov ntaub ntawv qhov chaw raws li kev xeem-1.
  • Hauv seem NTAWM, xaiv qhov ntsuas raws li "buffer."
  • Hauv seem QHOV CHAW, muaj peb lub teb los sau. Xaiv ntaus ntawv::tag, thiab hauv tag tus nqi xaiv lub hostname ntawm qhov hloov pauv (piv txwv li qaum-1); THIAB xaiv /cos/interfaces/interface/@name::tag thiab xaiv lub interface (piv txwv li thiab 0/0/0); THIAB xaiv cov kab ib yam nkaus, /cos/interfaces/interface/queues/queue/@queue::tag thiab xaiv cov kab zauv 4.
  • Hauv seem SELECT, xaiv cov ceg sensor uas koj xav saib xyuas; Nyob rau hauv cov ntaub ntawv no, xaiv " teb(/ cos/interfaces/interface/queues/queue/PeakBufferOccupancy)."
  • Xyuas kom tseeb tias FORMAT yog "lub sij hawm-series" thiab lub npe daim duab nyob rau hauv seem ALIAS.

Koj tuaj yeem sib sau cov ntaub ntawv rau ntau qhov sib cuam tshuam ntawm tib daim duab raws li pom hauv daim duab 17 rau et-0/0/0, et-0/0/1, et-0/0/2 thiab lwm yam.

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Graphs3

PFC thiab ECN txhais tau tias yog derivative
Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - derivative

Txhawm rau nrhiav qhov nruab nrab derivative (qhov sib txawv ntawm tus nqi hauv ib lub sijhawm), siv hom lus nug raw.
Qhov no yog cov lus nug influx uas peb tau siv los nrhiav qhov txiaj ntsig nruab nrab ntawm ob qhov txiaj ntsig PFC ntawm et-0/0/0 ntawm Spine-1 hauv ib sec.
SELECT derivative(txhais tau tias(“/interfaces/interface[if_name='et-0/0/0′]/state/pfc-counter/tx_pkts”), 1s) FROM “interface” where (“device”::tag = 'Spine-1') THIAB $timeFilter GROUP BY time($interval)

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Ib yam li rau ECN

SELECT derivative(txhais tau tias(“/interfaces/interface[if_name='et-0/0/8′]/state/error-counters/ecn_ce_marked_pkts”), 1s) FROM “interface” where (“device”::tag = 'Spine-1') THIAB $timeFilter GROUP BY time($interval)

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Ib yam li rau ECN1

Input resource yuam kev txhais hais tias derivative

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Ib yam li rau ECN2

Cov lus nug raw rau cov ntaub ntawv yuam kev txhais tau hais tias derivative yog:
SELECT derivative(txhais tau tias(“/interfaces/interface[if_name='et-0/0/0′]/state/error-counters/if_in_resource_errors”), 1s) FROM “interface” where (“device”::tag = 'Spine-1') THIAB $timeFilter GROUP BY time($interval)

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Ib yam li rau ECN3

Tail drops txhais tau hais tias derivative

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - Ib yam li rau ECN4

Cov lus nug nyoos rau tail drops txhais tau tias derivative yog:
SELECT derivative(txhais tau tias(“/cos/interfaces/interface/queues/queue/tailDropBytes”), 1s) Los ntawm “buffer” NTAWM NO (“device”::tag = 'Leaf-1' THIAB “/cos/interfaces/interface/@name”::tag = 'et-0/0/0' THIAB “/cos/interfaces/interface/queues/queue/@queue”::tag = '4') THIAB $timeFilter GROUP LOS NTAWM lub sij hawm($__interval) fill(null)
 Kev siv CPU

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - CPU siv

  • Xaiv cov ntaub ntawv qhov chaw raws li kev xeem-1.
  • Hauv seem NTAWM, xaiv qhov ntsuas raws li "newcpu"
  • Nyob rau hauv qhov chaw, muaj peb lub teb los ua. Xaiv ntaus ntawv::tag thiab hauv tag tus nqi xaiv lub hostname ntawm qhov hloov pauv (piv txwv li qaum-1). THIAB hauv /components/component/properties/property/name:tag, thiab xaiv cpuutilization-tag nrho THIAB hauv lub npe::tag xaiv RE0.
  • Hauv seem SELECT, xaiv cov ceg sensor uas koj xav saib xyuas. Hauv qhov no, xaiv "field(state/value)".

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - CPU siv1

Cov lus nug nyoos rau kev nrhiav qhov tsis muaj qhov tsis zoo ntawm tus Tsov tus tw poob rau ntau lub keyboards ntawm ntau qhov sib cuam tshuam hauv cov khoom / sec.
SELECT non_negative_derivative(txhais tau tias(“/cos/interfaces/interface/queues/queue/tailDropBytes”), 1s)*8 NTAWM “buffer” NO (device::tag =~ /^Spine-[1-2]$/) and (“/cos/interfaces/interface/@name”::tag =~ /et-0\/0\/[0-9]/ or “/cos/interfaces/interface/@name”::tag=~/et-0\/0\/1[0-5]/) THIAB $timeFilter GROUP BY time($__interval), ntaus ntawv::tag puv (null)

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - CPU siv2

Cov no yog ib co ntawm cov examples ntawm cov duab uas tuaj yeem tsim los saib xyuas AI / ML network.

Cov ntsiab lus

Daim ntawv no qhia txog txoj hauv kev rub cov ntaub ntawv telemetry thiab pom nws los ntawm kev tsim cov duab. Daim ntawv no tshwj xeeb hais txog AI / ML sensors, ob qho tib si haiv neeg thiab qhib kev sib tham tab sis kev teeb tsa tuaj yeem siv rau txhua yam ntawm cov sensors. Peb kuj tseem suav nrog kev daws teeb meem rau ntau yam teeb meem uas koj yuav ntsib thaum tsim teeb tsa. Cov kauj ruam thiab cov txiaj ntsig tau piav qhia hauv daim ntawv no yog tshwj xeeb rau cov qauv ntawm TIG pawg tau hais ua ntej. Nws yuav raug hloov nyob ntawm qhov version ntawm software, sensors thiab Junos version.

Cov ntaub ntawv

Juniper Yang Data Model Explorer rau txhua qhov kev xaiv sensor
https://apps.juniper.net/ydm-explorer/
Openconfig lub rooj sab laj rau openconfig sensors
https://www.openconfig.net/projects/models/

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software - icon

Corporate thiab Sales Headquarters
Juniper Networks, Inc.
1133 Innovation Txoj Kev
Sunnyvale, CA 94089 USA
Xov tooj: 888. JUNIPER (888.586.4737)
los yog +1.408.745.2000
Fax: + 1.408.745.2100
www.juniper.net
APAC thiab EMEA lub hauv paus
Juniper Networks International BV
Lub Boeing Avenue 240
1119 PZ Schiphol-Rijk
Amsterdam, Lub Netherlands
Xov tooj: +31.207.125.700
Fax: + 31.207.125.701
Copyright 2023 Juniper Networks. Inc. Ail txoj cai tshwj tseg. Juniper Networks, Juniper Networks logo, Juniper, Junos, thiab lwm yam lag luam yog cov cim lag luam ntawm Juniper Networks. inc. thiab/lossis nws cov koom tes hauv Tebchaws Meskas thiab lwm lub tebchaws. Lwm lub npe yuav yog cov cim lag luam ntawm lawv cov tswv. Juniper Networks xav tias tsis muaj lub luag haujlwm rau qhov tsis raug hauv daim ntawv no. Juniper Networks tseg txoj cai hloov. hloov kho. hloov, lossis hloov kho qhov kev tshaj tawm no yam tsis muaj ntawv ceeb toom.
Xa cov lus teb rau: tsim-center-comments@juniper.net V1.0/240807/ejm5-telemetry-junos-ai-ml

Cov ntaub ntawv / Cov ntaub ntawv

Juniper NETWORKS Telemetry Hauv Junos rau AI ML Workloads Software [ua pdf] Cov neeg siv phau ntawv qhia
Telemetry Hauv Junos rau AI ML Workloads Software, Junos rau AI ML Workloads Software, AI ML Workloads Software, Workloads Software, Software

Cov ntaub ntawv

Cia ib saib

Koj email chaw nyob yuav tsis raug luam tawm. Cov teb uas yuav tsum tau muaj yog cim *