Juniper NETWORKS- letšoaoTelemetry ho Junos bakeng sa Mesebetsi ea AI/ML
Mongoli: Shalini Mukherjee

Selelekela

Joalo ka ha AI cluster traffc e hloka marang-rang a sa lahleheng a nang le phallo e phahameng le latency e tlase, ntlha ea bohlokoa ea marang-rang a AI ke pokello ea lintlha tsa ho beha leihlo. Junos Telemetry e thusa ho lekola matšoao a mantlha a ts'ebetso, ho kenyeletsoa menyako le li-counter bakeng sa taolo ea tšubuhlellano le ho leka-lekanya mojaro oa sephethephethe. Likopano tsa gRPC li ts'ehetsa ho phatlalatsoa ha data ea telemetry. gRPC ke moralo oa sejoale-joale, o bulehileng, o sebetsang hantle o hahiloeng holim'a lipalangoang tsa HTTP/2. E matlafatsa bokhoni ba ho hasanya ka mahlakoreng a mabeli 'me e kenyelletsa metadata e feto-fetohang ho lihlooho tsa kopo. Mohato oa pele ho telemetry ke ho tseba hore na data e lokelang ho bokelloa. Joale re ka sekaseka data ena ka mekhoa e fapaneng. Hang ha re bokella lintlha, ho bohlokoa ho li hlahisa ka mokhoa oo ho leng bonolo ho o beha leihlo, ho etsa liqeto le ho ntlafatsa litšebeletso tse fanoang. Leqepheng lena, re sebelisa pokello ea telemetry e nang le Telegraf, InfluxDB, le Grafana. Sekhahla sena sa telemetry se bokella data ho sebelisa mohlala oa push. Mekhoa ea setso ea ho hula e na le lisebelisoa tse ngata, e hloka ho kenella ka letsoho, 'me e ka kenyelletsa likheo tsa tlhahisoleseding boitsebisong boo ba bo bokellang. Mefuta ea Push e hlola meeli ena ka ho fana ka data asynchronously. Ba ntlafatsa data ka ho sebelisa mokhoa o bonolo oa basebelisi tags le mabitso. Hang ha data e se e le ka mokhoa o baloang haholoanyane, re e boloka sebakeng sa polokelo ea litaba ebe re e sebelisa ka pono e kopanetsoeng web kopo ea ho hlahloba marang-rang. Setšoantšo. 1 e re bontša kamoo stack ena e etselitsoeng ho bokella lintlha, ho boloka le ho bona ka nepo, ho tloha lisebelisoa tsa marang-rang tse sutumelletsang data ho ea ho mokelli ho isa ho data e bonts'itsoeng ho li-dashboard bakeng sa tlhahlobo.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software -

Setšoantšo sa TIG

Re sebelisitse seva sa Ubuntu ho kenya software eohle ho kenyeletsoa stack ea TIG.

Telegraph
Ho bokella lintlha, re sebelisa Telegraf ho seva sa Ubuntu se sebelisang 22.04.2. Mofuta oa Telegraf o sebetsang ho demo ena ke 1.28.5.
Telegraf ke sesebelisoa sa seva se tsamaisoang ke plugin bakeng sa ho bokella le ho tlaleha metrics. E sebelisa processor plugins ho ntlafatsa le ho etsa hore data e tloaelehe. Sephetho plugins li sebelisoa ho romella data ena mabenkeleng a fapaneng a data. Tokomaneng ena re sebelisa tse peli plugins: e 'ngoe ke ea li-sensor tsa openconfig' me e 'ngoe ke ea li-sensor tsa matsoalloa tsa Juniper.
InfluxDB
Ho boloka lintlha ka har'a database ea letoto la nako, re sebelisa InfluxDB. Plugin e hlahisoang ho Telegraf e romella data ho InfluxDB, e e bolokang ka mokhoa o atlehileng haholo. Re sebelisa V1.8 kaha ha ho na CLI e teng bakeng sa V2 le holimo.
Grafana
Grafana e sebelisoa ho bona data ena ka mahlo a kelello. Grafana e hula lintlha ho tsoa ho InfluxDB mme e lumella basebelisi ho etsa li-dashboard tse ruileng le tse sebetsang. Mona, re tsamaisa mofuta oa 10.2.2.

Configuration On The Switch

Ho kenya ts'ebetsong stack ena, re tlameha ho qala ka ho lokisa switch joalo ka ha ho bonts'itsoe ho Setšoantšo sa 2. Re sebelisitse port 50051. Boema-kepe bofe kapa bofe bo ka sebelisoa mona. Kena ho switch ea QFX 'me u kenye tlhophiso e latelang.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Switch

Hlokomela: Tlhophiso ena ke ea li-labs/POCs kaha phasewete e fetisoa ka mongolo o hlakileng. Sebelisa SSL ho qoba sena.

Tikoloho

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Tikoloho

Nginx
Sena sea hlokahala haeba o sa khone ho pepesa boema-kepe boo Grafana e tšoaretsoeng ho bona. Mohato o latelang ke ho kenya nginx ho seva sa Ubuntu ho sebetsa joalo ka moemeli oa proxy. Hang ha nginx e kentsoe, eketsa mela e bontšitsoeng ho Setšoantšo sa 4 ho "default" faele 'me u tsamaise faele ho tloha /etc/nginx ho ea /etc/nginx/sites-enabled.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx1

Netefatsa hore firewall e fetotsoe ho fana ka phihlello e felletseng ea ts'ebeletso ea nginx joalo ka ha ho bonts'itsoe ho Setšoantšo sa 5.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx2

Hang ha nginx e kentsoe 'me liphetoho tse hlokahalang li entsoe, re lokela ho khona ho fihlella Grafana ho tsoa ho a web sebatli ka ho sebelisa aterese ea IP ea seva sa Ubuntu moo software eohle e kentsoeng.
Ho na le glitch e nyane Grafana e sa u lumelleng ho seta password ea kamehla. Sebelisa mehato ena haeba u tobana le bothata bona.
Mehato e lokelang ho etsoa ho seva sa Ubuntu ho beha phasewete ho Grafana:

  • Eya ho /var/lib/grafana/grafana.db
  • Kenya sqllite3
    o sudo apt kenya sqlite3
  • Kenya taelo ena ho terminal ea hau
    o sqlite3 grafana.db
  •  Potlako ea taelo ea Sqlite ea bula; tsamaisa potso e latelang:
    > hlakola ho mosebelisi moo ho kenang = 'admin'
  • Qala bocha grafana 'me u thaepe admin joalo ka lebitso la mosebelisi le password. E fana ka password e ncha.

Ha software eohle e se e kentsoe, theha faele ea config ho Telegraf e tla thusa ho hula data ea telemetry ho switjha ebe o e sutumelletsa ho InfluxDB.

Openconfig Sensor Plugin

Ho seva sa Ubuntu, hlophisa faele ea /etc/telegraf/telegraf.conf ho kenya tsohle tse hlokahalang. plugins le lisensara. Bakeng sa lisensara tsa openconfig, re sebelisa plugin ea gNMI e bonts'itsoeng ho Setšoantšo sa 6. Bakeng sa merero ea demo, kenya lebitso la moamoheli joalo ka "spine1", nomoro ea boema-kepe "50051" e sebelisetsoang gRPC, lebitso la mosebelisi le password ea switch, le nomoro. ya metsotswana bakeng sa ho letsa hape ha ho ka hloleha.
Karolong ea ngoliso, eketsa lebitso le ikhethileng, "cpu" bakeng sa sensor ena, tsela ea sensor, le nako ea ho ts'oara data ena ho switjha. Kenya tse tšoanang plugin inputs.gnmi le inputs.gnmi.subscription bakeng sa lisensara tse bulehileng tsa config. (Setšoantšo sa 6)

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx3

Native Sensor Plugin

Ena ke plugin ea Juniper telemetry interface e sebelisetsoang li-sensor tsa matsoalloa. Ho eona faele ea telegraf.conf, eketsa lisebelisoa tsa tlhaho tsa sensor plugin.jti_openconfig_telemetry moo masimo a batlang a tšoana le openconfig. Sebelisa ID ea moreki e ikhethang bakeng sa sensor e 'ngoe le e' ngoe; mona, re sebelisa "telegraf3". Lebitso le ikhethang le sebelisitsoeng mona bakeng sa sensor ena ke "mem" (Setšoantšo sa 7).

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx4

Qetellong, kenya plugin outputs.influxdb ho romella data ena ea sensor ho InfluxDB. Mona, database e bitsoa "telegraf" ka lebitso la mosebelisi e le "influx" le password "influxdb" (setšoantšo sa 8).

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx5

Hang ha u se u hlophisitse faele ea telegraf.conf, qala tšebeletso ea mohala bocha. Joale, sheba InfluxDB CLI ho etsa bonnete ba hore na litekanyo li etselitsoe li-sensor tsohle tse ikhethang. Tlanya "influx" ho kenya InfluxDB CLI.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx6

Joalokaha ho bonoa ho Figure. 9, kenya molaetsa oa influxDB 'me u sebelise "telegraf" ea database. Mabitso ohle a ikhethileng a fuoeng li-sensor a thathamisitsoe e le litekanyo.
Ho bona tlhahiso ea tekanyo efe kapa efe e le 'ngoe, ho etsa bonnete ba hore faele ea telegraf e nepahetse le sensor ea sebetsa, sebelisa taelo ea "select * from cpu limit 1" joalo ka ha ho bonts'itsoe ho Setšoantšo sa 10.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx7

Nako le nako ha liphetoho li etsoa faeleng ea telegraf.conf, etsa bonnete ba hore u emisa InfluxDB, qala Telegraf bocha, ebe u qala InfluxDB.
Kena ho Grafana ho tsoa ho sebatli mme u thehe li-dashboard ka mor'a ho netefatsa hore data e bokelloa ka nepo.
Eya ho Lihokelo> InfuxDB> Kenya mohloli o mocha oa data.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx8

  1. Fana ka lebitso mohloling ona oa data. Ho demo ena ke "test-1".
  2.  Tlas'a sethala sa HTTP, sebelisa seva sa Ubuntu IP le boema-kepe ba 8086.
    Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx9
  3. Lintlheng tsa InfluxDB, sebelisa lebitso le tšoanang la database, "telegraf," 'me u fane ka lebitso la mosebelisi le password ea seva sa Ubuntu.
  4. Tobetsa Boloka & teko. Netefatsa hore o bona molaetsa, "katleho".
    Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Nginx10
  5. Hang ha mohloli oa data o kenyelitsoe ka katleho, e-ea ho Dashboards ebe o tobetsa Ncha. Ha re theheng li-dashboards tse 'maloa tse bohlokoa bakeng sa mesebetsi e mengata ea AI/ML ka mokhoa oa ho hlophisa.

ExampLes Of Sensor Graphs

Tse latelang ke examptse ling tse kholo tse bohlokoa bakeng sa ho beha leihlo marang-rang a AI/ML.
Peresentetage sebelisoa bakeng sa sebopeho sa ingress et-0/0/0 ka mokokotlo-1
Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Graph

  • Khetha mohloli oa data joalo ka teko-1.
  • Karolong ea FROM, khetha tekanyo e le "interface". Lena ke lebitso le ikhethileng le sebelisoang bakeng sa tsela ena ea sensor.
  • Karolong ea WHERE, khetha sesebelisoa::tag, le ho tag boleng, khetha lebitso la moeti oa switch, ke hore, spine1.
  • Karolong ea KHETHA, khetha lekala la sensor leo u batlang ho le beha leihlo; Tabeng ena, khetha "tšimo(/interfaces/interface[if_name='et-0/0/0']/state/counters/if_in_1s_octets)". Hona joale karolong e tšoanang, tobetsa "+" 'me u kenye lipalo tsena tsa lipalo (/ 50000000000 * 100). Ha e le hantle re ntse re bala peresentetage sebelisa sebopeho sa 400G.
  • Etsa bonnete ba hore FORMAT ke "letoto la nako," 'me u fane ka lebitso la kerafo karolong ea ALIAS.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Graphs1Sebaka sa tlhoro sa buffer bakeng sa tatellano efe kapa efe

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Graphs2

  • Khetha mohloli oa data joalo ka teko-1.
  • Karolong ea FROM, khetha tekanyo e le "buffer".
  • Karolong ea WHERE, ho na le litsi tse tharo tse lokelang ho tlatsoa. Khetha sesebelisoa::tag, le ho tag boleng khetha lebitso la moeti oa switjha (ke hore, mokokotlo-1); 'Me u khethe /cos/interfaces/interface/@name::tag 'me u khethe sebopeho (ke hore et- 0/0/0); 'Me u khethe le mokoloko, /cos/interfaces/interface/queues/queue/@queue::tag ebe u khetha nomoro ea 4.
  • Karolong ea KHETHA, khetha lekala la sensor leo u batlang ho le beha leihlo; Tabeng ena, khetha "field(/cos/interfaces/interface/queues/queue/PeakBufferOccupancy)."
  • Etsa bonnete ba hore FORMAT ke "nako-letoto" 'me u fane ka lebitso la kerafo karolong ea ALIAS.

O ka kopanya data bakeng sa likhokahano tse ngata ho graph e ts'oanang joalo ka ha e bonoa ho Setšoantšo sa 17 bakeng sa et-0/0/0, et-0/0/1, et-0/0/2 joalo-joalo.

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Graphs3

PFC le ECN li bolela derivative
Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - derivative

Bakeng sa ho fumana moelelo oa motsoako (phapang ea boleng ka har'a nako), sebelisa mokhoa oa ho botsa o tala.
Ena ke potso ea influx eo re e sebelisitseng ho fumana moelelo o tsoang lipakeng tsa boleng ba PFC tse peli ho et-0/0/0 ea Spine-1 ka motsotsoana.
KHETHA motsoako (bolelang(“/interfaces/interface[if_name='et-0/0/0′]/state/pfc-counter/tx_pkts”), 1s) HO TLOHA “interface” MOKAO (“sesebelisoa”:tag = 'Spine-1') LE $timeFilter GROUP KA nako($interval)

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Ka ho tšoanang bakeng sa ECN

KHETHA motsoako (bolelang(“/interfaces/interface[if_name='et-0/0/8′]/state/error-counters/ecn_ce_marked_pkts”), 1s) HO TLOHA “interface” MOO (“sesebelisoa”::tag = 'Spine-1') LE $timeFilter GROUP KA nako($interval)

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Ka ho tšoanang bakeng sa ECN1

Liphoso tsa lisebelisoa tse kentsoeng li bolela ntho e nkiloeng

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Ka ho tšoanang bakeng sa ECN2

Potso e tala bakeng sa liphoso tsa lisebelisoa ke hore:
KHETHA motsoako (bolelang(“/interfaces/interface[if_name='et-0/0/0′]/state/error-counters/if_in_resource_errors”), 1s) HO TLOHA “interface” MOKAE (“sesebelisoa”::tag = 'Spine-1') LE $timeFilter GROUP KA nako($interval)

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Ka ho tšoanang bakeng sa ECN3

Marotholi a mohatla a bolela ntho e nkiloeng

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - Ka ho tšoanang bakeng sa ECN4

Potso e tala bakeng sa marotholi a mohatla a bolela ho tsoa ho:
KHETHA motsoako (bolelang("/cos/interfaces/interface/queues/queue/tailDropBytes"), 1s) HO TLOHA "buffer" HOKAE ("sesebelisoa"::tag = 'Leaf-1' LE “/cos/interfaces/interface/@name”::tag = 'et-0/0/0' LE “/cos/interfaces/interface/queues/queue/@queue”::tag = '4') LE $timeFilter GROUP BY time($__interval) tlatsa(null)
 Tšebeliso ea CPU

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - CPU tshebediso

  • Khetha mohloli oa data joalo ka teko-1.
  • Karolong ea FROM, khetha tekanyo e le "newcpu"
  • Ho WHERE, ho na le litsi tse tharo tse lokelang ho tlatsoa. Khetha sesebelisoa::tag le ho tag boleng khetha lebitso la moeti oa switjha (ke hore mokokotlo-1). LE ho /likarolo/karolo/properties/property/name:tag, 'me u khethe cpuutilization-kakaretso LE ka lebitso::tag khetha RE0.
  • Karolong ea KHETHA, khetha lekala la sensor leo u batlang ho le beha leihlo. Tabeng ena, khetha "tšimo (boemo / boleng)".

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - CPU utilization1

Potso e tala ea ho fumana mokelikeli o sa foseng oa marotholi a mohatla bakeng sa li-switches tse ngata ho li-interface tse ngata ka libits/sec.
KHETHA non_negative_derivative(mean(“/cos/interfaces/interface/queues/queue/tailDropBytes”), 1s)*8 HO TLOHA “bu” MOKAE (sesebelisoa::tag =~ /^Spine-[1-2]$/) le (“/cos/interfaces/interface/@name”::tag =~ /et-0\/0\/[0-9]/ kapa “/cos/interfaces/interface/@name”::tag=~/et-0\/0\/1[0-5]/) LE $timeFilter GROUP BY nako($__interval),sesebediswa::tag tlatsa(null)

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - CPU utilization2

Tsena e ne e le tse ling tsa li-examptse ling tsa li-graph tse ka etsoang bakeng sa ho beha leihlo marang-rang a AI/ML.

Kakaretso

Pampiri ena e bonts'a mokhoa oa ho hula data ea telemetry le ho e bona ka mahlo ka ho etsa li-graph. Pampiri ena e bua ka ho hlaka ka li-sensor tsa AI/ML, tsa matsoalloa le tse bulehileng, empa setup se ka sebelisoa bakeng sa mefuta eohle ea li-sensor. Hape re kenyelelitse litharollo tsa mathata a mangata ao u ka tobanang le 'ona ha u ntse u theha setup. Mehato le liphetho tse bontšitsoeng pampiring ena li tobane le mefuta ea TIG e boletsoeng pejana. E ka fetoha ho latela mofuta oa software, li-sensor le mofuta oa Junos.

Litšupiso

Juniper Yang Data Model Explorer bakeng sa likhetho tsohle tsa sensor
https://apps.juniper.net/ydm-explorer/
Openconfig foramo ea li-sensor tsa openconfig
https://www.openconfig.net/projects/models/

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software - letšoao

Ntlo-kholo ea Khoebo le Lithekiso
Litlhaloso tsa likarolo tsa Juniper Networks, Inc.
Mokhoa oa 1133 oa ho nchafatsa
Sunnyvale, CA 94089 USA
Mohala: 888. JUNIPER (888.586.4737)
kapa +1.408.745.2000
Fax: +1.408.745.2100
www.junipere.net
Ntlo-khōlō ea APAC le EMEA
Juniper Networks International BV
Boeing Avenue 240
1119 PZ Schiphol-Rijk
Amsterdam, Netherlands
Mohala: +31.207.125.700
Fax: +31.207.125.701
Copyright 2023 Juniper Networks. Litokelo tsa Ail li sirelelitsoe. Juniper Networks, logo ea Juniper Networks, Juniper, Junos, le matšoao a mang a khoebo ke matšoao a ngolisitsoeng a Juniper Networks. inc. le/kapa mafapha a eona a United States le linaheng tse ling. Mabitso a mang e kanna ea ba matšoao a beng ba ona. Juniper Networks ha e nke boikarabello bakeng sa ho se nepahale hofe kapa hofe ho tokomane ena. Juniper Networks e na le tokelo ea ho fetoha. fetola. fetisetsa, kapa ho ntlafatsa sengoliloeng sena ntle le tsebiso.
Romella maikutlo ho: design-center-comments@juniper.net V1.0/240807/ejm5-telemetry-junos-ai-ml

Litokomane / Lisebelisoa

Juniper NETWORKS Telemetry In Junos bakeng sa AI ML Workloads Software [pdf] Bukana ea Mosebelisi
Telemetry In Junos bakeng sa AI ML Workloads Software, Junos bakeng sa AI ML Workloads Software, AI ML Workloads Software, Workloads Software, Software

Litšupiso

Tlohela maikutlo

Aterese ea hau ea lengolo-tsoibila e ke ke ea phatlalatsoa. Libaka tse hlokahalang li tšoailoe *