← 返回列表

Uhlelo lwe-AI lokuxoxa impendulo 10: Yini Embedding eyenzayo empeleni? — Kusukela kumongo wobuchwepheshe kuya ezimpendulweni zoxoxo

Yini Embedding eyenzayo empeleni? — Kusukela kumongo wobuchwepheshe kuya ezimpendulweni zoxoxo

I. Umongo wobuchwepheshe: Umusho owodwa ochaza umongo

Umsebenzi oyinhloko we-Embedding uwukuthi umdwebo wedatha engahleliwe (amagama, izithombe, njll.) uyiswa esikhaleni esiqhubekayo, esinobukhulu obuncane, ukuze izinto ezifana ngokomqondo zisondele kuleso sikhala.
Kalula nje, akha uhlelo olungokomqondo 'lwesikhala sendawo', ehuba izincazelo ezingacacile zomuntu zibe 'izikhala' ezingabalwa ngumshini.


II. Ukuqonda okusobala: Imephu yomqondo

Cabanga umephu onobukhulu obubili (empeleni i-embedding ivamise ukuba namakhulu obukhulu, kodwa umgomo uyafana):

  • Ikati → [0.92, 0.31, -0.45, …]
  • Inja → [0.88, 0.29, -0.42, …]
  • Imoto → [0.15, -0.87, 0.53, …]

I-vector yekati neyenja zisondele kakhulu, eyemoto ikude.
I-Embedding yenza umshini ungabe esabheka amagama nje izimpawu ezinganemvelo, kodwa angaqhathanisa ngokuya 'ngokuqondana komqondo'.


III. Umgomo wobuchwepheshe (uhle olulula): Ifundwa kanjani?

Ngokusekelwe kumcabango wendimi: 'Incazelo yegama inqunywa yisimo sayo.'

  • Ngokuqeqeshwa emibhalweni eminingi (njenge-Word2Vec, ungqimba lwe-embedding lwe-BERT), imodeli ihlanganisa i-vector yegama ngalinye.
  • Ekugcineni, amagama avame ukuvela ezimweni ezifanayo (ikati nenja kuzimo ezifana 'nezilwane ezifuywayo', 'ukufuya', 'ukondla') adonswa endaweni esondelene.
  • Le nqubo ayidingi ukumaka ngenkuthalo, iyisakhiwo sejometri esivelela ngokuzenzakalelayo ekusetshenzisweni kolimi.

Isakhiwo esibalulekile: Isikhala se-vector singabamba ubudlelwano bokuqhathanisa, njengo inkosi - indoda + umuntu wesifazane ≈ indlovukazi.


IV. Ohlelweni lwe-RAG, i-Embedding yenza izinyathelo ezinjani?

  1. Lapho kwakhiwa inkomba: I-chunk ngayinye yombhalo iguqulwa ibe i-vector → igcinwe ku-vector database → kwakhiwe 'ikheli lomqondo'.
  2. Lapho kubuza: Umbuzo womsebenzisi uguqulwa ube yi-vector endaweni efanayo → kutholakale ama-vector embhalo asondele kakhulu enkombeni → kubuye izingqikithi ezinomqondo.

Isibonelo somphumela:
Umsebenzi ubuza 'Ndingakugcina kanjani ukujabula kwenja yami?', ngisho noma ugcinamthombo wazi kuphela 'Inja idinga ukuhamba nsuku zonke, okusiza impilo yayo yengqondo', i-embedding ingakwazi ukuthola umbhalo ngonya ngoba 'ukujabula/impilo/enja' zisondele ngokomqondo. Kufeza 'ukuhlangana ngomqondo', hhayi 'ukuhlangana ngomumo'.


V. Isu lokuphendula ekuxoxweni (inkulumo ephelele ye-2-3 imizuzu)

Nansi ifreyimu yokuphendula eklanyelwe ukukhombisa ubujula besifundo kanye nolwazi lwephrojekthi.

[Isingeniso esimisa umoya]

“Umsebenzi oyinhloko we-Embedding uwukuthi umdwebo wedatha engahleliwe uyiswa esikhaleni esiqhubekayo, esinobukhulu obuncane, ukuze izinto ezifana ngokomqondo zisondele kuleso sikhala. Kalula nje, akha uhlelo olungokomqondo 'lwesikhala sendawo'.”

[Ukuveza umgomo, kusho izakhiwo zakudala]

“I-encoding ye-one-hot yendabuko ayinamqondo webanga phakathi kwamagama, kanti i-embedding ifunda nge-neural network emibhalweni eminingi — 'incazelo yegama inqunywa yisimo sayo.' Ekugcineni igama/umusho ngamunye umelwe yi-vector eminyene, futhi i-cosine ye-engela phakathi kwama-vector ingalinganisa ukufana komqondo. Ngisho nokubamba ubudlelwano bokuqhathanisa, njengo inkosi - indoda + umuntu wesifazane ≈ indlovukazi.”

[Ukuhlanganisa nolwazi lwephrojekthi — okubalulekile]

“Ku-RAG system yokuphendula imibuzo engiyenzayo, ngasebenzisa ngqo i-embedding. Ngangikhethe i-text-embedding-3-small, ngihlukanise imibhalo yangaphandle yenkampani ibe ama-chunk angama-500 amagama, ngiguqule i-chunk ngayinye ibe i-vector ngayigcina ku-Qdrant.
Ngelinye ilanga umsebenzi wabuza 'Ngingenza sicelo samaholidi onyakeni?', ukusesha ngamagama akubuyanga, ngoba umbhalo uqukethe 'inqubo yesicelo sekhefu'. Kodwa i-embedding yakwazi ukuhlanekezela 'amaholidi onyakeni' ne 'khefu' endaweni esondelene, yaphumelela ukuthola isiqephu esifanele.
Ngaphinde ngagwema igodi: ekuqaleni ngasebenzisa i-embedding evamile, kwabonakala kungalungile emthethweni, kamuva ngashintshela ku-BGE-large eqeqeshwe endaweni ethile, izinga lokuthola lenyuka lisuka ku-72% liye ku-89%. Ngakho ukukhetha i-embedding model kunomthelela omkhulu emsebenzini ophansi.”

[Ukukhombisa ukucabanga okujulile, okukhombisa amandla e-senior]

“Futhi ngifuna ukwengeza: i-embedding empeleni iwukucindezela komqondo okulahlekile — ilahla ulwazi olungaphezulu njengohlelo lwamagama, isakhiwo sezwi, igcine 'umqondo omkhulu'. Ngakho kwezinye izimo ezidinga ukufana okunembile (njengemikhiqizo efana 'iPhone12' vs 'iPhone13'), ukusesha nge-vector kungenzeka kungasebenzi kahle njengamagama. Ekusebenzeni sangempela sivame ukusebenzisa ukusesha okuxubile (i-vector + i-BM25) ukugcwalisa okungekho.”

[Isiphetho]

“Ngokufingqa, i-embedding ixazulula inkinga eyisisekelo: 'Kwenziwa kanjani umshini abale ukufana komqondo?' Iyitshe legumbi le-NLP yesimanje kanye ne-RAG.”


VI. Imibuzo engalandela kanye nezimpendulo

Umbuzo olandelayo Amaphuzu okuphendula
“I-embedding iqeqeshwa kanjani?” Chaza kafushane i-CBOW/Skip-gram ye-Word2Vec (ukusebenzisa isimo ukubikezela igama eliphakathi noma okuphambene), noma ukufunda okuqhathanisayo kwesimanje (i-SimCSE, i-Sentence-BERT). Gcizelela ukuthi uqeqesho lusebenzisa izibalo zokwenzeka kanye kwezinto.
“Kulungiswa kanjani ubuhle be-embedding?” Ngemisebenzi ethile, sebenzisa izinga lokuthola, i-MRR; ama-benchmark omphakathi njenge-MTEB. Ekusebenzeni, ungakwazi ukuhlola nge-A/B testing umphumela wokusesha.
“Yiziphi izinhlobo ze-embedding ozisebenzisile? Izinzuzo nobubi?” I-OpenAI ilula kodwa ibiza, i-BGE inemiphumela emihle esiZulwini, i-M3E iyisilula, i-E5 inezilimi eziningi. Khetha ngokuya ngesimo.
“Ubukhulu be-vector bukhethwa kanjani?” Ubukhulu obukhulu bunamandla amakhulu kodwa kuyabiza ekubaleni/ekugcineni; obuncane bungaphansi kokufanele. Okuvamisile 384/768/1536, ngena ngokulingwa nokuya mayelana nokusebenza.

VII. Izixwayiso zokugwema (zisebenza ekuxoxweni)

  • ❌ Ungasho nje 'i-embedding iguqula amagama abe yi-vector' — kuwukuhamba nje, umhloli ubuza 'bese kuthiwani?'
  • ❌ Ungasho kakhulu ngezibalo (ugale nge-Hilbert space), kungabonakala sengathi ukhumbula nje kunokwenza.
  • Kubalulekile ukusho ukuthi wena ukusebenzise ukuxazulula inkinga ethile, ngisho noma kuyiphrojekthi yesifundo. Inombolo ethile (njengokuthi izinga lokuthola lenyuka ngo-17%) inamandla kuneyethiyori eyishumi.

评论

暂无已展示的评论。

发表评论(匿名)