Zaɓi Harshe

Tsarin Haɗin Kai don Koyon Harshe na Biyu: Hanyar da ta yi Nasara a Aikin Raba na SLAM na 2018

Bincike kan sabon tsarin haɗin kai wanda ya haɗa Bishiyoyi na yanke shawara masu Haɓakawa da RNNs don hasashen gibin ilimin ɗalibi a cikin koyon harshe, wanda ya sami maki mafi girma a Aikin Raba na SLAM na 2018.
study-chinese.com | PDF Size: 0.2 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - Tsarin Haɗin Kai don Koyon Harshe na Biyu: Hanyar da ta yi Nasara a Aikin Raba na SLAM na 2018

1. Gabatarwa

Hasashen daidaitaccen yanayin ilimin ɗalibi shine ginshiƙi don gina ingantattun tsarin koyo na musamman. Wannan takarda ta gabatar da sabon tsarin haɗin kai da aka ƙera don hasashen kurakuran da masu koyon harshe suka yi a matakin kalma, aikin da ke tsakiyar gano gibin ilimi. An ƙera tsarin don kuma ya sami mafi girman maki akan duka ma'aunin kimantawa (AUC da F1-score) a cikin dukkan bayanan harsuna uku (Turanci, Sifen, Faransanci) a cikin Aikin Raba na 2018 akan Tsarin Koyon Harshe na Biyu (SLAM), wanda ya yi amfani da bayanan bin diddigin Duolingo. Aikin yana haɗa ƙwararrun dabarun koyon na'ura tare da ƙalubalen aiki na ƙirƙirar tsarin koyon harshe mai rikitarwa da jeri.

2. Bayanai da Tsarin Kimantawa

Binciken ya dogara ne akan bayanai daga Aikin Raba na SLAM na 2018, yana ba da ma'auni na daidaitaccen ma'auni ga fannin.

2.1. Bayanan Aikin Raba na SLAM na 2018

Bayanai sun ƙunshi bayanan hulɗar ɗalibai da ba a san sunansu ba daga masu amfani da Duolingo a cikin kwanaki 30 na farko na koyon Turanci, Sifen, ko Faransanci. Wani muhimmin siffa shi ne cewa ba a ba da ainihin jimlar shigar da mai amfani ba; a maimakon haka, bayanan sun haɗa da jimlar daidai ta "madaidaicin daidaito" daga ƙayyadaddun saiti, wanda aka daidaita ta amfani da hanyar mai canza jihohi mai iyaka. Manufar hasashen ita ce alamar binary ga kowane alama (kalma) a cikin wannan jimlar da ta dace, wanda ke nuna ko mai amfani ya yi kuskure akan wannan kalmar.

2.2. Ma'anar Aiki da Ma'aunin Kimantawa

An tsara aikin a matsayin matsalar rarrabuwa ta binary a matakin alama. An raba bayanai bisa lokaci ga kowane mai amfani: kashi 10% na ƙarshe na abubuwan da suka faru don gwaji, kashi 10% na ƙarshe na sauran don haɓakawa, da sauran don horarwa. Ana kimanta aikin tsarin ta amfani da Yankin Ƙarƙashin Lanƙwasa ROC (AUC) da makin F1, ma'auni waɗanda ke daidaita daidaito da tunawa don ayyukan rarrabuwa marasa daidaito waɗanda aka saba da su a cikin bayanan ilimi.

2.3. Iyakoki don Muhallin Samarwa

Marubutan sun lura da mahimmanci cewa tsarin aikin raba bai cika kwatanta ainihin muhallin samarwa na lokaci-lokaci don koyon daidaitawa ba. An bayyana bambance-bambance guda uku masu mahimmanci: (1) An ba tsarin amsar daidai ta "madaidaicin daidaito", wadda ba za a san ta ba tun kafin don tambayoyi masu buɗe ido. (2) Akwai yuwuwar ɓarkewar bayanai saboda siffofi waɗanda suka haɗa da bayanan gaba. (3) Kimantawar ba ta haɗa da masu amfani na "farawa mai sanyi" ba, kamar yadda ake horar da tsare-tsare kuma ana gwada su akan bayanai daga rukunin masu koyo iri ɗaya.

3. Hanya

Babban gudunmawar shi ne tsarin haɗin kai wanda ke haɗa ƙarfin nau'ikan koyon na'ura daban-daban biyu cikin dabara.

3.1. Dalilin Tsarin Haɗin Kai

Haɗin kai yana amfani da ƙarfin haɗin kai na Bishiyoyin yanke shawara masu Haɓakawa (GBDT) da Cibiyoyin Jijiya masu Maimaitawa (RNNs). GBDTs suna da kyau wajen koyon rikitattun hulɗa, waɗanda ba su da layi daga bayanan siffa da aka tsara, yayin da RNNs, musamman Cibiyoyin Jijiya na Dogon Lokaci (LSTM), su ne mafi kyau don ɗaukar dogaro na lokaci da tsarin jeri a cikin bayanai.

3.2. Bangaren Bishiyar yanke shawara mai Haɓakawa (GBDT)

Wannan bangare yana sarrafa tarin siffofi da aka ƙera da hannu waɗanda ke akwai ga kowane alamar motsa jiki. Waɗannan sun haɗa da siffofin lexical (wahalar kalma, sashi na magana), siffofin tarihin mai amfani (daidaiton da ya gabata akan wannan kalma/ra'ayi), siffofin mahallin motsa jiki, da siffofin lokaci. Tsarin GBDT yana koyon hasashen yuwuwar kuskure $P(y=1|\mathbf{x}_{\text{feat}})$ inda $\mathbf{x}_{\text{feat}}$ shine vector na siffa.

3.3. Bangaren Cibiyar Jijiya mai Maimaitawa (RNN)

Wannan bangare yana sarrafa jerin hulɗar motsa jiki ga mai amfani. Yana ɗaukar wakilcin kowane abin da ya faru na motsa jiki (mai yuwuwa ya haɗa da ƙaddamar da ID na alama da sauran siffofi) kuma yana sabunta vector ɗin yanayin ɓoye $\mathbf{h}_t$ wanda ke ɓoye yanayin ilimin mai koyo akan lokaci. Hasashen don alama a mataki $t$ an samo shi daga wannan yanayin ɓoye: $P(y=1|\mathbf{h}_t)$.

3.4. Dabarar Haɗin Haɗin Kai

Hasashen ƙarshe shine haɗin da aka auna ko mai koyo na meta (kamar regression na logistic) wanda ke ɗaukar hasashen daga tsarin GBDT da RNN a matsayin shigarwa. Wannan yana ba da damar haɗin kai don auna mahimmanci na tsarin tushen siffa da tsarin jeri a hankali. Za'a iya tsara haɗin hasashen kamar haka: $P_{\text{ensemble}} = \alpha \cdot P_{\text{GBDT}} + (1-\alpha) \cdot P_{\text{RNN}}$ ko ta hanyar aikin da aka koya $g(P_{\text{GBDT}}, P_{\text{RNN}})$.

4. Sakamako da Tattaunawa

4.1. Ayyuka akan Aikin Raba na SLAM

Tsarin haɗin kai da aka gabatar ya sami mafi girman maki akan duka AUC da F1-score ga dukkan bayanan harsuna uku (Turanci, Sifen, Faransanci) a cikin Aikin Raba na SLAM na 2018. Wannan yana nuna mafi girman daidaiton hasashensa idan aka kwatanta da sauran tsare-tsaren da aka gabatar, waɗanda ƙila sun haɗa da RNN mai tsafta (kamar bambance-bambancen DKT) ko wasu hanyoyin al'ada.

Sakamako Mai Muhimmanci: Babban aiki a duk ma'auni da bayanai yana tabbatar da ingancin hanyar haɗin kai na haɗin gwiwa don wannan takamaiman aikin bin diddigin ilimi.

4.2. Binciken Hasashen Tsarin

Marubutan sun tattauni game da lokutan da za a iya inganta hasashen tsarin, mai yiwuwa yana da alaƙa da ƙirƙirar harshe da ba a saba da su ba, motsa jiki masu shakku sosai, ko yanayi tare da tarihin mai amfani mara yawa. Binciken ya jaddada cewa duk da cewa haɗin kai yana da ƙarfi, cikakken hasashe yana da wahala saboda hayaniya da rikitarwa na asali na koyon ɗan adam.

4.3. Kwatanta da Tsarukan Al'ada (IRT, BKT, DKT)

Takardar ta tsara kanta da tushen da aka kafa: Ka'idar Amsar Abubuwa (IRT) da Bin diddigin Ilimi na Bayesian (BKT), waɗanda suka fi fahimta amma sau da yawa ba su da sassauci, da Bin diddigin Ilimi mai zurfi (DKT), hanya ta farko ta tushen RNN. Nasarar haɗin kai tana nuna cewa haɗa ƙarfin wakilci na koyo mai zurfi tare da ingantaccen sarrafa siffa na tsarin tushen bishiya na iya fi kowane tsari guda ɗaya.

5. Cikakkun Bayanai na Fasaha da Tsarin Lissafi

Ƙarfin haɗin kai yana cikin tsarinsa. GBDT tana inganta aikin asarar $\mathcal{L}_{\text{GBDT}} = \sum_{i} l(y_i, F(\mathbf{x}_i))$, inda $F$ shine ƙarin tsarin bishiyoyi. RNN, mai yiwuwa LSTM, tana sabunta yanayin tantanin halitta $\mathbf{c}_t$ da yanayin ɓoye $\mathbf{h}_t$ ta hanyar hanyoyin ƙofar: $\mathbf{f}_t = \sigma(\mathbf{W}_f \cdot [\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_f)$ (Ƙofar Manta) $\mathbf{i}_t = \sigma(\mathbf{W}_i \cdot [\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_i)$ (Ƙofar Shigarwa) $\tilde{\mathbf{c}}_t = \tanh(\mathbf{W}_c \cdot [\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_c)$ (Yanayin ɗan takara) $\mathbf{c}_t = \mathbf{f}_t \circ \mathbf{c}_{t-1} + \mathbf{i}_t \circ \tilde{\mathbf{c}}_t$ $\mathbf{o}_t = \sigma(\mathbf{W}_o \cdot [\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_o)$ (Ƙofar Fitowa) $\mathbf{h}_t = \mathbf{o}_t \circ \tanh(\mathbf{c}_t)$ Layer na ƙarshe na hasashen yana lissafin $P_{\text{RNN}}(y_t=1) = \sigma(\mathbf{W}_p \mathbf{h}_t + b_p)$.

6. Tsarin Bincike: Fahimta ta Asali & Zargi

Fahimta ta Asali: Tsarin nasara na takardar ba sabon algorithm ne na juyin juya hali ba, amma haɗin kai mai tsananin aiki. Ya yarda da wani sirri na datti na bayanan EdTech na ainihi: cakuɗe ne na siffofi da aka ƙera da hankali (metadata na motsa jiki, ƙididdiga na mai amfani) da bayanan ɗabi'a na jeri, masu ɗanyen. Haɗin kai yana aiki azaman injin tsari biyu: GBDT tana murƙushe siffofi na tsayayye, na tebur tare da ingantacciyar inganci, yayin da RNN ke ba da haske game da tafiyar mai koyo mai tasowa. Wannan ya fi game da hazakar AI kuma ya fi game da aikin aiki—amfani da kayan aiki daidai don kowane ɓangare na aikin.

Kwararar Ma'ana: Hujja tana da ƙarfi. Fara da ingantaccen ma'auni mai mahimmanci (SLAM). Gane yanayin bayanai biyu (mai yawan siffa + jeri). Gabatar da tsarin tsarin da ke magance wannan dualidad kai tsaye. Tabbatar da sakamako mafi girma. Sa'an nan, mahimmanci, komawa baya don tambayar ingancin ma'auni na ainihin duniya. Wannan mataki na ƙarshe shine abin da ke raba aikin ilimi daga binciken da aka yi amfani da shi. Yana nuna ƙungiyar tana tunanin turawa, ba kawai allunan jagora ba.

Ƙarfi & Kurakurai: Ƙarfi: Tsarin yana da tasiri a kan aikin. Tattaunawar game da rashin daidaituwa na muhallin samarwa tana da matuƙar ƙima kuma sau da yawa ana yin watsi da su a cikin takardun bincike masu tsafta. Yana ba da cikakken tsari don tsarin bin diddigin ilimi mai inganci. Kurakurai: Takardar gajere ce ta taro, don haka cikakkun bayanai ba su da yawa. Ta yaya aka haɗa tsare-tsaren daidai? Matsakaicin sauƙi ko jagora na meta da aka koya? Wadanne takamaiman siffofi ne suka haifar da GBDT? Binciken "lokutan da za a iya inganta hasashen" ba shi da tabbas. Bugu da ƙari, farashin lissafi da jinkirin gudanar da tsare-tsare masu rikitarwa biyu tare don keɓancewa na lokaci-lokaci ba a magance su ba—babban damuwa ga tsarin samarwa inda saurin hujja ke da mahimmanci.

Hanyoyin Aiki: Ga masu aiki, abin da za a ɗauka a bayyane yake: Kada ku zaɓi tsakanin bishiyoyi da raga—haɗa su yana aiki. Lokacin gina nasu tsarin mai koyo, ku saka hannun jari don ƙirƙirar ingantaccen saitin siffofi masu fahimta don tsarin tushen bishiya don cinyewa a layi daya tare da tsarin jerin ku. Mafi mahimmanci, yi amfani da wannan takarda azaman lissafin tantance bincike: koyaushe ku tambayi ko tsarin kimantawa yana da "ɓarkewar bayanai" daga gaba ko yana yin watsi da matsalar farawa mai sanyi, kamar yadda aka nuna a nan. Don matakai na gaba, bincike ya kamata ya mai da hankali kan (a) narkar da tsarin don matsawa haɗin kai zuwa tsari guda ɗaya, mai sauri ba tare da asarar aiki mai mahimmanci ba, da (b) ƙirƙirar tsarin kimantawa waɗanda ke kwaikwayon ainihin yanke shawara na lokaci-lokaci, na jeri, watakila suna zana wahayi daga kimantawar koyon ƙarfafawa a cikin yanayin kwaikwayo.

7. Misalin Tsarin Bincike

Yanayi: Kamfanin EdTech yana son hasashen ko mai koyo zai yi wahala tare da yanayin subjunctive na Faransanci a cikin motsa jiki na gaba. Aikace-aikacen Tsarin: 1. Ƙirƙirar Siffa (Shigarwar GBDT): Ƙirƙiri siffofi: daidaiton tarihin mai koyo akan motsa jiki na subjunctive, lokaci tun kwanan nan na aikin subjunctive na ƙarshe, rikitarwar takamaiman jimla, adadin sababbin kalmomin ƙamus a cikin motsa jiki. 2. Tsarin Jeri (Shigarwar RNN): Ciyar da RNN jerin hulɗar motsa jiki na ƙarshe 20 na mai koyo, kowanne ana wakilta shi azaman haɗakar nau'in motsa jiki da tsarin daidaito. 3. Hasashen Haɗin Kai: GBDT tana fitar da yuwuwar bisa ga siffofi na tsayayye (misali, "babban haɗari saboda dogon lokaci tun aikin"). RNN tana fitar da yuwuwar bisa ga jerin kwanan nan (misali, "ƙananan haɗari saboda mai koyo yana kan zazzafan zango"). 4. Yanke Shawara na Meta: Mai haɗa haɗin kai (misali, ƙaramin cibiyar jijiya) yana auna waɗannan siginonin da suka ci karo da juna. Yana iya yanke shawarar cewa kwanan wannan nasara (siginar RNN) ya fi girman haɗarin tasirin tazara (siginar GBDT) kuma ya fitar da yuwuwar kuskuren da aka hasashen matsakaici. 5. Aiki: Tsarin yana amfani da wannan yuwuwar. Idan an ɗauki haɗari a matsayin babba, zai iya ba da shawarar a gabace shi ko zaɓar ɗan sauƙin motsa jiki don haɓaka koyo.

8. Aikace-aikace na Gaba da Jagororin Bincike

  • Bayyan Hasashen Kuskure na Binary: Tsawaita tsarin don hasashen nau'in kuskure (misali, na nahawu, na lexical, rubutu) ko don ƙirƙirar ƙwarewa a matsayin madaidaicin madaidaici mai ci gaba.
  • Bin diddigin Ilimi na Tsakanin Yanki: Yin amfani da hanyar haɗin kai zuwa wasu yankuna na koyo na jeri kamar lissafi (hasashen kurakuran warware matsaloli mataki-mataki) ko coding.
  • Haɗawa tare da Koyon Ƙarfafawa (RL): Yin amfani da ingantaccen hasashen haɗin kai na gibin ilimi a matsayin wakilcin "yanayi" ga wakilin RL wanda ke yanke shawarar wane motsa jiki zai gabatar da shi gaba, yana matsawa zuwa cikakken koyon manufofin ilimi mai cin gashin kansa.
  • Mayar da hankali kan Bayyanawa: Haɓaka hanyoyin bayyana hasashen haɗin kai, watakila ta amfani da mahimmanciyar siffa ta GBDT da hanyoyin hankali na RNN, don ba da amsa mai aiki ga duka masu koyo da malamai.
  • Ƙirar Tsarin da aka Tsara don Samarwa: Bincike cikin dabarun narkar da ilimi don ƙirƙirar tsari guda ɗaya, mai sauƙi wanda ke adana daidaiton haɗin kai don turawa mara jinkiri a cikin ƙa'idodin ilimi na wayar hannu.

9. Nassoshi

  1. Osika, A., Nilsson, S., Sydorchuk, A., Sahin, F., & Huss, A. (2018). Tsarin Koyon Harshe na Biyu: Hanyar Haɗin Kai. arXiv preprint arXiv:1806.04525.
  2. Settles, B., Brunk, B., Gustafson, L., & Hagiwara, M. (2018). Tsarin Koyon Harshe na Biyu. Proceedings of the NAACL-HLT 2018 Workshop on Innovative Use of NLP for Building Educational Applications.
  3. Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L. J., & Sohl-Dickstein, J. (2015). Bin diddigin Ilimi mai zurfi. Advances in Neural Information Processing Systems (NeurIPS).
  4. Corbett, A. T., & Anderson, J. R. (1994). Bin diddigin ilimi: Tsarin samun ilimin tsari. User Modeling and User-Adapted Interaction.
  5. Lord, F. M. (1952). Ka'idar makin gwaji. Psychometric Monographs.
  6. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Raganun Adversarial Nets. Advances in Neural Information Processing Systems (NeurIPS). (An ambata a matsayin misalin tsarin tsarin haɗin gwiwa na asali wanda ke rinjayar wasu yankuna).
  7. Duolingo. (n.d.). Binciken Duolingo. An samo daga https://research.duolingo.com/ (A matsayin tushen bayanan da kuma babban mai taka rawa a cikin binciken SLA da aka yi amfani da shi).