1. Gabatarwa
Koyon harshe na biyu (SLA) tsari ne mai sarkakiya da ƙarfi wanda a al’adance ana nazarinsa ta hanyar bayanai da ba su da cikakken tsari, guda ɗaya, ko na ɗan gajeren lokaci. Aikin MOSLA (Lokutan Koyon Harshe na Biyu) ya magance waɗannan gazawar ta hanyar ƙirƙirar bayanai na farko na tsawon lokaci, tsaka-tsaki, harsuna da yawa, kuma mai sarrafawa. Aikin ya rubuta ɗaliban da ke koyon Larabci, Sifen, ko Sinanci daga farko sama da shekaru biyu ta hanyar koyarwa ta kan layi kawai, yana rikodin kowane darasi. Waɗannan bayanan, waɗanda suka ƙunshi fiye da sa’o’i 250 na bidiyo, sauti, da rikodin allo, tare da bayyanannun da aka sarrafa da kansu da na’ura, suna ba da albarkatu da ba a taɓa samun irinsa ba don nazarin madaidaicin tsarin koyon harshe.
2. Hanyar Tattara Bayanai
An gina bayanan MOSLA a ƙarƙashin ƙa’ida mai tsauri, mai sarrafawa don tabbatar da daidaito da ingancin bincike.
2.1 Zaɓin Mahalarta & Zaɓin Harshe
An zaɓi mahalarta don koyon ɗaya daga cikin harsuna uku da aka yi niyya: Larabci, Sifen, ko Sinanci na Mandarin. Zaɓin ya haɗa da harsunan da ba su da haruffan Latin (Larabci da Sinanci), yana faɗaɗa yiwuwar amfani da bayanan a fannin harsuna fiye da harsunan Indo-Turai da aka saba yi bincike a kansu.
2.2 Yanayin Koyo Mai Sarrafawa
Wani muhimmin siffa na ƙira shine dokar sarrafa bayyanuwa. Mahalarta sun yarda su koyi harshen da aka yi niyya kawai ta hanyar darussan kan layi da aka bayar a tsawon shekaru biyu na binciken. Wannan sarrafawa yana rage sauye-sauyen da ke rikitar da harshe daga waje, yana ba da damar bayyana ci gaban ƙwarewa a fili ga hanyar koyarwa.
2.3 Tsarin Yin Rikodin Tsaka-tsaki
Duk darussan an gudanar da su kuma an yi rikodin su ta hanyar Zoom, suna ɗaukar rafuka uku masu daidaitawa:
- Bidiyo: Kayan aikin kyamarar kan layi na ɗalibi da malami.
- Sauti: Cikakken sautin darasi.
- Raba Allon: Allon da malami ya raba wanda ya ƙunshi kayan koyarwa, nunin faifai, da aikace-aikace.
Wannan tsari uku ya haifar da rikodi mai wadata, mai mahallin mu’amalar koyo.
Bayanai a Kallo Guda
- Tsawon Lokaci: ~shekaru 2 ga kowane ɗalibi
- Jimlar Rikodi: >sa’o’i 250
- Hanyoyi: Bidiyo, Sauti, Allon
- Harsunan da aka Yi Niyya: 3 (Larabci, Sifen, Sinanci)
- Sarrafawa: Koyarwa ta kan layi kawai
3. Tsarin Bayyana Bayanai
An sarrafa rikodin da ba a sarrafa su ba ta hanyar tsari mai sarrafa kansa da na’ura don samar da bayanan da aka tsara, waɗanda za a iya tambayarsu.
3.1 Tsarin Bayyana Mai Sarrafa Kansa Da Na’ura
An samar da bayyanannun ta hanyar haɗin gwiwar ɗan adam da na’ura:
- Rarraba Masu Magana: Rarraba sauti zuwa yankuna masu kama da mai magana (“wa ya yi magana a wane lokaci?”).
- Gano Mai Magana: Lakafta sassan a matsayin ‘malami’ ko ‘ɗalibi’.
- Gano Harshe: Lakafta sassan ta harshe (misali, Harshe na Farko/Turanci da Harshen da aka Yi Niyya).
- Gano Magana ta Atomatik (ASR): Samar da rubutun magana ga duk sassan magana.
Masu bayyana ɗan adam ne suka ƙirƙiro bayyanannun na farko, suna samar da wani ɓangare na ma’auni na zinariya da aka yi amfani da shi don daidaita mafi kyawun samfuran.
3.2 Daidaita Model & Aikin da ya yi
An daidaita samfuran da aka riga aka horar (misali, don ASR, rarraba masu magana) akan bayanan MOSLA da ɗan adam ya bayyana. Takardar ta ba da rahoton gagarumin ci gaba a aikin bayan daidaitawa, yana nuna ƙimar bayanan da suka keɓance ga yanki ko da ga manyan samfuran da aka riga aka horar. Wannan mataki yana da mahimmanci don daidaita bayyanannun ga dukan tarin bayanan da suka wuce sa’a 250.
4. Nazarin Harshe & Tsaka-tsaki
Bayanai da aka bayyana suna ba da damar sabbin nazarce-nazarce na tsarin SLA.
4.1 Ma’auni na Ci gaban Ƙwarewa
An yi nazarin yanayin tsawon lokaci ta amfani da ma’auni kamar:
- Matsakaicin Harshen da aka Yi Niyya: Kashi na kashi na maganganun ɗalibi a cikin harshen da aka yi niyya da na harshensu na asali akan lokaci.
- Bambancin Ƙamus: Auna girma da sarkakiya na ƙamus (misali, ta hanyar Matsakaicin Nau’in Alama).
- Tsawon Magana & Sarkakiya: Bin diddigin ci gaban tsarin nahawu.
Waɗannan ma’auni suna zana hoto na ƙididdiga na ci gaban ƙwarewa a cikin tafiya ta shekaru biyu.
4.2 Gano Mai da Hankali akan Allon
Wani bincike na musamman mai ƙirƙira ya haɗa da amfani da samfuran zurfin koyo na tsaka-tsaki don hasashen yankin da ɗalibi ya mai da hankali a kan allo da aka raba kawai daga sigina na bidiyo da sauti da ba a bayyana ba. Ta hanyar haɗa alamun sauti (misali, tattauna takamaiman kalma) da abubuwan da ke cikin allo, samfurin zai iya ƙididdige abin da ɗalibi yake kallo, yana ba da haske game da hankali da shiga ciki.
5. Babban Fahimta & Ra’ayi na Mai Bincike
Babban Fahimta: Aikin MOSLA ba wani bayanai kawai ba ne; yana aiki ne na kayan aiki na tushe wanda ke fallasa babban gibi tsakanin binciken SLA na keɓe, na ɗan lokaci, da kuma gaskiyar rikitarwa, ci gaba da koyo. Ƙimarsa ta ta’allaka ne akan tsawon lokaci mai sarrafawa—wata siffa da ba kasafai ake samunta ba kamar yadda take da mahimmanci. Yayin da ayyuka kamar tarin Mozilla Common Voice suka ba da damar yin amfani da bayanan magana, ba su da tsarin koyo da aka tsara da mahallin tsaka-tsaki da MOSLA ke bayarwa. Hakazalika, Aikin Raba na BEA-2019 ya mai da hankali kan ƙwarewar rubutu keɓe, ya rasa wadata, ma’ana mai ma’ana da aka ɗauka a nan.
Tsarin Ma’ana: Ma’anar aikin yana da layi mai kyau: 1) Gano fanko na hanyoyin bincike (rashin bayanai na SLA mai sarrafawa, tsaka-tsaki, na tsawon lokaci), 2) Ƙirƙirar mafita (ƙa’ida mai tsauri na mahalarta + rikodin Zoom), 3) Magance matsalar girma (bayyanannun ML tare da ɗan adam a ciki), da 4) Nuna amfani (nazarin harshe + sabbin ayyuka na tsaka-tsaki). Wannan tsari daga ƙirƙirar bayanai zuwa aikace-aikace shiri ne don kimiyyar koyo ta zahiri.
Ƙarfi & Kurakurai: Ƙarfin ba shakka ne: girma, sarrafawa, da wadata na tsaka-tsaki. Abin mafarki ne ga mai bincike don nazarin yanayin lokaci. Duk da haka, kurakurai suna cikin musayar. Yanayin “mai sarrafawa” shi ma shine babban abin ƙirƙira—koyon harshe na zahiri ba a sarrafa shi ba. Girman samfurin, yayin da yake ƙirƙirar bayanai mai zurfi na tsawon lokaci, na iya iyakance yaduwa a cikin al’ummomin ɗalibai daban-daban. Bugu da ƙari, shingen fasaha don amfani da irin wannan bayanan tsaka-tsaki mai sarkakiya ya kasance mai girma, yana iya iyakance amfani da shi nan da nan.
Fahimta Mai Aiki: Ga masu bincike, aikin nan da nan shine bincika waɗannan bayanan buɗaɗɗen. Ga Kamfanonin Fasahar Ilimi, fahimtar ita ce su wuce ma’auni masu sauƙi na kammalawa kuma su ƙirƙiri tsarin koyo kamar yadda MOSLA ke yi. Gwajin gano mai da hankali akan allo kaɗai yana nuna makomar da dandalin koyo zai iya ƙididdige shiga cikin fahimi a ainihin lokaci. Babban abin da ya fi mahimmanci shine fannin ya canza daga “hotuna” na tsaka-tsaki zuwa “fina-finai” na tsawon lokaci na koyo. MOSLA ta gina kyamara; yanzu lokaci ya yi da al’umma su fara yin fina-finai.
6. Cikakkun Bayanai na Ai’idar Fasaha
Tsarin bayyanannun ya dogara ne akan samfuran koyo na inji da yawa. Za a iya tsara ra’ayi mai sauƙi na aikin rarraba masu magana da gano su a matsayin matsalar ingantawa. Bari $X = \{x_1, x_2, ..., x_T\}$ ya wakilci jerin siffofi na sauti. Manufar ita ce nemo jerin lakabin mai magana $S = \{s_1, s_2, ..., s_T\}$ da ainihin mai magana $Y = \{y_1, y_2, ..., y_K\}$ waɗanda suka ƙara yawan yiwuwar bayan haka:
$P(S, Y | X) \propto P(X | S, Y) \cdot P(S) \cdot P(Y)$
Inda:
- $P(X | S, Y)$ shine yiwuwar siffofi na sauti da aka bayar sassan mai magana da ainihinsa, galibi ana yin samfurin ta amfani da Samfuran Gaurayawan Gaussian (GMMs) ko zurfin hulɗar jijiyoyin kwakwalwa kamar x-vectors.
- $P(S)$ shine fifiko akan yanayin juyin mai magana, yana ƙarfafa ci gaba na ɗan lokaci (misali, ta amfani da samfurin Markov ɓoye).
- $P(Y)$ yana wakiltar ilimin farko na ainihin mai magana (malami da ɗalibi).
Daidaitawa akan bayanan MOSLA da farko yana inganta kimanta $P(X | S, Y)$ ta hanyar daidaita samfurin sauti (misali, mai cire x-vector) zuwa takamaiman yanayin sauti da halayen mai magana na ajin kan layi.
7. Sakamakon Gwaji & Binciken da aka Gano
Takardar ta gabatar da muhimman binciken da aka gano daga nazarin bayanan MOSLA:
- Hanyoyin Ƙwarewa: Zane-zane suna nuna bayyanannen, haɓaka mara layi a cikin kashi na amfani da harshen da aka yi niyya ta ɗalibai akan lokaci, tare da tsayawa da tsalle-tsalle masu dacewa da raka’o’in koyarwa daban-daban. Ma’auni na bambancin ƙamus suna nuna ci gaba mai ƙarfi, yana haɓaka bayan watanni shida na farko.
- Ribobin Aikin Model: Daidaita samfurin Wav2Vec2.0 da aka riga aka horar don ASR akan sa’o’i 10 kawai na rubutun ɗan adam na MOSLA ya rage Kuskuren Kalma (WER) da fiye da 35% akan bayanan MOSLA da aka ajiye idan aka kwatanta da samfurin tushe. An ba da rahoton irin wannan gagarumin ci gaba don ayyukan gano mai magana da harshe.
- Gano Mai da Hankali akan Allon: An horar da samfurin tsaka-tsaki (misali, mai canza hangen nesa don firam ɗin allo da aka haɗa da mai ɓoye sauti) don rarraba faffadan yanki na mai da hankali akan allo (misali, “rubutun nunin faifai,” “bidiyo,” “allon farar fata”). Samfurin ya cimma daidaito wanda ya fi yuwuwar sama da kashi, yana nuna cewa haɗin gwiwar sauti da hangen nesa yana ɗauke da sigina masu ma’ana game da hankalin ɗalibi, ko da ba tare da kayan aikin bin diddigin ido ba.
Hoto 1 (Na Ra’ayi): Takardar ta haɗa da hoto na ra’ayi wanda ke kwatanta tsarin MOSLA: Tattara Bayanai (rikodin Zoom) -> Bayyana Bayanai (Rarraba masu magana, Gano, ASR) -> Nazarin Tsaka-tsaki (Mai da hankali akan allo) & Nazarin Harshe na SLA (Ma’auni na ƙwarewa). Wannan hoton yana jaddada cikakken tsarin aikin, mai bin tsari.
8. Tsarin Nazari: Ƙirƙirar Tsarin Ci gaban Ƙwarewa
Hali: Ƙirƙirar Tsarin “Amfani da Harshen da aka Yi Niyya”
Masu bincike za su iya amfani da bayanan MOSLA don gina samfuran lanƙwasa na girma. Misali mai sauƙi yana nazarin matsakaicin kashi na mako-mako na maganganun harshen da aka yi niyya (TL) ta ɗalibi. Bari $R_t$ ya zama matsakaicin TL a mako $t$.
Za a iya ƙayyade samfurin layi mai gauraye mai sauƙi kamar haka:
R_t ~ 1 + Time_t + (1 + Time_t | Learner_ID)
Inda:
1 + Time_tyana ƙirƙirar tasirin ƙayyadaddun matsakaicin tsaka-tsaki da gangara (matsakaicin tsarin ci gaban girma).(1 + Time_t | Learner_ID)yana ba da damar duka farawa (tsaka-tsaki) da ƙimar girma (gangara) su bambanta bazuwar a cikin ɗalibai ɗaya.
Ta amfani da bayanan MOSLA, mutum zai iya daidaita wannan samfurin (misali, ta amfani da R’s lme4 ko Python’s statsmodels) don kimanta matsakaicin haɓakar amfani da TL na mako-mako da matakin bambancin mutum ɗaya. Ƙarin samfuran masu sarkakiya za su iya haɗa lokacin koyarwa a matsayin mai hasashen ko ƙirƙirar girma mara layi ta amfani da sharuɗɗan polynomial ko spline don Time. Wannan tsarin yana wucewa fiye da kwatanta gwaje-gwaje na gaba da na baya don ƙirƙirar dukan lanƙwasa na koyo.
9. Ayyukan Gaba & Hanyoyin Bincike
Bayanai na MOSLA sun buɗe hanyoyi masu yawa don aikin gaba:
- Hanyoyin Koyo Na Musamman: Algorithms za su iya nazarin tsarin farko na ɗalibi a cikin MOSLA don hasashen matsalolin da za su yi a gaba da ba da shawarar sake dubawa ko kayan aiki na musamman.
- Ƙimar Ƙwarewa ta Atomatik: Haɓaka samfuran ƙima masu ƙima, ci gaba da ci gaba waɗanda suka wuce gwaje-gwaje da aka daidaita, ta amfani da alamun tsaka-tsaki (sauƙin magana, zaɓin ƙamus, lafazin kalma, shiga ciki) kamar yadda yake a cikin binciken ETS akan ƙimar magana ta atomatik.
- Nazarin Malami: Nazarin dabarun malami da haɗin kai da ci gaban ɗalibi, yana ba da ra’ayoyin da aka samo daga bayanai don horar da malami.
- Nazarin Canja Harsuna: Kwatanta tsarin koyo tsakanin Larabci, Sifen, da Sinanci don fahimtar yadda takamaiman siffofi na harshe (misali, tsarin sautin murya, rubutu) ke shafar tsarin koyo.
- Samfuran Tushe na Tsaka-tsaki: MOSLA wuri ne mai kyau na horo don gina samfuran AI na tsaka-tsaki waɗanda suka fahimci tattaunawar ilimi, mai yuwuwar haifar da ƙarin ƙwararrun malamai na AI.
- Faɗaɗawa: Maimaitawar gaba na iya haɗawa da ƙarin harsuna, manyan tarin mahalarta masu yawa da bambancin su, bayanan halittar jiki (kamar bugun zuciya don damuwa/matsakaicin fahimi), da haɗawa da bayanan tsarin sarrafa koyo (LMS).
10. Nassoshi
- Geertzen, J., Alexopoulou, T., & Korhonen, A. (2014). Bayyana Harshe ta Atomatik na Manyan Bayanan L2: EF-Cambridge Open Language Database (EFCAMDAT). A cikin Proceedings of the 9th Workshop on Innovative Use of NLP for Building Educational Applications.
- Settles, B., T. LaFlair, G., & Hagiwara, M. (2018). Ƙimar Harshe Mai Koyon Injin. Transactions of the Association for Computational Linguistics.
- Stasaski, K., Devlin, J., & Hearst, M. A. (2020). Auna da Inganta Bambancin Ma’ana na Ƙirƙirar Tattaunawa. A cikin Findings of the Association for Computational Linguistics: EMNLP 2020.
- Hampel, R., & Stickler, U. (2012). Amfani da tattaunawar bidiyo don tallafawa hulɗar tsaka-tsaki a cikin ajin harshe na kan layi. ReCALL, 24(2), 116-137.
- Mozilla Common Voice. (n.d.). An samo daga https://commonvoice.mozilla.org/
- Educational Testing Service (ETS). (2021). Ƙimar Magana ta Atomatik. Rahoton Bincike.
- Hagiwara, M., & Tanner, J. (2024). Aikin MOSLA: Yin Rikodin Kowane Lokaci na Koyon Harshe na Biyu. arXiv preprint arXiv:2403.17314.