1. Gabatarwa
Haɗuwar Manyan Harsunan AI (LLMs) kamar ChatGPT cikin sauri a cikin ilimin harsunan waje ya haifar da buƙatar gaggawa na tsarin tantancewa na musamman. Duk da cewa waɗannan hanyoyin suna nuna alƙawari a cikin tallafawa koyo mai cin gashin kai da samar da abun ciki, ainihin ƙwarewarsu ta nahawun koyarwa—wanda ke da mahimmanci don ingantaccen koyar da harshe—har yanzu ba a tantance su sosai ba. Wannan takarda ta magance wannan gibi mai mahimmanci ta hanyar gabatar da CPG-EVAL, ma'auni na farko na musamman da aka ƙera don tantance ilimin manyan harsunan AI game da nahawun koyarwa a cikin mahallin Koyar da Sinanci a matsayin Harshen Waje (TCFL).
Takardar ta yi iƙirarin cewa kamar yadda malamai na ɗan adam ke buƙatar takaddun shaida, dole ne tsarin AI da ake amfani da su a matsayin malamai su sha wahala, tantancewa mai zurfi, na fannin musamman. CPG-EVAL yana ba da tsarin aiki mai matakai da yawa wanda ya dogara da ka'ida don tantance gane nahawu, bambancewa mai zurfi, rarrabuwa ta rukuni, da juriya ga tsangwama na harshe.
2. Ayyukan Da Suka Gabata
Ma'auni na yanzu a cikin NLP, kamar GLUE, SuperGLUE, da MMLU, sun fi tantance fahimtar harshe gabaɗaya da tunani. Duk da haka, sun rasa mayar da hankali kan koyarwa da ake buƙata don tantance dacewar koyarwa. Bincike kan manyan harsunan AI a cikin ilimi ya bincika aikace-aikace kamar gyaran kuskure da aikin magana, amma an rasa tsarin tantancewa mai zurfi, wanda ya ta'allaka a kan nahawu wanda ya dogara da ƙwarewar koyar da harshe. CPG-EVAL ya cika wannan gibi ta hanyar daidaita ƙirar ma'auni tare da ingantattun tsarin rarrabuwar nahawun koyarwa daga TCFL.
3. Ma'aunin CPG-EVAL
An gina CPG-EVAL a matsayin ma'auni mai cikakken bayani, mai ayyuka da yawa don bincika sassa daban-daban na ƙwarewar nahawun koyarwa.
3.1. Tushen Ka'idar
Ma'aunin ya dogara ne akan tsarin rarrabuwar nahawun koyarwa wanda aka tabbatar da shi ta hanyar aikin koyarwa na TCFL mai faɗi. Ya wuce daidaiton tsarin jumla don tantance ilimin da ake amfani da shi a cikin yanayin koyarwa na gaske, yana mai da hankali kan ra'ayoyi kamar yanke hukunci na nahawu, bayanin kuskure, da tsara ƙa'ida.
3.2. Ƙirar Aiki & Tsari
CPG-EVAL ya ƙunshi manyan ayyuka guda biyar da aka ƙera don zama tsarin tantancewa mai ci gaba:
- Aiki 1: Yanke Hukunci Na Nahawu – Rarrabuwa biyu na daidaiton jumla.
- Aiki 2: Gano Kuskure Mai Zurfi – Nuna ainihin sashin da ya yi kuskure.
- Aiki 3: Rarraba Kuskure – Rarraba nau'in kuskure (misali, lokaci, yanayi, tsarin kalmomi).
- Aiki 4: Samar da Bayanin Koyarwa – Bayar da bayani mai sauƙi ga ɗalibi game da kuskuren.
- Aiki 5: Juriya Ga Misalai Masu Rikitarwa – Tantance aiki lokacin da aka gabatar da misalai da yawa, waɗanda ke iya haifar da ruɗani.
3.3. Ma'aunin Ƙima
Ana auna aiki ta amfani da ma'auni na rarrabuwa na yau da kullun (Daidaito, Maki-F1) don Ayyuka 1-3. Don ayyukan samarwa (Aiki 4), ana amfani da ma'auni kamar BLEU, ROUGE, da tantancewar ɗan adam akan bayyanawa, daidaito, da dacewar koyarwa. Aiki 5 yana tantance raguwar aiki idan aka kwatanta da misalai masu zaman kansu.
4. Tsarin Gwaji & Sakamako
4.1. Hanyoyin AI Da Aka Tantance
Binciken ya tantance nau'ikan manyan harsunan AI, gami da GPT-3.5, GPT-4, Claude 2, da hanyoyin buɗe tushe da yawa (misali, LLaMA 2, ChatGLM). Ana ƙarfafa hanyoyin ta hanyar sifili ko ƙananan misalai don kwaikwayi amfani da su a duniyar gaske inda ƙwarewar daidaitawa ta musamman ba za ta yiwu ba.
4.2. Babban Abubuwan Gano
Tazarar Aiki
Ƙananan hanyoyin AI (misali, sigogi miliyan 7) suna samun kusan kashi 65% na daidaito akan yanke hukunci na nahawu mai sauƙi amma sun faɗi ƙasa da kashi 40% akan ayyukan bayanin kuskure masu rikitarwa.
Fa'idar Girma
Manyan hanyoyin AI (misali, GPT-4) suna nuna ci gaba mai kashi 15-25 cikin ɗari akan ayyuka masu misalai da yawa da masu ruɗani, suna nuna ingantaccen tunani da juriya ga tsangwama.
Rauni Mai Muhimmanci
Duk hanyoyin suna fama sosai da Aiki 5 (misalai masu ruɗani), har ma masu nasara suna nuna raguwar aiki sama da kashi 30%, suna bayyana rauni a cikin rarrabuwar nahawu mai zurfi.
4.3. Binciken Sakamako
Sakamakon ya bayyana tsarin wahala a fili. Duk da yake yawancin hanyoyin na iya sarrafa daidaito a saman (Aiki 1), ikonsu na ba da bayani mai inganci na koyarwa (Aiki 4) da kiyaye daidaito a ƙarƙashin tsangwamar harshe (Aiki 5) yana da iyaka sosai. Wannan yana nuna cewa manyan harsunan AI na yanzu suna da ilimin nahawu na bayyanawa amma sun rasa ilimin tsari da sharadi da ake buƙata don ingantaccen koyarwa.
Bayanin Chati (Tunani): Chati mai layi da yawa zai nuna aikin hanyar AI (Daidaito/F1) akan axis-y a kan ayyuka biyar akan axis-x. Layukan hanyoyin daban-daban (GPT-4, GPT-3.5, LLaMA 2) za su nuna raguwa mai zurfi daga Aiki 1 zuwa Aiki 5, tare da gangaren ya fi zurfi ga ƙananan hanyoyin. Wani chati na daban zai kwatanta raguwar aiki a cikin Aiki 5 idan aka kwatanta da Aiki 1 ga kowane hanya, yana nuna "gibin raunin tsangwama."
5. Tattaunawa & Abubuwan Da Ke Tattare Da Shi
Binciken ya kammala cewa amfani da manyan harsunan AI a matsayin kayan aikin koyarwa ba tare da irin wannan tantancewa da aka yi niyya ba yana da wuri. Manyan gibin aiki, musamman a cikin ayyuka masu rikitarwa, masu dacewa da koyarwa, suna jaddada buƙatar ingantaccen daidaitawar koyarwa. Binciken ya yi kira ga: 1) Haɓaka ƙarin ma'auni masu tsauri, waɗanda suka fara da koyarwa; 2) Ƙirƙirar bayanan horo na musamman waɗanda suka mayar da hankali kan tunanin ilimi; 3) Aiwatar da daidaitawar hanya ko dabarun ƙarfafawa waɗanda ke haɓaka sakamakon koyarwa.
6. Binciken Fasaha & Tsarin Aiki
Babban Fahimta
CPG-EVAL ba wani jadawalin jagora ne kawai na daidaito ba; yana duba gaskiyar hype na AI-a-cikin-ilimi. Ma'aunin ya fallasa rashin daidaituwa na asali: Manyan harsunan AI an inganta su don hasashen gaba na alama akan tarin bayanan intanet, ba don tsarin tunani mai tsari, mai hankali ga kuskure, kuma mai dogaro da bayani da ake buƙata a cikin koyarwa ba. Wannan yana kama da tantance motar tuƙi ta kai da kai kawai akan mil na babbar hanya mai rana—CPG-EVAL yana gabatar da hazo, ruwan sama, da mahadar masu rikitarwa na koyar da harshe.
Kwararar Hankali
Hankalin takardar yana da inganci kuma yana da laifi. Ya fara ne daga wani abu da ba za a iya musantawa ba (malaman AI "marasa takaddun shaida"), ya gano takamaiman gibin ƙwarewa (nahawun koyarwa), kuma ya gina ma'auni wanda ke ci gaba da kai hari ga raunin hanya. Ci gaban aiki daga yanke hukunci mai sauƙi zuwa ingantaccen bayani a ƙarƙashin tsangwama shine babban darasi a cikin tantancewa na bincike. Ya wuce "shin hanya ta iya amsawa?" zuwa "shin hanya za ta iya koyarwa?"
Ƙarfi & Kurakurai
Ƙarfi: Mayar da hankali kan fanni na musamman shine siffar kashe shi. Ba kamar ma'auni na gabaɗaya ba, ayyukan CPG-EVAL an cire su daga ƙalubalen aji na gaske. Haɗa "juriya ga misalai masu ruɗani" yana da haske musamman, yana gwada sanin harshe na hanya—babban ƙwarewar malami. Kiran don daidaitawa da ka'idar koyarwa, ba kawai girman bayanai ba, shine gyara da ake buƙata ga yanayin ci gaban AI na yanzu.
Kurakurai: Ma'aunin a halin yanzu yana da harshe ɗaya (Sinanci), yana iyakance yaduwa. Ƙimar, duk da yake yana da fuskoki da yawa, har yanzu ya dogara da wani ɓangare akan ma'auni na atomatik (BLEU/ROUGE) don ayyukan bayani, waɗanda suke marasa kyau a matsayin wakilcin ingancin koyarwa. Dogaro mai nauyi akan tantancewar ɗan adam na ƙwararru, kamar yadda aka gani a cikin aikin Ƙungiyar BigScience ta Hugging Face akan tantancewa gabaɗaya, zai ƙarfafa da'awarsa.
Fahimta Mai Aiki
Ga Kamfanonin EdTech: Daina tallata manyan harsunan AI a matsayin malaman koyarwa da aka shirya. Yi amfani da tsarin aiki kamar CPG-EVAL don tabbatar da ciki. Saka hannun jari a kan daidaitawa akan bayanan inganci, masu bayanin koyarwa, ba kawai ƙarin rubutu na gabaɗaya ba.
Ga Masu Bincike: Wannan aikin ya kamata a faɗaɗa shi a tsaye da a kwance. A tsaye, ta hanyar haɗa ƙarin yanayin koyarwa na hulɗa, wanda ya dogara da tattaunawa. A kwance, ta hanyar ƙirƙirar makamantan don wasu harsuna (misali, Ingilishi, Sifen). Fannin yana buƙatar kayan "PedagogyGLUE".
Ga Malamai & Masu Tsara Manufofi: Bukatar bayyanawa. Kafin amfani da kayan aikin AI, nemi "makin CPG-EVAL" ko makamancinsa. Kafa ƙa'idodin takaddun shaida bisa irin waɗannan ma'auni. Misali ya wanzu a wasu fannonin AI; Tsarin Gudanar da Haɗarin AI na NIST yana jaddada tantancewa na musamman, wanda ilimi ke buƙata sosai.
Cikakkun Bayanai Na Fasaha & Tsarin Bincike
Ƙirar ma'aunin ta ƙaddara ƙwarewar koyarwa a matsayin aiki na iyawa da yawa. Za mu iya tsara aikin da ake tsammani $P$ akan aikin koyarwa $T$ kamar haka:
$P(T) = f(K_d, K_p, K_c, R)$
Inda:
$K_d$ = Ilimin Bayyanawa (ƙa'idodin nahawu),
$K_p$ = Ilimin Tsari (yadda ake amfani da ƙa'idodi),
$K_c$ = Ilimin Sharadi (loka/dalilin amfani da ƙa'idodi),
$R$ = Ƙarfi ga tsangwama da lamuran gefe.
Ayyukan CPG-EVAL suna taswira zuwa waɗannan masu canji: Ayyuka 1-3 suna bincika $K_d$, Aiki 4 yana bincika $K_p$ da $K_c$, kuma Aiki 5 yana gwada $R$ kai tsaye. Sakamakon ya nuna cewa duk da yake ƙididdigewa yana inganta $K_d$ da ɗan $R$, $K_p$ da $K_c$ sun kasance manyan matsaloli.
Misalin Tsarin Bincike
Yanayi: Tantance bayanin hanyar AI game da kuskuren a cikin "*Jiya na tafi makaranta."
Binciken Tsarin Aiki na CPG-EVAL:
1. Aiki 1 (Yanke Hukunci): Hanya ta yiwa jumlar lakabin da ba ta dace ba daidai. [Yana gwada $K_d$]
2. Aiki 2 (Gano): Hanya ta gano "tafi" a matsayin kuskure. [Yana gwada $K_d$]
3. Aiki 3 (Rarrabawa): Hanya ta rarraba kuskuren a matsayin "Rashin Daidaituwar Lokaci." [Yana gwada $K_d$]
4. Aiki 4 (Bayanin): Hanya ta samar da: "Don ayyukan da suka gabata, yi amfani da lokacin da ya gabata 'tafi'. Maɗaukakin lokaci 'jiya' yana nuna lokacin da ya gabata." [Yana gwada $K_p$, $K_c$—haɗa ƙa'ida da alamar mahalli].
5. Aiki 5 (Ruɗani): An gabatar da "Jiya na tafi..." da "Kowace rana na tafi...", dole ne hanya ta bayyana duka biyun daidai, kada ta yi gaba ɗaya. [Yana gwada $R$].
Hanya na iya wucewa 1-3 amma ta kasa 4 ta hanyar ba da ƙa'ida mai ban mamaki ("yi amfani da lokacin da ya gabata") ba tare da alaƙa da "jiya" ba, kuma ta kasa 5 ta hanyar amfani da ƙa'idar lokacin da ya gabata da ƙarfi ga aikin da aka saba a cikin misali na biyu.
7. Aikace-aikace Na Gaba & Hanyoyi
Tsarin aikin CPG-EVAL yana buɗe hanyar don ci gaba mai mahimmanci da yawa:
- Horo Na Musamman Na Hanya: Ana iya amfani da ma'auni a matsayin manufar horo don daidaita "Manyan Harsunan AI na Malamai" tare da haɓaka ƙwarewar nahawun koyarwa, wucewa daga ingantaccen tattaunawa gabaɗaya.
- Ma'auni Masu Tsallaka Harsuna: Haɗa irin waɗannan ma'auni don wasu harsuna da ake koyarwa sosai (misali, Ingilishi, Sifen, Larabci) don ƙirƙirar taswirar shirye-shiryen koyarwa na duniya na manyan harsunan AI.
- Haɗawa Da Ka'idar Ilimi: Ayyukan gaba na iya haɗa ƙarin sassa na kama harshe na biyu, kamar tsarin kama, hanyoyin ɗalibi na gama gari, da ingancin dabarun amsa gyara daban-daban, kamar yadda aka tattauna a cikin manyan ayyuka kamar Ellis (2008).
- Zuwa Ga Malaman AI Masu Takaddun Shaida: CPG-EVAL yana ba da ma'auni na asali don yuwuwar shirye-shiryen takaddun shaida na gaba don kayan aikin ilimin AI, yana tabbatar da madaidaicin ƙwarewar koyarwa kafin amfani da su a cikin azuzuwan.
8. Nassoshi
- Wang, D. (2025). CPG-EVAL: Ma'auni Mai Matakai Da Yawa Don Tantance Ƙwarewar Nahawun Koyarwar Sinanci Na Manyan Harsunan AI. arXiv preprint arXiv:2504.13261.
- Brown, T., et al. (2020). Hanyoyin Harshe Ɗalibai Ne Kaɗan. Ci Gaba a Cikin Tsarin Bayanai na Neuronal, 33.
- Ellis, R. (2008). Nazarin Kama Harshe Na Biyu (bugu na 2). Oxford University Press.
- Liang, P., et al. (2023). Tantance Hanyoyin Harshe Gabaɗaya. Ma'ana akan Binciken Koyon Injiniya.
- OpenAI. (2023). Rahoton Fasaha na GPT-4. arXiv preprint arXiv:2303.08774.
- NIST. (2023). Tsarin Gudanar da Haɗarin AI (AI RMF 1.0). Cibiyar Ƙa'idodin Fasaha ta Ƙasa.
- Hugging Face. (2023). Tantance Manyan Hanyoyin Harshe. Blog na Hugging Face. An samo daga https://huggingface.co/blog/evaluation-llms
- Bin-Hady, W. R. A., et al. (2023). Bincika rawar ChatGPT a cikin koyo da koyar da harshe. Jaridar Taimakon Kwamfuta na Koyo.