i-manager Publications

Survey on Enhancing Dialogue Agent Alignment through MiniLLM with Targeted Human Assessments

Swapnil B. Mahajan*, Chandu D. Vaidya**, Bhojraj Lalit Narware***, Divya Rameshwar Yemde****, Harshal Sanju Meshram*****, Harsh Anil Sukhdeve******, Harpreet Kaur Anoop Singh*******

*-******* Department of Computer Science and Engineering, S. B. Jain Institute of Technology, Management and Research, Nagpur, Maharashtra, India.

Periodicity:January - June'2025
DOI : https://doi.org/10.26634/jaim.3.1.21243

Abstract

This paper presents the development of a compact and effective language model inspired by the LLaMA architecture. The model's design is based on the fundamental principles of LLaMA, which influenced the architectural decisions and training methods. This study explores innovative approaches and expands the possibilities achievable with limited resources. By leveraging open-source datasets and advanced training techniques, significant progress was made without relying on extensive computational power or proprietary data. However, due to resource constraints, the model remains a work in progress. Individuals with access to greater computational capabilities could build upon this foundation to enhance its performance. This investigation aims to promote further contributions to the advancement of more robust and accessible language models. Key training parameters include context window size, number of layers, batch size, and model dimensions. Model evaluation is based on epoch count, execution time, model parameters, and validation loss.

Keywords

BERT, Machine Learning, LLM, NLP, Deep Learning.

How to Cite this Article?

Mahajan, S. B., Vaidya, C. D., Narware, B. L., Yemde, D. R., Meshram, H. S., Sukhdeve, H. A., and Singh, H. K. A. (2025). Survey on Enhancing Dialogue Agent Alignment through MiniLLM with Targeted Human Assessments. i-manager’s Journal on Artificial Intelligence & Machine Learning, 3(1), 11-25. https://doi.org/10.26634/jaim.3.1.21243

References

[1]. Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., & McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.

[2]. Adarsh, S., Shridhar, K., Gulcehre, C., Monath, N., & Sachan, M. (2024). SIKeD: Self-guided iterative knowledge distillation for mathematical reasoning. arXiv preprint arXiv:2410.18574.

[3]. Akhtar, Z. B. (2024). Unveiling the evolution of generative AI (GAI): A comprehensive and investigative analysis toward LLM models (2021–2024) and beyond. Journal of Electrical Systems and Information Technology, 11(1), 22.

[4]. Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., & Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073.

[5]. Balachandran, A. (2023). Tamil-llama: A new tamil language model based on llama 2. arXiv preprint arXiv:2311.05845.

[6]. Balvir, S. U., Raghuwanshi, M. M., & Singh, K. R. (2023). A comprehensive survey on learning based methods for link prediction problem. In 2023 6th International Conference on Information Systems and Computer Networks (ISCON) (pp. 1-7). IEEE.

[7]. Chen, L., Zaharia, M., & Zou, J. (2023). Frugalgpt: How to use large language models while reducing cost and improving performance. arXiv preprint arXiv:2305.05176.

[8]. Choudhary, R. K. (2015). Implementation of efficient search by integrating proximity ranking & instant fuzzy. International Journal of Advances in Computer Science and Cloud Computing (IJACSCC) (pp. 25-35).

[9]. Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[10]. Ferrag, M. A., Alwahedi, F., Battah, A., Cherif, B., Mechri, A., & Tihanyi, N. (2024). Generative AI and Large language models for cyber security: All insights you need. arXiv preprint arXiv:2405.12750.

[11]. Florath, A. (2023). LLM interactive optimization of open source python libraries--case studies and generalization. arXiv preprint arXiv:2312.14949.

[12]. Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30, 681-694.

[13]. Geiping, J., & Goldstein, T. (2023). Cramming: Training a Language Model on a single GPU in one day. In International Conference on Machine Learning (pp. 11117-11143). PMLR.

[14]. Glaese, A., McAleese, N., Trebacz, M., Aslanides, J., Firoiu, V., Ewalds, T., & Irving, G. (2022). Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375.

[15]. Hillier, D., Guertler, L., Tan, C., Agrawal, P., Ruirui, C., & Cheng, B. (2024). Super tiny language models. arXiv preprint arXiv:2405.14159.

[16]. Jiang, Z., Gu, J., Zhu, H., & Pan, D. (2024). Pre-RMSNorm and Pre-CRMSNorm transformers: equivalent and efficient Pre-LN transformers. Advances in Neural Information Processing Systems, 36, 1-17.

[17]. Kenton, J. D. M. W. C., & Toutanova, L. K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NaacL-HLT, 1, 4171-4186.

[18]. Khan, F. (2023). Building a Million-Parameter LLM from Scratch Using Python.

[19]. Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713-3744.

[20]. Kunc, V., & Kléma, J. (2024). Three decades of activations: A comprehensive survey of 400 activation functions for neural networks. ar Xiv preprint arXiv:2402.09092.

[21]. Lee, J., Yang, F., Tran, T., Hu, Q., Barut, E., Chang, K. W., & Su, C. (2024). Can small language models help large language models reason better?: LM-guided chain-of-thought. arXiv preprint arXiv:2404.03414.

[22]. Lewis, M. (2019). Bart: Denoising sequence-to- sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

[23]. Li, B., Zhang, Y., Bubeck, S., Pathuri, J., & Menache, I. (2024a). Small language models for application inter actions:A case study. arXiv preprint arXiv:2405.20347.

[24]. Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International Conference on Machine Learning (pp. 19730-19742). PMLR.

[25]. Li, R., Xu, J., Cao, Z., Zheng, H. T., & Kim, H. G. (2024b). Extending context window in large language models with segmented base adjustment for rotary position embeddings. Applied Sciences, 14(7), 3076.

[26]. Lv, Z. (2023). Generative artificial intelligence in the metaverse era. Cognitive Robotics, 3, 208-217.

[27]. Movva, R., Balachandar, S., Peng, K., Agostini, G., Garg, N., & Pierson, E. (2024). Topics, authors, and institutions in large language model research: Trends from 17K arXiv papers. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational L inguistics: Human Language Technologies, 1, 1223-1243.

[28]. Openai, A. R., Openai, K. N., Openai, T. S., & Openai, I. S. (2018). Improving Language Understanding by Generative Pre-Training. OpenAi Blog.

[29]. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., & Lowe, R. (2022a). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

[30]. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., & Lowe, R. (2022b). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

[31]. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., & Chintala, S. (2019). Pytorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems.

[32]. Qiao, Y., Ao, X., Liu, Y., Xu, J., Sun, X., & He, Q. (2024). LOGIN: A large language model consulted graph neural network training framework. arXiv preprint arXiv:2405.13902.

[33]. Rosenthal, S., Sil, A., Florian, R., & Roukos, S. (2024). CLAPNQ: Cohesive long-form answers from passages in natural questions for RAG systems. arXiv preprint arXiv:2404.02103.

[34]. Shazeer, N. (2020). GLU variants improve transformer. arXiv preprint arXiv:2002.05202.

[35]. Shen, K., Guo, J., Tan, X., Tang, S., Wang, R., & Bian, J. (2023). A study on ReLU and softmax in transformer. arXiv preprint arXiv:2302.06461.

[36]. Shuster, K., Xu, J., Komeili, M., Ju, D., Smith, E. M., Roller, S., & Weston, J. (2022). Blenderbot 3: A deployed conversational agent that continually learns to responsibly engage. arXiv preprint arXiv:2208.03188.

[37]. Tian, Y., Peng, B., Song, L., Jin, L., Yu, D., Mi, H., & Yu,D. (2024). Toward self-improvement of LLMs via imagination, searching, and criticizing. arXiv preprint arXiv:2404.12253.

[38]. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.

[39]. Vaidya, C. D., Botre, M., Rokde, Y., Kumbhalkar, S., Linge, S., Pitale, S., & Bawne, S. (2023b). Unveiling sentiment analysis: A comparative study of LSTM and logistic regression models with XAI insights. i-manager's Journal on Computer Science, 11(3), 36-46.

[40]. Vaidya, C., Takalkar, K., Ghosekar, A., Nimgade, S., & Ghode, V. (2023a). Decentralized file sharing. In 2023 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS) (pp. 1-6). IEEE.

[41]. Yang, D., Ziems, C., Held, W., Shaikh, O., Bernstein, M. S., & Mitchell, J. (2024). Social skill training with large language models. arXiv preprint arXiv:2404.04204.

[42]. Zhang, Z., Song, Y., Yu, G., Han, X., Lin, Y., Xiao, C., & Sun, M. (2024). ReLU $^ 2$ Wins: Discovering efficient activation functions for sparse LLMs. arXiv preprint arXiv:2402.03804.