References
[1].
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., & McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
[4].
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., & Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073.
[8]. Choudhary, R. K. (2015). Implementation of efficient search by integrating proximity ranking & instant fuzzy. International Journal of Advances in Computer Science and Cloud Computing (IJACSCC) (pp. 25-35).
[10].
Ferrag, M. A., Alwahedi, F., Battah, A., Cherif, B., Mechri, A., & Tihanyi, N. (2024). Generative AI and Large language models for cyber security: All insights you need. arXiv preprint arXiv:2405.12750.
[13]. Geiping, J., & Goldstein, T. (2023). Cramming: Training a Language Model on a single GPU in one day. In International Conference on Machine Learning (pp. 11117-11143). PMLR.
[14].
Glaese, A., McAleese, N., Trebacz, M., Aslanides, J., Firoiu, V., Ewalds, T., & Irving, G. (2022). Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375.
[15].
Hillier, D., Guertler, L., Tan, C., Agrawal, P., Ruirui, C., & Cheng, B. (2024). Super tiny language models. arXiv preprint arXiv:2405.14159.
[16]. Jiang, Z., Gu, J., Zhu, H., & Pan, D. (2024). Pre-RMSNorm and Pre-CRMSNorm transformers: equivalent and efficient Pre-LN transformers. Advances in Neural Information Processing Systems, 36, 1-17.
[17]. Kenton, J. D. M. W. C., & Toutanova, L. K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NaacL-HLT, 1, 4171-4186.
[19].
Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713-3744.
[21].
Lee, J., Yang, F., Tran, T., Hu, Q., Barut, E., Chang, K. W., & Su, C. (2024). Can small language models help large language models reason better?: LM-guided chain-of-thought. arXiv preprint arXiv:2404.03414.
[22]. Lewis, M. (2019). Bart: Denoising sequence-to- sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
[24]. Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International Conference on Machine Learning (pp. 19730-19742). PMLR.
[25].
Li, R., Xu, J., Cao, Z., Zheng, H. T., & Kim, H. G. (2024b). Extending context window in large language models with segmented base adjustment for rotary position embeddings. Applied Sciences, 14(7), 3076.
[27].
Movva, R., Balachandar, S., Peng, K., Agostini, G., Garg, N., & Pierson, E. (2024). Topics, authors, and institutions in large language model research: Trends from 17K arXiv papers. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational L inguistics: Human Language Technologies, 1, 1223-1243.
[28]. Openai, A. R., Openai, K. N., Openai, T. S., & Openai, I. S. (2018). Improving Language Understanding by Generative Pre-Training. OpenAi Blog.
[29]. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., & Lowe, R. (2022a). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
[30]. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., & Lowe, R. (2022b). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
[31]. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., & Chintala, S. (2019). Pytorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems.
[32].
Qiao, Y., Ao, X., Liu, Y., Xu, J., Sun, X., & He, Q. (2024). LOGIN: A large language model consulted graph neural network training framework. arXiv preprint arXiv:2405.13902.
[35].
Shen, K., Guo, J., Tan, X., Tang, S., Wang, R., & Bian, J. (2023). A study on ReLU and softmax in transformer. arXiv preprint arXiv:2302.06461.
[36].
Shuster, K., Xu, J., Komeili, M., Ju, D., Smith, E. M., Roller, S., & Weston, J. (2022). Blenderbot 3: A deployed conversational agent that continually learns to responsibly engage. arXiv preprint arXiv:2208.03188.
[37].
Tian, Y., Peng, B., Song, L., Jin, L., Yu, D., Mi, H., & Yu,D. (2024). Toward self-improvement of LLMs via imagination, searching, and criticizing. arXiv preprint arXiv:2404.12253.
[38].
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
[39].
Vaidya, C. D., Botre, M., Rokde, Y., Kumbhalkar, S., Linge, S., Pitale, S., & Bawne, S. (2023b). Unveiling sentiment analysis: A comparative study of LSTM and logistic regression models with XAI insights. i-manager's Journal on Computer Science, 11(3), 36-46.
[40].
Vaidya, C., Takalkar, K., Ghosekar, A., Nimgade, S., & Ghode, V. (2023a). Decentralized file sharing. In 2023 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS) (pp. 1-6). IEEE.
[41].
Yang, D., Ziems, C., Held, W., Shaikh, O., Bernstein, M. S., & Mitchell, J. (2024). Social skill training with large language models. arXiv preprint arXiv:2404.04204.
[42].
Zhang, Z., Song, Y., Yu, G., Han, X., Lin, Y., Xiao, C., & Sun, M. (2024). ReLU $^ 2$ Wins: Discovering efficient activation functions for sparse LLMs. arXiv preprint arXiv:2402.03804.