Building a On Premise LLM Server with Axiomtek SERVER

During the Episode 53 of the “Sembang AIoT” sharing session on 14th of June 2024, we shared and discussed about the LLM ( Large Language Model) of Generative AI. Axiomtek Industrial Server platform was used as the sample server used in one of the Nvidia Standard LLM Server that serve a stand alone LLM server which is projecting a real usage of the On premise LLM Server that serve the industry. 

An LLM, or Large Language Model, is a type of artificial intelligence (AI) designed to understand and generate human language. Here are some key points about LLMs:

Architecture: LLMs are typically based on deep learning architectures, such as the Transformer model, which enables them to process and generate text in a sophisticated manner.

Training Data: They are trained on vast amounts of text data from diverse sources, including books, articles, websites, and other forms of written content. This extensive training helps them understand and mimic human language patterns.


Text Generation: LLMs can generate coherent and contextually relevant text, making them useful for writing assistance, storytelling, and content creation.

Question Answering: They can answer questions based on their training data, providing information and explanations on a wide range of topics.

Translation: LLMs can translate text between languages.

Summarization: They can summarize long texts into shorter, concise versions.

Conversation: They are used in chatbots and virtual assistants to engage in human-like conversations.


Customer Support: LLMs power chatbots that provide automated customer service.

Content Creation: Writers use LLMs for brainstorming, drafting, and editing text.

Education: LLMs assist with tutoring, providing explanations and answering educational queries.

Healthcare: They help with generating medical reports and assisting in clinical decision-making processes.


GPT (Generative Pre-trained Transformer): Developed by OpenAI, with versions like GPT-3 and GPT-4.

BERT (Bidirectional Encoder Representations from Transformers): Developed by Google, primarily used for understanding the context of words in a sentence.

T5 (Text-to-Text Transfer Transformer): Also developed by Google, designed to convert various text-based tasks into a text-to-text format.

LLMs have transformed the field of natural language processing (NLP) by enabling machines to perform tasks that require understanding and generation of human language with remarkable proficiency. However, they also raise ethical and practical concerns, such as the potential for generating misleading information, privacy issues, and the need for responsible usage and regulation.

The building of On premise LLM with Axiomtek Server required the following components:-

Axiomtek IMB700 3rd gen Scalable board

Main features of the industrial board :

3rd gen Intel® Xeon® Scalable processors (Ice Lake-SP)

Six 288-pin DDR4-3200 RDIMM for up to 384GB of memory

3 PCIe x1 and 3 PCIe x8

Supports M.2 Key M

TPM 2.0 supported (optional)

Supports multiple graphic cards

Supports internal USB dongle

GPU : RTX3090 ( Nvidia)  with 12GB Ram

LLM Chat RTX Demo

LLM chat RTX

In this event, we did a live demonstration of LLM Server with some test on the trained model with the well known model Mistral AI 7 and within the Chat RTX, it demonstrated very well on the RAG function ( Retrieval Augmented Generation RAG) which showed the high efficiency of RAG within the LLM. 

To watch the recorded session of the ” Sembang AIOT” on Gen AI LLM.

