In fact, more new open-source models have been released in the last 1.5 years than closed-source models:
Today we’ll dive into the world of open-source LLMs and:
- review 5 top open-source LLMs;
- demonstrate different ways to easily access them;
- showcase how to get started with the open-source LLMs in a brand new LangChain edition of n8n.
Read on to find out!
Are there any open-source LLMs?
For this article, we picked several popular pre-trained models: Llama2, Mistral, Falcon, MPT, and BLOOM. These are often considered as “base” models, which require additional fine-tuning. Original developers usually provide each model in several sizes plus they offer one or two fine-tuned models. You can use them directly or pick the fine-tuned versions from some third-party developers.
tasks in English.
for 7B and 40B;
licence for 180B
–Long fiction story writing.
|2k||–Multilingual text generation;
(Modified Apache 2.0)
In any case, you can quickly add most of these open-source LLMs into your n8n workflows. If you are using a LangChain version of n8n, there are even more possibilities for you.
Currently, most of them use the OpenAI Model node. But in many cases, you can interchange them with the open-source models.
Don’t miss these details in the second half of the article!
What are the benefits of open-source LLMs?
Even though GPT-4 is considered the gold standard for AI language models, it’s proprietary. Open-source LLMs have their own advantages, such as:
- Accessibility: They are accessible to anyone. This promotes inclusion and equal opportunities for learning and experimenting with such models.
- Collaboration: Open-source projects foster community collaboration. Developers across the globe can help make the models more robust, diverse and efficient over time.
- Innovation: They can be used as a basis for further research and development and thus promote innovation. Researchers can build on existing models, optimize them and push the boundaries of what's possible.
- Transparency: Because the code is open to view, it's easier to understand how the model works and to trust its outputs. This transparency also makes it possible to identify and fix potential bugs or issues more quickly.
- Cost-effective: Open-source LLMs can be a cost-effective solution for businesses and individuals. These models can be used without license fees and even customized according to specific needs. However, there are certain costs associated with running LLMs as they are computationally intensive.
What is the best open-source LLM?
There is no single best open-source LLM.
And here’s why.
There are many benchmarks for rating the models, and various research groups decide for themselves which benchmarks are suitable. This makes objective comparison rather non-trivial.
Thanks to the Hugging Face, there is a public leaderboard for the open-source LLMs.
It performs tests on 4 key benchmarks using the Eleuther AI Language Model Evaluation Harness. The results are aggregated and each model receives a final score. Learn more about the evaluation process in a blog article.
The leaderboard can be filtered by various parameters, such as model size, quantization method, whether it’s a basic or fine-tuned model, etc. In general, larger and fine-tuned models have a higher score.
Several fine-tuned Llama2 implementations lead the rating, but this can change at any time as the leaderboard is an open competition and anyone can submit their model for evaluation. For example, a new pretrained model Yi-34B was released in November 2023 and already achieved the highest scores among the base LLMs. This can encourage the development of even more advanced fine-tuned models.
Let’s take open-source LLMs one by one to have a closer look into them!
Llama 2 is a family of large language models (LLMs) developed by Meta with 7 billion to 70 billion parameters. The models, which are available in both pre-trained and fine-tuned versions, have been optimized for text generation and dialogue use cases.
At the time of release, these models have demonstrated superior performance on most benchmarks tested against open-source chat models and match the effectiveness of certain popular closed-source models, such as GPT-3 and PaLM.
Llama 2 is designed for commercial and scientific use in English. However, the models must be used in compliance with the Acceptable Use Policy and Licensing Agreement for Llama 2.
To this day, Llama2 serves as a backbone for many new, fine-tuned models created by various researchers and enthusiasts.
- The models are auto-regressive and use an optimized transformer architecture.
- Model training using a new mix of publicly available online data, with the biggest model utilizing Grouped-Query Attention for improved inference scalability. The training data used for Llama 2 does not include meta-user data.
- The estimated carbon emissions of the model during the pre-training have been offset by Meta’s sustainability program.
- The fine-tuned models are perfect for an assistant-like chat.
- The pre-trained versions can be customized for a variety of natural language generation tasks.
- The base models are great for further research and fine-tuning. A detailed research paper is available.
Mistral-7B-v0.1 is a powerful LLM developed by Mistral AI – a young French start-up. Its first with 7.3 billion parameters outperforms Llama-2 13B across all benchmarks and is competitive with Llama-1 34B on many fronts.
It uses a transformer architecture, featuring Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer. The Mistral model is designed for efficient operation and offers faster inference times and the ability to process longer sequences with fewer computing resources.
However, it should be noted that as a base model, it has no built-in moderation mechanisms.
- State-of-the-art performance that outdoes much larger models on various benchmarks.
- Efficient design with Grouped-query attention for faster inference and Sliding Window Attention for handling larger sequences.
- Versatility, demonstrated by good performance on both English and code-related tasks.
- Easy fine-tuning, demonstrated by the model's impressive performance when fine-tuned for chat applications.
- Open licensing, allowing unrestricted use under the Apache 2.0 license.
- Suitable for a wide range of applications, including Natural Language Understanding and Generation, Chatbots, and Code Generation.
- Its efficient design makes it a good choice for deployment in resource-constrained environments, and its superior performance makes it a strong contender for demanding NLP tasks.
Falcon LLM is a flagship series developed by the United Arab Emirates' Technology Innovation Institute (TII), a major global research center. It was developed using a custom data pipeline and a distributed training library and offers high performance on multiple Natural Language Processing (NLP) benchmarks.
Falcon-180B is the largest and most powerful open-access model currently available. Its inference-optimized architecture outperforms other models such as LLaMA-2, StableLM, RedPajama, and MPT. Additionally, the Falcon-7B and Falcon-40B models are smaller but state-of-the-art for their size.
- The Falcon 180B model has been trained in several languages, such as English, German, Spanish, French (and has limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish). Smaller versions are also multilingual.
- The Falcon series features the special RefinedWeb dataset, a large-scale web dataset with stringent filtering and deduplication.
- Particularly promising in research on large language models, serving as the foundation for further specialization and fine-tuning;
- Fine-tuned versions are suitable for ready-to-use chat/instruct applications. For productive employment, users should assess the risks and develop guardrails to ensure responsible utilization.
MosaicML presents several of its Pretrained Transformer (MPT) models designed to handle large-scale language tasks. The largest model, MPT-30B, is a decoder-style transformer embedded with some unique features.
It was trained on a large dataset of 1T tokens, which is a significant increase over the datasets used for similar models. MPT-30B is capable of fast and efficient training and inference thanks to FlashAttention and FasterTransformer. This model is specifically designed for convenient deployment on a single GPU, a feature that sets it apart from other models on the market.
MPT models utilize the MosaicML LLM codebase found in the llm-foundry repository and have been trained by MosaicML’s NLP team on the MosaicML platform.
- MPT-30B can handle an impressive 8k token context window, which can be further extended by fine-tuning, and supports context-length extrapolation via an ALiBi system.
- The smaller sister model mpt-7b-storywriter supports an astonishing 65k context window.
- MPT models are licensed to allow for possible commercial use.
- The MPT-7B-storywriter-65k+ is a variant specifically designed for reading and writing fictional stories with extremely long context lengths.
- An Instruct variant of MPT is suitable for following instruction and answering questions.
- Finally, the MPT-7B-Chat-8k variant is suitable for chatbot-like conversations with long messages.
The BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model is an advanced autoregressive Large Language Model (LLM). It was developed by BigScience – a global collaboration of more than a thousand AI researchers – and trained on huge amounts of data using industrial computing resources (Jean Zay Public Supercomputer, provided by the French government).
The model is proficient in generating coherent text in 46 natural languages and 13 programming languages, so that the output is nearly indistinguishable from human-written texts. BLOOM also is also able to perform tasks it hasn't been explicitly trained by presenting them as text generation tasks.
BLOOM can be used by researchers, educators, students, engineers, developers, and non-commercial organizations.
- BLOOM has a decoder-only architecture derived from Megatron-LM GPT2. The flagship model consists of a total of 176B parameters, 70 layers and 112 attention heads.
- The training data consists of 46 natural languages and 13 programming languages, all in 1.6TB of pre-processed text converted into 350B unique tokens.
- BLOOM is intended for public research on large language models.
- It can be used for various tasks, including text generation, exploring language characteristics, information extraction, question answering and summarization.
- It is one of the few models that can generate multilingual text generation.
- Several fine-tuned BLOOMz models are able to follow human instructions with zero-shot prompting.
How to get started with an open-source LLM?
With so many affordable open-source models available, aren't your fingers itching to try them out already?
There are two main approaches to setting up and using open-source LLMs:
- The most traditional is to install everything locally. This requires a certain level of expertise. Also, the larger the model, the more difficult it is to meet the hardware requirements. The largest models require industrial-level equipment.
- Instead of hosting everything locally, it is also possible to rent a virtual server with dozens or even hundred gigabytes of video RAM. Some hosting providers have automated the process of model installation and deployment, so the entire setup requires just a few clicks and some waiting time.
How much RAM do I need to run an LLM?
To work, most LLMs need to be loaded into memory (RAM or GPU). How much memory is needed is a non-trivial question. In most cases, you can check the model card on the Hugging Face website, the GitHub repository, or a website.
There are several factors to consider, you can read more about them on the Hugging Face website: there is a useful tool for estimating the hardware requirements for LLMs.
In any case, at least 16 GB of free RAM is required to run even small LLMs. Larger models require more memory. If you want the model to work faster, you’ll need the corresponding amount of GPU memory.
Build your own LLM apps with n8n’s LangChain integration
If you think that running an open-source model is difficult, we've got great news: there are at least 3 easy ways to do this with n8n LangChain nodes:
- Run small Hugging Face models with a User Access Token completely for free.
- If you want to run larger models or need a quick response, try the new Hugging Face feature called Custom Inference Endpoints.
- If you have enough computing resources, run the model via Ollama (locally or self-hosted).
LangChain nodes make it easier to access open-source LLMs and give you handy tools for working with LLMs. Here’s a video from our latest LangChain community workshop with an overview of the most important aspects:
Getting started with LangChain and open-source LLMs in n8n
At the moment, the LangChain version of n8n is in beta mode, which is why you need to:
- Either create a new cloud account;
- Or install a special LangChain docker image for the self-hosted version.
After installation, you’ll see a new section with AI nodes in the n8n interface:
Now, let’s finally create our first workflow powered by the open-source LLM!
Build a simple conversation bot with open-source LLM
Once you have access to the LangChain version of n8n, let’s create a simple AI conversation bot powered by a completely free Mistral-7B-Instruct-v0.1 model.
- Select the new manual Chat Message as the first node of the workflow. A new Chat button appears near the Execute Workflow.
- Next, connect the Basic LLM Chain node. This is a LangChain node that stores a prompt template. This allows you to connect different model nodes interchangeably, as they always receive the same prompt as in the following screenshot.
- Finally, let’s connect the model node. Pick a Hugging Face Inference Model node and connect it to the Basic LLM Chain. Add new credentials and specify a User Access Token. This way you can get immediate access to hundreds of small-scale LLMs hosted directly in Hugging Face hub. Configure the model as shown below:
- Increase the Frequency Penalty so that model doesn’t get stuck repeating the same phrase;
- Enter a larger value in the Maximum Number of Tokens field. This way, the model will respond with complete sentences or even paragraphs of text;
- Adjust the Sampling Temperature so that it’s not too low and not too high. We have briefly explained the concept of the Sampling Temperature in our previous article with the GPT-3 workflow examples.
Now let's try out our workflow! Open the chat window and ask something.
Pretty impressive for a model with only 7B parameters, don't you think?
On the right side of the screenshot, you can see the log of the LangChain node: the input and output JSONs and their sequence. This is especially handy for debugging complex multi-step Agents that may call a model several times with different prompts.
In this article, we’ve given a brief introduction to open-source LLMs and explained how to get access to them.
We’ve also demonstrated the LangChain framework in n8n and provided a simple workflow that uses an open-source model hosted by Hugging Face.
So far we’ve only covered the tip of the iceberg. There are many other extremely useful LangChain nodes, such as Recursive Character Text Splitter for working with large documents, or Vector Stores, Embeddings and Vector Store Retrievers.