Alpa is a system for training and serving gigantic machine learning models.
Alpa makes training and serving large models like GPT-3 simple, affordable, accessible to everyone.
Free, Unlimited OPT-175B Text Generation
Warning: This model might generate something offensive. No safety measures are in place as a free service.
Like the results? ⭐ Support Alpa development by staring Alpa on GitHub
StarA language model is a probability distribution over sequences of words. It predicts the next word based on all the previous words. It is useful for a variety of AI applications, such the auto-completion in your email or chatbot service. For more information, check out the language model wikipedia page.
GPT-3 is very large language model, with 175 billion parameters, that uses deep learning to produce human-like text. Many researchers and news articles described GPT-3 as "one of the most interesting and important AI systems ever produced". GPT-3 is gradually being used as a backbone in the latest NLP research and applications.
Due to its gigantic size, training and serving GPT-3 are very difficult and expensive, and pose significant challenges to the underlying software systems. The original GPT-3 trained by OpenAI is closed sourced and developed as a charged service --- When using it, the users have to pay for every token generated.
Right now we use random sampling, so every time you click "generate" the generated result might be different. The temperature controls how sharp the sampling distribution is. Lower temperature pushes the generator to pick the tokens with higher scores from the model. Top-p sampling chooses from the smallest possible set of words whose cumulative probability exceeds the probability p. Small value of p prevents the model to choose from tokens with lower scores. See more detailed description on how to sample on this page from huggingface.
This web interface exposes only three arguments for simplicity, although our backend supports a diverse set of generation techniques and arguments.
We are developing a RESTFUL API to expose the full set of arguments. Stay tuned. Meanwhile, if you want to try out different generation techniques and hyperparameters now, you can set up your own OPT-175B service using Alpa and start from here.
High-level speaking, Alpa is more automatic, scalable, and cost-effective compared to existing systems.
In more details, if you are an ML developer or data scientist who is looking for a system that can train or serve large models like GPT-3, Alpa provides state-of-the-art performance while requires the least amount of system expertise to setup. Meanwhile, Alpa enables to train or serve large models on older generations of (hence cheaper) GPUs, such as 40GB A100, V100, T4, M60, etc., which are common in many in-house clusters and more accessible for many people.
If you are a system developer aiming for developing better training or serving systems, Alpa, as a compiler, offers the most flexibility to try out various ML parallelization methods (inter- and intra-op parallelisms), and the richest coverage of big model architectures (GPT-3, MoE, WideResNet, etc.). Alpa might be a good starting point for you to start your prototyping.
If you are an amateur in ML/NLP/systems, well 😛, you can play with OPT-175B inference for free; while all existing service will charge you for each token generated.
It depends on which types of GPUs used. A hard constraint now is that the total GPU memory in the cluster needs to be greater than 350GB in order to successfully run the model inference. Many existing training or serving systems usually rely on using the latest generations of GPUs with the largest memory capacity, such as 80GB A100. In contrast, Alpa, due to its more powerful backend, enables serving OPT-175B with more flexible parallelisms on older generations of GPUs, such as 40GB A100, V100, T4, M60, etc.
Take an example, if you choose to use 16GB V100 GPUs, then you would need 350 / 16 = 22 V100 GPUs to run the service.
We are working on a feature to enable serving models even if you do not have enough GPU memory, stay tuned.
Alpa does not require the latest generation GPUs (such as 80GB A100), hence reduces the machine cost. With that, we leverage older generations of hardware provided by our sponsors: MBZUAI and Sky Lab, UC Berkeley.
If you are interested in any form of donation or sponsorship to help the development of Alpa, please get in touch with Alpa authors in Alpa Slack.