


The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo
May 07, 2024 pm 04:13 PMImagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here.
DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each tag. Compared with DeepSeek 67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times.
DeepSeek is a company exploring the nature of artificial general intelligence (AGI) and is committed to integrating research, engineering and business.
The comprehensive capabilities of DeepSeek-V2
In the current mainstream list of large models, DeepSeek-V2 performs well:
- The Chinese comprehensive ability (AlignBench) is the strongest among the open source models: it is in the same echelon with closed source models such as GPT-4-Turbo and Wenxin 4.0 in the evaluation
- The English comprehensive ability (MT-Bench) is in the third place First echelon: English comprehensive ability (MT-Bench) is in the same echelon as the strongest open source model LLaMA3-70B, surpassing the strongest MoE open source model Mixtral 8x22B
- Knowledge, mathematics, reasoning, programming and other ranking results At the forefront
- Supports 128K context windows
New model structure
The potential of AI Being constantly excavated, we can’t help but ask: What is the key to promoting intelligent progress? DeepSeek-V2 gives the answer - the perfect combination of innovative architecture and cost-effectiveness.
"DeepSeek-V2 is an improved version. With a total parameter of 236B and activation of 21B, it finally reaches the capability of 70B~110B Dense model. At the same time, the memory consumption is only 1/5 of the same level model~ 1/100. On the 8-card H800 machine, it can process the input of more than 100,000 tokens per second and the output of more than 50,000 tokens per second. This is not only a leap in technology, but also a revolution in cost control. "
Today, with the rapid development of AI technology, the emergence of DeepSeek-V2 not only represents a technological breakthrough, but also heralds the popularization of intelligent applications. It lowers the threshold for AI and allows more companies and individuals to enjoy the benefits of efficient intelligent services. At the same time, it also heralds the popularization of intelligent applications. It lowers the threshold for AI and allows more companies and individuals to enjoy the benefits of efficient intelligent services.
Chinese capability VS price
In terms of Chinese capability, DeepSeek-V2 leads the world in the AlignBench ranking while providing a very competitive API price.
Both open source models and papers
DeepSeek-V2 is not just a model, it is a gateway to more The key to the smart world. It opens a new chapter in AI applications with lower cost and higher performance. The open source of DeepSeek-V2 is the best proof of this belief. It will inspire more people's innovative spirit and jointly promote the future of human intelligence.
- Model weights: https://huggingface.co/deepseek-ai
- Open source address: https://github.com/deepseek-ai/DeepSeek-V2
As AI continues to evolve, how do you think DeepSeek-V2 will change our world? Let’s wait and see. If you are interested, you can visit chat.deepseek.com to personally experience the technological changes brought about by DeepSeek-V2.
References
[1]
DeepSeek-V2: https: //m.miracleart.cn/link/b2651c9921723afdfd04ed61ec302a6b
The above is the detailed content of The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

When you open PyCharm for the first time, you should first create a new project and select a virtual environment, and then be familiar with the editor area, toolbar, navigation bar, and status bar. Set up Darcula themes and Consolas fonts, use smart tips and debugging tools to get more efficient, and learn Git integration.

Social security number verification is implemented in PHP through regular expressions and simple logic. 1) Use regular expressions to clean the input and remove non-numeric characters. 2) Check whether the string length is 18 bits. 3) Calculate and verify the check bit to ensure that it matches the last bit of the input.

The steps to effectively use graphical tools to compare the differences in Git versions include: 1. Open GitKraken and load the repository, 2. Select the version to compare, 3. View the differences, and 4. In-depth analysis. Graphical tools such as GitKraken provide intuitive interfaces and rich features to help developers understand the evolution of code more deeply.

The gitstatus command is used to display the status of the working directory and temporary storage area. 1. It will check the current branch, 2. Compare the working directory and the temporary storage area, 3. Compare the temporary storage area and the last commit, 4. Check untracked files to help developers understand the state of the warehouse and ensure that there are no omissions before committing.

Configuring VSCode to synchronize code with GitHub can improve development efficiency and team collaboration. First, install the "GitHubPullRequestsandIssues" and "GitLens" plugins; second, configure the GitHub account; then clone or create a repository; finally, submit and push the code to GitHub.

To develop a complete Python Web application, follow these steps: 1. Choose the appropriate framework, such as Django or Flask. 2. Integrate databases and use ORMs such as SQLAlchemy. 3. Design the front-end and use Vue or React. 4. Perform the test, use pytest or unittest. 5. Deploy applications, use Docker and platforms such as Heroku or AWS. Through these steps, powerful and efficient web applications can be built.

Verifying an IMEISV string in PHP requires the following steps: 1. Verify the 16-bit numeric format using regular expressions. 2. Verify the validity of the IMEI part through the Luhn algorithm. 3. Check the validity of the software version number. The complete verification process includes format verification, Luhn checking and software version number checking to ensure the validity of IMEISV.

Create tags on remote repository using gitpushorigin, delete tags using gitpushorigin--delete. The specific steps include: 1. Create a local tag: gittagv1.0. 2. Push to remote: gitpushoriginv1.0. 3. Delete local tag: gittag-dv1.0. 4. Delete remote tag: gitpushorigin--deletev1.0.
