TeleChat

11mos agoupdate 276 0 0

The 7 billion parameter semantic grand model based on the Transformer architecture launched by China Telecom has powerful natural language understanding and generation capabilities, and is applicable to multiple AI application scenarios such as intelligent dialog and text generation.

Location:

China

Language:

Collection time:

2024-06-03

Open site Mobile view

TeleChat

Open site

TeleChat-12B is the Artificial Intelligence Research Institute of China TelecomOpen Sourceof a Star Semantic Megamodel, which has been significantly improved in terms of content, performance and application compared to the previous TeleChat-7B version.

open source process

2024.5.16 Open source optimized version 12B chat modelTeleChat-12B-V2
2024.3.20 Open source version 12B chat model and quantization version
2024.1.11 Open source 1T Chinese dataset
2024.1.10 Open source version 7B chat model and its quantized version

Model parameters and training data

parameter scale: TeleChat-12B has 12 billion parameters, a significant increase in size compared to TeleChat-7B's 7 billion parameters.
Training data: TeleChat-12B increases the amount of training data from 1.5T in version 7B to 3T, significantly improving the quality of the data and the performance of the model.

Model Structure and Optimization

Decoupling the word embedding layer from the output layer: TeleChat-12B adopts the structure of decoupling the word embedding layer from the output layer, which separates the parameters of the word embedding layer and the output lm head layer, which helps to enhance the training stability and convergence.
Model structure optimization: TeleChat-12B uses a small-scale model to experiment with combinations of multiple model structures to select the optimal structure, further optimizing the performance of the model.

Training Methods and Results Enhancement

Scientific Data Matching Learning and Curriculum Learning: TeleChat-12B employs a scientific approach to data rationing learning and course learning during the training process, using a small parameter model to fit over multiple data ratios and dynamically boosting the weights of harder-to-learn datasets to ensure that the model has a better fit across all datasets.
Effectiveness enhancement: Compared to TeleChat-7B, TeleChat-12B achieves an overall improvement of about 30% in terms of content understanding, performance performance and application scenarios, and especially improves more than 40% in terms of capabilities in multi-round dialog reasoning and security-related areas.

Application Scenarios and Effects

multi-scenario application: TeleChat-12B has been applied to line writing, code programming, network fault analysis, and business analysis scenarios. For example, in line writing, the average number of words generated is more than 1,500, and the effective adoption rate is 85.7%.
External services: In the external service of enterprise and public institution customers, TeleChat-12B can cover the actual business requirements of 95%, and the accuracy rate in multi-round dialog comprehension reaches 90%.

Localization Advancement

Support Domestic Chips: TeleChat-12B supports int8, int4 quantization and domestic chip training inference, which further promotes the process of full-stack localization of large models.
Cooperation and ecologyChina Telecom and Huawei SingTen have jointly promoted the localization of the full stack of large models, and have completed the commercialization of models based on SingTen technology in several projects.

TeleChat-12B has been comprehensively optimized and upgraded in terms of parameter scale, training data, model structure, and training methodology, which significantly improves the performance and effect of the model and demonstrates excellent capabilities in multiple application scenarios. At the same time, it also actively promotes the process of full-stack localization of large models, injecting new momentum into the development of the AI industry.

data statistics

Relevant Navigation

No comments

No comments...