BLOOMTranslation site

9mos agorelease 109 0 0

A large open-source multilingual language model developed by over 1,000 researchers from more than 60 countries and 250 institutions, with 176B parameters and trained on the ROOTS corpus, supporting 46 natural languages and 13 programming languages, aims to advance the research and use of large-scale language models by academics and small companies.

Language:
en
Collection time:
2024-06-02
BLOOMBLOOM
BLOOM

I. Basic information

  • full name: BigScience Large Open-science Open-access Mul-tilingual Language Model
  • abridge: BLOOM
  • localization: Aims to provide better access to research and use of large-scale language models (LLMs) for research labs in academia, non-profit organizations, and small companies.

II. Technical details

  1. Model Type::
    • BLOOM is an open source, decoder-only converter model.
    • It was trained with modifications based on Megatron-LM GPT2.
  2. parameter scale::
    • The BLOOM has 176B parameters and is of the same size as the GPT-3.
  3. Training data::
    • The training set contains 46 natural languages and 13 programming languages, totaling 1.5TB of preprocessed text transformed into 350B unique tokens.
    • A corpus called ROOTS was used, which is a dataset of hundreds of sources in 59 languages.
  4. training process::
    • The training process took 117 days (from March 11 to July 6, 2022) on the Jean Zay supercomputer in Paris, France.
    • The cost of the training exceeded 3 million euros, with arithmetic support provided by CNRS and GENCI.
  5. performance::
    • BLOOM achieved significant performance in various benchmarks and better results with fine-tuned multitasking cues.

III. Participants and organization

  • participants: More than 1,000 researchers from more than 60 countries and 250 institutions are involved in BLOOM's programs.
  • sponsoring organization: BLOOM is published by the Open Collaborative, an initiative of HuggingFace, GENCI and IDRIS.

IV. Significance and impact

  • Democratization visits: The emergence of BLOOM marks a significant advance in the democratization of language modeling technology, making high-quality large-scale language models accessible and usable by more institutions and individuals.
  • International cooperation: BLOOM is not only a technical marvel, but also a symbol of the power of international cooperation and collective scientific pursuit.

V. Version updates

  • initial version: Released on May 19, 2022.
  • latest version: As of the current time (June 2024), the latest version is version 1.3, which was released on July 6, 2022.

VI. Application prospects

  • Multi-disciplinary applications: BLOOM can be used in various fields such as social network analysis, recommender systems, natural language processing, etc.
  • Continuous optimizationWith the continuous progress of technology and in-depth research, the performance and application scope of BLOOM will be further improved and expanded.

data statistics

Relevant Navigation

No comments

none
No comments...