Rectifying the Crisis of Trust in the Age of AI, the Office of the Internet Information Office Seeks Opinions on Labeling Approaches for AI Synthesized Content

Newsflash5mos agorelease intellectual curiosity

　　With the explosion of generative AI, content on social media is becoming more and more indistinguishable from the real thing, and it's hard to be sure that a photo or video is a real moment, so the old adage of "seeing is believing" is now in question.

　　To cope with this onslaught, on September 14, the State Internet Information Office released the "Measures for the Identification of Artificial Intelligence-Generated Synthesized Content (Draft for Opinion)" (hereinafter referred to as the "Draft for Opinion"), which further refines the approach for the identification of AI-synthesized content on the basis of the "Provisions on the Administration of Depth Synthesis of Internet Information Services", the "Provisional Measures for the Administration of Generative Artificial Intelligence Services", and other laws and regulations.

　　"This is the first national standard on AI-generated synthetic content labeling on a global scale, and is an important exploration on the construction of AI content governance mechanism, which is of great significance in guiding the orderly development and business rules of the AI content industry, and also contributes to the cultivation of a benign ecology of AI." Wu Shenguo, an associate professor at the Law School of Beijing Normal University and deputy director of the Research Center of the Internet Society of China, told CBN.

　　Refinement of content generation logos

　　The Opinion Draft proposes that AI-generated synthetic content logos include explicit logos and implicit logos. Explicit identification refers to the identification added in the generated synthetic content or interactive scene interface, presented in text, sound, graphics and other ways and can be clearly perceived by the user. Implicit logo refers to the logo added in the generated synthetic content file data by technical measures, which is not easily perceived by the user.

　　The obligations of different Internet platforms vary. The obligations of Internet application distribution platforms include verifying the identification function, i.e. verifying whether the service provider provides the function of generating synthetic content identification as required when the application is put on the shelves or audited on the line.

　　Specifically, for network information service providers that provide generation synthesis services, Article 4 of the Opinion Draft proposes that, if the provision of depth synthesis services as stipulated in the first paragraph of Article 17 of the Administrative Provisions on Depth Synthesis of Internet Information Services is likely to lead to confusion or misrecognition by the public, it should be required to add explicit identification to the generation of synthesized content.

　　Specific cases mentioned in Article 17(1) of the Administrative Provisions on Deep Synthesis of Internet Information Services include: intelligent dialog, intelligent writing and other services that simulate natural persons to generate or edit text; synthesizing human voices, imitating voices and so on.speech productionor editing services that significantly alter personal identification characteristics; face generation, face replacement, face manipulation, gesture manipulation, and other character image and video generation or editing services that significantly alter personal identification characteristics, etc.

　　The way of explicit marking varies according to the form of content expression. If it is a picture, it is necessary to add a prominent prompt logo in the appropriate position; if it is a video, it is necessary to add a prominent prompt logo in the appropriate position around the video start screen and video playback, and it can be added at the end of the video and in the middle of the appropriate position to add a prominent prompt logo, and so on.

　　The Opinion Draft also requires network information service providers that provide generation and synthesis services to add implicit identification in the metadata of the files that generate synthesized content, and the implicit identification needs to contain information on the attributes of the generated synthesized content, the name or code of the service provider, the number of the content, and other information on the production elements. At the same time, the platform providing network information content dissemination shall take measures to standardize the dissemination activities of generated synthetic content. This includes providing the necessary identification functions and reminding users to proactively declare whether or not generated synthetic content is included in the published content.

　　If this provision is violated and serious consequences result from the failure to mark the generated synthetic content, it will be penalized by the relevant competent authorities, such as Netcom, in accordance with the provisions of the relevant laws, administrative regulations and departmental rules.

　　Chen Peng, vice president of NewOne Technology, told reporters that the Opinion Draft protects the rights and interests of users, and through explicit and implicit marking, users can more easily identify whether the content is generated by AI, which helps to protect the user's right to know and the right to choose. In addition, through this specification can enhance the credibility of the content, reduce the possibility of misleading and abuse.

　　Wu Shenguo mentioned to the reporter that content labeling is a very important mechanism design in the current governance of AI content, which has great value for avoiding the security risks brought about by in-depth counterfeiting, and the unification of standards can also help guide the business rules, business design and business model of all parties, which is very important for the risk prevention of the whole chain, and also helps the public to identify the authenticity and authoritativeness of the AI content. authenticity and authority.

　　Peng Fei, the person in charge of the big model algorithm of Hanwang Technology Heaven and Earth, believes that the introduction of this approach is of great significance and has a positive effect, not only to protect the legitimate rights and interests of citizens, legal persons, as well as other organizations, but also to safeguard the national security and public interests of the society.

　　"For example, in terms of data security, display and implicit marking helps to improve data traceability, thus enhancing data security; by clearly marking AI-generated content, it effectively prevents the dissemination of misleading information and reduces social problems caused by the misuse of technology. In terms of copyright management, the approach clarifies the source and copyright attribution of AI-generated content, safeguarding the legitimate rights and interests of creators." According to Peng Fei, through these logos, users can more easily identify synthetic content, so that they can use such content in a standardized and responsible manner and reduce the spread of false information. The approach will help prompt industry service providers to conduct their business in a standardized manner and promote the healthy development and wide application of AI technology.

　　Preventing AI crime

　　"The Opinion Draft is to the effect that the generation of relevant content such as pictures, videos, text, voice must be marked to indicate that it is AI-generated, to prevent people from maliciously rumor-mongering and fraud, which will lead to a negative impact on public opinion, and to prevent related crimes from occurring. Previously, some people illegally use other people's avatars or faces to do synthesis, through the AI conversion voice fraud, similar to this vicious incident should be put an end to." AI video producer Feng Bin told reporters.

　　At the end of August, South Korea saw a number of criminal cases of using AI face-swapping to forge pornographic and explicit images of women, behind which the perpetrators used Deepfake technology to synthesize pornographic photos and videos and spread them in group chats on the communication software Telegram, with as many as 220,000 participants, triggering a panic among the Korean public, with some calling it yet another "N-numbered room" scandal in Korea. Some have labeled it another "Room N" scandal in South Korea.

　　Depth of forgery of this technical crime, not only in South Korea, these two years in the world have appeared in the country, in June this year, "men with AI forged student colleagues nearly 7000 nude photos" rushed to microblogging hot search. According to CCTV news news, suspects with AI "a key to the clothes" technology, the depth of the forged obscene pictures nearly 7,000, and then sold each 1.5 yuan, earned nearly 10,000 yuan. Pictures involve a large number of women, including students, teachers, colleagues.

　　Feng Bin said, now a number of video platforms have to provide video uploaders with similar "author statement: the content is AI-generated" identification options, including shaking voice, b station, fast hand, video number, but there is no mandatory creators to choose. According to his observation, videos that are not identified as AI content, the platform will recognize some and identify the field similar to "suspected AI generation".

　　Feng Bin believes that the platform should be more strict in identifying and reminding whether the content is AI-generated. Compared with video and pictures, it should be more difficult for the platform to identify whether the text is AI-generated, and from the point of view of the dissemination effect, the video may be the carrier of "fake news" more often, AI face, AI voice, which may breed fraudulent content is also worthy of particular attention.

　　There is now a consensus at home and abroad to label and manage AI-generated content. In July last year, the U.S. White House reached an agreement with large technology companies to set up more protective measures for AI development, including the development of watermarking systems. The European Commission has also asked social media companies to mark all content generated by AI.

　　Tech companies are already trying to use technology to check AI abuse. In February of this year, the OpenAI team launched the "AI-generated content recognizer," which is designed to identify whether text is automatically generated by computers or written by humans, and is essentially a classifier that distinguishes between real content and AI-generated content. However, OpenAI noted in a blog post that the success rate of this recognizer in detecting AI-written content is only 26%.

　　In May, Google announced its AI-generated image recognition tool at the Google I/O conference, which allows users to upload an image from an unknown source, perform a reverse image search, and learn when that image was first indexed by Google, and the earliest website it appeared on. Google also announced that its own generative AI tool will include metadata, embedded watermarks in each image to indicate that it's an AI-generated image and not a real photo. Digital watermarking is thought to be potentially a more effective method than AI-generated content recognizers.

　　Earlier, at the Build 2023 developer conference, Microsoft announced the addition of a feature that lets anyone recognize if an image or video clip generated by Bing Image Creator and Microsoft Designer was generated by AI. The technology uses cryptographic methods to tag and sign AI-generated content with metadata information about its origin.

　　Wu Shenguo said that the business side of the micro-ecology, the introduction of the labeling approach to the standard is the state and industry of the macro-ecology, "a lot of strength".

Attachment: AI-generated synthetic content labeling approach (draft for comment)

State Internet Information Office (SIIO)

September 14, 2024

Artificial Intelligence Generated Synthetic Content Markup Approach

(exposure draft)

Article 1 In order to promote the healthy development of artificial intelligence, regulate the identification of synthetic content generated by artificial intelligence, protect the legitimate rights and interests of citizens, legal persons and other organizations, and safeguard the public interests of the society, according to the Network Security Law of the People's Republic of China, the Provisions on the Administration of Algorithmic Recommendation of Internet Information Services, the Provisions on the Administration of Depth Synthesis of Internet Information Services, and Interim Measures for the Administration of Generative Artificial Intelligence Services, etc., these Measures are formulated. Laws, regulations and departmental rules, to formulate these measures.

article 2 These Measures shall apply to network information service providers (hereinafter referred to as "service providers") that carry out AI-generated synthetic content identification in accordance with the provisions of the "Regulations on the Administration of Algorithmic Recommendation of Internet Information Services", the "Regulations on the Administration of In-depth Synthesis of Internet Information Services", and the "Provisional Measures for the Administration of Generative Artificial Intelligence Services".

The provisions of these Measures shall not apply to industrial organizations, enterprises, educational and scientific research institutions, public cultural institutions, and relevant professional institutions that develop and apply artificial intelligence-generating synthesis technologies and do not provide services to the public within their territories.

Article III Artificial Intelligence Generated Synthesized Content refers to text, images, audio, video and other information produced, generated, and synthesized using Artificial Intelligence technology.

Artificial intelligence generates synthetic content logos including explicit and implicit logos.

Explicit logos are logos that are added to the interface of the generated synthetic content or interaction scenarios, which are presented in the form of text, sound, graphics, etc. and can be clearly perceived by the user.

Implicit marking refers to technical measures taken to add marking to the data of the generated synthetic content file that is not easily and obviously perceived by the user.

Article 4 Service providers to provide the generation of synthetic services belonging to the "Internet information services depth synthesis management provisions," the first paragraph of Article 17 of the situation, should be in accordance with the following requirements of the generation of synthetic content to add explicit identification.

(i) Adding signs such as textual hints or universal symbols at the beginning, end, or appropriate position in the middle of the text, or adding prominent hints in the interface of the interactive scene or around the text;

(ii) Adding markers such as voice prompts or audio rhythmic prompts at the beginning, end, or appropriate location in the middle of the audio, or adding prominent prompt markers in the interface of the interactive scene;

(iii) Add a prominent reminder logo at an appropriate place in the picture;

(d) Add a prominent reminder logo at the appropriate location around the video start screen and video playback, and may add a prominent reminder logo at the appropriate location at the end and in the middle of the video;

(e) When presenting a virtual scene, a prominent cue mark should be added at an appropriate location on the starting screen, and a prominent cue mark may be added at an appropriate location during the ongoing service of the virtual scene;

(f) Other generation of synthetic service scenarios should be added according to their own application characteristics with a significant hint of the effect of explicit identification.

When service providers offer to generate synthetic content for download, copying, and exporting, they should ensure that the files contain explicit identifiers that meet the requirements.

Article 5 The service provider shall, in accordance with the provisions of Article 16 of the Administrative Provisions on Depth Synthesis of Internet Information Services, add an implicit identification in the metadata of the file that generates the synthesized content, which contains information on the attributes of the generated synthesized content, the name or code of the service provider, the content number and other information on the production elements.

Service providers are encouraged to add implicit identifiers in the form of digital watermarks and other forms to generated synthetic content.

File metadata refers to descriptive information embedded in the header of a file according to a specific encoding format, which is used to record the information content of the file such as source, attributes, usage, copyright, and so on.

Article 6 Service providers that provide network information content dissemination platform services should take measures to regulate the generation of synthetic content dissemination activities.

(a) It should verify whether the metadata of the document contains implicit identification, and for those containing implicit identification, it should take appropriate means to add a conspicuous reminder logo around the published content, clearly reminding the user that the content belongs to the generation of synthetic content;

(ii) If the implicit identification is not verified in the metadata of the document, but the user declares that it is generated synthetic content, an appropriate way should be taken to add a prominent reminder logo around the published content to remind the user that the content may be generated synthetic content;

(c) The metadata of the document is not verified to the implicit logo, and the user has not declared that it is generated synthetic content, but if the service provider that provides network information content dissemination platform services detects the explicit logo or other traces of generation synthesis, it can be recognized as suspected generation synthesis content, and shall take appropriate ways to add significant reminder logos around the published content to remind the users of the content suspected to be generated synthetic content;

(d) For the actual, possible and suspected generation of synthesized content, information on the attributes of the synthesized content, the name or code of the dissemination platform, the content number and other dissemination elements should be added to the metadata of the document;

(v) Provide the necessary identification functions and remind users to proactively declare whether the published content contains generated synthetic content.

Article 7 The Internet application distribution platform shall verify whether the service provider provides the function of generating synthetic content identification as required when the application is put on the shelves or reviewed online.

Article 8 The service provider shall clearly state in the user service agreement the method of generating synthetic content logo, style and other normative content, and prompt the user to carefully read and understand the relevant logo management requirements.

Article 9 If the user needs the service provider to provide generated synthesized content without adding explicit logos, the service provider may provide generated synthesized content without explicit logos after clarifying the user's logo obligations and use responsibilities through the user agreement, and keep the relevant logs for not less than six months.

Article 10 Users shall take the initiative to declare and use the identification function provided by the platform for identification when uploading generated synthetic content to a service provider providing network information content dissemination platform services.

No organization or individual shall maliciously delete, tamper with, falsify or conceal the generated synthetic content logos provided for in these Measures, provide tools or services for others to carry out the above malicious acts, or harm the legitimate rights and interests of others by means of improper logos.

Article 11 Service providers should mark in accordance with the requirements of the relevant mandatory national standards.

Article 12 Service providers shall, when performing procedures such as algorithm filing and security assessment, provide materials related to generating synthetic content logos in accordance with these Measures, and strengthen the sharing of logo information to provide support and assistance in preventing and combating related illegal and criminal activities.

Article 13 In case of violation of the provisions of these Measures and serious consequences caused by the failure to mark the generated synthetic content, the relevant competent departments such as Netcom shall be penalized in accordance with the provisions of the relevant laws, administrative regulations and departmental rules.

Article 14 These Regulations shall come into force on the month of 2024.