Written by shirin anlen and Raquel Vazquez Llorente
WITNESS is driven by a deep belief in the power of audiovisual technologies to protect and defend human rights. Alongside our ongoing work to ground our understanding of AI and synthetic media harms in the realities faced by communities at the frontlines of mis- and disinformation, we recognize the potential of using generative AI and synthetic media tools (“AI tools”) to support human rights advocacy and social critique–but only with appropriate caution and ethical considerations.
Our Technology Threats and Opportunities Team is exploring some of the AI tools that can create or modify audiovisual content and are currently available to the public. For instance, Natural Language Processing techniques can generate fully synthetic visual material from a text prompt. They can also allow users to edit in or out objects from an AI-generated or a real photo uploaded by the user. Deep Neural Networks can recognize patterns in the data to replace the style or likeness of one person with another, as well as to segment an image into multiple parts or regions based on specific criteria, such as color or object. AI voice cloning can mimic any target voice it is trained on.
Drawing from this experimentation and our global consultations, this piece outlines the scenarios where we see audiovisual generative AI having the most potential for human rights advocacy. The piece also reflects on the ethical issues that the use of certain technologies may bring to a human rights organization, as well as provides a list of questions to consider when applying these tools. We recognize that developments in this area of AI are happening rapidly, and our perspective may evolve with these advancements and be revised in the future. We hope that sharing some of our recommendations at this stage can help similar groups who may be developing their own internal policies. We welcome feedback at firstname.lastname@example.org.
Key considerations: Undermining human rights documentation, transparency in production, consent, and context expectations
The use of AI to create or edit media should not undermine the credibility, safety and content of other human rights organizations, journalists, fact checkers and documentation groups. In the global consultations we have been leading since 2018, we have consistently heard from communities at the frontlines of human rights defense about how their content and credibility are constantly undermined. The advent of synthetic media has made it easier to dismiss real footage. As tools to make AI-generated or edited images, videos and audio become more accessible, it will only be easier for governments and companies to claim that damaging footage is fake. When generating or modifying visual content with AI, it is important to think about the role of global human rights organizations in terms of setting standards and using tools in a way that doesn’t have collateral harms to smaller, local groups who face much more extreme pressures. These groups are already overburdened trying to defend their footage or challenge false information and are targeted repeatedly by their governments to discredit them.
AI output should be clearly labeled and watermarked, and consider including metadata or invisible fingerprints to track the provenance of media. When publishing content that is generated or manipulated using AI, its use should always be disclosed. For disclaimers or cues to be effective, they need to be legible, meaning they can be seen, read, heard, or understood by those consuming the information. We strongly advocate for a more innovative and principled approach to content labeling that can express complex ideas and provide audiences with meaningful context on how the media has been created or manipulated. In the best cases, these transparency approaches should also communicate the intent behind the use of AI. When using open source tools, invisible tracemarks (e.g., machine-readable fingerprints and detailed metadata) can provide higher levels of transparency and the ability to track the provenance of a piece of media–however, at the moment implementing these requires advanced technical knowledge.
A careful approach to consent is critical in AI audiovisual content. There are a few exceptions that can be addressed by asking who the represented subjects are, what groups they belong to, and what the intent of the content is. For instance, political satire may not require consent by the subject as it aims to challenge power dynamics of public figures, or criticize and highlight the absurdities of systems that reinforce inequality. Human rights organizations can draw from existing guidelines about informed consent in visual content, as well as good practices in dealing with situations or populations that require special attention–for instance, footage involving minors, people with mental disabilities, people under coercive contexts, or perpetrators of abuse.
Expectations about the veracity and the extent of manipulation in the visual content will depend on the context in which it is produced and the genre of the footage. For instance, narrative and fiction films on human rights may lend themselves to more artistic expressions that can help people connect with difficult topics. In contrast, factual human rights reporting that aims at exposing abuses has implicit assumptions of accuracy, veracity and realism. Some of these expectations may also evolve over time, particularly when footage is revisited or reclaimed for purposes other than the one for which they were created. This is why transparency in the creation and modification of visual content, as well as a careful approach to consent, are both critical to avoid contributing to mis- and disinformation and harming communities and individuals.
Potential use cases and their ethical considerations: our current thinking
1) Identity protection
AI can be used to anonymize individuals or protect their identities, as long as certain conditions are met. Filters such as blurring, pixelization, or voice alterations applied to individuals or places can help create digital disguises for images and audio. Similarly, creative applications can both protect individuals and engage an audience, like the deepfake methods employed in the documentary Welcome to Chechnya, and other techniques such as video-to-video models, AI avatars driven by real videos, and voice clones. These tools can be useful, but only when following the guidelines mentioned above of dignity, transparency, and consent.
Yet, the use of AI techniques can produce dehumanizing results that should be avoided. Current AI tools often generate results that enhance social, racial, and gender biases, as well as produce visual errors that depict deformed human bodies. Any process that uses AI for identity protection should always have careful human curation and oversight, along with a deep understanding of the community and audience it serves.
Questions to help guide the use of AI for identity protection:
- Who are the represented subjects and what groups do they belong to?
- What is the intent of the content?
- Is there informed consent from the individual/s depicted or involved in the audiovisual content? If not, why?
- If the use of AI for protecting identity is not obvious (e.g. if using voice cloning), is the modification clearly disclosed to the audience?
- Could the masking technique be reversed and reveal the real identity of an individual or the image behind it?
- Does the resulting footage preserve the dignity of the individual/s?
- Does the resulting footage inadvertently or directly reinforce biases existing in generative AI datasets?
2) Visualizing testimonies of survivors, victims, and witnesses, or reconstructing places and events from statements
Expressive and artistic approaches to visualize testimonies can be ethical and appropriate, as long as certain conditions are met. There is a rich history of animations and alternative documentary storytelling forms, and AI can help advance audiovisual forms that effectively convey stories and engage audiences for advocacy purposes. AI tools such as text-to-image, text-to-video, and frame interpolations can be used to generate visuals for audio and text-based testimonies that are missing images and video content to show the underlying emotions and subtexts of the experience. However, it should be clearly mentioned that these visuals are meant to be an artistic expression rather than a strict word-by-word representation.
Under certain circumstances, AI can be used to portray and reconstruct places for advocacy purposes. When physical places are inaccessible for security reasons or have been destroyed, AI techniques such as synthesizing volumetric scenes or image-to-video, can enable audiences to visualize a site of historical importance or the circumstances experienced by a certain community in a given place (e.g. detention conditions). Similarly, they can help us imagine alternative realities and futures, like environmental devastation. However, in these instances, it should be noted how these visuals were generated, and the use of AI should always match the advocacy objectives and be grounded in verified information.
The use of AI to “resurrect” deceased people requires a careful approach and should not be done macabrely. Bringing deceased individuals “back to life” raises many ethical challenges around consent, exploitation of the dead, and re-traumatization of the surviving family and community. Generating lifelike representations using AI, which utilizes someone’s likeness (i.e. image and/or audio), may replicate the harm and abuse that the individual or their community suffered in the first place. On the other hand, the careful use of AI can help represent alternative realities or bring back someone’s message, and have a powerful advocacy effect. When using AI tools for these purposes, it is critical to consider the legal implications; incorporate strict consent by the next of kin, community, or others depending on the culture, context, and risks; think about respect to the memory of the individual in the curation and creation process; and clearly disclose the use of AI.
AI should not be used to generate humans and events where real-life footage exists and it can be obtained. We separate using AI to apply specific effects or alterations to a face, figure, or voice, from using AI to generate visuals of an entire critical event that has actual footage. Photojournalists, human rights defenders and documentation teams put their safety at risk to cover events and collect valuable information. Their work can expose abuses, gather evidence of crimes, or connect with audiences. In a world where mis- and disinformation are rampant, using AI to generate visual content takes us further away from real evidence and undermines our ability to fight against human rights violations and atrocities. Importantly, in these situations, audiences expect to receive real information about real events.
AI should not be used to autonomously edit the words or modify the tone of survivors’, victims’, and witnesses’ testimonies. When using written or visual testimonies for advocacy purposes, applying AI to edit the material can produce errors and changes in tone and meaning. Interpreting these testimonies requires a level of sensitivity and comprehension of the subject-matter and the purpose of the material that is not within AI’s capabilities.
Questions to help guide the use of AI for visualizing testimonies or reconstructing places and events from statements:
- Is there real-life information that can be used to explain an event? If not, would the use of AI undermine the credibility of those trying to collect and share information about that particular situation; and more generally other human rights organizations, activists, and civil society trying to share trustworthy information?
- What is the purpose of reconstructing a particular site, and what information is there available?
- What level of accuracy would the people watching the visualization or reconstruction expect? Is there a way to communicate existing gaps between factual information and the produced outcome?
- Is the use of AI techniques and tools clearly disclosed, in part of or on the totality of the reconstruction?
- Has the factual accuracy and tone of the output been checked by a human, or can it be?
3) Satire and artistic expression
The use of AI for satire and parody can be powerful, especially when the intent is to open up conversations about inequality, injustice, and other power dynamics. WITNESS has focused extensively on questions of how to balance countering mis- and disinformation while respecting freedom of opinion and expression. As such, it is important to protect the satirical and artistic use of synthetic media and generative AI in political and social contexts in line with international human rights standards (as discussed in the Just Joking! report, in collaboration with the Co-Creation Studio at MIT). In these applications, clear labeling, watermarking, or disclosure of how media is made and altered is key to avoiding harmful content being presented after-the-fact as satirical, or if satirical content loses its context and is repurposed maliciously. Using AI for these purposes also forces us to think about our conception of digital dignity and likeness rights, as well as our understanding of consent.
Questions to help guide the use of AI for satirical and artistic expression:
- How easy is it to edit out the watermark?
- What is the risk of ‘context collapse’ (i.e. if the photo or video is circulated out of context)? What are the ways in which the audience can trace the provenance of the content back to the creator/s, and/or the AI tool that was used to generate the image or video (e.g. invisible fingerprints)? Can this provenance metadata put someone at risk?
- Is the content limited to depicting a public figure, or does it also portray other people? If the latter, how was the footage obtained, and is there consent for using their likeness or voice in AI-generated or manipulated content?
- How does the content fare against international human rights law and other recognised standards that address freedom of expression, dignity and privacy?
Since WITNESS’ founding in 1992, our story has been one of ceaseless innovation. We care deeply about supporting communities and grassroots movements to share and create trustworthy information that contributes to accountability and justice. Our Technology Threats and Opportunities Team engages early on with emerging technologies that have the potential to enhance or undermine our trust on audiovisual content. As such, we are approaching the use of generative AI with extreme caution. Importantly, there are many questions that these tools have yet to answer, in particular in regard to copyrights and artists’ agency over their content (for instance in the use of stock photos), or transparency in disclosing how media is created or modified. Similarly, we have heard in our consultations that the privacy and security of those depicted in the training datasets are increasing concerns for organizations posting real human rights footage online, and merits further research on how “do not train” tags can be incorporated to publicly available data.
With the anticipation of the significant social impact of generative AI and the lack of regulations that address the risks and harms on a global scale, we must ensure we understand the ethical challenges this technology poses, and our role in addressing them. Otherwise, we are at risk of devaluing and undermining the credibility of visual content and those who have the most to lose.
Published on June 28th, 2023