Safety & Alignment – Search.AI.Wiki

DALL·E 3 system card

…

GPT-4V(ision) system card

…

GPT-4V(ision) system card

…

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

…

Frontier Model Forum

We’re forming a new industry body to promote the safe and responsible development of frontier AI systems: advancing AI safety research, identifying best practices and standards, and facilitating information sharing among policymakers and industry.…

Frontier AI regulation: Managing emerging risks to public safety

…

Introducing Superalignment

We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the…

Governance of superintelligence

Now is a good time to start thinking about the governance of superintelligence—future AI systems dramatically more capable than even AGI.…

Our approach to AI safety

Ensuring that AI systems are built, deployed, and used safely is critical to our mission.…

Planning for AGI and beyond

Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.…

How should AI systems behave, and who should decide?

We’re clarifying how ChatGPT’s behavior is shaped and our plans for improving that behavior, allowing more user customization, and getting more public input into our decision-making in these areas.…

Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk

OpenAI researchers collaborated with Georgetown University’s Center for Security and Emerging Technology and the Stanford Internet Observatory to investigate how large language models might be misused for disinformation purposes. The collaboration included an October 2021 workshop bringing together 30 disinformation…

Our approach to alignment research

We are improving our AI systems’ ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems.…

A hazard analysis framework for code synthesis large language models

…

AI-written critiques help humans notice flaws

We trained “critique-writing” models to describe flaws in summaries. Human evaluators find flaws in summaries much more often when shown our model’s critiques. Larger models are better at self-critiquing, with scale improving critique-writing more than summary-writing. This shows promise for…

Best practices for deploying language models

Cohere, OpenAI, and AI21 Labs have developed a preliminary set of best practices applicable to any organization developing or deploying large language models.…

Measuring Goodhart’s law

Goodhart’s law famously says: “When a measure becomes a target, it ceases to be a good measure.” Although originally from economics, it’s something we have to grapple with at OpenAI when figuring out how to optimize objectives that are difficult…

Lessons learned on language model safety and misuse

We describe our latest thinking in the hope of helping other AI developers address safety and misuse of deployed models.…

Aligning language models to follow instructions

We’ve trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through our alignment research. These InstructGPT models, which are trained with humans in the…

文 » A