Alineació de la IA

En el camp de la intel·ligència artificial (IA), la investigació sobre l'alineació de la IA té com a objectiu dirigir els sistemes d'IA cap als objectius, preferències o principis ètics previstos per als humans. Un sistema d'IA es considera alineat si avança els objectius previstos. Un sistema d'IA desalineat persegueix alguns objectius, però no els previstos.^[1]

Pot ser un repte per als dissenyadors d'IA alinear un sistema d'IA perquè pot ser difícil per a ells especificar la gamma completa de comportaments desitjats i no desitjats. Per evitar aquesta dificultat, solen utilitzar objectius de proxy més senzills, com ara obtenir l'aprovació humana. Tanmateix, aquest enfocament pot crear llacunes, passar per alt les limitacions necessàries o recompensar el sistema d'IA per semblar alineat.^[2]

Els sistemes d'IA desalineats poden funcionar malament o causar danys. Els sistemes d'IA poden trobar llacunes que els permetin assolir els seus objectius de proxy de manera eficient, però de maneres no desitjades, de vegades perjudicials (pirateria de recompensa).^[4] Els sistemes d'IA també poden desenvolupar estratègies instrumentals no desitjades, com ara la recerca de poder o la supervivència, perquè aquestes estratègies els ajuden a assolir els seus objectius.^[4] ^[5] A més, poden desenvolupar objectius emergents indesitjables que poden ser difícils de detectar abans que el sistema estigui en desplegament, on s'enfronta a noves situacions i distribucions de dades.^[6]

Avui dia, aquests problemes afecten els sistemes comercials existents com ara els models de llenguatge,^[7]^[8] robots,^[9] vehicles autònoms ^[10] i motors de recomanació de xarxes socials.^[7] ^[11]^[12] Alguns investigadors d'IA argumenten que els sistemes futurs més capaços es veuran més greument afectats, ja que aquests problemes resulten parcialment perquè els sistemes són altament capaços.^[13]

Molts científics líders en IA, com Geoffrey Hinton i Stuart Russell, argumenten que la IA s'apropa a les capacitats sobrehumanes i que podria posar en perill la civilització humana si no s'alinea.^[14]^[15]

L'alineació de la IA és un subcamp de la seguretat de la IA, l'estudi de com construir sistemes d'IA segurs. Altres subcamps de seguretat de la IA inclouen robustesa, monitorització i control de capacitat.^[16] Els reptes de la investigació en l'alineació inclouen inculcar valors complexos a la IA, evitar la IA enganyosa, la supervisió escalable, l'auditoria i interpretació de models d'IA i la prevenció de comportaments emergents d'IA com la recerca de poder.^[16] La investigació de l'alineació té connexions amb la investigació d'interpretabilitat,^[17] robustesa (adversarial), detecció d'anomalies, incertesa calibrada,^[17] verificació formal,^[18] aprenentatge de preferències,^[19]^[20] seguretat -enginyeria crítica, teoria de jocs,^[21] equitat algorítmica, ^[22] i les ciències socials,^[23] entre d'altres.

Referències

↑ Russell, Stuart J. Artificial intelligence: A modern approach (en anglès). 4th. Pearson, 2020, p. 31–34. ISBN 978-1-292-40113-3. OCLC 1303900751.
↑ Russell, Stuart J. Artificial intelligence: A modern approach (en anglès). 4th. Pearson, 2020, p. 31–34. ISBN 978-1-292-40113-3. OCLC 1303900751.
↑ Wiggers, Kyle. «Falsehoods more likely with large language models». VentureBeat, 20-09-2021. Arxivat de l'original el 4 agost 2022. [Consulta: 23 juliol 2022].
↑ ^4,0 ^4,1 Russell, Stuart J. Artificial intelligence: A modern approach (en anglès). 4th. Pearson, 2020, p. 31–34. ISBN 978-1-292-40113-3. OCLC 1303900751.
↑ Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control (en anglès). Penguin Random House, 2020. ISBN 9780525558637. OCLC 1113410915.
↑ Christian, Brian. The alignment problem: Machine learning and human values (en anglès). W. W. Norton & Company, 2020. ISBN 978-0-393-86833-3. OCLC 1233266753. Arxivat 2023-02-10 a Wayback Machine.
↑ ^7,0 ^7,1 Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran Stanford CRFM, 12-07-2022. arXiv: 2108.07258.
↑ Zaremba, Wojciech. «OpenAI Codex» (en anglès). OpenAI, 10-08-2021. Arxivat de l'original el 3 febrer 2023. [Consulta: 23 juliol 2022].
↑ Kober, Jens; Bagnell, J. Andrew; Peters, Jan (en anglès) The International Journal of Robotics Research, 32, 11, 01-09-2013, pàg. 1238–1274. DOI: 10.1177/0278364913495721. ISSN: 0278-3649 [Consulta: 12 setembre 2022].
↑ Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (en anglès) Artificial Intelligence, 316, 01-03-2023, pàg. 103829. DOI: 10.1016/j.artint.2022.103829. ISSN: 0004-3702.
↑ Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control (en anglès). Penguin Random House, 2020. ISBN 9780525558637. OCLC 1113410915.
↑ Stray, Jonathan (en anglès) International Journal of Community Well-Being, 3, 4, 2020, pàg. 443–463. DOI: 10.1007/s42413-020-00086-3. ISSN: 2524-5295. PMC: 7610010. PMID: 34723107.
↑ Russell, Stuart. Artificial Intelligence: A Modern Approach (en anglès). Prentice Hall, 2009, p. 1010. ISBN 978-0-13-604259-4.
↑ Smith, Craig S. «Geoff Hinton, AI's Most Famous Researcher, Warns Of 'Existential Threat'» (en anglès). Forbes. [Consulta: 4 maig 2023].
↑ Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control (en anglès). Penguin Random House, 2020. ISBN 9780525558637. OCLC 1113410915.
↑ ^16,0 ^16,1 Ortega, Pedro A. «Building safe artificial intelligence: specification, robustness, and assurance» (en anglès). DeepMind Safety Research – Medium, 27-09-2018. Arxivat de l'original el 10 febrer 2023. [Consulta: 18 juliol 2022].
↑ ^17,0 ^17,1 Rorvig, Mordechai. «Researchers Gain New Understanding From Simple AI» (en anglès). Quanta Magazine, 14-04-2022. Arxivat de l'original el 10 febrer 2023. [Consulta: 18 juliol 2022].
↑ Russell, Stuart; Dewey, Daniel; Tegmark, Max AI Magazine, 36, 4, 31-12-2015, pàg. 105–114. DOI: 10.1609/aimag.v36i4.2577. ISSN: 2371-9621 [Consulta: 12 setembre 2022].
↑ Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes Journal of Machine Learning Research, 18, 136, 2017, pàg. 1–46.
↑ Heaven, Will Douglas. «The new version of GPT-3 is much better behaved (and should be less toxic)» (en anglès). MIT Technology Review, 27-01-2022. Arxivat de l'original el 10 febrer 2023. [Consulta: 18 juliol 2022].
↑ Clifton, Jesse. «Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda». Center on Long-Term Risk. Arxivat de l'original el 1 gener 2023. [Consulta: 18 juliol 2022].
↑ Prunkl, Carina. «Beyond Near- and Long-Term». A: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (en anglès). New York NY USA: ACM, 2020-02-07, p. 138–143. DOI 10.1145/3375627.3375803. ISBN 978-1-4503-7110-0.
↑ Irving, Geoffrey; Askell, Amanda Distill, 4, 2, 19-02-2019, pàg. 10.23915/distill.00014. DOI: 10.23915/distill.00014. ISSN: 2476-0757 [Consulta: 12 setembre 2022].

[aima4-1] Russell, Stuart J. Artificial intelligence: A modern approach (en anglès). 4th. Pearson, 2020, p. 31–34. ISBN 978-1-292-40113-3. OCLC 1303900751.

[aima42-2] Russell, Stuart J. Artificial intelligence: A modern approach (en anglès). 4th. Pearson, 2020, p. 31–34. ISBN 978-1-292-40113-3. OCLC 1303900751.

[Falsehoods-3] Wiggers, Kyle. «Falsehoods more likely with large language models». VentureBeat, 20-09-2021. Arxivat de l'original el 4 agost 2022. [Consulta: 23 juliol 2022].

[aima43-4] 4,0 ^4,1 Russell, Stuart J. Artificial intelligence: A modern approach (en anglès). 4th. Pearson, 2020, p. 31–34. ISBN 978-1-292-40113-3. OCLC 1303900751.

[:2102-5] Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control (en anglès). Penguin Random House, 2020. ISBN 9780525558637. OCLC 1113410915.

[Christian2020-6] Christian, Brian. The alignment problem: Machine learning and human values (en anglès). W. W. Norton & Company, 2020. ISBN 978-0-393-86833-3. OCLC 1233266753. Arxivat 2023-02-10 a Wayback Machine.

[Opportunities_Risks-7] 7,0 ^7,1 Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran Stanford CRFM, 12-07-2022. arXiv: 2108.07258.

[OpenAICodex-8] Zaremba, Wojciech. «OpenAI Codex» (en anglès). OpenAI, 10-08-2021. Arxivat de l'original el 3 febrer 2023. [Consulta: 23 juliol 2022].

[9] Kober, Jens; Bagnell, J. Andrew; Peters, Jan (en anglès) The International Journal of Robotics Research, 32, 11, 01-09-2013, pàg. 1238–1274. DOI: 10.1177/0278364913495721. ISSN: 0278-3649 [Consulta: 12 setembre 2022].

[10] Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (en anglès) Artificial Intelligence, 316, 01-03-2023, pàg. 103829. DOI: 10.1016/j.artint.2022.103829. ISSN: 0004-3702.

[:21022-11] Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control (en anglès). Penguin Random House, 2020. ISBN 9780525558637. OCLC 1113410915.

[12] Stray, Jonathan (en anglès) International Journal of Community Well-Being, 3, 4, 2020, pàg. 443–463. DOI: 10.1007/s42413-020-00086-3. ISSN: 2524-5295. PMC: 7610010. PMID: 34723107.

[AIMA-13] Russell, Stuart. Artificial Intelligence: A Modern Approach (en anglès). Prentice Hall, 2009, p. 1010. ISBN 978-0-13-604259-4.

[:2-14] Smith, Craig S. «Geoff Hinton, AI's Most Famous Researcher, Warns Of 'Existential Threat'» (en anglès). Forbes. [Consulta: 4 maig 2023].

[:21023-15] Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control (en anglès). Penguin Random House, 2020. ISBN 9780525558637. OCLC 1113410915.

[building2018-16] 16,0 ^16,1 Ortega, Pedro A. «Building safe artificial intelligence: specification, robustness, and assurance» (en anglès). DeepMind Safety Research – Medium, 27-09-2018. Arxivat de l'original el 10 febrer 2023. [Consulta: 18 juliol 2022].

[:333-17] 17,0 ^17,1 Rorvig, Mordechai. «Researchers Gain New Understanding From Simple AI» (en anglès). Quanta Magazine, 14-04-2022. Arxivat de l'original el 10 febrer 2023. [Consulta: 18 juliol 2022].

[18] Russell, Stuart; Dewey, Daniel; Tegmark, Max AI Magazine, 36, 4, 31-12-2015, pàg. 105–114. DOI: 10.1609/aimag.v36i4.2577. ISSN: 2371-9621 [Consulta: 12 setembre 2022].

[prefsurvey2017-19] Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes Journal of Machine Learning Research, 18, 136, 2017, pàg. 1–46.

[LessToxic-20] Heaven, Will Douglas. «The new version of GPT-3 is much better behaved (and should be less toxic)» (en anglès). MIT Technology Review, 27-01-2022. Arxivat de l'original el 10 febrer 2023. [Consulta: 18 juliol 2022].

[21] Clifton, Jesse. «Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda». Center on Long-Term Risk. Arxivat de l'original el 1 gener 2023. [Consulta: 18 juliol 2022].

[22] Prunkl, Carina. «Beyond Near- and Long-Term». A: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (en anglès). New York NY USA: ACM, 2020-02-07, p. 138–143. DOI 10.1145/3375627.3375803. ISBN 978-1-4503-7110-0.

[23] Irving, Geoffrey; Askell, Amanda Distill, 4, 2, 19-02-2019, pàg. 10.23915/distill.00014. DOI: 10.23915/distill.00014. ISSN: 2476-0757 [Consulta: 12 setembre 2022].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]