Posts

NIST Framework: Thoughts

NIST Framework 1.0  To explore threat modelling for societal risks from AI, I read up on the NIST Framework and made a baseline NIST User Profile for the use case of: “LLM Public Use In Islamic Jurisprudence And Theology By Minors”:   NIST AI Risk Management Framework 1.docx This taught me what NIST is good at, what its not so good at, and how we can improve it. What is NIST good at? NIST is quite useful in identifying which actors exist at each stage of the AI lifecycle, and what risks pertain to each actor at every stage. This systematic evaluation helps formulate and expand your thoughts on the reach of your system. NIST also works quite well for specific domain risk analysis. What is NIST not so good at? NIST does not provide guidance for analysing risk presented by technologies that present cross-domain risks with different risk severity and probability profiles for each domain. The NIST cross-sectoral profile on Generative Artificial Intelligence highlight...

Islam x AI

Islam x AI Overview This blog post outlines why the Muslim community should engage with the development and conversation of AI.  This post is targeted at a Muslim audience. There is a lot that can be said about how the Muslim community – with all of its scholarly brilliance, political rollercoasters and diverse, rich capabilities – can contribute in the conversation of AI. Importantly, we have, for the past century or two, been a rather reactive community, tossed about amidst an ocean of political strive, economic struggle and moral disparity. However, we have, today, incredible resource and capacity within our community: high degrees of education, fair penetration in global economy and (to a lesser extent) politics, and a retention and nurturing of our spiritual wellbeing. We are by no means wealthy in any of these domains, but at least sufficient to grow something collectively. Using these resources, how can we grow a collective, proactive response to AI? Here, I conceive of...

Thoughts On EM

  Emergent Misalignment & What It Might Mean [2502.17424] Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs The above paper indicates a very interesting phenomenon: finetuning an LLM on a relatively insecure task makes it misaligned on a broad range of tasks, some of which are quite funny/scary. I have seen some other research done as a result of the above: Anthropic:  Persona Vectors: Monitoring and Controlling Character Traits in Language Models This is a really cool work where they find activation directions that correspond to certain concepts; other authors have done similar. Note that they did not solve EM, but rather were inspired by the phenomenon to create a methodology that can help us in investigating EM and interpretability in general. MATS:  Model Organisms for Emergent Misalignment I have not read this in detail, but from a quick skim it looks like they find one LoRA weight that is substantially affected by the EM shift -- shini...

README

This blog space is my own personal CoT ( chain-of-thought ) as I am studying my AI MRes at UCL. Not all content in this blog will be super interesting, but I hope it might have a few bright sparks here and there. Enjoy!