16 min read

So you want to work on technical AI safety

Advice for aspiring safety researchers

I’ve been to two EAGx events and one EAG, and the vast majority of my one on ones with junior people end up covering some subset of these questions. I’m happy to have such conversations, but hopefully this is more efficient and wide-reaching (and more than I could fit into a 30 minute conversation).

I am specifically aiming to cover advice on getting a job in empirically-leaning technical research (interp, evals, red-teaming, oversight, etc) for new or aspiring researchers without being overly specific about the field of research – I’ll try to be more agnostic than something like Neel Nanda’s mechinterp quickstart guide but more specific than the wealth of career advice that already exists but that applies to ~any career. This also has some overlap with this excellent list of tips from Ethan Perez but is aimed a bit earlier in the funnel.

This advice is of course only from my perspective and background, which is that I did a PhD in combinatorics, worked as a software engineer at startups for a couple of years, did the AI Futures Fellowship, and now work at Timaeus as the research lead for our language model track. In particular, my experience is limited to smaller organizations, so “researcher” means some blend of research engineer and research scientist rather than strictly one or the other.

Views are my own and don’t represent Timaeus and so on.

Requisite skills

What kind of general research skills do I need?

There’s a lot of tacit knowledge here, so most of what I can offer is more about the research process. Items on this list aren’t necessarily things you’re expected to just have all of or otherwise pick up immediately, but they’re much easier to describe than e.g. research taste. These items are in no particular order:

  • Theory of change at all levels. Yes, yes, theories of change, they’re great. But theories of change are most often explicitly spoken of at the highest levels: how is research agenda X going to fix all our problems? Really, it’s theories of change all the way down. The experiment you’re running today should have some theory of change for how you understand the project you’re working on. Maybe it’s really answering some question about a sub-problem that’s blocking you. Your broader project should have some theory of change for your research agenda, even though it probably isn’t solving it outright. If you can’t trace up the stack why the thing you’re doing day to day matters for your ultimate research ambitions, it’s a warning flag that you’re just spinning your wheels.
  • Be ok with being stuck. From a coarse resolution, being stuck is a very common steady state to be in. This can be incredibly frustrating, especially if you feel external pressure from feeling that you’re not meeting whatever expectations you think others have or if your time or money is running out (see also below, on managing burnout). Things that might help for a new researcher are to have a mentor (if you don’t have access to a human, frontier LLMs are (un)surprisingly good!) that can reassure you that your rate of progress is fine and to be more fine-grained about what progress means. If your experiment failed but you learned something new, that’s progress!
  • Quickly prune bad ideas. Always look for cheap, fast ways to de-risk investing time (and compute) into ideas. If the thing you’re doing is really involved, look for additional intermediates as you go that can disqualify it as a direction.
  • Communication. If you’re collaborating with others, they should have some idea of what you’re doing and why you’re doing it, and your results should be clearly and quickly communicated. Good communication habits are kind of talked about to death, so I won’t get into them too much here.
  • Write a lot. Writing is thinking. I can’t count the number of times that I felt confused about something and the answer came while writing it down as a question to my collaborators, or the number of new research threads that have come to mind while writing a note to myself or others.
  • Be organized. Figure out some kind of system that works for you to organize your results. When you’re immersed in a research problem, it can feel deceptively easy to just keep all the context of your work and the various scattered places information is stored in your head. I currently keep a personal research log[1] in a Google doc (also visible to my collaborators) that I write entries into throughout the lifetime of a project. The level of detail I aim for is to be able to revisit an entry or a plot months later and to be able to recall the finer points from there – on average this actually happens in practice about once a week and has saved me a great deal of headache.
  • Continually aim just beyond your range. Terry Tao has a ton of great career advice, much of which is transferable to other fields of research beyond math. Research is a skill, and like many other skills, you don’t grow by just doing the same things in your comfort zone over and over.
  • Make your mental models legible. It’s really hard to help someone who doesn’t make it easy to help them! There’s a ton of things that feel embarrassing to share or ask, and this is often a signal that you should talk about it! But it’s also important to communicate things that you don’t feel embarrassed about. You might be operating off of subtly bad heuristics, and someone with more experience can only correct you if you either say things in a way that reveals the heuristic or if you do a thing that reveals it instead (which is often more costly).
  • Manage burnout. The framing that I find the most helpful with burnout is to think of it as a mental overuse injury, and the steps to recover look a lot like dealing with a physical overuse injury. Do all the usual healthy things (sleep enough, eat well, exercise) and ease into active recovery, which emphatically does not look like taking a few days off and then firing on all cylinders again. Much like physical overuse injuries, it’s possible to notice signs ahead of time and to take preventative steps earlier. This is much easier after going through the process of burning out once or twice. For example, I notice that I start doing things like snacking more, procrastinating sleep, and playing more video games. These things happen well before there’s any noticeable impact on my productivity. Finally, burning out is not a judgment of your research ability – extremely competent researchers still have to manage burnout, just as professional athletes still have to manage physical injuries.

What level of general programming skills do I need?

There is a meaningful difference between the programming skills that you typically need to be effective at your job and the skills that will let you get a job. I’m sympathetic to the view that the job search is inefficient / unfair and that it doesn’t really test you on the skills that you actually use day to day. It’s still unlikely that things like LeetCode are going to go away. A core argument in their favor is that there’s highly asymmetric information between the interviewer and interviewee and that the interviewee has to credibly signal their competence in a relatively low bandwidth way. False negatives are generally much less costly than false positives in the hiring process, and LeetCode style interview questions are skewed heavily towards false negatives.

Stepping down from the soapbox, the table stakes for passing technical screens are knowing basic data structures and algorithms and being able to answer interview-style coding questions. I personally used MIT’s free online lectures, but there’s an embarrassment of riches out there. I’ve heard Aditya Bhargava’s Grokking Algorithms independently recommended several times. Once you have the basic concepts, do LeetCode problems until you can reliably solve LeetCode mediums in under 30 minutes or so. It can be worth investing more time than this, but IME there are diminishing returns past this point.

You might also consider trying to create some small open source project that you can point to, which can be either AI safety related or not. A simple example would be a weekend hackathon project that you put on your CV and your personal GitHub page that prospective employers can skim through (which you should have, and which you should put some minimal level of effort into making look nice). If you don’t have a personal GitHub page with lots of past work on it (I don’t, all of my previous engineering work has been private IP, but do as I say, not as I do), at least try to have a personal website to help you stand out (mine is here, and I was later told that one of my blog posts was fairly influential in my hiring decision).

Once you’re on the job, there’s an enormous number of skills you need to eventually have. I won’t try to list all of them here, and I think many lessons here are better internalized by making the mistake that teaches them. One theme that I’ll emphasize though is to be fast if nothing else. If you’re stuck, figure out how to get moving again – read the documentation, read the source code, read the error messages. Don’t let your eyes just gloss over when you run into a roadblock that doesn’t have a quick solution on Stack Overflow. If you’re already moving, think about ways to move faster (for the same amount of effort). All else being equal, if you’re doing things 10% faster, you’re 10% more productive. It also means you’re making more mistakes, but that’s an opportunity to learn 10% faster too :)

What level of AI/ML experience do I need?

Most empirical work happens with LLMs these days, so this mostly means familiarity with them. AI Safety Fundamentals is a good starting point for getting a high level sense of what kinds of technical research are done. If you want to get your hands dirty, then the aforementioned mechinterp quickstart guide is probably as good a starting point as any, and for non-interp roles you probably don’t need to go through the whole thing. ARENA is also commonly recommended.

Beyond this, your area of interest probably has its own introductory materials (such as sequences or articles on LessWrong) that you can read, and there might be lists of bite-sized open problems that you can start working on.

Should I upskill?

I feel like people generally overestimate how much they should upskill. Sometimes it’s necessary – if you don’t know how to program and you want to do technical research, you’d better spend some time fixing that. But I think more often than not, spending 3-6 months just “upskilling” isn’t too efficient.

If you want to do research, consider just taking the shortest path of actually working on a research project. There are tons of accessible problems out there that you can just start working on in like, the next 30 minutes. Of course you’ll run into things you don’t know, but then you’ll know what you need to learn instead of spending months over-studying, plus you have a project to point to when you’re asking someone for a job.

Should I do a PhD?

Getting a PhD seems to me like a special case of upskilling. I used to feel more strongly that it was generally a bad idea for impact unless you also want to do a PhD for other reasons, but currently I think it’s unclear and depends on many personal factors. Because the decision is so context-dependent, it’s a bit out of scope for this post to dive into, but there are some more focused posts with good discussion elsewhere. I think my own experience was very positive for me (even if it wasn’t clear that was the case at the time), but it also had an unusual amount of slack for a PhD.

Actually getting a job

What are some concrete steps I can take?

Here’s a list of incremental steps you can take to go from no experience to having a job. Depending on your background and how comfortable you feel, you might skip some of these or re-order them. As a general note, I don't recommend that you try to do everything listed in depth. I'm trying not to leave huge gaps here, but you can and should try to skip forward aggressively, and you'll probably find that you're ready for later steps much sooner than you think you are (see also upskilling above).

  • Learn to code (see above)
  • If your area of interest is low-level (involves digging into model internals), learn the basics of linear algebra and run through some quick tutorials on implementing some model components from scratch (you can just search “implement X from scratch” and a billion relevant Medium articles will pop up)
  • Find some ML paper that has a walkthrough of implementing it. It doesn’t have to be in your area of interest or even AI safety related at all. There are varying degrees of hand-holding (which isn’t a bad thing at this stage). Here is a particularly in-depth guide that eventually implements the Stable Diffusion algorithm in part 2, but it might be a bit overkill. You can probably find plenty of other resources online through a quick web search, e.g. this HN thread.
  • Learn a little bit about your area of interest (or some area of interest; you don’t have to know right this second what field you’re going to contribute to forever!)
  • Find a paper in your area of interest that looks tractable to implement and try to implement it on your own. If you have trouble with this step, try finding one that has a paper walkthrough or an implementation somewhere on GitHub that you can peek at when you get stuck.
  • Find an open problem in your area of interest that looks like it could be done in a weekend. It’ll probably take more than a weekend, but that’s ok. Work on that open problem for a while.
  • If you get some interesting results, that’s great! If you don’t, it’s also ok. You can shop around for a bit and try out other problems. Once you’ve committed to thinking about a problem though, give it a real shot before moving on to something else. In the long run, it can be helpful to have a handful of bigger problems that you rotate through, but for now, just stick to one at a time.
  • Ideally you now have some experience with small open problems. This is where legibility becomes important – try and write up your results and host your code in a public GitHub repo.
  • Now you have something to take to other people to show that you’re capable of doing the work that you’re asking someone to pay you to do, so go and take this to other people. Start applying for fellowships, grants, and jobs. While you’re applying, continue working on things and building up your independent research experience.
  • It might take a while before you get a bite. A job would be nice at this point, but it might not be the first opportunity you get. Whatever it is, do a good job at it and use it to build legible accomplishments that you can add to your CV.
  • Continue applying and aiming for bigger opportunities as your accomplishments grow. Repeat until you have a job that you like.

The general idea here is to do a small thing to show that you’re a good candidate for a medium thing, then do a medium thing to show you can do a bigger thing, and so on. It’s often a good idea to apply for a variety of things, including things that seem out of your reach, but it’s also good to keep expectations in check when you don’t have any legible reasons that you’d be qualified for a role. Note that some technical research roles might involve some pretty unique work, and so there wouldn’t be an expectation that you have legible accomplishments in the same research area. In those cases, “qualified” means that you have transferable skills and general competency.

How can I find (and make!) job opportunities?

I used 80k Hours’ job board and LessWrong (I found Timaeus here). If you find your way into Slacks from conferences or local EA groups, there will often be job postings shared in those as well. My impression is that the majority of public job opportunities in AI safety can be found this way. I started working at Timaeus before I attended my first EAG(x), so I can’t comment on using those for job hunting.

Those are all ways of finding job opportunities that already exist. You can also be extremely proactive and make your own opportunities! The cheapest thing you can do here is just cold email people (but write good cold emails). If you really want to work with a specific person / org, you can pick the small open problems you work on to match their research interests, then reach out to discuss your results and ask for feedback. Doing this at all would put you well above the average job candidate, and if you’re particularly impressive, they might go out of their way to make a role for you (or at least have you in mind the next time a role opens up). At worst, you still have the project to take with you and talk about in the future.

Sensemaking about impact

When I first decided to start working in AI safety, I had very little idea of what was going on – who was working on what and why, which things seemed useful to do, what kinds of opportunities there were, and how to evaluate anything about anything. I didn’t already know anyone that I could ask. I think I filled out a career coaching or consultation form at one point and was rejected. I felt stressed, confused, and lonely. It sucked! For months! I think this is a common experience. It gets better, but it took a while for me to feel anywhere close to oriented. These are some answers to questions that would have helped me at the time.

What is the most impactful work in AI safety?

I spent a lot of time trying to figure this out, and now I kind of think this is the wrong way to think about this question. My first attempt when I asked myself this was something like “it must be something in AI governance, because if we really screw that up then it’s already over.” I still think it’s true that if we screw up governance then we’re screwed in general, but I don’t think that it being a bottleneck is sufficient reason to work on it. I have doubts that an indefinite pause is possible – in my world model, we can plausibly buy some years and some funding if policy work “succeeds” (whatever that means), but there still has to be something on the other side to buy time and funding for. Even if you think an indefinite pause is possible, it seems wise to have insurance in case that plan falls through.

In my model, the next things to come after AI governance buys some time are things like evals and control. These further extend the time that we can train advanced AI systems without major catastrophes, but those alone won’t be enough either. So the time we gain with those can be used to make further advances in things like interpretability. Interpretability in turn might work long enough for other solutions to mature. This continues until hopefully, somewhere along the way, we’ve “solved alignment.”

Maybe it doesn’t look exactly like these pieces in exactly that order, but I don’t think there’s any one area of research that can be a complete solution and can also be done fast enough to not need to lean on progress from other agendas in the interim. If that’s the case, how can any single research agenda be the “most impactful?”

Most research areas have people with sensible world models that justify why that research area is good to work in. Most research areas also have people with sensible world models that justify why that research area is bad to work in! You don’t have to be able to divine The Truth within your first few months of thinking about AI safety.

What’s far more important to worry about, especially for a first job, is just personal fit. Personal fit is the comparative advantage that makes you better than the median marginal person doing the thing. It’s probably a bad idea to do work that you’re not a good fit for, even if you think the work is super impactful – this is a waste of your comparative advantage, and we need to invest good people in all kinds of different bets. Pick something that looks remotely sensible that you think you might enjoy and give it a shot. Do some work, get some experience, and keep thinking about it.

On the “keep thinking” part, there’s also a sort of competitive exclusion principle at play here. If you follow your curiosity and keep making adjustments as your appraisal of research improves, you’ll naturally gradient descent into more impactful work. In particular, it’ll become clearer over time if the original reasons you wanted to work on your thing turned out to be robust or not. If they aren’t, you can always move on to something else, which is easier after you’ve already done a first thing.

On how to update off of people you talk to

Ok this isn’t a question, but it’s really hard, as a non-expert, to tell whether to trust one expert’s hot takes or another’s. AI safety is a field full of hot takes and people that can make their hot takes sound really convincing. There’s also a massive asymmetry in how much they’ve thought about it and how much you’ve thought about it – for any objection you can come up with on the spot, they probably have a cached response from dozens of previous conversations that makes your objection sound naive. As a result, you should start off with (but not necessarily keep forever) a healthy amount of blanket skepticism about everything, no matter how convincing it sounds.

Some particularly common examples of ways this might manifest:

  • “X research is too reckless / too dual-use” or “Y research is too slow / too cautious.” We all have our own beliefs about how to balance the trade-offs of research between capabilities and safety, and we self-select into research areas based on a spectrum of those beliefs. Then from where we stand, we point in one direction and say that everyone over there is too careful and point in the other direction and say that everyone over there is too careless.
  • “Z research doesn’t seem (clearly) net-positive.” People have different thresholds for what makes something obvious and also have disagreements on how much not-obviously-net-positive work is optimal (I claim that the optimal amount of accidentally net-negative work is not zero, which is probably much less controversial than if I try to claim exactly how much is optimal).

I emphatically do not mean to say that all positions on these spectrums are equally correct, and it's super important that we have truth-seeking discussions about these questions. But as someone new, you haven't yet learned how to evaluate different positions and it’s easy to prematurely set the Overton window based on the first few takes you hear.

Some encouragement

This isn’t a question either, but dropping whatever you were doing before is hard, and so is finding a foothold in something new. Opportunities are competitive and rejections are common in this space, and interpreting those rejections as “you’re not good enough” stings especially hard when it’s a cause you care deeply about. Keep in mind that applications skew heavily towards false negatives and that to whatever degree that “not meeting the bar” can be true, it is a dynamic statement about your current situation, not a static statement about who you fundamentally are. Remember to take care of yourself, and good luck.


Thanks for feedback: Jesse Hoogland, Zach Furman