An early-stage AI safety research group based in Sydney, Australia
An early-stage AI safety research group based in Sydney, Australia
People
Updated 06/10/26By grantmaking.aicreator
Funding Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- -
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Stage
- -
- Fiscal Sponsor
- -
Project Details
Updated 06/10/26By grantmaking.aiEdit: The scope of this grant application has increased based on feedback. See comments
Project summary
Lyptus Research is an early-stage AI safety research group building on the foundations of two previous Manifund grants. We published our first major work, Offensive Cyber Time Horizons, in April 2026. We are now growing to a team of three and building out a research group collective in Sydney, Australia.
We are out of money. We're requesting bridge funding to keep this team together and operating while we set up as an ACNC-registered charity and prepare larger, workstream-specific grant applications.
What we're building
AI safety nonprofits in San Francisco and London are reportedly talent-starved. Australia has the opposite problem. The talent exists but there is nowhere near enough institutional capacity to translate it into impact. Lyptus Research aims to help provide this capacity.
We're approaching this through a workstream model where each research program earns its own project-specific funding. Our current workstreams are:
1. Cyber and Control evaluations. Our published work sits here. Our research interests lean toward human-grounded studies, evaluations at the intersection of cyber and control, and high-quality science communication that reaches both the AI safety community and policy audiences.
2. Pragmatic Interpretability. Led by Slava Chalnev. Currently in an exploratory phase, with early work on activation oracles and model self-steering via activation probes.
What we delivered
From two small grants totalling $73K, we published Offensive Cyber Time Horizons. A new application of the METR time-horizon methodology to offensive cybersecurity, grounded in a human expert study with 10 professional security practitioners.
Why bridge funding
We're completely out of money (in fact we went over-budget). Our plan is project-specific grants for each workstream, but we want to be genuinely thoughtful about those applications. We'd like to be ambitious with our research directions and that deserves careful scoping rather than a rushed proposal.
This bridge fund keeps the team intact for two months while we:
- Finalise workstream plans
- Incorporate as an ACNC-registered Australian charity
We are deliberately keeping this ask small and scoped as a bridge. The larger workstream grants will follow.
Prospective Ideas
As stated our research direction at this point is still being scoped. We have prospective ideas, but these will require a lot more thought, and may very likely not reflect what we end up working on at all.
This is why we have decided to just do this bridge grant.
With that said, providing examples is perhaps illustratively useful. Our favourite ideas include:
[Cyber] Real World Red Teaming
- Partner with a red teaming organisation and ~5 consenting target businesses of increasing scale (whether by headcount, infrastructure scope, revenue etc.)
- Run human red-teams against each target
- Run model red-teams against each target
Models are clearly extremely effective at identifying exploits and even broader pentesting. But we believe full end-to-end red teaming still has a ways to go.
Gray Swan and Stanford did the first example of this late last year.
[Cyber & Control] Measuring Human-Grounded Covert Capability on Cyber Tasks
- Collect human attacker transcripts through cyber experts across many of the higher quality control settings
- Measure model covert capability Elos against human Elos
- Data share these transcripts with organisations like Redwood
[Pragmatic Interpretability] Cycle Consistent Activation Oracles
- Activation oracles train a model to interpret activations, sidestepping the problem of understanding messy LLM internals directly
- We explore cycle consistency training is a way to get around the lack of ground truth training data
- Next steps include mixing cycle consistency training with standard activation oracle training, and changes to architecture and training setup.
[Pragmatic Interpretability] Self-Steering
- Give models tools to apply activation steering interventions on themselves
- Observe behavior across various experimental setups
Budget
- Personnel (3 staff, 2 months, incl. super & workers comp): $50,000
- Back pay, founder shortfall (Feb to mid-Apr 2026): $20,000
- Model API credits: $5,000
- Infrastructure (AWS): $3,000
- SaaS & tooling: $2,000
- Travel (international network building): $5,000
- Fiscal sponsorship fee (5%): $5,000
- Buffer: $10,000
- Total: $100,000
Salaries are benchmarked against Australian AI Safety Institute bands.
The team
Sean Peters (Founder) — Software engineer and team lead for 12 years across research domains. Microkernels at Data61, radio astronomy at ICRAR, cancer proteomics at CMRI, cultivated meat at Vow.Jack Payne (Technical Staff) — Graduated 2025. Worked as an ML engineer while completing AI safety fellowships through TARA, SPAR, and Oxford ARBOx. First author on the cyber horizons paper. Built the evaluation harness and managed the human expert study.Slava Chalnev (Technical Staff) — Former ML engineer, MATS alumnus, independent mechanistic interpretability researcher, and former startup founder. Published on activation steering with sparse autoencoders and early work on transcoders. Building out a pragmatic interpretability research workstream under Lyptus, with its own larger funding application to follow.
Previous funding
- Manifund (Joel Becker), $32,000 USD, Sep 2025. Career transition.
- Manifund (Joel Becker), $41,000 USD, Nov 2025. Cyber horizons & attack selection. Cyber horizons completed. Attack selection work shelved.
- Total received: $73,000 USD
Grants Received
Updated 06/10/26By grantmaking.aiDiscussion
Here's my updated sense of where Sean and co. are at, extremely heavily influenced by colleagues, especially Nate Rush:
- My original comment on Sean continues to stand up well today. He appears to be among the very most competent people building on top of METR work.
- The work he links above was competently executed. Their baselines/time estimates are reasonable (less common than you might think), they have a high attention-to-detail vibe, the write-up is fairly impressive and not overclaiming, etc. The work compares favorably to UK AISI's analogous work.
- Unfortunately, best guess is that all of their tasks would saturate for most recent models.
Coming out of this:
- I think it's a shame that Sean needed to dip into personal funds to pay for this project.
- I consider it ~totally obvious that Lyptus' work is worthy of further funding.
- I really, really, really, really, really want Lyptus to (1) throw more AI labor at their problems and (2) create dramatically harder cyber tasks. I think the end work will be much more useful to the degree that these things are true, less helpful to the degree they're not.
- My best guess is that unfortunately Sean is continuing his streak of under-asking for funding, wanting to bring about higher-depth proof of concepts, clarity of workstreams etc. than I feel I need to be confident for $Xk bar. From my perspective, we're in a state of extreme triage; I do not want Lyptus to be wasting time fundraising. (And to some extent wasting my time doing larger number of funding rounds!) I would like Sean to expand the maximum funding here by a minimum of $20k (so that he can absorb $20k backpay + $100k prospectively), and very plausibly by a lot more than that. I am not at all saying that I would be comfortable at $500k right now, but I think that's where we should be starting the conversation. (As in, that does not obviously seem like the maximum to me, although it doesn't feel like minimum either.)
I'm delighted to recommend funding Lyptus' maximum here; I hope they can shift work to more ambitious ends that I think would be more helpful, and I hope they ask for more funding now.
(Noting that after conversation with Nate it's now less obvious to me that Lyptus should focus on dramatically harder cyber tasks in particular. Although I continue to think that there is a very, very, very high premium on dramatically harder tasks in general. Want to leave the focus more flexible depending on Lyptus' taste.)
Approving this project! I'm glad that @joel_bkr and others at METR are so excited for Sean's work, and from my perspective, this kind of active, expert-led, fast grantmaking is where the regranting mechanism shines.
I also appreciate Joel taking the time to lay out his considerations in making this grant, which are hopefully helpful for Sean and others in similar positions. (And @seanpetersau, insofar as you would be open to accepting more funding through this Manifund project, let me know; we're very happy to increase the funding goal on your behalf!)
Thanks @Austin. It appears it was simple enough to edit myself!
To start! Thanks Joel! I very sincerely appreciate the candid and clearly worded feedback. I spent a bunch of time reflecting on this today.
With this in mind, I've drafted two budget tiers today with the information I have. This would be spent over two workstreams.
Our initial bridging fund was allocated to 2 months, with lower API credit and contractor asks, while we scoped future work. These tiers would have us operational for 6 months, and greatly increase the API credit and contractor budgets.
Tier 1: $300K
Workstream 1 — Cyber & Control:
- Salaries: $100K
- Model API credits: $50K
- Red team contractors / human experts: $50K
Workstream 2 — Pragmatic Interpretability:
- Salary + compute: $60K
Backpay + Overheads + Buffer: $40K
Tier 2: $600K
Workstream 1 — Cyber & Control:
- Salaries: $160K (supports an additional senior cyber researcher hire)
- Model API credits: $150K
- Red team contractors / human experts: $150K
Workstream 2 — Pragmatic Interpretability:
- Salary + compute: $60K
Backpay + Overheads + Buffer: $40K
What we're thinking
Workstream 2 is nascent and exploratory. I am particularly excited about upcoming work exploring self-steering via activation probes with potential applications in model welfare, model preferences and speculatively alignment stability. This workstream is a first step to support a core mission of Lyptus. Creating and growing institutional capacity, support and direction for untapped talent in Australia.
Workstream 1 is our larger investment. Through the course of our own work (and further so with the recent Mythos model card), it's plainly clear to us that cyber evaluations are materially not keeping up with capability growth. Honestly, the accelerating progress did take me by surprise and our dataset would certainly be saturated at higher token budgets. That said, while these steepening trendlines show no signs of stopping, we do believe current absolute capabilities can be overstated. Full red teaming engagements against even moderate blue teams are a very different proposition to isolated benchmark tasks.
To ground this with an illustrative example of where our thinking is. We're currently scoping the tractability of scaling up the ideas presented in the Artemis paper. Evaluating AI agents via real red teaming engagements and comparing their performance to professional human red teamers. This captures the full end-to-end scope of an attacker against real production organisations.
This is clearly operationally difficult. Though early conversations with senior offensive security leadership suggest it is tractable. We believe with the right people involved we can design incentive structures that work for all parties. A typical engagement with two professionals runs around $30K USD per week. We would ramp up starting with smaller targets.
This is speculative. We are having early conversations, and we are genuinely in a phase of figuring things out. It is our current leading candidate project but not our only option. What it represents is our broader interest in human grounded studies and science communication. Results that are quickly interpretable beyond the AI security ecosystem. We think this suits our strengths better than building cyber benchmark tasks, which requires deeper offensive security expertise than we have in house.
What the tiers mean
At tier 1 we would more likely prioritise budget friendly work like human-grounded covert capability on cyber tasks orhuman-grounded studies on cyber stealth benchmarks. If we had strong multi-organisation traction on the red teaming idea we would still pursue it, but budget constraints have a way of subtly pressuring priorities against more ambitious work.
At tier 2 the more ambitious work becomes realistic. A senior cyber researcher hire, 2-3 red teaming engagements at small to medium scale with client co-financing, and enough API credits to properly test models at adequate token budgets.
Other ideas in this space would require internal budget rescoping but the broad structure holds.
I have confidence in Sean based on his previous work, and I think locations with great talent (like Australia, India, Germany, etc) must be funded appropriately in order to create institutional capacity to nurture that talent. I really really hope that this project gets funded. All the best, Sean and #TeamLyptus :)
Recommended an additional $50k to Lyptus on same thesis as the below. I'm still curious about Lyptus absorbing more than this, but I only had another $50k in my Manifund pot! I've reached out to some funders about considering funding Lyptus further. I hope my comment below can serve as some evidence. For now, without more regrantor recommendation budget, I'm bowing out. Wishing Sean et al. the best of luck!!
I have confidence in Sean and Jack's previous work and think it's worth boosting efforts to keep things like this operating in Australia. Bridging the ACNC registration gap seems worthwhile.