News
An Argument for Analogies'Polymaths 1/3 ' Less Wrong
2+ hour, 26+ min ago (234+ words) The following is a link-post to a series about polymathy (. .ism?) and makes a case for arguing by analogy as opposed to first principles (most of the...
Why I am not too worried about AIpocalypse: Scott Alexander vs Nicolaus Copernicus " Less Wrong
8+ hour, 17+ min ago (30+ words) I have no good gears-level model of AI, and the expert views are all over the place (see AI Doc), so the only remaining argument is my physical intui...
Incriminating misaligned AI models via distillation " Less Wrong
7+ hour, 4+ min ago (205+ words) Suppose we have a dangerous misaligned AI that can fool alignment audits, and distill it into a student model. Two things can happen: "...
Risk reports need to address deployment-time spread of misalignment " Less Wrong
10+ hour, 28+ min ago (22+ words) Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from an internally deployed AI. However, an AI that genui...
Mechanistic estimation for expectations of random products " Less Wrong
11+ hour, 57+ min ago (21+ words) We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as e...
Clarifying the Darwinian Honeymoon " Less Wrong
12+ hour, 25+ min ago (875+ words) Here's what I AM saying but is not my main point: I hoped to convey an additional thing - here's the original post's description of it: I need a shorthand for this vague concept of "runaway civilizational growth, but specifically framed…...
MATS 9 Retrospective & Advice " Less Wrong
16+ hour, 17+ min ago (493+ words) With that being said, there's a lot I wish I knew going into MATS, so here's a brain-dump of thoughts. It's not extremely polished, but I expect it'll be useful nonetheless (none of this is endorsed by MATS, just my…...
Announcing the Center for Shared AI Prosperity " Less Wrong
15+ hour, 44+ min ago (214+ words) Our main purpose as an organization is to surface tractable ideas across four main areas: We have two tracks for idea proposals. Track 1 involves submitting a 500-1, 000 word writeup of an idea; this track does not offer compensation but can involve…...
Data Quality is Way Underrated, and We Should Start Funding It. " Less Wrong
1+ day, 30+ min ago (1054+ words) The title for this post is inspired by: Forecasting is Way Overrated, and We Should Stop Funding It " Less Wrong Summary Data quality in Africa is near-universally poor, especially at a sub-national level. Organisations and individuals who care about development,…...
Claude is Now Alignment Pretrained " Less Wrong
2+ day, 5+ hour ago (526+ words) Anthropic are now actively using the approach to alignment often called "Alignment Pretraining" or "Safety Pretraining" " using Stochastic Gradient Descent on a large body of natural or synthetic documents showing the AI assistant doing the right thing. They tried this…...