News

lesswrong. com
lesswrong. com > posts > Xcj Kmpgtu2ij PToiz > an-argument-for-analogies-polymaths-1-3-1

An Argument for Analogies'Polymaths 1/3 ' Less Wrong

2+ hour, 26+ min ago  (234+ words) The following is a link-post to a series about polymathy (. .ism?) and makes a case for arguing by analogy as opposed to first principles (most of the...

lesswrong. com
lesswrong. com > posts > oy TQpq Btdawj Anfcm > why-i-am-not-too-worried-about-aipocalypse-scott-alexander

Why I am not too worried about AIpocalypse: Scott Alexander vs Nicolaus Copernicus " Less Wrong

8+ hour, 17+ min ago  (30+ words) I have no good gears-level model of AI, and the expert views are all over the place (see AI Doc), so the only remaining argument is my physical intui...

lesswrong. com
lesswrong. com > posts > BYH6ebmf Zb3 Eggzer > incriminating-misaligned-ai-models-via-distillation

Incriminating misaligned AI models via distillation " Less Wrong

7+ hour, 4+ min ago  (205+ words) Suppose we have a dangerous misaligned AI that can fool alignment audits, and distill it into a student model. Two things can happen: "...

lesswrong. com
lesswrong. com > posts > c Nymohc Wt GHz W7 Aj K > risk-reports-need-to-address-deployment-time-spread-of

Risk reports need to address deployment-time spread of misalignment " Less Wrong

10+ hour, 28+ min ago  (22+ words) Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from an internally deployed AI. However, an AI that genui...

lesswrong. com
lesswrong. com > posts > 7 Ry Aef ESvb6 BQ3t Mz > mechanistic-estimation-for-expectations-of-random-products

Mechanistic estimation for expectations of random products " Less Wrong

11+ hour, 57+ min ago  (21+ words) We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as e...

lesswrong. com
lesswrong. com > posts > s FMrz D58 Cnjo7t Sqb > clarifying-the-darwinian-honeymoon

Clarifying the Darwinian Honeymoon " Less Wrong

12+ hour, 25+ min ago  (875+ words) Here's what I AM saying but is not my main point: I hoped to convey an additional thing - here's the original post's description of it: I need a shorthand for this vague concept of "runaway civilizational growth, but specifically framed…...

lesswrong. com
lesswrong. com > posts > e FD3roz NCZKMe4r Ts > mats-9-retrospective-and-advice

MATS 9 Retrospective & Advice " Less Wrong

16+ hour, 17+ min ago  (493+ words) With that being said, there's a lot I wish I knew going into MATS, so here's a brain-dump of thoughts. It's not extremely polished, but I expect it'll be useful nonetheless (none of this is endorsed by MATS, just my…...

lesswrong. com
lesswrong. com > posts > Ct RNm AQXm Sotp Ai9j > announcing-the-center-for-shared-ai-prosperity

Announcing the Center for Shared AI Prosperity " Less Wrong

15+ hour, 44+ min ago  (214+ words) Our main purpose as an organization is to surface tractable ideas across four main areas: We have two tracks for idea proposals. Track 1 involves submitting a 500-1, 000 word writeup of an idea; this track does not offer compensation but can involve…...

lesswrong. com
lesswrong. com > posts > n6 G553y Qv Ays Stsvw > data-quality-is-way-underrated-and-we-should-start-funding

Data Quality is Way Underrated, and We Should Start Funding It. " Less Wrong

1+ day, 30+ min ago  (1054+ words) The title for this post is inspired by: Forecasting is Way Overrated, and We Should Stop Funding It " Less Wrong Summary Data quality in Africa is near-universally poor, especially at a sub-national level. Organisations and individuals who care about development,…...

lesswrong. com
lesswrong. com > posts > Xqh9b Dw7 Ei5b Ex C6h > claude-is-now-alignment-pretrained-1

Claude is Now Alignment Pretrained " Less Wrong

2+ day, 5+ hour ago  (526+ words) Anthropic are now actively using the approach to alignment often called "Alignment Pretraining" or "Safety Pretraining" " using Stochastic Gradient Descent on a large body of natural or synthetic documents showing the AI assistant doing the right thing. They tried this…...