Skip to content

Dev weekly 2026-Feb-09

Published: at 02:02 AM

AI

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? from Anthropic

2026 Agentic Coding Trends Report by anthropic

Unrolling the Codex agent loop

Evaluating Deep Agents: Our Learnings from langchain

State of Agent Engineering by langchain

姚顺雨在腾讯首个研究:Context Learning Benchmark Context Learning Benchmark https://arxiv.org/abs/2602.03587 结合后面提到的“AGENTS.md outperforms skills in our agent evals”一起看就有意思了

How AI Impacts Skill Formation 过分依赖人工智能完成不熟悉任务的新手工作者可能会在这个过程中牺牲自己的技能获取

Radar Trends to Watch: January 2026

Claude Code / Codex / Gemini CLI 全方位辅助工具 *****

AGENTS.md outperforms skills in our agent evals **** 我们在注入的内容里加了一条关键指令。 “IMPORTANT: Prefer retrieval-led reasoning over pre-training-led reasoning for any Next.js tasks.”

Inside OpenAI’s in-house data agent

Demystifying evals for AI agents 揭开 AI agent 评估的神秘面纱

Programming

Vibe Engineering: What I’ve Learned Working with AI Coding Agents

The Devdocs Methodology

Custom instructions with AGENTS.md Codex reads AGENTS.md files before doing any work. By layering global guidance with project-specific overrides, you can start each task with consistent expectations, no matter which repository you open.

Pi is a minimal terminal coding harness. Pi runs in four modes:

The 2026 Guide to AI Agents - our one-stop resource for gaining in-depth knowledge and hands-on applications of AI agents.

深度解析:Moltbot 底层架构 Agent Loop 的核心运行时环境被称为 Pi(源自 @mariozechner/pi-agent-core[5])

qmd - mini cli search engine for your docs, knowledge bases, meeting notes, whatever

The 80% Problem in Agentic Coding

Automate Development Tasks with goose Headless Mode

The Rise of Coding Agent Orchestrators

State of C++ 2026

Nanobot: Ultra-Lightweight Alternative to OpenClaw

3天5k+星标,港大开源极致轻量OpenClaw, 1%代码量打造个人专属贾维斯

Others

From Pilot to Profit: Survey Reveals the Financial Services Industry Is Doubling Down on AI Investment and Open Source

European Transparent IT JOB Market Report 2025

《康熙的红票》作者最新力作,讲述康熙朝储位之争的新故事|豆瓣一周新书精选

Breaking Down the Shocking Ending of Fallout Season 2 time没有paywall了

美剧窝

新一代中性能图片查看器

深度学习 本质上就是三步

把东西“翻译成数字” → 让机器试着算答案 → 根据算错的地方反过来改参数。

Encoding(编码):把现实世界的东西变成机器能算的数字。Decoding(解码):把机器算出来的数字再变回人能理解的结果。反向传播(Backpropagation):发现算错了,从结果倒推,逐层修正内部参数。神经网络做的事情很朴素:一层一层做:加权求和 + 非线性变化。反向传播(Backpropagation):错了怎么办?

先说“正向”,训练时流程是:输入 → Encoding → 网络计算 → 输出 → 和标准答案比。关键问题:到底是哪一层、哪个参数,导致我认错了?直接答案:不知道,只能从后往前一点点算。就是“反向传播”,反向传播 = 甩锅大会。

流程:先算“我错得有多离谱”(loss),从最后一层开始问:“你对这个错误负多少责任?”,把责任一点点往前传。

每个参数:往“让错误变小”的方向微调一点点。一个极简比喻,你在练投篮:投偏了,你不会“全部推倒重来”,而是:手抬高一点,力气小一点。反向传播就是告诉你:哪个“动作”该怎么微调。

不管多复杂,最小深度学习程序一定包含这 5 个部分:1. 数据(Data)2. 编码(Encoding)3. 模型(Model)4. 损失函数(Loss)5. 反向更新(Backprop + Update)

for 每一次训练:
    x = encode(输入)
    y_pred = model(x)
    loss = compare(y_pred, y_true)
    根据 loss 反向调整 model 参数

最简单的损失函数:平方误差,loss = (预测值 - 正确值)^2

我们用一个“极简版反向传播”:
  w = w - 学习率 * 梯度
  b = b - 学习率 * 梯度

你现在不需要理解数学细节,只记住一句话:反向传播 = 算“该往哪个方向调参数,调多少”

把一切连起来(训练循环)
import random

# 生成训练数据
data = []
for _ in range(100):
    x = random.uniform(0, 10)
    y = 1 if x > 5 else 0   # 人工规则
    data.append((x, y))

def encode(x):
    return x

def model(x):
    return w * x + b

def decode(y):
    return 1 if y > 5 else 0

def loss_fn(y_pred, y_true):
    return (y_pred - y_true) ** 2

lr = 0.01  # 学习率

def backward(x, y_pred, y_true):
    global w, b
    grad = 2 * (y_pred - y_true)
    w -= lr * grad * x
    b -= lr * grad

for epoch in range(50):
    total_loss = 0

    for x, y in data:
        x_enc = encode(x)
        y_pred = model(x_enc)
        loss = loss_fn(y_pred, y)
        backward(x_enc, y_pred, y)
        total_loss += loss

    print(f"epoch {epoch}, loss {total_loss:.2f}")


那 PyTorch / TensorFlow 干了什么?一句话:帮你自动写了 backward()。

现实模型:参数成千上万,梯度人算不可能

PyTorch 的核心价值只有一个:
  loss.backward()
  optimizer.step()

你刚才“手搓”的逻辑,它帮你自动化了。