Close Menu
Savannah HeraldSavannah Herald
    • Home
    • Features
      • View All On Demos
    • Buy Now
    We're Social
    • Twitter
    • Facebook
    • YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Trending
    • Ciara Miller: I Now Realize “How Much” West Wilson Hated Her
    • SWAC Basketball Tournament makes changes for 2027
    • Analiza oferty bukmacherskiej forbet dla graczy w Polsce
    • Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights
    • Georgia Trend Daily – June 11, 2026
    • FIFA World Cup 2026 kicks off at Atlanta’s Fan Fest
    • Cartagena, Colombia for Black Travelers: Culture, Beaches, and Afro-Colombian Heritage
    • Voting Rights Act: An Essential Right – NY Carib News
    Facebook X (Twitter) Instagram YouTube
    Login
    Savannah HeraldSavannah Herald
    Savannah HeraldSavannah Herald
    Home » Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights
    Tech

    Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights

    Savannah HeraldBy Savannah HeraldJune 11, 20269 Mins Read
    Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Tumblr Email
    Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Tomorrow’s Tech, Today: Innovation That Moves Us Forward

    Key takeaways
    • SkillOpt imports deep-learning controls, edit budget as learning rate, validation gates, and momentum to stabilize iterative skill document optimization.
    • SkillOpt outperforms baselines across models and harnesses, creating compact, transferable skills that improve reliability in multi-step enterprise workflows.
    • Efficient token footprint, low everyday training cost, integrates with existing orchestration, requires representative held-out examples and a scorable feedback signal.

    Agent skills have become an important part of real-world AI applications, providing a mechanism — a set of instructions saved in a folder of text-based markdown (.md) files, usually — for models to adapt to specific enterprise use cases and complex workflows.

    However, optimizing these skills is a slow process and faulty process, as they cannot be trained in the same way as the parameters of the underlying AI model. Instead, users typically must update them manually by retyping the instructions in each file, playing a “guessing game” as to what changes might improve agentic AI performance and reduce errors.

    SkillOpt, a new, open source (MIT Licensed) framework developed by Microsoft, does one better: it introduces an optimizer designed for agent skills, turning the agent’s skill .md document as a trainable object that evolves based on performance feedback.

    It uses deep-learning-style optimization to make it possible for the AI to systematically explore modifications to the document and find the best combination of instructions. Most importantly, it accomplishes this procedural adaptation without making changes to the underlying model’s weights.

    On various industry benchmarks, SkillOpt outperforms existing baselines, significantly boosting accuracy for models like GPT-5.5 and Qwen. The result is a set of compact, transferable skill artifacts that allow AI agents to adapt to new domains effortlessly.

    The challenge of optimizing agent skills

    Agent skills package procedural knowledge into natural-language specifications, including domain heuristics, tool-use policies, output constraints, and known failure modes. These skills provide an external interface for agents to adapt to complex enterprise workflows. In practice, agent skills are stored as text documents and inserted into the agent’s context before execution.

    One of the key benefits of skills is that they customize the behavior of the underlying model without changing its weights. However, the skill document itself needs to be tweaked and optimized to get the best performance out of the agent.

    While deep learning relies on strict mathematical controls for stability, human prompt engineering often relies on trial and error. When attempting to automatically update a skill document based on feedback, the lack of mathematical discipline makes text highly volatile.

    Yifan Yang, Senior Research SDE at Microsoft Research Asia, told VentureBeat that the problem is not making changes, but ensuring those changes are mathematically sound.

    “The breaking point isn’t whether a team can change a skill, it’s that they can’t guarantee the change is an improvement,” Yang said. “Three failure modes recur: no step-size control, so skills drift; no validation, so a fix that reads as reasonable gets written in and can quietly regress performance; and no negative memory, so the same failed edit keeps coming back.”

    To illustrate how easily performance can drop when edits aren’t mathematically validated, Yang noted that “an ungated rewrite pushed GPT-5.5 on SpreadsheetBench from 41.8 down to 41.1.”

    According to Yang, these failure modes are amplified in multi-step workflows “because that’s where frontier models are weakest zero-shot. Not on reasoning, but on procedural discipline: format, self-verification, tool policy.”

    Before SkillOpt, agent skills were primarily hand-crafted, generated in a single shot, or evolved through loosely controlled self-revision pipelines that could not reliably improve under feedback.

    Prompt optimization methods like TextGrad and GEPA treat language artifacts as optimizable objects and use trajectory feedback to evolve prompts, but they focus on single-prompt configurations rather than generating persistent, reusable skill artifacts.

    Meanwhile, skill evolution and discovery methods like EvoSkill and Trace2Skill convert agent execution experiences into trajectory lessons to refine skill folders, build domain-specific libraries, or perform evolutionary search.

    None of them apply deep-learning-style controls, such as learning rates, validation gates, and momentum, which are necessary to continuously train a single, compact skill document.

    Importing mathematical discipline to text

    SkillOpt optimizes a text document through an iterative propose-and-test loop that separates the model executing the tasks from the model optimizing the skill. The process unfolds in several steps:

    • SkillOpt starts with an initial skill document and a frozen target model (or harness), where the target model runs a batch of tasks to generate execution trajectories that act as the evidence for the current step.

    • An offline optimizer model analyzes these trajectories, separating successes from failures into minibatches. Looking at a minibatch helps the model identify systematic procedural errors rather than one-off anomalies. Based on these patterns, the optimizer proposes structural add, delete, or replace edits to the skill document.

    • The proposed edits are reviewed to filter out duplicates or contradictions, and the optimizer then ranks these candidate edits by their expected utility.

    • Rather than applying all proposed changes, SkillOpt clips the list to a maximum edit budget for that step, generating a candidate skill.

    • The candidate skill is evaluated on a held-out validation set using the target model. If the candidate improves the validation score, it is accepted and becomes the new current skill. If it fails, the edits are rejected and sent to a rejected-edit buffer, providing negative feedback so the optimizer knows not to repeat that mistake.

    SkillOpt directly addresses the problem of treating text as a trainable object by importing mathematical concepts from deep learning. The creators note that “the deep-learning analogy is operational rather than decorative,” helping the framework avoid the instability issues associated with other optimization techniques.

    SkillOpt pipeline

    SkillOpt framework (source: arXiv)

    The edit budget acts as a learning rate. By limiting how many edits can be applied at once, the skill version is prevented from moving too far from its previous state, preserving continuity while allowing new procedures to be acquired. 

    Just like checking validation loss in deep learning, the strict held-out examples ensure that plausible-sounding text edits are only kept if they mathematically improve the agent’s actual performance on the validation split.

    At the end of an epoch, SkillOpt performs a slow update by comparing tasks under the previous and current epoch’s skills. This acts like a momentum term, carrying durable, long-horizon procedural lessons forward while isolating them from the fast, step-level edits.

    SkillOpt in action

    To evaluate the technique in practice, researchers tested SkillOpt across different models, ranging from large-scale frontier models like GPT-5.5 to smaller closed and open models including GPT-5.4-mini and Qwen3.5-4B. They also deployed the skills within different execution harnesses, using plain chat as well as complex coding harnesses like the Codex CLI and Claude Code.

    The evaluation spanned diverse industry benchmarks including single-round question-answering, multi-round code generation involving tool use, and multimodal document reasoning. SkillOpt was measured against multiple baselines ranging from a default no-skill setting to human-written skills and one-shot LLM-generated skills. It was also compared against advanced prompt-optimization and skill-evolution methods, specifically Trace2Skill, TextGrad, GEPA, and EvoSkill.

    SkillOpt dominated across the board, proving highly effective on all 52 evaluated combinations of model, benchmark, and harness. It was particularly effective with frontier models, delivering an average absolute improvement of +23.5 points against the no-skill baseline on GPT-5.5. Furthermore, SkillOpt outperformed a hypothetical oracle baseline that cherry-picks the best competing method for every problem.

    Small target models saw immense relative gains, proving that a compact text file can supply procedural knowledge that small models lack in their weights. For example, GPT-5.4-nano nearly doubled its score on multimodal document QA and tripled its score on embodied interaction and sequential decision-making.

    These academic benchmarks map to critical enterprise pain points. Zero-shot models often hallucinate formatting or fail to use tools properly in multi-step scenarios. Yang explained that the biggest performance leaps occurred in operations that enterprises historically struggle to automate reliably.

    “Document data extraction… exact figures out of contracts, invoices, and forms — AP automation, claims, compliance,” Yang said. “What improves is reliability: precise formatting, self-verification, auditable outputs. And the gains come from learning procedure, not memorizing answers.”

    For enterprise practitioners, the true value of SkillOpt lies in its portability, efficiency, and compatibility with existing infrastructure. Experiments confirm that the framework is harness-agnostic. In addition to basic chat, the same optimization loop was successfully integrated into tool-backed execution environments like the Codex CLI and Claude Code with significant gains on industry benchmarks.

    Developers can train a skill using one execution loop and deploy it in another. For example, a spreadsheet skill trained entirely inside the Codex loop was moved directly into Claude Code and drove a +59.7 point gain over Claude Code’s native baseline without any further changes.

    SkillOpt artifacts also transfer cleanly across model scales. A skill optimized for GPT-5.4 was deployed onto the smaller GPT-5.4-mini and GPT-5.4-nano models with positive gains, proving that the learned procedures encode reusable workflows rather than just exploiting quirks of a specific model’s architecture.

    Finally, the framework is highly efficient regarding token usage and context window real estate. Across all benchmarks, the final deployed skills never exceeded 2,000 tokens, with a median length of roughly 920 tokens. This results in highly readable, auditable artifacts that a human practitioner can review and manage in minutes.

    Implementation strategies and the enterprise ‘catch’

    For enterprise tech leaders, adopting a new framework requires understanding the overhead and limitations. While the research paper notes that training tokens can reach up to 210 million for academic benchmarks, the reality for day-to-day enterprise use cases is much lighter. The high token counts in testing were largely due to re-scoring massive held-out test sets.

    “The real upfront work is the verifier and a representative held-out split. The optimizer is light; the evaluation harness is where the engineering goes,” Yang said. He added that for everyday use, “in community frameworks like GBrain, where SkillOpt updates run on Claude Sonnet, training a skill for a single task averages just $1–5.” This optimization cost is a one-time fee that amortizes completely at deployment.

    However, the framework requires specific conditions to work effectively, namely a few dozen representative examples and a scorable feedback signal. Teams should avoid applying SkillOpt to open-ended or subjective tasks. “With no clean automatic scorer you have to design a human- or model-based evaluator and watch its stability,” Yang said.

    SkillOpt also integrates smoothly with existing orchestration stacks, removing a major adoption hurdle. For instance, developers already using pipeline compilers can run both systems harmoniously. “DSPy is a different, complementary layer,” Yang said. “It compiles declarative LM pipelines and optimizes program structure; SkillOpt optimizes the external skill state a frozen agent loads. You can run them together.”

    Looking ahead, open-source developers are already scheduling SkillOpt to run periodically over their agents’ past trajectories, creating a small ecosystem of self-optimizing code-agent plugins. This continuous feedback loop represents a significant shift in how AI systems adapt.

    “The valuable version of self-improvement is an agent autonomously discovering knowledge to improve its own behavior and the user experience, under verification and audit,” Yang said. “Skills are the fastest, cheapest, most reversible first step, and the same mindset points toward agents eventually optimizing themselves, all the way down to their own weights.”

    Read the full article on the original site


    AI and Machine Learning Black Technologists Cybersecurity News Digital Innovation Emerging Technologies Future of Work Gadget Reviews Innovation in Education Minorities in Tech Silicon Valley Updates Smart Devices Software Development Startup News STEM News Tech Culture Tech Equity Tech for Good Tech Industry Updates Tech Trends Technology News
    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Tumblr Email
    Savannah Herald
    • Website

    Related Posts

    Tech June 10, 2026

    Bluesky Will Soon Have a Subreddit-Like ‘Communities’ Feature

    Tech June 10, 2026

    Windows Ready Print is Microsoft’s biggest overhaul of Windows printing in years

    Tech June 10, 2026

    4 Black Billionaires Make Forbes’ 2026 America’s Richest Self-Made Women List

    Tech June 9, 2026

    Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

    Tech June 9, 2026

    Councils exit 10-year Capita deal to boost decision and project velocity

    Tech June 8, 2026

    New AI Approach for Christian Schools: Built on a Biblical Worldview

    Comments are closed.

    Don't Miss
    Politics May 28, 2026By Savannah Herald06 Mins Read

    Wrapping up the 25-26 Legislative Session

    May 28, 2026

    Local Voices. Statewide Impact. Stay Informed with Georgia News The 2025-26 legislative biennial ended on…

    4 Windows Wi-Fi repairs that in fact quit arbitrary stagnations and goes down

    January 3, 2026

    #WowWednesday: Luxurious Lakefront Property in Sandy Springs

    August 29, 2025

    N64 App on Switch Online to Get Switch 2 Exclusive Features

    August 28, 2025

    Clean Sunscreen for Black People That Won’t Ghost You

    November 16, 2025
    Archives
    • June 2026
    • May 2026
    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • November 2025
    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    Categories
    • Art & Literature
    • Beauty
    • Black History
    • Business
    • Climate
    • Culture
    • Education
    • Employment
    • Entertainment
    • Faith
    • Fashion
    • Food
    • Gaming
    • Georgia Politics
    • HBCUs
    • Health
    • Health Inspections
    • Investing
    • Lifestyle
    • Local
    • Lowcountry News
    • National
    • National Opinion
    • News
    • Politics
    • Real Estate
    • Senior Living
    • Sports
    • State
    • Tech
    • Traffic
    • Transportation
    • Travel
    • World
    Savannah Herald Newsletter

    Subscribe to Updates

    A round up interesting pic’s, post and articles in the C-Port and around the world.

    About Us
    About Us

    The Savannah Herald is your trusted source for the pulse of Coastal Georgia and the Low County of South Carolina. We're committed to delivering timely news that resonates with the African American community.

    From local politics to business developments, we're here to keep you informed and engaged. Our mission is to amplify the voices and stories that matter, shining a light on our collective experiences and achievements.
    We cover:
    🏛️ Politics
    💼 Business
    🎭 Entertainment
    🏀 Sports
    🩺 Health
    💻 Technology
    Savannah Herald: Savannah's Black Voice 💪🏾

    Our Picks

    The Essential Roy Ayers” Playlist (LISTEN) – Good Black News

    August 28, 2025

    Take-Two’s chief executive officer is possibly the globe’s only individual to decline the possibility to play GTA 6 since “I’m not a player” – his duty is to “leave their means”

    August 28, 2025

    Athlete of the Week for June 1, 2026

    June 1, 2026

    YouTuber and Wife Ended Pregnancy After Down Syndrome Diagnosis. They Got Death Threats.

    June 6, 2026

    Every Must-See Look From Coachella Weekend 1

    April 14, 2026
    Categories
    • Art & Literature
    • Beauty
    • Black History
    • Business
    • Climate
    • Culture
    • Education
    • Employment
    • Entertainment
    • Faith
    • Fashion
    • Food
    • Gaming
    • Georgia Politics
    • HBCUs
    • Health
    • Health Inspections
    • Investing
    • Lifestyle
    • Local
    • Lowcountry News
    • National
    • National Opinion
    • News
    • Politics
    • Real Estate
    • Senior Living
    • Sports
    • State
    • Tech
    • Traffic
    • Transportation
    • Travel
    • World
    Copyright © 2002-2026 Savannahherald.com All Rights Reserved. A Veteran-Owned Business

    Type above and press Enter to search. Press Esc to cancel.

    Manage Consent
    To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    • Manage options
    • Manage services
    • Manage {vendor_count} vendors
    • Read more about these purposes
    View preferences
    • {title}
    • {title}
    • {title}
    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.

    Sign In or Register

    Welcome Back!

    Login below or Register Now.

    Lost password?

    Register Now!

    Already registered? Login.

    A password will be e-mailed to you.