Give your 'human-level agents' a proper head start with these 3 best practices

Tharon Green/ZDNET/Getty Images

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

Setting up governance and evaluation are keys to designing agents.
Start small with agents rather than try to replace entire workflows.
Clean, well-organized data makes all agency work smoother.

Computing is at the threshold of “nearly human-level agents,” according to Mustafa Suleyman, Microsoft CEO of AI, in a recent opinion column for the MIT Technology Review.

Special Feature

But there are many stumbling blocks along the way. Businesses are overwhelmed with trying to redesign their workflows and decide what information agentic AI programs should have access to.

A consequence of the challenges, database technology giant Databricks noted in its recent State of AI Agents report, is that “Only 19% of organizations have deployed AI agents, and mostly to a limited extent.”

“If you talk to a lot of chief financial officers, they will tell you, ‘I have three concerns’,” Craig Wiley, the head of AI for Databricks, told ZDNET.

“Can you control it, can you tell me if it’s any good [meaning, does what comes out of the model actually provide value], and how much does it cost?”

To address those concerns, said Wiley, enterprises should consider up-front, before they implement agents, three best practices:

Control it (governance)
Evaluate for correctness
Start small to maximize efficiency and payoff

Also: AI agents are fast, loose, and out of control, MIT study finds

Can you control it?

Craig Wiley

“Can you control it?” boils down to the practice of governance, which starts with controlling what data an agent will access.

AnAI agent is an artificial intelligence program that can go beyond simple turn-by-turn prompting, as offered by ChatGPT and similar bots. An agent can plug into corporate resources such as databases. It can execute computer code outside what’s included in a large language model. It can invoke external programs such as email systems. It can string together multiple actions of varying types to execute entire workflows.

Also: How to build better AI agents for your business – without creating trust issues

The first rule in data access is to do no harm. A Databricks client, the women’s health application Flow, has 75 million users who use the app for personalized assessments and advice.

“They have this challenge, which is they want to offer stronger and stronger feedback and advice and guidance and insight to their app users,” Wiley explained. “But they need to be unbelievably careful because this is very sensitive data, so the last thing they would want is an app user to get a response that includes some other app user’s information in it.”

Also: I built an app for work in 5 minutes with Tasklet – and watched my no-code dreams come true

To protect such data leakage, Wiley said, a governance system “should be able to very selectively say, ‘Hey, these tools or this data, that’s data everyone can use; the data over here only should be used by the user.'”

Asset manager Franklin Templeton took a similar level of care when sending portfolio reports to clients. “The last thing I want [as a fund client] is to get an email from my financial advisor that’s about [someone else’s] your information,” he observed.

“Oftentimes, what we see is customers get really excited about a use case, they start driving it, and then they run into one of these walls where they say, Oh, our questions or our responses need to be different by user,” he said. “And that needs to be enforced, not just suggested in the prompt, but it needs to be deterministically forced.”

Connecting the dots in the data

The next part of governance is defining the question and identifying the resource that should have the answer.

As Wiley framed the challenge, “How do I align my question with the perfect data to support my question with the right model to get that response?” The goal is to avoid making the agentic AI program “transactional,” like a chatbot, where a person is expected to keep posing yet another question.

Also: I asked 5 data leaders about how they use AI to automate – and end integration nightmares

Design the agent so that it finds many connected pieces of data to automatically let the human user go deeper into the subject.

Wiley cited Edmunds, the online car-buying operation, which created an agentic information tool for internal use to efficiently manage car sales called Edmunds Mind. It was designed to merge many more aspects of a potential purchase.

Wiley explained, “Instead of just asking which car is the best convertible for sale and how much does it cost, they can ask which car dealerships are underserved by looking at traffic data and demographic data on top of listings data and pricing data, in a much more kind of comprehensive way.”

Such an agent “takes a whole series of steps potentially to ensure that the responses are high-quality responses,” he said, so that “I’m not responsible [as the user] for delivering all of the information to the model.”

To implement governance, something called a data catalog does two things. First, it is a “single pane of glass” that lets an IT admin see everything the agent has access to, including structured and unstructured data, the Model Context Protocol for external tool calling, and the tools being invoked.

Second, a catalog enforces identities, including the identity of an agent and the information it has access to, as well as the identity of a user. The catalog tracks those identities throughout the agent’s activity to keep data segmented, so it is accessed only by the agent and the user to the extent their identities grant them permission.

Being careful with governance from the outset “as a first-class principle in your design” makes customers much more likely to get agents into production than those “who are just kind of freewheeling these things,” he said. “It really comes down to that deliberateness of design.”

How do you know if it’s correct?

The second element is thinking really carefully about how to evaluate what comes out of the model.

When Flow’s app developers “were seeking to drive accuracy, the people who were evaluating whether or not these agents were saying what they should say were actually physicians, not programmers. The software programmers write what’s called the orchestration system, which manages the agents, but it is physicians who were saying, ‘this response over here needs additional context or color or what have you,'” Wiley said.

Evaluation is ongoing throughout the life of the program and at multiple levels, said Wiley. “Not just what did the agent get asked and what did it answer, but at every intermediate step of its thinking, what exactly was it doing, and was it aligned with getting to the right answer?”

Also: This AI expert says the job apocalypse isn’t coming, even if you’re a coder – here’s why

If something is off, roll the agent back to the evaluation stage, redeploy, and “keep that loop going so we can build the kinds of automated learning types of agents that folks I think are really hungry for.”

Accuracy has enabled Flow to deliver an application to market that is differentiated by the quality of the user experience, noted Wiley. More broadly, as with governance, companies that can evaluate the output of agents are six times more likely to get into production, he added.

Small is beautiful

The third concern, cost, is easier because it is an outcome of doing the first two things right, governance and evaluation. “Once you can do those two things, to be honest, the rest of it becomes implementation details,” said Wiley.

But cost has to be considered from the outset.

“It is something we spend a lot of time talking to customers about,” said Wiley. “Is this something that we can solve today inside a reasonable cost envelope? And assuming we can solve it inside that reasonable cost envelope, is it actually going to move the needle in your company?”

“There is an important consideration with implementation,” Wiley continued, “and that’s to consider starting small and building at a pace at which agents can be governed and verified. We’re seeing companies of varying levels of ambition, and ambition is great. [However], with all software projects, the smaller and more atomic I can build individual pieces that I can then test and confirm work, then I can build those into a larger kind of confederacy of capabilities that can go do a much larger task.”

Also: True agentic AI is years away – here’s why and how we get there

As an example of focus, Wiley cited convenience store chain 7-Eleven, whose service techs have to go on-site to repair equipment. When they don’t have the right manuals, it’s either a wasted trip or a more complicated job than it should be.

By having agents access tons of documentation, the company could provide techs with a “super assistant,” said Wiley, “where they can go search every single issue that’s ever been filed against these machines, and every single manual and spec, and they are no longer calling their buddy asking, ‘Have you seen this problem before?'”

Another example is Baylor University, which uses agents to review recordings of every call with a prospective student to analyze elements such as the student’s decision factors for a school, when humans taking the call don’t have the time or energy to take comprehensive notes.

“They’re able to learn a lot more about their own organization now by listening to their customers to a depth that they’ve never been able to listen before,” said Wiley.

Probably less successful would be trying to replace complete workflows with agents, he said.

“If I were trying to replace my ERP or a SaaS system that my organization uses, the last thing I would do is start with a single prompt that says, Hey, I want a new general ledger system,” said Wiley. “I would go after it component by component.”

What’s the payoff?

It is still early to have concrete figures for the industry’s financial return on investment from agents, said Wiley. “We’re probably sitting in the equivalent of 2001 on the web, where companies are investing in their web pages but don’t really understand the purpose of all this yet.”

There are encouraging anecdotal examples. Franklin Templeton’s automation of investment portfolio analysis enabled the firm to identify over $15 million in new product opportunities, such as gaps in a client’s portfolio.

Also: Scaling agentic AI means trusting your data – here’s what most CDOs are investing in

Companies see their KPIs (key performance indicators) moving in the right direction, such as 7-Eleven seeing a 25% increase in first-time fix rates for equipment and a 40% drop in time to repair, which can lead to cost savings.

The final element is the time required to conceive, build, and deploy. From Wiley’s perspective, it goes back to “making sure your data is clean and in the right place” at the outset of agentic AI.

Organizing data at the outset will increase the “velocity” of a project, he said. “Then your software developers, data scientists, agent developers… they’ll be able to run fast if that’s the case. ‘If your data’s in good shape, we could do it [meaning, build and deploy an agentic system] this afternoon. If your data is in rough shape, then the real problem is going to be how long it takes us to get your data in order.'”

Give your 'human-level agents' a proper head start with these 3 best practices

ZDNET’s key takeaways

Special Feature

Can you control it?

Connecting the dots in the data

How do you know if it’s correct?

Small is beautiful

What’s the payoff?

Artificial Intelligence

Related Posts

Samsung just accidentally leaked the Galaxy Wide Fold and Z Fold 8, and I’m not sure I’d buy either of them

This floating data center ship could power future AI systems offshore

This critical Linux vulnerability is putting millions of systems at risk – how to protect yours

I've tested dozens of Sony headphones – these 4 tweaks get me the best sound quality

The Samsung Galaxy S26 Ultra falls to a new record-low price at Amazon — get $300 off

I

The best mobile antivirus software of 2026: Expert tested and reviewed