The pattern is familiar to anyone who has sat on an academic technology committee. A platform arrives with strong vendor references and a compelling demo. A pilot runs across one department or cohort. Feedback is positive. The evaluation committee signs off. Then the institution commits, and within eighteen months, usage has cratered, IT is managing an integration nobody fully owns, and the faculty who championed the tool during the pilot have quietly stopped using it. This is not a story about bad technology. It is a story about what pilots are actually designed to measure, and why that almost never includes the conditions that determine whether a tool survives at an institutional scale.
The Pilot Is a Different Environment
When an institution runs an edtech pilot, it typically does so with a self-selected cohort: motivated early adopters, a dedicated project lead, a vendor support team on standby, and a defined timeline that creates a sense of shared investment. These conditions are not representative of full institutional deployment. They are, by design, the best case scenario. The gap between pilot performance and institutional reality is not anecdotal; it is a documented pattern in how higher education makes technology decisions. A qualitative study by researchers at Columbia University’s Centre for Benefit Cost Studies of Education, published in Educational Technology Research and Development, found that most institutions select edtech tools based on evidence that is “less than rigorous”, drawing on internal pilots and stakeholder sentiment rather than comparative or outcomes based research. Based on interviews with 45 decision makers across 42 U.S. institutions, the researchers found that peer reviewed external research informed the selection process in barely a fifth of cases. As lead researcher, Fiona Hollands noted, even when a tool demonstrates success in a pilot, “it is a long and uncertain path to conclude that it will work here in your own particular context with your own constituents, prevailing conditions, and support factors.”
That gap is structural, not incidental. A pilot can hit its uptime targets, generate positive educator sentiment, and run integration flows without errors and still stall at renewal, because it was designed to validate features under favourable conditions, not to survive a procurement committee, a change in departmental leadership, or the withdrawal of vendor side implementation support. The pilot proves that a platform can work. It rarely proves that the institution can sustain it.
Champions Don’t Scale
One of the most consistent failure modes in edtech adoption is what might be called the champion dependency problem. A single enthusiastic administrator or lead educator drives the pilot, builds workarounds, absorbs friction, and produces outcomes that look like institutional success. Then they leave, or get promoted, or simply move on to the next initiative. This dynamic has a well established theoretical foundation. Everett Rogers’ Diffusion of Innovations, the most cited framework in technology adoption research, identifies the internal champion as essential for moving an innovation from trial to early adoption within an organisation. But Rogers is equally explicit about the limits of that role: champions accelerate early uptake; they do not create the institutional conditions for sustained use.
The reason is structural, not personal. When a champion drives an edtech initiative, the institutional knowledge required to keep it running (which configurations were customised, which workarounds were built, which faculty needed extra convincing, which IT constraints were quietly negotiated around) lives in that person’s head, not in any documentation, policy, or shared process. Once they leave, the institution doesn’t lose an enthusiast. It loses the entire operating manual. This is particularly acute in higher education, where departmental autonomy means that a successful pilot in one faculty rarely transfers organically to another. The pilot proves that this person, in this context, with this level of support, could make the tool work. It proves almost nothing about whether the institution can.
The Metrics Pilots Collect Are the Wrong Metrics
The data most pilots generate (engagement rates, session counts, feature adoption, user satisfaction scores) are designed to satisfy instructional evaluation goals. They tell a committee that people used the platform and didn’t hate it. They say almost nothing about what happens when the system scales across five faculties, or when the IT team that built the integration turns over, or when a grant funded implementation period ends and the tool has to survive on departmental budgets. A 2023 systematic review, Implementing Educational Technology in Higher Education Institutions: A Review of Technologies, Stakeholder Perceptions, Frameworks and Metrics, published in Education and Information Technologies, found that institutional aversion to the perceived risks of new technology is among the most persistent barriers to edtech adoption in higher education and that aversion compounds with each implementation that fails to survive beyond its initial rollout. When a tool collapses after commitment, the damage is rarely contained to that vendor. It recalibrates the institution’s appetite for the entire category of innovation the tool represented, making the next adoption conversation harder before it has even begun.
Assessment technology is where this gap is most consequential. Unlike a content platform or collaboration tool, an online assessment system sits at the intersection of academic integrity, institutional data governance, accessibility compliance, and high stakes outcomes. A failure in this context is not an inconvenience; it is a credentialing risk. Yet the evaluation framework most institutions apply to assessment platforms during pilots is the same surface level checklist they apply to everything else: does it have the features we need, does it integrate with our LMS, does the vendor have good support ratings? These are necessary questions, but they are not sufficient.
The harder questions (about concurrent user load at exam peak times, about what happens to data governance when a cohort rolls over, about who owns configuration after the pilot team disbands) rarely appear in a standard pilot evaluation rubric. Knowing which questions to ask before committing is precisely what separates institutions that adopt successfully from those that cycle through platforms every three years. For teams doing that groundwork seriously, resources that help review online assessment technology at the right level of specificity can be the difference between a decision that holds and one that unravels at scale.
Procurement Culture Selects for Demo Performance
Underneath the pilot problem is a procurement culture problem. The institutions with the most rigorous formal evaluation processes often end up most exposed, because their rigour is applied to compliance criteria rather than operational reality. RFP scoring rubrics reward vendors who can document features comprehensively, not vendors whose platforms hold up under messy, decentralised, under resourced institutional conditions. A 2024 study published in Management Matters on edtech adoption in educational institutions identified institutional support structures and compatibility with existing academic practices as among the most decisive variables in whether technology actually takes root. Factors that feature checklists and demo sessions are poorly positioned to reveal. The vendor, meanwhile, optimises for exactly what the evaluation process rewards: a polished demo, strong documentation, and a pilot environment where their customer success team can ensure nothing goes wrong. This is rational vendor behaviour. It is also why the pilot passes, and the adoption fails.
What Rigorous Adoption Actually Requires
The institutions that move through pilot to durable adoption share a few structural habits that have little to do with which platform they chose and almost everything to do with how they made the decision. They define success criteria before the pilot begins, not in terms of feature usage, but in terms of the institutional outcomes the tool is supposed to support. They involve the people who will own the tool post adoption (IT, academic services, examination offices) from the first planning conversation, not the last approval meeting. They simulate scale during the pilot by intentionally including edge cases: high volume concurrent access, non standard device environments, and users who are not early adopters. And critically, they treat the pilot not as a verdict but as a stress test, looking not for evidence that the tool works under ideal conditions, but for evidence that it can survive the conditions the institution will actually create for it.
The Brookings Institution’s research on scaling edtech interventions found that successful edtech implementation requires, above all, a clear theory of change and evidence that the tool addresses a real operational need, not just a perceived one surfaced during a vendor led discovery process. The pilot, in that framing, is not the evaluation. It is the beginning of it. The platform that passed your pilot might be the right platform. Or the pilot might simply have been too well managed, too small, and too friendly an environment to tell you anything useful either way.
Share your achievements, innovations, or announcements with the academic world!
Submit your guest post or press release to AIJR and amplify your voice across the education and research community.
