Research2026-06-30

A Comparative Study of Student Perspectives on Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics

Originally published byArxiv CS.AI

arXiv:2601.11541v3 Announce Type: replace-cross Abstract: To address the scalability of feedback in computer science while mitigating the privacy and cost limitations of commercial Large Language Models (LLMs), this study evaluates a locally hosted Small Language Model (SLM). We deployed a...

The Local Model Advantage: SLMs Outperform LLMs in Student Feedback

A new study from arXiv (2601.11541v3) has directly compared how students perceive feedback quality from Large Language Models (LLMs), Small Language Models (SLMs), and human instructors on technical writing in computer science. The researchers deployed a locally hosted SLM alongside commercial LLMs and human evaluators, then surveyed students on the usefulness, clarity, and actionability of the feedback they received.

The core finding is striking: students rated the SLM-generated feedback as comparable to or better than both commercial LLMs and human instructors in several key dimensions. This contradicts the common assumption that larger models always produce superior educational outputs. The locally hosted model, which runs on modest hardware without sending data to external servers, achieved high marks for specificity and relevance to technical writing tasks.

Why This Matters

This research addresses three critical pain points in AI-assisted education:

Privacy and cost barriers have been the primary obstacles to widespread adoption of AI feedback tools in academia. Commercial LLMs require sending student work to external servers, raising FERPA and GDPR compliance issues. They also incur per-token costs that become prohibitive at scale. A locally hosted SLM eliminates both concerns entirely. Feedback quality is not simply a function of model size. The study suggests that smaller, task-specific models can outperform general-purpose LLMs when properly fine-tuned for educational contexts. Students valued concise, actionable feedback over verbose analysis—a strength of SLMs that are less prone to over-generation. Scalability without compromise becomes achievable. Human instructors cannot provide detailed feedback to every student in large CS courses. SLMs offer a middle path: better than fully automated LLM feedback, cheaper and more private than human grading, and deployable on institutional infrastructure.

Implications for AI Practitioners

For educational technology teams, the takeaway is clear: local SLMs are not a compromise but a strategic advantage. Practitioners should:

Evaluate SLMs fine-tuned on domain-specific writing rubrics rather than defaulting to GPT-4 or Claude for educational feedback
Prioritize model deployment that keeps student data on-premises, which also eliminates latency and API dependency issues
Design feedback systems that emphasize conciseness and actionable suggestions over comprehensive commentary

The study also highlights an important methodological insight: student perception data matters more than automated metrics. An SLM that scores lower on BLEU or ROUGE can still deliver more pedagogically useful feedback.

Key Takeaways

Locally hosted SLMs can match or exceed commercial LLMs in student-rated feedback quality for technical writing
Privacy, cost, and scalability advantages make SLMs the practical choice for educational institutions
Feedback effectiveness depends more on task-specific tuning than raw model size
Educational AI deployments should prioritize student perception metrics over standard NLP evaluation benchmarks

Read Original Article on Arxiv CS.AI

arxivpapers