A challenge to find 10 human-centered solutions in 10 days
I challenged myself a month ago to find 10 beneficial, impactful uses of LLMs in 10 days. As I share more in this post on why this was a success, I also felt the limits of this search acutely.
The center of this quest was to find how generative AI might be starting to benefit humans. My goal was to find pro-social or human-centered solutions that would not be possible without LLMs. I deliberately aimed to exclude “productivity over tedium” products, to surface real impact and not modern search tools. From reviewing ProductHunt, Google.ai and NYT, I knew text would be a real challenge: beyond productivity, virtually every “impact” use case or “AI utopia is near” speech specifically breaks from text applications.
I will explain my findings below, curated “just in time” in 10 days to meet my goal. I posted the list on LinkedIn, although the intention of this limited exploration was focused on the “promise” of each tool, not its “peril”.
All things considered, do Tech solutions to societal problems bring more good than harm? As soon as I finished reading “The Worlds I See” (a memoir by Fei Fei Li, head of Stanford’s Human Centered AI lab, HAI) I remembered learning directly in my work not to apply a cure that is worse than the disease. So I decided to revisit my list.
Before LLMs: Looking back on a decade of deep learning
Computer vision has significantly changed the modern age, recognizing objects and patterns, including medical and safety applications (e.g., radiology and police response to armed violence).
Deep learning has extended to new dimensions since then: for instance, AlphaFold’s protein folding brought a significant impact in health sciences, both diagnostic and treatment. Text-to-speech and speech-to-text have also brought wonders in many domains (including healthcare).
Another area of study and debate is the real-life outcomes of safety features in cars. Thanks to deep learning, lane departure and collision avoidance systems have reduced accidents. That said, it appears other accidents showed concerning rates of occurrence, requiring studies over the last few years to prove an overall improvement. It appears that safety features are beneficial on balance. Conversely, autonomous driving technologies cannot operate without a driver, causing complicated accountability, and the safety track record appears worse than humans per miles driven. The safety benefits are primarily from the non-autonomous ADAS (Advanced Driver Assistance Systems).
Are there significant benefits in image generators? I had the privilege of building Generative Adversarial Networks (GANs) at ODSC East 2018, and also built with Diffusion models in the last 1-2 years (for example, an escape game). It's fun to create for people we care about! Unfortunately, image generation has spurred significant backlash due to a lack of license to intellectual property and generating toxic content (see the ongoing “Take it down” campaign), to say nothing of the associated emissions, all of which weigh heavily in the face of entertainment use cases. This leaves the impact of those applications questionable.
The examples above illustrate the potential of Machine Learning / Deep Learning. But there is a contrast between these applications and Large Language Models (LLMs) trained to produce text. Some parts of AlphaFold’s architecture use Transformers, but it does not function like an LLM, nor is it trained on text data.
Are astronomical LLM investments paired with real social promise and worth the perils?
Human centered LLM-applications
10. QuitBot
When I started my quest for how text AI might be benefiting humans beyond productivity, I was excited about my first finding. Generative Text AI (LLMs) open the possibility of an interactive, always available presence for freeform dialog. A team from Microsoft AI and Fred Hutch was able to show a statistically significant improvement over a simplistic sms alternative.
What seemed even more significant was the 4 year formative process that brought it about. I still think about this example as reaping the fruits of our efforts, and I hope it can serve as an encouragement to persevere for teams that encounter challenges with premature LLM deployments.
9. Perspective API
I was torn and nearly dismissed this platform as a high-impact use of text AI (not a generative model), but I found how a team applied it as an initial trigger to rephrase user-generated posts, allowing platforms to rephrase (rather than censor) toxic comments, etc.
As we all work to bridge polarization divides and work towards an understanding of different viewpoints, this is a significant AI win that does, ultimately, include generative text! By building on the perspective API with LLMs, this team started making online discourse more civil in a paradigm that preserves representation of important topics and voices.
8. ClimateEngine / Robeco partnership
In investment technologies, this team developed generative AI solutions to scale the analysis of company assets and actions for biodiversity impact attribution. The outcome is that this framework enables sustainability-aware decisions at scale. Without analysis tools with this type of “research” capabilities, investments might fail altogether to assess the substance of corporate ESG initiatives (Environmental, Social, and Governance).
7. GroundNews
Helping people navigate truth and representation in a challenging political environment, especially considering bias risks in the media (e.g. selection bias). GroundNews developed AI-driven live summaries of news coverage, helping to assess factuality and showing clear differences in coverage between left, center, and right-leaning sources.
As I dove into that topic, I also discovered great coverage of journalism issues at last year's AI for Good summit.
6 & 5. Tabiya and CareerVillage
By incorporating LLM technology into their platforms, personal career assistance services can now reach people they couldn't otherwise, providing help for interview preparation, mock interviews, and tools to optimize resumes and cover letters. Read more at Tabiya and CareerVillage. With fewer controversial issues than therapy, adding LLM guidance in career feedback and assistance interactions can increase mobility. That in turn can have positive ripples, for instance for families facing the two body problem or return-to-office mandates
4, 3 & 2: Legal Services Organizations in Middle Tennessee, North Carolina, and JusticiaLab
I found that these three organizations developed ways to bring more legal support than previously possible to underserved rural populations as well as immigrants, helping them to navigate processes and uphold their rights.
Navigating legalese would have been prohibitively difficult to scale without LLMs’ interactivity due to requiring significant legal expertise, time, and expenses. It seems likely for these underserved populations to experience fewer cost barriers to have justice with their predicaments.
1. FullFact AI
Historically, NLP struggled with issues to assist fact-checking (covered recently by Warren et al. 2025, Mitral et al. 2024). As our rate of absorption of information transformed with the advent of social media, we needed major advances. FullFact aggregates statements from various media platforms and official sources in explanatory feedback, allowing context and nuance to be instantly curated.
Visit https://FullFact.org to learn more, and use it whenever a story seems triggering or conspicuous.
An inconvenient truth
No generative AI application or model can be risk-free. How much are responsible teams reducing the likely harms? Self-governance varies widely. Whether a deployment is open-ended or “narrowly scoped”, all systems can cause many different harms that are often significant. By the way, limiting an LLM’s scope is hard, and even deliberate efforts to detect an input that is “out-of-distribution” face real challenges
Misuse, errors, and biases are just the tip of the iceberg: there have been well-documented AI risk taxonomies (e.g. DeepMind 2021, CSET 2023, Google 2023, NIST 2024). Next, there have been feedback loops for evaluation (Google DeepMind 2023) as well as adaptation to these risks in government-driven regulations (AIR 2024 shows a comparison).
The reality is that each of the services I presented above faces challenges, such not just in inaccuracies but severe failure modes and abuse:
All of them are exposed to content manipulation, leading to incorrect or forced outputs, i.e. prompt injection, as well as other OWASP LLM Top 10 risks. What that means is that inserting blatant or discreet oddities and instructions in the text seen by these applications can cause the opposite effect (e.g. to steer the system to untrue or unsanctioned outcomes)
ClimateEngine/Robeco potentially faces risks of unfair financial determinations from selection biases, omissions, and adversarial content in the underlying data (Automated Decision-Making, from AIR 2024)
Legal Service Organizations’ virtual assistants could potentially provide incorrect guidance on legal rights with insufficient steering or oversight (also in AIR 2024), for example, failing to manage “out-of-distribution circumstances” with respect to immigration law.
Career Services could expose job-seekers to malicious content (exploitative opportunities, theft of personally identifiable information) without proper protection against spam and social engineering. AIR 2024 categorizes these as fraudulent schemes.
By increasing reliance on these tools, users increase the habit of sharing information in lowerr trust situations (e.g. disclosing payment information or legally sensitive information to a Chatbot on a page purporting to represent a certain organization, which could actually be operated by a bad actor).
In cybersecurity, backdoors and missed security requirements show that there is more to security vulnerabilities than just “a software bug”. When perverse incentives and threat modeling are overlooked or expedited, consequences can be severe. How does this transfer to AI use cases and interactions? AI responses are typically obtained because users want to orient their understanding of the world, and to orient beliefs and actions (theirs or others’). Large language models are trained on untrusted data (curation practices vary widely). With a given input, stimulating biases (predicting the next word that the training data would have been most likely to show) is “a feature, not a bug”. Dr. Iga Kozlowska (Microsoft Ethics and Society/Meta Responsible AI) discusses the impact of bias-inducing technology, drawing from the seminal book “How Artifacts Afford” (Jenny L Davis). For these reasons, each application needs a defensive design and robust iterative development with risk analyses and adversarial testing. OWASP provides an excellent guide on that.
AI for Good, Human Centric AI, and Responsible AI Policies
The UN found a sustainable development goals (SDGs) partner with the International Telecommunications Union (ITU), establishing AI for Good in 2017. As a telecom & network engineer by training, I was thrilled to see this initiative, although its advocacy for pro-social and risk-aware decision making can run into challenging economic and political realities. This year’s AI for Good programme for July 8-11 will feature a call to action by Amazon’s CTO to solve urgent societal problems, a debate between Yann LeCun and Geoff Hinton, and discussions of what it means to benefit humanity, but amid dozens of sessions relating to computer vision, and robotics, only 2-3 will ultimately focus on LLMs (one on Truth/Power, and two relating to Agentic systems). This raises a real question about text applications' risk/benefit trade-offs.
There’s still so much work left to improve the behavior of foundation models (HAI responsible AI index 2025), and the call to action is clear for all AI system developers: Consider how proposals might pose negative ethical and societal risks, and come up with methods to lessen those risks. Stanford Human-Centered AI requires this Ethics and Society Review for all grant submissions, and its structure and strategies are enabling significant improvements to thwart negative impacts from tarnishing even the most positive projects’ societal benefits.
Conclusion
I was heartened to eventually reach my goal of finding 10 services that help humanity beyond tedium/efficiency challenges. The core value of most services out there is still productivity (faster googling or extracting a specific facet of a document for speed). There are more risks to mitigate than resources might allow. But there are also frameworks to lean on, and my perspective on AI regained optimism after discovering great Generative Text AI applications by 10 organizations in the course of only 10 (already busy) days.
We will certainly see more innovative pro-social solutions and insights in 2025. However, the risks are already high and continue to grow, so I'm excited to be a part of the solution and to work with fantastic, inclusive folks in OWASP’s AI security initiatives.