
Project Overview
Working as an undergraduate research assistant in Psyche He’s UI design team in Cornell’s Entrepreneurship Lab Class of 2025 cohort with responsibilities in interface design, study design, user-testing, data analysis, and data visualization.
Roles
UI Designer, developer
Collaborators
psyche he, ra(s)
Timeline
1 year | ongoing (2025)
Tools
figma, atlas.ai, r, qualtrics, excel
“how might we design AI language assistants that proactively learn from real-time conversational signals to better mediate human–human speech interactions and reduce the gap between human and model speech prediction?”
THE PRODUCT
Helper AI
Recognizing the limitations of current technologies, this project introduces XPLAIN, a proactive AI model designed to scaffold real-time conversation by anticipating, preventing, and addressing communication gaps, particularly in facilitating rapid turn-taking in high-stakes online meetings. It focuses on three critical stages where processing errors may occur, and offloads the predictive pressure from users. Three contextual features—lexical clarifications, idea and content suggestions, and topic summaries— are each directly targeted at reaching goals in:
Smoother Conversation
Using the theory of broad-scaled content prediction in speech processing from face-to-face communication to evaluate how it may mitigate disfluency patterns in AI-MC.
Better Predictions
Using functional proactivity to assist in more effective predictions to provide higher-level engagement of AI assistance in online meetings.
After reviewing a consent form, the experimenter, acting as the confederate, walks participants through a short demo of Helper AI. They then jump into the full flow of a conversation on creating a Japanese–American fusion restaurant menu. Using TeamViewer to gain control of the participant’s laptop and having Pip overlay the Zoom multi-speaker view over the WoZ flow, Helper AI is simulated. Participants interact with the Figma flow freely using their mouse while the conversation proceeds with the confederate’s control of the “→” keyboard button. Screen and audio recordings are collected from both laptops throughout the session. After the conversation, participants completes a Qualtrics questionnaire and an optional post-study interview.
experiment
Procedure
35 NNS of English or Japanese (M=9, F=25, NA=1; mean age=20.79, SD=2.17; mean years in U.S.=7, SD=3.2) completed the baseline-study
*Recruited via the university system for 1 extra course credit per 30 minutes of participation
29 participants (M=7, F=19, NA=1; mean age=21.23, SD=2.25; mean years in U.S.=6.32, SD=2.95) completed post-study interview
Moderator
Participant
“Let’s create a menu for a Japanese-American fusion restaurant!”
5 slang terms
5 acronyms
6 idioms
6 complex words
6 cultural food items with images
Defining terms that have been pre-identified as a gap in understanding.
Clarifications (28)
*with bidirectional options to avoid bias
6 idea suggestions
3 American
3 Japanese items with images
4 sentence suggestions
Response prompting based on pre-activated conversation context identification.
Suggestions (10)
Real-time conversation alignment promoted to reduce information overload under time pressure.
Summaries (4)
The 3 Helper AI features was simulated with a Wizard of Oz figma flow side bar overlayed on top of the Zoom using Team Viewer.
experiment
Materials
1. Pre-Study Survey

Qualtrics

2. Study Task

Figma

Zoom

TeamViewer

Pip

Quicktime
3. Post-Study Survey

Qualtrics




Otter.ai
4. Post-Study Interview
Most data analysis was broken down into 2 parts: speech coding during task completion, and pre and post-task interview coding.
data analysis
Speech Pattern Codes
Speech while performing the task is analyzed based on disfluencies, which were recorded on its length and type.
Disfluency Length Reflects Different Cognitive States in Online Conversations
Longer disfluencies signal effortful speech production, while shorter disfluencies often indicate uncertainty during offloaded thinking.
1
Disfluency Duration Patterns
Repairs, pauses, and reformulations significantly increase overall disfluency duration
Longer Disfluencies
Repairs
Reformulations
Pauses
→ Associated with higher cognitive effort
Shorter Disfluencies
Non–feature-related turns
Suggestion-Idea turns
Suggestion-Sentence turns
→ Associated with uncertainty or partial offloading
2
Turn-Type Comparison
Clarification turns require sustained cognitive processing, resulting in longer disfluencies.
Clarification
2.82s
2.35s
suggestion
1.87s
non-feature
avg. disfluency duration
3
Individual Differences
Experience with AI-supported communication correlates with more efficient speech production.
Increases Disfluency Duration
Language & background
Decreases Disfluency Duration
Online meeting experience
AI usage & trust
Communication strategies
data analysis
Interview Codes
Interviews were broken down into thousands of codes then organized into a hierarchy of code groups and themes to grasp overarching trends between Helper AI usage and attitudes towards online meetings, artificial intelligence, and the three features.
Top Codes from Each Class Per Group








Qualitative Results
1
Perceived Change in Speech Flow
Participants expressed that smooth conversation flows is most desirable and the embedded feature pop-ups created trade-offs.
2
Perceived Effectiveness of Individual Features
Clarification and suggestions received higher positive code counts but also greater concern in over-reliance.
3
Individual Differences
Experiences varied depending on level of anxiety in speaking English, communication strategies and personalities, and attitudes towards AI.
4
UI Design
Feedback revolved around increasing discreteness and decreasing distractibility.
5
Overall Enhanced Experience
Participants described XPLAIN as a “safety net” that boosted confidence and enabled increased participation.
data analysis
Takeaways
Clarifications (28)
as Grounding Support: significant enhancement for real-time conversation despite individual variability with minimal disfluency costs
as Comprehension At Word Onset
Suggestions (10)
as Production Aid: valuable for idea generation and/or retrieval, but at the cost of hesitation and silent pauses
participants reported smoother perceived flow when they experience more disfluencies (a paradox of production)
as Production After Question Onsets
Summaries (4)
as Alignment Aid: driven by perceived utility instead of immediate fluency costs
as Alignment After Topic Transition
1
Disfluency as a processing signal
Repairs and Pauses
indicates elevated processing demands during conceptualization and formulation under time pressure
Uncertainties and Delays
suggests quick stalling or micro-repair when integrating new information
Silent Pauses and Hesitations
signals fail points in real-time prediction and formulation under time pressure as primary rating detractors
2
Participant background effects
*similar to individual differences (see qualitative takeaways)
3
Proactivity’s relation to timing
Proactivity Most Valued
when outputs matched the time-sensitive gaps in the conversation without missing the critical window
Hesitation and Repairs
Perceived Conversational Flow vs. Incremental Timing Strain at the response formulation stage
Participation and Confidence
well-timed prompts
Silent Pauses and Repairs
suggestions that arrive before idea formulation
Most data analysis was broken down into 2 parts: speech coding during task completion, and pre and post-task interview coding.
design implications
Model Flexibility
XPLAIN should have the flexibility to adjust its output based on the specific users situation. The results of these outputs depends on the user’s English proficiency, level of anxiety, and level of engagement.
Proficiency
Anxiety
Less Intrusive - private clarifications and lightweight suggestion to alleviate social anxiety and stigma against seeking help
Confidence
Participation
Inclusivity
design implications
Model Adaptability
XPLAIN should react differently based on individual behavior patterns.
Training model through prior conversations
Gathering language, cultural, and professional experience
Constructing personal knowledge base for specific terms
design implications
Feature-Specific
XPLAIN’s three features should also adjust its UI and content to best supports its users.
Clarification
Idea Suggestion
Sentence Suggestion
Summary
Explanatory Materials - delivered in native language (L1) to reduce processing time and accelerate comprehension time
Context Sensitive - low vs. high stakes, passive vs. active conversation
Sentence Templates (low proficiency) - delayed output and reduced density to encourage idea formulation
Concise Content - minimize visual clutter
Lexical Alignment
Key Components - topic transitions, alignment needs
Producible Content - delivered in English (L2) to support immediate usage and fluent speech
Reduce Over-Reliance - cues that suggestions are optional
PROTOtyping
Clarifications

XPLAIN | “Call the shots” Clarification

XPLAIN | “USDA-graded” Idea Suggestion
PROTOtyping
Idea Suggestions
PROTOtyping
Sentence Suggestions

XPLAIN | “USDA-graded” Sentence Suggestion

XPLAIN | “Main Dishes” Summary
PROTOtyping
Summaries
Takeaways
This project strengthened my understanding of how AI systems can leverage micro-level behavioral signals to inform proactive support in real-time communication. I learned how to design and execute mixed-methods research, operationalizing conversational disfluencies, turn types, and individual differences into measurable variables that translate into system-level insights. This project reinforced my practice of grounding AI interaction design in rigorous empirical research to enable anticipatory, human-centered assistance.