Project Overview

Working as an undergraduate research assistant in Psyche He’s UI design team in Cornell’s Entrepreneurship Lab Class of 2025 cohort with responsibilities in interface design, study design, user-testing, data analysis, and data visualization.

Roles

UI Designer, developer

Collaborators

psyche he, ra(s)

Timeline

1 year | ongoing (2025)

Tools

figma, atlas.ai, r, qualtrics, excel

“how might we design AI language assistants that proactively learn from real-time conversational signals to better mediate human–human speech interactions and reduce the gap between human and model speech prediction?”

THE PRODUCT

Helper AI

Recognizing the limitations of current technologies, this project introduces XPLAIN, a proactive AI model designed to scaffold real-time conversation by anticipating, preventing, and addressing communication gaps, particularly in facilitating rapid turn-taking in high-stakes online meetings. It focuses on three critical stages where processing errors may occur, and offloads the predictive pressure from users. Three contextual features—lexical clarifications, idea and content suggestions, and topic summaries— are each directly targeted at reaching goals in:

Smoother Conversation

Using the theory of broad-scaled content prediction in speech processing from face-to-face communication to evaluate how it may mitigate disfluency patterns in AI-MC.

Better Predictions

Using functional proactivity to assist in more effective predictions to provide higher-level engagement of AI assistance in online meetings.

After reviewing a consent form, the experimenter, acting as the confederate, walks participants through a short demo of Helper AI. They then jump into the full flow of a conversation on creating a Japanese–American fusion restaurant menu. Using TeamViewer to gain control of the participant’s laptop and having Pip overlay the Zoom multi-speaker view over the WoZ flow, Helper AI is simulated. Participants interact with the Figma flow freely using their mouse while the conversation proceeds with the confederate’s control of the “→” keyboard button. Screen and audio recordings are collected from both laptops throughout the session. After the conversation, participants completes a Qualtrics questionnaire and an optional post-study interview.

experiment

Procedure

35 NNS of English or Japanese (M=9, F=25, NA=1; mean age=20.79, SD=2.17; mean years in U.S.=7, SD=3.2) completed the baseline-study

*Recruited via the university system for 1 extra course credit per 30 minutes of participation

29 participants (M=7, F=19, NA=1; mean age=21.23, SD=2.25; mean years in U.S.=6.32, SD=2.95) completed post-study interview

Moderator

Participant

“Let’s create a menu for a Japanese-American fusion restaurant!”

5 slang terms

5 acronyms

6 idioms

6 complex words

6 cultural food items with images

Defining terms that have been pre-identified as a gap in understanding.

Clarifications (28)

*with bidirectional options to avoid bias

6 idea suggestions

3 American

3 Japanese items with images

4 sentence suggestions

Response prompting based on pre-activated conversation context identification.

Suggestions (10)

Real-time conversation alignment promoted to reduce information overload under time pressure.

Summaries (4)

The 3 Helper AI features was simulated with a Wizard of Oz figma flow side bar overlayed on top of the Zoom using Team Viewer.

experiment

Materials

1. Pre-Study Survey

Qualtrics

2. Study Task

Figma

Zoom

TeamViewer

Pip

Quicktime

3. Post-Study Survey

Qualtrics

Otter.ai

4. Post-Study Interview

Most data analysis was broken down into 2 parts: speech coding during task completion, and pre and post-task interview coding.

data analysis

Speech Pattern Codes

Speech while performing the task is analyzed based on disfluencies, which were recorded on its length and type.

Disfluency Length Reflects Different Cognitive States in Online Conversations

Longer disfluencies signal effortful speech production, while shorter disfluencies often indicate uncertainty during offloaded thinking.

1

Disfluency Duration Patterns

Repairs, pauses, and reformulations significantly increase overall disfluency duration

Longer Disfluencies

Repairs

Reformulations

Pauses

→ Associated with higher cognitive effort

Shorter Disfluencies

Non–feature-related turns

Suggestion-Idea turns

Suggestion-Sentence turns

→ Associated with uncertainty or partial offloading

2

Turn-Type Comparison

Clarification turns require sustained cognitive processing, resulting in longer disfluencies.

Clarification

2.82s

2.35s

suggestion

1.87s

non-feature

avg. disfluency duration

3

Individual Differences

Experience with AI-supported communication correlates with more efficient speech production.

Increases Disfluency Duration

Language & background

Decreases Disfluency Duration

Online meeting experience

AI usage & trust

Communication strategies

data analysis

Interview Codes

Interviews were broken down into thousands of codes then organized into a hierarchy of code groups and themes to grasp overarching trends between Helper AI usage and attitudes towards online meetings, artificial intelligence, and the three features.

Top Codes from Each Class Per Group

Qualitative Results

1

Perceived Change in Speech Flow

Participants expressed that smooth conversation flows is most desirable and the embedded feature pop-ups created trade-offs.

2

Perceived Effectiveness of Individual Features

Clarification and suggestions received higher positive code counts but also greater concern in over-reliance.

3

Individual Differences

Experiences varied depending on level of anxiety in speaking English, communication strategies and personalities, and attitudes towards AI.

4

UI Design

Feedback revolved around increasing discreteness and decreasing distractibility.

5

Overall Enhanced Experience

Participants described XPLAIN as a “safety net” that boosted confidence and enabled increased participation.

data analysis

Takeaways

Clarifications (28)

as Grounding Support: significant enhancement for real-time conversation despite individual variability with minimal disfluency costs

as Comprehension At Word Onset

Suggestions (10)

as Production Aid: valuable for idea generation and/or retrieval, but at the cost of hesitation and silent pauses

participants reported smoother perceived flow when they experience more disfluencies (a paradox of production)

as Production After Question Onsets

Summaries (4)

as Alignment Aid: driven by perceived utility instead of immediate fluency costs

as Alignment After Topic Transition

1

Disfluency as a processing signal

Repairs and Pauses

indicates elevated processing demands during conceptualization and formulation under time pressure

Uncertainties and Delays

suggests quick stalling or micro-repair when integrating new information

Silent Pauses and Hesitations

signals fail points in real-time prediction and formulation under time pressure as primary rating detractors

2

Participant background effects

*similar to individual differences (see qualitative takeaways)

3

Proactivity’s relation to timing

Proactivity Most Valued

when outputs matched the time-sensitive gaps in the conversation without missing the critical window

Hesitation and Repairs

Perceived Conversational Flow vs. Incremental Timing Strain at the response formulation stage

Participation and Confidence

well-timed prompts

Silent Pauses and Repairs

suggestions that arrive before idea formulation

Most data analysis was broken down into 2 parts: speech coding during task completion, and pre and post-task interview coding.

design implications

Model Flexibility

XPLAIN should have the flexibility to adjust its output based on the specific users situation. The results of these outputs depends on the user’s English proficiency, level of anxiety, and level of engagement.

Proficiency

Anxiety

Less Intrusive - private clarifications and lightweight suggestion to alleviate social anxiety and stigma against seeking help

Confidence

Participation

Inclusivity

design implications

Model Adaptability

XPLAIN should react differently based on individual behavior patterns.

Training model through prior conversations

Gathering language, cultural, and professional experience

Constructing personal knowledge base for specific terms

design implications

Feature-Specific

XPLAIN’s three features should also adjust its UI and content to best supports its users.

Clarification

Idea Suggestion

Sentence Suggestion

Summary

Explanatory Materials - delivered in native language (L1) to reduce processing time and accelerate comprehension time

Context Sensitive - low vs. high stakes, passive vs. active conversation

Sentence Templates (low proficiency) - delayed output and reduced density to encourage idea formulation

Concise Content - minimize visual clutter

Lexical Alignment

Key Components - topic transitions, alignment needs

Producible Content - delivered in English (L2) to support immediate usage and fluent speech

Reduce Over-Reliance - cues that suggestions are optional

PROTOtyping

Clarifications

XPLAIN | “Call the shots” Clarification

XPLAIN | “USDA-graded” Idea Suggestion

PROTOtyping

Idea Suggestions

PROTOtyping

Sentence Suggestions

XPLAIN | “USDA-graded” Sentence Suggestion

XPLAIN | “Main Dishes” Summary

PROTOtyping

Summaries

Takeaways

This project strengthened my understanding of how AI systems can leverage micro-level behavioral signals to inform proactive support in real-time communication. I learned how to design and execute mixed-methods research, operationalizing conversational disfluencies, turn types, and individual differences into measurable variables that translate into system-level insights. This project reinforced my practice of grounding AI interaction design in rigorous empirical research to enable anticipatory, human-centered assistance.