Testing a natural language interface: A tale of quick wins and valuable insights
At Dattos, we’re on a mission to help accounting and finance professionals clean, prepare, and analyze data with ease. As we’ve continued to grow, we’ve faced a major challenge: how can we empower users to handle complex data tasks without needing to know code or advanced Excel formulas?
Simplifying data preparation for everyone
Our users have consistently faced three major obstacles:
- Complexity: Building processes in our ETL (Extract, Transform, Load) tool sometimes is a daunting task, and only a handful of users can do it without help.
- Time-Consuming: Creating complex workflows using an ETL software takes too long.
- Knowledge Gap: Many users struggle with data preparation because they lack the expertise in Excel or coding, which is why they came to us in the first place.
How could we enable users to clean and prepare their data without needing to learn code or formulas?
We hypothesized that if we could offer an AI-powered data engineering assistant, users would be able to clean and prepare their data simply by describing the analysis they need.
Validating our hypothesis with user testing
After initial research, plenty of brainstorming, and prototyping, we developed a simple three-step workflow for our users:
- Raw data and analysis objective: Users would upload their raw data and, using natural language, explain the type of analysis they wanted.
- Data cleaning: The AI assistant would ask a few questions to help normalize, prepare, and clean the raw data.
- Refining: The assistant would guide the user in refining and adjusting the final result.
But, before moving forward, we had some critical questions to answer:
- Would our users trust the output of an AI-driven workflow?
- Could users write the correct prompts to achieve the analysis they needed?
- Would users with low technical knowledge be able to write proper prompts?
- Would the assistant be capable of handling real-life use cases?
With limited time, effort, and budget, we needed to test these aspects without building a full product.
Prototyping
At this stage, we were experimenting with the assistant on OpenAI’s playground. However when we decided to validate the hypotheses with actual users, it became clear that testing with Figma's prototypes would not be enough, we needed users to interact directly with the assistant. However, developing a full interface and backend would be expensive and time-consuming.
Functional prototype
We felt somewhat stuck until our data scientist had a brilliant idea: using Streamlit, an open-source app framework, to quickly build a fully functional prototype.
Within days, we had a functioning prototype based on our wireframes, and connected to OpenAI’s API, ready for user testing!
However, we knew that this rapid development would have a few limitations:
- To simplify the complexity, the prototype was designed to only accept CSV file uploads.
- Processing prompts and building analyses were time-consuming, taking about 1–2 minutes for complex interactions.
- While we aimed to follow initial wireframes, there wasn’t enough time to refine the interface and address all usability issues thoroughly.
User testing strategy
Acknowledging these constraints, I crafted a two-part research plan.
1- Functional testing: We focused on the Streamlit prototype to validate whether users across different personas could effectively interact with the assistant, manage the design prompts, and achieve the desired analytical outcomes.
Usability focus: For a deeper dive into usability, we used the Figma prototype, verbally filling in interaction gaps during the user testing.
Our team would also evaluating the AI assistant’s capability to interpret natural language inputs, convert them into executable code, and perform data cleaning tasks.
Learning from users
Over the next month, we ran 20 remote user testing sessions. We pushed the assistant to its limits, assessing its ability to handle prompts of varying quality and output different types of analysis. The insights we gained from watching users interact with the AI assistant were invaluable.
We learned:
- When to use natural language: Understanding when it’s best to use natural language input versus more traditional interface elements was key in refining the workflow.
- Prompt flexibility: We discovered the importance of guiding users in crafting their prompts, especially for those less tech-savvy.
- Percentage of success: By testing real-life scenarios, we were able to improve the assistant and increase the analysis success rate.
This process not only validated our initial hypothesis but also provided us with the feedback needed to iterate and improve our design.
A step closer to simplified data processing
This rapid prototyping and testing approach not only saved us time and money but also accelerated our product development. By using tools like Streamlit and gathering early feedback, we were able to quickly validate our ideas, refine our AI assistant, make necessary adjustments, and move forward with confidence.
With a lean and rapid prototyping approach, we’ve uncovered invaluable insights that are already shaping our product roadmap. We’re excited to leverage these learnings to further refine our solution and create an even more powerful and intuitive experience for our users.