查看更多>>摘要:This article introduces the design and prototype of Ajna, a wearable shared perception system for supportingextreme sensemaking in emergency scenarios. Ajna addresses technical challenges in Augmented Reality (AR)devices, specifically the limitations of depth sensors and cameras. These limitations confine object detectionto close proximity and hinder perception beyond immediate surroundings, through obstructions, or acrossdifferent structural levels, impacting collaborative use. It harnesses the Inertial Measurement Unit (IMU) in ARdevices to measure users’ relative distances from a set physical point, enabling object detection sharing amongmultiple users across obstacles like walls and over distances. We tested Ajna’s effectiveness in a controlledstudy with 15 participants simulating emergency situations in a multi-story building. We found that Ajnaimproved object detection, location awareness, and situational awareness and reduced search times by 15%.Ajna’s performance in simulated environments highlights the potential of artificial intelligence (AI) to enhancesensemaking in critical situations, offering insights for law enforcement, search and rescue, and infrastructuremanagement.
查看更多>>摘要:Autonomous negotiating agents, which can interact with other agents, aim to solve decision-making problemsinvolving participants with conflicting interests. Designing agents capable of negotiating with human partnersrequires considering some factors, such as emotional states and arguments. For this purpose, we introducean extended taxonomy of argument types capturing human speech acts during the negotiation. We proposean argument-based automated negotiating agent that can extract human arguments from a chat-basedenvironment using a hierarchical classifier. Consequently, the proposed agent can understand the receivedarguments and adapt its strategy accordingly while negotiating with its human counterparts. We initiallyconducted human-agent negotiation experiments to construct a negotiation corpus to train our classifier.According to the experimental results, it is seen that the proposed hierarchical classifier successfully extractedthe arguments from the given text. Moreover, we conducted a second experiment where we tested theperformance of the designed negotiation strategy considering the human opponent’s arguments and emotions.Our results showed that the proposed agent beats the human negotiator and gains higher utility than thebaseline agent.
查看更多>>摘要:Building higher-quality image classification models requires better performance analysis (PA) to help understandtheir behaviors. We propose ConfusionLens, a dynamic and interactive visualization interface thataugments a conventional confusion matrix with focus+context visualization. This interface allows users toseamlessly switch table layouts among three views (overall view, class-level view, and between-class view)while observing all of the instance images in a single screen. We designed and implemented a ConfusionLensprototype that supports hundreds of instances, and then conducted a user study (N = 14) to evaluate itcompared to the conventional confusion matrix with a split view of instances. Results show that Confusion-Lens achieved faster task-completion time in observing instance-level performance and higher accuracy inobserving between-class confusion. Moreover, we conducted an expert interview (N = 6) to investigate theapplicability of our interface to practical PA tasks, and then implemented several extensions of ConfusionLensbased on the feedback. Feedback on these extensions from users experienced in image classification (N = 5)demonstrated their general usefulness and highlighted their beneficial use in PA tasks.
PHILIPP SCHOENEGGERPETER S. PARKEZRA KARGERSEAN TROTT...
4.1-4.25页
查看更多>>摘要:Large language models (LLMs) match and sometimes exceed human performance in many domains. This studyexplores the potential of LLMs to augment human judgment in a forecasting task. We evaluate the effect onhuman forecasters of two LLM assistants: one designed to provide high-quality (“superforecasting”) advice, andthe other designed to be overconfident and base-rate neglecting, thus providing noisy forecasting advice. Wecompare participants using these assistants to a control group that received a less advanced model that did notprovide numerical predictions or engage in explicit discussion of predictions. Participants (N = 991) answereda set of six forecasting questions and had the option to consult their assigned LLM assistant throughout. Ourpreregistered analyses show that interacting with each of our frontier LLM assistants significantly enhancesprediction accuracy by between 24% and 28% compared to the control group. Exploratory analyses showed apronounced outlier effect in one forecasting item, without which we find that the superforecasting assistantincreased accuracy by 41%, compared with 29% for the noisy assistant. We further examine whether LLMforecasting augmentation disproportionately benefits less skilled forecasters, degrades the wisdom-of-thecrowdby reducing prediction diversity, or varies in effectiveness with question difficulty. Our data do notconsistently support these hypotheses. Our results suggest that access to a frontier LLM assistant, even anoisy one, can be a helpful decision aid in cognitively demanding tasks compared to a less powerful modelthat does not provide specific forecasting advice. However, the effects of outliers suggest that further researchinto the robustness of this pattern is needed.
查看更多>>摘要:Recent work has proposed AI models that can learn to decide whether to make a prediction for a taskinstance or to delegate it to a human by considering both parties’ capabilities. In simulations with syntheticallygenerated or context-independent human predictions, delegation can help improve the performance of human-AI teams—compared to humans or the AI model completing the task alone. However, so far, it remains unclearhow humans perform and how they perceive the task when individual instances of a task are delegated tothem by an AI model. In an experimental study with 196 participants, we show that task performance andtask satisfaction improve for the instances delegated by the AI model, regardless of whether humans areaware of the delegation. Additionally, we identify humans’ increased levels of self-efficacy as the underlyingmechanism for these improvements in performance and satisfaction, and one dimension of cognitive ability asa moderator to this effect. In particular, AI delegation can buffer potential negative effects on task performanceand task satisfaction for humans with low visual processing ability. Our findings provide initial evidence thatallowing AI models to take over more management responsibilities can be an effective form of human-AIcollaboration in workplaces.
查看更多>>摘要:Multimodal scene search of conversations is essential for unlocking valuable insights into social dynamicsand enhancing our communication. While experts in conversational analysis have their own knowledge andskills to find key scenes, a lack of comprehensive, user-friendly tools that streamline the processing of diversemultimodal queries impedes efficiency and objectivity. To address this gap, we developed ConverSearch, avisual-programming-based tool based on insights for effective interface and implementation design derivedfrom a formative study with experts. The tool allows experts to integrate various machine learning algorithmsto capture human behavioral cues without the need for coding. Our user study, employing the System UsabilityScale (SUS) and satisfaction metrics, demonstrated high user preference, reflecting the tool’s ease of use andeffectiveness in supporting scene search tasks. Additionally, through a deployment trial within industrialorganizations, we confirmed the tool’s objectivity, reusability, and potential to enhance expert workflows. Thissuggests the advantages of expert-AI collaboration in domains requiring human contextual understandingand demonstrates how customizable, transparent tools yielding reusable artifacts can support expert-driventasks in complex, multimodal environments.