查看更多>>摘要:Through extensive experience developing and explaining machine learning (ML) applications for real-world domains, we have learned that ML models are only as interpretable as their features. Even simple, highly interpretable model types such as regression models can be difficult or impossible to understand if they use uninterpretable features. Different users, especially those using ML models for decision-making in their domains, may require different levels and types of feature interpretability. Furthermore, based on our experiences, we claim that the term "interpretable feature" is not specific nor detailed enough to capture the full extent to which features impact the usefulness of ML explanations. In this paper, we motivate and discuss three key lessons: 1) more attention should be given to what we refer to as the interpretable feature space, or the state of features that are useful to domain experts taking real-world actions, 2) a for-mal taxonomy is needed of the feature properties that may be required by these domain experts (we propose a partial taxonomy in this paper), and 3) transforms that take data from the model-ready state to an interpretable form are just as essential as traditional ML transforms that prepare features for the model.
Zhiqiang HuRoy Ka-Wei LeeCharu C. AggarwalAston Zhang...
32页
查看更多>>摘要:The stylistic properties of text have intrigued computational linguistics researchers in recent years. Specifically, researchers have investigated the text style transfer task (TST), which aims to change the stylistic properties of the text while retaining its independent content of style. Over the last few years, many novel TST algorithms have been developed, while the industry has leveraged these algorithms to enable exciting TST applications. The field of TST research has developed because of this symbiosis. This article aims to provide a comprehensive review of recent research efforts on text style transfer. More concretely, we create a taxonomy to organize the TST models, and provide a comprehensive summary of the state of the art. We review existing evaluation methodologies for TST tasks and conduct a large-scale reproducibility study in which we experimentally benchmark 19 state-of-the-art TST algorithms on two publicly available datasets. Finally, we expand on current trends and provide new perspectives on the new and exciting developments in the TST field.
查看更多>>摘要:This report presents and briefly discusses the first Data Science Summer school in Malawi, Africa, which was named Malawi Data Science Bootcamp 2021 (MWData 2021). This event took place at Mzuzu University, Lilongwe ODeL Center in Lilongwe, Malawi, on October 25 - 29, 2021.
Juan Jose del CozPablo GonzalezAlejandro MoreoFabrizio Sebastiani...
3页
查看更多>>摘要:The 1st International Workshop on Learning to Quantify (LQ 2021 - https://cikmlq2021.github.io/), organized as a satellite event of the 30th ACM International Conference on Knowledge Management (CIKM 2021), took place on two separate days, November 1 and 5, 2021. As the main CIKM 2021 conference, the workshop was held entirely online, due to the COVID-19 pandemic. This report presents a summary of each keynote speech and contributed paper presented in this event, and discusses the issues that were raised during the workshop.
查看更多>>摘要:EGC ("Extraction et Gestion des Connaissances" in French) started in 2001 and is the reference conference for the french community in Knowledge Extraction and Management (equivalent to the French KDD). The topics of EGC include Machine Learning, Knowledge Engineering and Representation, Data and Knowledge Reasoning, Data Mining and Analysis, Information Systems, Databases, Semantic Web and Open Data. The 2022 edition of the EGC conference brought together 219 attendees, 136 of which attended in person in Blois (France), and 83 attended remotely. Among them, there were 64 women (29.2%) and 93 students (42.5%).