Learning Bayesian user model from 8 M Census records to design an efficient preference elicitation agent.
Understanding privacy preferences of users with a personalization AI agent
đ This post is an accessible read of the publication:
Asthana, Sumit, et al. âI know even if you donât tell meâ: Understanding Usersâ Privacy Preferences Regarding AI-based Inferences of Sensitive Information for Personalization. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 2024. https://doi.org/10.1145/3613904.3642153
This post distills the motivation, methods, and key insights from our research on privacy and AI-based personalization. Itâs written for a broader audience â so whether youâre a designer, privacy researcher, or just AI-curious, this is for you!
Project Overview
Personalization offers undeniable benefits to user experienceârecommendations and services that feel tailor-made based on individual preferences. But what happens when those personalizations come from AI-inferred data users never knowingly provided?
In this project, I explored how users react when they become aware of AI-generated inferences based on the data they willingly provide. The work led to a paper that combines large-scale experimentation with sophisticated modeling approaches to investigate privacy and consent in the age of personalization.

Skills Demonstrated
- Probabilistic modeling (Bayesian Networks via
pgmpy
, structure learning viaARACNE
) - Scalable data processing and augmentation using US Census datasets
- Designing and executing large-scale online experiments (n > 800)
- Human-centered design for privacy and AI consent
- Empirical evaluation of personalization and inference transparency
Core Motivation
Today, personalization engines routinely make inferences about usersâ interests and attributesâfrom their online behavior, purchase history, and more. While helpful, these inferences are often invisible and made without explicit consent. This can lead to discomfort, mistrust, and, in some cases, real harmâespecially when the inferred data is sensitive or incorrect.
Our goal was to understand how awareness of AI inferences changes user consent behaviors and to build computational systems that support transparency and agency.
Key Contributions
- Quantified Privacy Behavior: We conducted two large-scale experiments (N=877 total) to measure how peopleâs consent decisions change when shown AI inferences of varying sensitivity (e.g., public interest, income level, or protected attributes like race).
- Measured Impact of Incorrect Inferences: Participants were less likely to consent when they saw incorrect inferences, even if the original data was provided willingly.
- Highlighted Need for Granular Consent Interfaces: Our work provides empirical support for designing new kinds of AI interfaces that foreground inferences and ask for enthusiastic, informed consent.
Technical Approach
Learning a Bayesian User Model from Census Data
To simulate realistic AI inferences, I trained a Bayesian network on public data from the Supplement of Public Arts Participation (SPPA) dataset, drawn from the US Census Bureauâs Current Population Survey (CPS). The data included ~125,000 respondents and covered:
- Demographics (e.g., age, gender)
- Household information (e.g., marital status, children)
- Engagement in public arts (e.g., attending museums, concerts)
After pre-processing and filtering the data (removing minors, incomplete interviews, and variables irrelevant for modeling), I used Census person weights to upsample records and produce a representative dataset of ~8.2 million US adults.
Each CPS question was treated as a variable in the user model. To learn the structure of the Bayesian network, I applied the ARACNE algorithm via the bnlearn
package. I then estimated conditional probability tables using the pgmpy
library.
This allowed us to simulate realistic, data-driven AI inferencesâlike predicting a personâs income bracket or cultural preferencesâbased on the same data they explicitly provided.
Code snippet: Parameter learning
from pgmpy.estimators import StructureEstimator, BayesianEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
# First load the graph structure and then learn the parameters
dag = None
with open('../datasets/{}_network_aug8.pk'.format(dataset), 'rb') as fin:
dag = pickle.load(fin)
model = BayesianNetwork(ebunch=dag.edges())
mle = MaximumLikelihoodEstimator(model=model, data=data)
cpds = mle.get_parameters(weighted=True)
model.add_cpds(*cpds)
with open("../datasets/cps_model_aug8.pk", "wb") as fout:
pickle.dump(model, fout)
Insights for Industry
This project demonstrates how to:
- Use RNN-style personalization scenarios without sacrificing user agency.
- Train Bayesian user models at scale using public data while maintaining interpretability.
- Build AI systems that prioritize transparency and give users the opportunity to meaningfully consentâor opt outâof AI-based inferences.
As AI-based personalization becomes ubiquitous, this work emphasizes a vital point: the line between data explicitly shared and data inferred is not only blurryâitâs consequential.
Future Directions
Iâm interested in expanding this work into:
- Real-time interfaces where users can audit or revoke AI inferences
- Generalizable Bayesian/RNN hybrid models for interactive systems
- Deployable consent frameworks that adapt as AI learns from more user data
This project lays the groundwork for AI systems that are not just powerfulâbut aligned with user values.