For the second IRIF Distinguished Talk of 2024, we have the pleasure to receive Omer Reingold, professor of computer science at Stanford University and the director of the Simons Collaboration on the Theory of Algorithmic Fairness, who will give a talk on The multitude of group affiliations: Algorithmic Fairness, Loss Minimization and Outcome Indistinguishability on May 14th, 2024.

“To address this challenge, we employ the concept of multi-group fairness, which involves scrutinizing numerous subpopulations simultaneously to ensure accuracy across all groups. Traditionally, fairness considerations focused on only a handful of subpopulations based on factors like gender, race, or age. However, this approach has proven inadequate, particularly due to issues of intersectionality, where individuals may face unique challenges at the intersection of multiple identity categories.”

Risk predictors are widely utilized across various domains, such as insurance, advertising, and medicine, to gauge probabilities. For instance, they estimate the likelihood of insurance repayment, ad clicks, product purchases, or developing a medical condition like a heart condition. Essentially, these predictors, often embedded in decision-making algorithms, aim to assess probabilities to inform our choices. Even forecasting weather, like predicting tomorrow's rain probability, falls under the umbrella of risk prediction, emphasizing the pervasive nature of risk assessment in diverse contexts.

To train these predictors effectively, algorithms require vast amounts of data, including historical information submitted by individuals through applications or website interactions. This data enables the estimation of probabilities regarding various outcomes, such as loan repayment, ad clicks, or the likelihood of experiencing a heart attack within a specified timeframe.

The training process for predictors involves minimizing a loss function, which evaluates the consequences of different decisions. For instance, in the context of lending, the loss function considers the financial impact of granting or denying a loan. Similarly, in medical scenarios, decisions regarding diagnostic tests are guided by minimizing potential costs associated with false positives or false negatives, optimizing both gains and losses.

However, the practical implementation of machine learning algorithms involves working with established mathematical loss functions, rather than tailoring functions to specific tasks. Despite this abstraction, there's an expectation that minimizing these loss functions will yield effective predictors. For example, the L2 loss function quantifies errors in prediction probabilities, aiming to reduce discrepancies between predicted and actual outcomes, thereby enhancing predictive accuracy.

The challenge arises when training these predictors to minimize loss, as they typically optimize across the entire global population. However, this approach fails to account for the disparate impact on smaller subpopulations. Consider a scenario where a subgroup constitutes only 10% of the population, characterized by limited financial strength or capabilities. When training a risk predictor, such as estimating loan repayment probabilities, prioritizing overall accuracy may lead to focusing solely on the larger 90% segment, resulting in increased errors for the 10%.

While this method may yield mostly accurate results, it fosters unfairness akin to real-world instances of stereotyping. Similar to human decision-makers, algorithms may exhibit biases by favoring the majority group and disregarding the needs or characteristics of smaller, less represented populations. Consequently, this predisposition may lead algorithms to deny loans to financially affluent individuals within the minority group, a behavior akin to human biases driven by convenience or limited exposure.

Thus, the risk associated with algorithmic decision-making lies in the potential for discrimination or exclusion of marginalized groups due to algorithmic biases. Addressing this risk necessitates careful consideration of algorithmic training data and methodologies to ensure fairness and equity across diverse populations.

How will you try to not let that happen?

To address this challenge, we employ the concept of multi-group fairness, which involves scrutinizing numerous subpopulations simultaneously to ensure accuracy across all groups. Traditionally, fairness considerations focused on only a handful of subpopulations based on factors like gender, race, or age. However, this approach has proven inadequate, particularly due to issues of intersectionality, where individuals may face unique challenges at the intersection of multiple identity categories.

Multi-group fairness expands beyond traditional considerations by examining a broader range of subpopulations. This approach acknowledges that accuracy assessments must extend beyond individual categories to account for intersections and seemingly unrelated groups requiring protection. For instance, in loan assessments, it's insufficient to merely ensure accuracy for an entire racial group on average; accuracy must also extend to individuals within that group, including those with strong financial profiles.

By adopting multi-group fairness, we aim to combat stereotyping and ensure equitable outcomes across a potentially vast number of subgroups. Despite the complexity involved, this approach offers stronger assurances of fairness by simultaneously guaranteeing accuracy across all subpopulations, even as their number grows exponentially.

Since you need to gather a lot of data, is it a challenge to keep the privacy of people's data? Could zero-knowledge proof be used with your study?

Fairness and privacy share an intriguing relationship. In safeguarding privacy, we often aim to conceal information. While striving for fairness may entail delving deeper into certain data to ensure equitable outcomes, although we demonstrate that this doesn't significantly increase costs. Despite the complexity of addressing an exponential number of groups, the additional data and time required for training algorithms isn't substantially greater compared to when focusing on a few groups or overall fairness.

Indeed, there exists a compelling interplay between these concepts, highlighting the trade-offs inherent in ethical considerations within computational realms.

So, potentially, zero-knowledge proof could be used, the possibilities are vast. Cryptography offers robust methods that can secure virtually any computation when all data is available. Integrating cryptographic techniques with our work presents opportunities for enhanced efficiency, although it poses its challenges. Nevertheless, there's a wealth of research on privacy in learning, including approaches like differential privacy and cryptographic methods. By determining the desired learning outcomes, one can explore applying these privacy-preserving techniques effectively. Overall, the fusion of our work with cryptographic principles offers a promising avenue for secure and efficient computation.

Essentially, these terms originate from distinct domains, delineating whether an action or principle applies to individuals or groups as a whole. Similarly, fairness poses the question of whether to prioritize fairness across an entire group or towards individual members. In 2011, the computer science community introduced individual notions of fairness, which present powerful yet challenging implementation hurdles : see this paper : Dwork C., Hardt M., Pitassi T., Reingold O., Zemel R. (2011), Fairness Through Awareness, Arxiv.

The concept of multi-group fairness seeks to strike a balance between the simplicity of group fairness notions and the potency of individual fairness considerations. By navigating this middle ground, multi-group fairness offers a more nuanced approach than group fairness alone, while also being often more practical than individual fairness to implement. This balancing act is particularly relevant in fields such as medicine and insurance, where considerations of fairness intersect with the complexities of individual and group dynamics.

These notions are quite general and not specific to any particular domain. They have already been implemented in the medical field; for instance, they have been utilized in predicting COVID complications, which was an intriguing application. However, the applicability of this approach extends beyond medicine—it can be employed in various contexts. For example, it could be used in loan approvals, university admissions, or hiring processes—anywhere decisions about individuals are made. The aim is to ensure fairness and accuracy across numerous subpopulations.

In this case with students admission or jobs recruitment, what would fairness means pratically?

Consider the example of university admissions. You aim to predict the likelihood of a student's success in their chosen field of study. This entails forecasting when and with what grades students will graduate, allowing for the selection of the most promising candidates. While it may be easier to predict outcomes for certain populations with well-defined profiles, our goal is to ensure accurate predictions across a wide range of subpopulations. This includes different racial, ethnic, and gender groups, as well as their intersections.

For instance, in the United States, students may opt for advanced placement (AP) courses to explore their academic interests. However, our prediction models must be accurate not only for students who have taken these courses but also for those who haven't. We strive for accuracy across various criteria, such as the educational institutions attended or the type of schools students have graduated from. This encompasses different high school settings and diverse student profiles.

Ultimately, our objective is to achieve refined accuracy, ensuring that no students with significant potential are overlooked simply because they belong to less represented or perceived weaker groups. This requires our models to accurately identify promising individuals across all demographic and educational backgrounds.

Isn’t there a danger with fairness of refusing somebody who could have become great even thought there is no sign that shows it?

In general, this presents an issue, albeit not specifically tied to fairness in learning applications. The crux lies in determining how to utilize the information effectively. One approach is to refrain from using it solely for admission decisions, but rather to allocate resources strategically and devise interventions to support students. By identifying areas of concern, we can pinpoint where additional assistance may be necessary. For instance, within university settings, consideration must be given to the types of interventions required to aid student success post-admission.

However, it's crucial to recognize that there are instances where human oversight is essential, as algorithms may not match the accuracy of human judgment in certain domains. Sometimes, numerous decisions need to be made, necessitating the initial filtering of students to manage the workload. Consequently, there should be a review process for these tools, and reliance on algorithmic solutions should be moderated. It's imperative to determine the level of trust we can place in these tools.

One crucial aspect we've touched upon is risk prediction, but equally significant is the realm of interventions. This leads us to the topic of treatment effects. It's not just about predicting the likelihood of someone developing a heart condition in the next decade, but also understanding the probability that administering a particular medication could prevent such an outcome. This concept applies universally across various domains. The field is extensive and relatively new, presenting numerous challenges across different fronts. One prominent advancement in technology today is the emergence of large language models like ChatGPT. Exploring what fairness entails for ChatGPT involves predictions—simply put, predicting the next logical word based on a few preceding words. However, fairness encompasses intriguing aspects across different domains. Additionally, this field intersects with various other disciplines, including cryptography and economics. Considerations include how to incentivize entities to act fairly, highlighting the interconnectedness of different fields.

Another significant challenge is the context within which decisions are made. Decisions are rarely made in isolation; they often occur concurrently with numerous other decision-making processes. For example, while one university decides on its admissions, others are doing the same. This interplay among multiple decision-makers demands a nuanced approach to decision-making and consideration of downstream effects. A decision made now may have long-term consequences, impacting future opportunities and outcomes. Taking a broader societal perspective, decisions must be made with foresight, considering the well-being of the entire community.

In educational institutions like universities, fostering diversity is crucial. Increasing the number of students from underrepresented groups can create a supportive environment where students can thrive. By intervening to support these students, we not only enhance their chances of success but also cultivate role models for future generations from similar backgrounds. These complex interactions between different stakeholders and the necessity for long-term thinking underscore the challenges inherent in this vibrant field.

In the realm of loans, a critical consideration is the potential impact on economic development and individual success within specific communities. Providing more loans to certain groups can spur economic growth and increase opportunities for success. Conversely, limiting loans may hinder the progress of these communities, creating a dependency between individual actions and broader economic outcomes.

Furthermore, there are long-term implications to consider. Denying loans to particular populations may dissuade future applicants, perpetuating a cycle of exclusion. Conversely, granting loans to individuals who may struggle to repay them can have detrimental effects, such as damaging their credit scores. This highlights the delicate balance between taking risks to foster positive outcomes and mitigating potential negative repercussions.

Ultimately, decisions made for one individual can reverberate throughout the community. It's essential to adopt a holistic perspective, considering the interconnectedness of individuals and communities, and the long-term implications of lending practices.

Is gathering data is also a challenge for you ?

Data presents a significant challenge in learning, especially concerning privacy issues. However, our primary concern lies in ensuring that the information we derive is not based on flawed data, which can manifest in various forms. Our focus is on verifying the adequacy of the data we possess, as its shortcomings can lead to problematic scenarios, such as excluding certain subpopulations. For instance, historical data may be lacking for individuals from specific demographic groups due to past discriminatory lending practices.

Another concern arises when our decisions are based on biased data inherited from past practices tainted by racism or sexism for example. Learning from such data perpetuates existing biases rather than reflecting reality accurately. Consequently, our research endeavors involve first identifying inadequate data and then assessing its potential implications. While correcting flawed data presents challenges, it's a necessary step in mitigating biases and improving decision-making processes.

Undoubtedly, the issue of flawed data looms large and requires careful consideration in our efforts.

And how do you correct data while remaining objective?

In our research, we've identified a common challenge faced by learning algorithms: the need for extensive, unbiased data. Often, the primary dataset may be heavily biased or lack sufficient representation. However, we've devised an approach that addresses this issue by incorporating additional, unbiased data to supplement the existing dataset. By leveraging this supplementary data, we can correct biases and improve the quality of our learning process.
While there are various methods for addressing data biases, our approach has demonstrated effectiveness in utilizing algorithms to correct biases in datasets. Nonetheless, there are instances where this approach may not yield satisfactory results, and in some cases no method can correct for flawed data. It is important to identify such scenarios so as to incentivise the collection of new data.

How do you manage the lack of data you can have with some subgroups?

The effectiveness of data correction strategies varies depending on the availability of data from specific subpopulations. Insufficient data from a particular group limits the corrective measures that can be applied. Therefore, it's crucial to prioritize gathering data from underrepresented subpopulations to enhance the learning process.

Some techniques involve extrapolating insights from one population to another, such as in medical research comparing outcomes between genders. While this approach may be effective in certain scenarios, it's not universally applicable, especially when significant differences exist between populations.

Overall, the success of correction techniques relies on the availability of representative data for all populations. In situations where data is lacking, exploration strategies, such as taking calculated risks in lending practices, can help gather valuable data and improve learning outcomes.

The concept encompasses various components, including instance distributions and capacity constraints. Instance distributions refer to variations in populations across different settings, such as hospitals, which can impact the applicability of research findings. Multi-group fairness addresses this by ensuring robustness across diverse distributions, allowing insights from one context to be applied more broadly.

Capacity constraints, on the other hand, recognize limitations in decision-making processes, such as university admissions or loan approvals, where decisions are constrained by capacity limits. Despite these challenges, multi-group fairness offers a framework to navigate complex decision-making scenarios while maintaining fairness.

Moreover, multi-group fairness extends beyond its initial purpose, proving useful in diverse learning contexts. Its robustness to distribution changes and different loss functions enhances its utility across various decision-making scenarios. Whether deciding to carry an umbrella or launching a space shuttle, multi-group fairness provides a unified approach that transcends individual contexts, highlighting its broader significance in learning and decision-making processes.