Project Details

Linear Discriminant Analysis (LDA) to Classify the Contract Status of Baseball Players
This project applied Linear Discriminant Analysis (LDA) to predict and classify the contract status of baseball players into three categories: Free Agent, Arbitration, and None. The dataset contained 337 records and 28 variables, comprising both categorical and continuous features. Key objectives included developing discriminant functions to separate the categories, identifying the most influential factors affecting contract status, and evaluating the model's performance.
Steps and Findings:
Preprocessing :
Principal Component Analysis (PCA) was used to reduce dimensionality, yielding eight significant components. Exploratory Data Analysis:Provided summary statistics and visual insights into the distribution of key components. Model Development: Created LDA functions, resulting in a two-dimensional discriminant space. First Discriminant Function: Derived as a linear combination of eight principal components (e.g., runs, on-base percentage, hits per error). Model Performance:
Training accuracy: 64.58%
Testing accuracy: 67.01%
The confusion matrix showed varying misclassification rates across categories.