Abstract
<jats:title>Abstract</jats:title> <jats:p>The Yates reservoir is a major oil field in West Texas. The primary production interval is the San Andres formation which con sists of mainly oil-wet dolomite rock that is naturally fractured. The oil-wet rock leads to low oil-cut production, especially under waterflood as water flows through the fractures bypassing most of the oil in the adjacent matrix. Water also does not imbibe into the matrix because of oil wettability. Treatment with wettability-altering fluids can induce water imbibition and enhance oil production.</jats:p> <jats:p>Efforts have been made since 2016 to alter rock wettability by injecting surfactants into producing wells. Each year, approx imately 15 wells are treated by injecting a slug of surfactant into the well. Then the well is shut in for several days allowing the surfactant to soak and alter the rock wettability before opening it for production. Analysis of incremental oil recovery indicates that only about 70% of the treatments successfully overcome the economic hurdle. The analysis also shows that the location of wells selected for the surfactant treatment is the key factor that determines the outcome. Currently the wells are selected based on historical production data. This paper presents a data-driven methodology for optimizing well selection for surfactant Huff-and-Puff (HnP) treatments in the Yates Field.</jats:p> <jats:p>The approach is based on a supervised classification model trained on historical well treatment outcomes. The training dataset consisted of 96 wells treated between 2018 and 2024, each labeled according to post-treatment oil production performance. An initial set of 30 features was systematically reduced to 10 through correlation analysis, missing data handling, and feature importance evaluation. The final model was developed using a Random Forest algorithm. Model performance was assessed using K-fold cross-validation and a blind validation set. Due to the imbalanced nature of the dataset, the model was designed to output success probabilities rather than binary classifications. Model performance was evaluated using two key metrics: False Positive Rate (FPR), to avoid uneconomic treatments, and precision, to ensure a high proportion of predicted successes is truly successful. These metrics reflect the project's priority—minimizing costly misclassifications in a large candidate pool where only 15 wells are ultimately selected.</jats:p> <jats:p>The model was applied to a prediction dataset of 641 candidate wells, generating ranked probabilities of treatment success. The top 25 wells from each model variant were shortlisted and further refined to 15 wells using a legacy selection framework that in corporated domain-specific criteria such as reactivation status, gas-oil ratio (GOR), and nearby development activities. The fifteen selected wells were treated using cationic surfactants, and their post-treatment performance are evaluated to validate the model's predictive capability. The results will be used to further refine the model for treatment programs in coming years.</jats:p>