Courses Taught
- Scientific Computing
- Computer Networking and Communication
- Data Structures and Algorithms
- Machine Intelligence
- Computer Organization
- Bioinformatics
- Calculus for Social Sciences
- First Year Seminar
- Problem Seminar
- Operating Systems
Recently Advised Senior Independent Studies
- 2014
- On Designing and Implementing Graphical Interface for Mild Dementia and Early-Stage Alzheimer's Patients
(Michelle Blackwood) - Optimizing Merit Scholarhips for Enrollment Management at The College of Wooster
(Tin Nguyen, Computer Science and Business Economics) - Don't Get Boxed In: Connecting the Dots of Game Theory Through Computer Simulations
(Trevor Pozderac, Computer Science and Math) - 2013
- An Agent-based Model of Influenza Within a College Population
(Matt Lambert, Math and Computer Science) - Optimizing Integer Arithmetic for Public Key Cryptography
(Spencer Hall, Math and Computer Science)
Research
My main research interest is in the area of analyzing the strengths and the shortcomings of current machine learning algorithms when applied in learning or classifying real world data. So far, my research focused on a particular challenging real-world type of data, the imbalanced datasets (data where one class is abundant in examples, where the other is significantly under-represented) which pose serious problems to the current algorithms. A significant number of datasets occurring in the real world are highly imbalanced (e.g. diagnosis of rare diseases, text classification and all kinds of fraud detection such as credit card, phone calls, insurance). For such real world applications the conventional classifiers fail to recognize the minority class, which is the class of interest. Furthermore, errors coming from the minority class (e.g. cancer versus non-cancer) have higher penalty. I have created a type of fuzzy classifiers that are more suitable for such data, having applications in many important fields: medical field in diagnosis of rare disease, fraud detection in credit card, telephone and insurance companies. In my research I use both experimental and theoretical methods: at the experimental level I use many data sets from well known benchmarks such as University of California at Irvine Machine Learning Repository and also from the real world; at the theoretical level I use mathematical tools to investigate the behavior of the algorithms (e.g. proofs and statistical methods).
Selected publications from this research:
- S. Visa. The effect of class imbalance, complexity, size, and learning distribution on classifier performance, In the International Journal of Advanced Intelligence Paradigms, Vol 3(3/4), 341-366, 2011.
- S. Visa, A. Ralescu. Data-driven Fuzzy Sets for Classification, In the International Journal of Advanced Intelligence Paradigms, Vol 1(1), 11-38, 2008.
More recently, I have expanded my area of research to bioinformatics. I am particularly proud of my NSF sponsored collaboration with The Ohio State – OARDC (Ohio Agricultural Research and Development Center) and Boyce Thompson Institute. For this $3.7 million ($156,000 sub-awarded to the College of Wooster) bioinformatics project on “Discovery of Genes and Networks Regulating Tomato Fruit Morphology," eight Wooster students and I worked for four summers (2009-2013) in analyzing genetic data to discover genes that regulate size and shape in tomato fruits. The accomplishments of our group include:
- Identification of differentially expressed genes in various tomato backgrounds;
- Clustering of genes according to their time-points expression into clusters of genes with similar expression patterns over time;
- Identification of promoter elements within four different gene clusters of interest (these latest results will be incorporated in a journal paper to be submitted in Fall 2013);
- Updating the Tomato Analyzer software which analyzes the image of a vertically sectioned tomato fruit and computes about 40 morphological attributes such as height, width, perimeter, tip angle, how close is the shape to a circle, rectangle, etc… ;
- Using morphological data (i.e. tomato fruit contour points) to model the tomato fruits into nine distinct shapes (e.g. Round, Rectangular, Heart, Obovoid, etc.);
- Using machine learning algorithms to implement a classifier which, based on the Tomato Analyzer attributes, automatically classifies the tomato fruits into one of the nine shape classes.
Selected publications from this research include: (red indicates a student researcher)
- S. Wu, J. P. Clevenger, L. Sun, S. Visa, Y. Kamiya, Y. Jikumaru, J. Blakeslee and E. van der Knaap, The control of tomato fruit elongation orchestrated by sun, ovate and fs8.1 in a wild relative of tomato, Plant Science (2015), pp. 95-104, DOI information: 10.1016/j.plantsci.2015.05.019
- J. Clevenger, J. Van Houten, M. Blackwood, G. Rodriguez, J. Yusuke, Y. Kamiya, M. Kusano, K. Saito, S. Visa and E. van der Knaap, Network analyses reveal shifts in transcript profiles and metabolites that accompany the expression of SUN and an elongated tomato fruit, Plant Physiology (2015), vol. 168, no. 3, pp. 1164-1178, DOI: http://dx.doi.org/10.1104/pp.15.00379
- S. Visa, C. Cao, B. McSpadden-Gardener, E. van der Knaap. Modeling of Tomato Fruits into Nine Shape Categories using Elliptic Fourier Shape Modeling and Bayesian Classification of Contour Morphometric Data, Euphytica, 200:429-439, 2014.
- L. Sersain, C. G. Mendoza, S. Visa, E. van der Knaap. A Frequency- and Clustering-based Methodology for Finding Transcription Factor Binding Sites. In Proceedings of the Midstates Conference for Undergraduate Research in Computer Science and Mathematics, College of Wooster, 2014.