Data Project Portfolio

About me

Hi, my name is Sébastien Lozano-Forero and be very welcome to my webpage. I hold a Bachelor's degree in science (Mathematics), a Master's degree in data (Statistics) and I'm passionate about understanding and solving complex problems.

Since young, always enjoyed learning. So, that’s how I look at data science: an ongoing (and probably never ending) learning project with amazing oportunities to understand business questions, apply the scientific method, delivering data related products and thus, adding value to organizations who trust data.

Over 8 years of experience in consulting data related problems, teaching, researching and management. Basic French (working on it), fluent in english, spanish and portuguese. Analytic and strategical thinker, Higly addaptable, quick and passionate learner, result-and-detail-oriented individual and experienced in collaborative and individual frameworks.

In this page some of my skills to solve problems using data are presented. In addition to the Data Science Portfolio, areas of interest, professional experience, education and selected publications.

Contact information is available at the end of the webpage.

Areas of interest

As my fondness for learning only grows with time, I'm looking forward to expand my areas of expertise including, but not limited to, the following:

Data Science: As a mathematician, I'm passionate about the uses of scientific and quantitative thinking to understand and solve complex problems. As a statistician, I'm passionate about data and the ways to uncover the various histories it tells. As a self-taught programmer, I acknowledge the world now belongs to those who can code it. Data Science is the perfect environmet to fully explore these three elements.
Machine Learning and Deep Learning: Today's world is undoubtedly shaped by data and the value we can extract from it. Never before in human history, a so vast amount of information was available to explore. The incredible ideas behind algorithms and models widely used nowdays do not cease to amaze me and wanting me to be there while being able to deliver added value to organizations and people out of its use.
Data Engineering: From a technical perspective, data is everywhere and thus, many oportunities for integrating technologies to gather such data are available. I found particularly interesting the ideas behind modern data engineering.
Teaching: As Richard Feynman once said 'If you really want to master something, teach it'. Teaching delivers two huge outcomes: i) long lasting well being to people who learn and ii) systematic thinking and deep understanding to people who teach. Teaching is basically learning twice.
Music: Music has been and probably forever will be my main hobbie. Over 20 years of guitar playing.

Skills

Programming Languages

Python for Data Analysis (Pandas, Numpy, Tensorflow, Scikit-learn, Keras, Pytorch, etc. )
SQL for data wrangling (SQLite, Postgress, MySQL, Oracle, MongoDB and Cassandra)
R for statistical modelling (tidyverse, caret, ggplot2, ggvis, etc. )
Other items in my toolbox: C++, Ox, SAS, SPSS, STATA, LaTeX

Statistics and Machine Learning

Exploratory a and descriptive analysis
Regression, Clasification and Clusterization models
Data Balancing techniques, feature selection and dimension reduction
Algorithm performance metrics (RMSE, MAE, MAPE, Confusion Matrix, Presicion, Recall, ROC curve, AUC, Silhouette, DB-Index, etc.)

Data Visualization

Matplotlib, Seaborn, Plotly, Bokeh, Dash and Ggplot2
Power BI, Tableau, Metabase and Looker

Software Engineering

Git, Github, Docker, Anaconda
Streamlit, Flask, FastAPI, Django
Cloud Heroku, AWS and GCP

Experience

Data Analyst - Lambton College of Applied Arts and Sciences (Sarnia, ON - Canada)

3+ year of experience as Data Analyst at the Lambton College. KPIs reporting, tracking and automation, Institutional Intelligence and Quality Assurance committees member, Ministry of Education reporting. Responsable for proposing, deploying and mantaining college-wide data related products.

Correlation One - Teaching assistant (New York, NY - USA)

1+ year of teaching assistant experience for the Data Science for (DS4A) All Empowerment and DS4A / Data Engineering programs. These programs are highly demanded and admissions is granted based on tests and interviews. Typical admission rate of 3%

Teaching and Management - Various higher education institutions (Bogotá, Bogotá D.C - Colombia)

5+ years of experience as mid-level manager and professor at Los Libertadores University Foundation (Colombia): Head of Department of Statistics, Chair of graduate and undergraduate programs in Applied Statistics and Statistical Consultant coordinator

5+ years experience as professor and researcher in various universities in Bogotá (Colombia): Universidad del Rosario (Graduate and undergraduate programs), Universidad Santo Tomás (Graduates programs) and Universidad Sergio Arboleda (Undergraduate programs).

Statistical Consultant - (Bogotá, Bogotá D.C - Colombia)

6+ years as statistical consultant for various individuals, public and private institutions in Colombia.

Data Science Projects Portfolio

Solution proposals for business problems based on data. Main intention is to display various problems similar to those that firms might have, from problem understanding to solution deployment

Education and Certifications

Mathematician
National University of Colombia (Bogotá, Colombia). 2008-2013
Ms.C. in Statistics
Federal University of Pernambuco (Recife, Brazil). 2013-2015
One year Graduate Diploma in Applied Statistics
Los Libertadores University Foundation (Bogotá, Colombia). 2017-2018
Data Science for All (DS4A) Correlation One and Colombian Minister of technologies. 2020
IBM Data Science IBM and Coursera. 2020

Data Science Projects

SIPSApp (Tracking Colombian food prices)

A Streamlit app allowing user to dive into the colombian food prices. Data pipeline designed to take weekly messy excel files from the Colombian Division of Statistics webpage and processed it into a unified, clean and usable database. Forecasted prices per product and city are available. Project deployed on AWS.

Learn more

FoodVision (Food Image Classifier)

EfficientNetB2 and Transformer Vision models were trained on Food 101 dataset to build and food image classifier. The former was selected for deployment as it was considerably lighter than it's countarpart. Obtained accuracy outperforms the one obtained in the paper where is was first presented. All programming was carried out with Python using PyTorch while deployment was made in HuggingFace.

Learn more

Insurance Cross-Sell Client Ranking

An automatic client ranking is proposed so later iterations of a marketing campaing within an insurance company (looking forward to increase cross-sell rate) can be more efficient. An API with a pre-trained ML model available through google sheets spreadsheets is available. All programming was carried out with Python.

Learn more

Public Transportation Dynamics (Valle de Aburra, Colombia)

This project aimed to contribute to Aburrá Valley (Colombian region of about 4M inhabitants) authorities to understand public transportation dynamics, by means of a real-time dashboard, in order to enhanced quality, efficiency and proactively propose future infrastructure projects. Also, Neural networks were used to forecast heavy transit points. All programming was carried out with Python.

Learn more

Dissecting and forecasting food inflation (Colombian case)

This project aimed to propose a suitable approach based on classic statistics to better forecast food inflation in Colombia based on prices collected diretly from food marketplaces across the country. Being able to proactively anticipate food prices variation lead to better decision making by authorities and food merchants. It involves heavy data wranling and processing, classical univariate and multivariate time Series models (SARIMA, SVAR and SVEC). All programming was carried out in R.

Learn more

Predicting Sales

This project presents a solution for a Kaggle competition where forecasted selling values were required by a large drugstore. Among various Machine Learning techniques, XGboost was selected, implemented and deployed. All programming was carried out in python. Deployment was carried out using Flaks and Heroku.

Learn more

Some publications

Proposal for a SIPSA index and its relation to food inﬂation for the Colombian case: empirical evidence: A new index, based on food prices, willing contribute to better predict food inflation in Colombia is proposed. Full article available here (in spanish)
Improved Likelihood Ratio Tests in Power Series Generalized Nonlinear Models: Improved versions, based on boostrap estimation of Bartlett correction factor, for the widely know Likelihood ratio test are proposed in the context of PSGNLM. Full article available here
Notion of approximation of the area under the curve using the GeoGebra Graphing Calculator app: Mobile learning (m-learning) paradigm was implemented and assessed when teaching a regular calculus course for engineering students. Full article available here (in spanish)
The Gamma Modified Weibull Distribution: A new probability distribution is proposed based on the gamma-G family of distributions. Full article available here