Areas of interest

As my fondness for learning only grows with time, I'm looking forward to expand my areas of expertise including, but not limited to, the following:

Data Science

As a mathematician, I'm passionate about the uses of scientific and quantitative thinking to understand and solve complex problems. As a statistician, I'm passionate about data and the ways to uncover the various histories it tells. As a self-taught programmer, I acknowledge the world now belongs to those who can code it. Data Science is the perfect environmet to fully explore these three elements.

Machine Learning and Deep Learning

Today's world is undoubtedly shaped by data and the value we can extract from it. Never before in human history, a so vast amount of information was available to explore. The incredible ideas behind algorithms and models widely used nowdays do not cease to amaze me and wanting me to be there while being able to deliver added value to organizations and people out of its use.

Data Engineering

From a technical perspective, data is everywhere and thus, many oportunities for integrating technologies to gather such data are available. I found particularly interesting the ideas behind modern data engineering.

Teaching

As Richard Feynman once said 'If you really want to master something, teach it'. Teaching delivers two huge outcomes: i) long lasting well being to people who learn and ii) systematic thinking and deep understanding to people who teach. Teaching is basically learning twice.

Music

Music has been and probably forever will be my main hobbie. Over 20 years of guitar playing.

Skills

Programming Languages

  • Python for Data Analysis (Pandas, Numpy, Tensorflow, Scikit-learn, Keras, Pytorch, etc. )
  • SQL for data wrangling (SQLite, Postgress, MySQL, Oracle, MongoDB and Cassandra)
  • R for statistical modelling (tidyverse, caret, ggplot2, ggvis, etc. )
  • Other items in my toolbox: C++, Ox, SAS, SPSS, STATA, LaTeX

Statistics and Machine Learning

  • Exploratory a and descriptive analysis
  • Regression, Clasification and Clusterization models
  • Data Balancing techniques, feature selection and dimension reduction
  • Algorithm performance metrics (RMSE, MAE, MAPE, Confusion Matrix, Presicion, Recall, ROC curve, AUC, Silhouette, DB-Index, etc.)

Data Visualization

  • Matplotlib, Seaborn, Plotly, Bokeh, Dash and Ggplot2
  • Power BI, Tableau, Metabase and Looker

Software Engineering

  • Git, Github, Docker, Anaconda
  • Streamlit, Flask, FastAPI, Django
  • Cloud Heroku, AWS and GCP

Experience

Data Analyst - Lambton College of Applied Arts and Sciences (Sarnia, ON - Canada)

3+ year of experience as Data Analyst at the Lambton College. KPIs reporting, tracking and automation, Institutional Intelligence and Quality Assurance committees member, Ministry of Education reporting. Responsable for proposing, deploying and mantaining college-wide data related products.

Correlation One - Teaching assistant (New York, NY - USA)

1+ year of teaching assistant experience for the Data Science for (DS4A) All Empowerment and DS4A / Data Engineering programs. These programs are highly demanded and admissions is granted based on tests and interviews. Typical admission rate of 3%

Teaching and Management - Various higher education institutions (Bogotá, Bogotá D.C - Colombia)

5+ years of experience as mid-level manager and professor at Los Libertadores University Foundation (Colombia): Head of Department of Statistics, Chair of graduate and undergraduate programs in Applied Statistics and Statistical Consultant coordinator

5+ years experience as professor and researcher in various universities in Bogotá (Colombia): Universidad del Rosario (Graduate and undergraduate programs), Universidad Santo Tomás (Graduates programs) and Universidad Sergio Arboleda (Undergraduate programs).

Statistical Consultant - (Bogotá, Bogotá D.C - Colombia)

6+ years as statistical consultant for various individuals, public and private institutions in Colombia.

Data Science Projects Portfolio

Solution proposals for business problems based on data. Main intention is to display various problems similar to those that firms might have, from problem understanding to solution deployment

Education and Certifications

Mathematician

National University of Colombia (Bogotá, Colombia). 2008-2013

Ms.C. in Statistics

Federal University of Pernambuco (Recife, Brazil). 2013-2015

One year Graduate Diploma in Applied Statistics

Los Libertadores University Foundation (Bogotá, Colombia). 2017-2018

Data Science for All (DS4A)

Correlation One and Colombian Minister of technologies. 2020

IBM Data Science

IBM and Coursera. 2020

Data Science Projects

SIPSApp (Tracking Colombian food prices)

A Streamlit app allowing user to dive into the colombian food prices. Data pipeline designed to take weekly messy excel files from the Colombian Division of Statistics webpage and processed it into a unified, clean and usable database. Forecasted prices per product and city are available. Project deployed on AWS.

FoodVision (Food Image Classifier)

EfficientNetB2 and Transformer Vision models were trained on Food 101 dataset to build and food image classifier. The former was selected for deployment as it was considerably lighter than it's countarpart. Obtained accuracy outperforms the one obtained in the paper where is was first presented. All programming was carried out with Python using PyTorch while deployment was made in HuggingFace.

Insurance Cross-Sell Client Ranking

An automatic client ranking is proposed so later iterations of a marketing campaing within an insurance company (looking forward to increase cross-sell rate) can be more efficient. An API with a pre-trained ML model available through google sheets spreadsheets is available. All programming was carried out with Python.

Public Transportation Dynamics (Valle de Aburra, Colombia)

This project aimed to contribute to Aburrá Valley (Colombian region of about 4M inhabitants) authorities to understand public transportation dynamics, by means of a real-time dashboard, in order to enhanced quality, efficiency and proactively propose future infrastructure projects. Also, Neural networks were used to forecast heavy transit points. All programming was carried out with Python.

Dissecting and forecasting food inflation (Colombian case)

This project aimed to propose a suitable approach based on classic statistics to better forecast food inflation in Colombia based on prices collected diretly from food marketplaces across the country. Being able to proactively anticipate food prices variation lead to better decision making by authorities and food merchants. It involves heavy data wranling and processing, classical univariate and multivariate time Series models (SARIMA, SVAR and SVEC). All programming was carried out in R.

Predicting Sales

This project presents a solution for a Kaggle competition where forecasted selling values were required by a large drugstore. Among various Machine Learning techniques, XGboost was selected, implemented and deployed. All programming was carried out in python. Deployment was carried out using Flaks and Heroku.

Some publications

Proposal for a SIPSA index and its relation to food inflation for the Colombian case: empirical evidence

A new index, based on food prices, willing contribute to better predict food inflation in Colombia is proposed. Full article available here (in spanish)

Improved Likelihood Ratio Tests in Power Series Generalized Nonlinear Models

Improved versions, based on boostrap estimation of Bartlett correction factor, for the widely know Likelihood ratio test are proposed in the context of PSGNLM. Full article available here

Notion of approximation of the area under the curve using the GeoGebra Graphing Calculator app

Mobile learning (m-learning) paradigm was implemented and assessed when teaching a regular calculus course for engineering students. Full article available here (in spanish)

The Gamma Modified Weibull Distribution

A new probability distribution is proposed based on the gamma-G family of distributions. Full article available here

Contact