Chloë's Blog

In this blog, I will discuss what I learnt at a recent writing workshop and how I am hoping to apply it into my proffessional and personal life.

Short form:

I attended a scientific writing workshop presented. Was extremly useful. Areas covered: what people actually read; main message; question/problem; argumentation; good image guidlines; writing effective summaries-not lab books; turning ugly sentences into final texts; language hints; disucssions and titles; why are we not publishing.

  1. To write an paper, one has to know hoe to read a paper.

This is a concise description of the problem to be solved. It is important to define the problem you are trying to solve. There are four different kinds of Machine Learning depending on the type of problem. They are:

The nature of the problem has to be defined to know which algorithm/model is to be used and what method to evaluate the data given.

  1. Data.

Since Machine Learning requires algorithms to find patterns in data, data is the basis for any Machine Learning Project. Data comes in many shapes and sizes but there are two main kinds of data: Structured and Unstructured Data. Structured Data is data organized into rows and columns. It is in the format of .csv or .excel files while Unstructured Data is unorganized data consisting of images or audio. Data can be viewed with notebooks such as the Jupyter Notebook.

  1. Evaluation.

Before modeling, there is a need to set a target accuracy to define the expectation of your model. A feasible accuracy should be set for the problem given because a model cannot be 100% accurate but it can be trained to give its best accuracy. The process of determining the accuracy of a model is called evaluation For example, a 95% accurate model may work best in some areas but when predicting heart disease, you might want a more accurate model. Evaluation metrics can be put in place to measure how well a Machine Learning algorithm predicts the future. As progress is made on the project, the evaluation metrics might change due to certain circumstances.

  1. Features.

This is another word for different forms of data. What is known about the data given? Insights are drawn from features for Data Analysis. For example, in a car sales .csv file, the column names (e.g type, odometer, color, etc) are all features of the car sales data. They are also referred to as feature variables. Feature variables are used to predict the target variables. A feature variable could be Numerical or Categorical. The process of deriving features out of given data is called Feature Engineering.

  1. Modeling.

Based on the problem statement and data, a model would be picked and this would further be divided into three parts:

When it is time to model, data is often split into three parts: Training, Validation, and Testing. The ability of a Machine Learning model to perform well on data it hasn’t seen before is called Generalization. There are several kinds of algorithms to use when modeling. Some algorithms work better than others depending on the type of data. When choosing a model, certain options like the size and type of data come into play. For Structured Data, algorithms like Xgboost and Random forest are used while in Unstructured Data, Deep Learning and Transfer Learning can be used. Training may take a while depending on how complex the model is and the algorithm used.

Tuning takes place on the validation data split and a model can be tuned for different kinds of data. Hyper-parameters are used to tune the algorithms to suit the model.

A good model would yield a similar result on the validation and test set during comparison. During comparison, the model might not be able to generalize well and this could be caused by data leakage or mismatch. Corrections can be made to fix such problems.

  1. Experimentation.

This is an iterative step of steps 2 - 5. Here, you search for what else can be used to improve the model and other models that can be tried to improve its accuracy and make it better.

After the model has been built, the next step is deployment. Deployment of a Machine Learning model is simply the integration of the model into a production environment so that it would be able to take input and return output to be used to make decisions. It could be deployed as APIs with Python Frameworks such as Flask and Django or to the frontend with Tensorflow.js. There are so many other ways to do this and it depends on where it is needed.

Machine Learning isn’t the solution to all problems. For simple problems that can be easily fixed with a few lines of code, Machine Learning isn’t needed as it can bombard the system with unnecessary lines of code.

As time goes on, I'll write more articles on the steps above. I hope you enjoyed this!

Thank you!