Zero to Hero... NLP project edition

Zero to Hero... NLP project edition


So you just went through another tutorial, another MOOC. Your guilty gut instinct knows another one just won't help, but you have no idea what else to do.

You've been told a project can go a long way to show initiative, motivation and even skill, but... you've got no idea what to do, where to go or even how to start!

Apparently, it should "genuinely motivate you to work", but... how? What should your project even be about?

Your first instinct for starting a project may be to go with the flow, and see where it leads to. You can try, be my guest, but that's like learning to navigate around a jungle yourself (you might find your target location... eventually)! Instead, you could try conducting deep research into the surrounding landscape and wildlife. However, when you're exposed to real, aggressive animals you'll notice the major difference between theoretical knowledge and real practical skills.

What you really need is a guide! Someone who knows the place well enough to give you a brief tour around. Someone able to point out general points of interest and significant events to watch out for. This way when you're left alone, you'll roughly know what to do and how to live/navigate around the jungle yourself.


It's easy to get stuck without any sense of direction during a project (like in a jungle)

Please don't get caught alone in the jungle! Instead, allow me to be your guide. In this mini-series, we together will go from the ground up building a unique (and therefore impressive) Natural Language Processing project! I hope this mini-series inspires you to start your own project whilst also offering a solid foundation to replicate the process yourself!

A light bulb

It's great to start of intrinsically motivated to work, but it's just... unrealistic. How many times have you been so blown away by a random perfect idea that was so aligned to what you were about to do, that you could take immediate action and bring it to life? If your answer is daily, you're lucky, kudos to you.

However, if you've got no idea what to do, I've got your back:

With time and effort, learning and absorbing information, you'll eventually encounter an impressive and worthy idea.

This means if you've been ruminating for a while, take a break and instead learn. You can learn through articles, books, videos, anything you like... just bask in information! The trick here is to continuously question how these ideas could be used in the real world. It doesn't matter whether you completely understand yet either (with time you'll learn...), just make sure to replicate this process until you come across a gem!

If you're unsure about your ability to finish a project, that's fine! What's the worst thing that will happen? The worst thing is that you'll have learnt more about what you can and can't do next time! Just remember that dedication pays off in the long run. If you research each idea, eventually after a five or so you'll find something golden!


Finding an idea was damn hard, but following through... now that's something entirely different! Lucky for confused basic simpletons like us, there's an easy way to break down the entire project into a few key stages:

  1. Data collection

    Machine learning is cool, but we can't really do much without data. So let's kick off our journey the right way by finding quality data! There are two options:

    • Popular and easy to manage data
    • Unique and niche data

    What could possibly warrant going through the trouble of creating a special dataset just for a single project? Simple, you want to be a problem solver.

    You want to show your ability to solve new, unique and challenging problems, not simple tutorials!

    I know finding data from unique sources will bring about numerous seemingly unnecessary hurdles, but they're part of the fun.

  2. Process data (make sure it's formatted correctly and cleaned)

    Processing data could be the most important part of your project.

    High-quality data yields high-quality results.

    I know you'll be tempted to fast track your progress by simplifying your preprocessing pipeline. But just remember the saying "garbage in == garbage out". It means your lazy unprocessed data manifests itself within your model. Hence a lazy mediocre model will generate sub-optimal output (despite attempts to algorithmically improve results).

  3. Modelling

    The highly anticipated part of any data science project is creating a model. There are loads of complex models (and modifications to them) you can make, however, start simple and incrementally improve afterwards.

  4. Application

    You thought you'd finished? Hahaha... the model itself isn't nearly as impressive as a tangible application!

    You have a variety of options, a website, mobile app, browser extension... Choose whatever application makes sense!

    Creating a final application may take a little time and require you to broaden your skillset further, but it pays itself off extremely fast. Remember that one well thought of project is far better than a dozen small and careless mediocre ones!

Cover image (modified) sourced from here


I know that creating our first NLP project won't be quick nor easy. But I think it's important to find why you're doing a project. Is it to demonstrate how fast you can work or how able you are to do meaningful and realistic work? I hope this mini-series helps you!

If you've liked this, make sure to stay put for the next post where I actively go through the first step of our journey (data collection). Make sure to follow me on Twitter for updates!