Last Updated: 25 Feb 2024
A journal of everything I've accomplished as a Software Engineer interested in learning more about AI.
February was a rather productive month. I spent some time working with basic hyper-parameter optimization using Optuna for the first time and learnt about how to run a simple experiment. I managed to run 3 more sessions of the paper club, where we covered
I also published a new article on writing better python code on some of the key changes in the way I write python code after working in python for the last 5 months.
January was a slow start! I had the opportunity to spend it in Germany for the first part and I really enjoyed being able to enjoy some time with family.
I've been delving deeper into open source models and experimenting with Cohere's re-ranker. That's how I learnt about batch processing data when Nils Reimer started correcting my usage on twitter and discovered that you should not be using a re-ranker model to evaluate textual similarity. Ended up also learning about these new cool metrics such as AUC and Precision to evaluate my results which has been interesting.
I also started a paper club under Latent Space to help increase interest in LLMs starting with the Attention Is All You Need paper.
In December I slowed down the stuff I was doing and took a slight holiday. I ended up experimenting more with Text to UI platforms and documented how I was able to put together a small front-end demo using MagicPatterns in a short tweet breaking down the specific steps that I took and what the intermediate products were.
I also started on some small projects which I expect to be finished and shareable in 2024 March so stay tuned!
I started working on the Instructor library and published two articles with them along with getting some MRs into the codebase merged! The published articles were
- Smarter Summaries w/Finetuning GPT-3.5 and Chain Of Density
- Good LLM Validation is Just Good Validation
On the side I continued working on more agent code and deployed a small telegram bot called Conseil which tracks todos and can understand basic natural language queries to interact with the database.
So far I've read the papers
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Training language models to follow instructions with human feedback
and I think there'll be a good more to come for the rest of the month
I've tried to split my time in two in october - doing more LLM stuff while simultaneously trying to delve deeper into the theory behind more classical machine learning.
I also continued reading more papers and have read the following two papers this month
This month I worked on two main projects
- Using GPT4 To generate React Components using the @shadcn/ui library
- Creating an Arxiv Crawler
I also finally got around to finishing up a quick article on the derivation of the softmax and cross entropy loss which I worked through in CS321n. I struggled with it for quite some time and thought it might help others.
September proved to be a pretty productive month for me. I managed to achieve the following things
I started reading more papers - I read a total of 4 new papers in September including the LLama 2 paper, Less is more For Instruction (LIMA), Anthropic's Constituitional AI paper and the new paper on the RWKV architecture
I launched a new site to collate all my notes - you can check it out here
I started participating in Kaggle competitions
I finally finished up a LLM red-team challenge I called The Chinese Wall. I did a small write up here which also includes a small writeup on the discord bot that I built to accompany it.
- Finished tidying up a repo with notes that I'd taken down on Karpathy's course. Currently we have part 1,2 and 3 in there.
- Read up a bit on the paper that he mentioned - A Neural Probablistic Model which mentions the use of a real-number vector to represent words. This is currently used extensively in NLP and is known as word embeddings but back then I'm sure it must have been a novel idea.
- Played around with the new Next Auth Kysely integration and Resend and wrote a quick article here - Started working on a small tool as part of Buildspace s4 to help people prep for interviews using GPT-4 and some other models called Prep With AI which uses a bunch of the different things that I wrote about
I wasn't able to do as much as I wanted due to reservice commitments but I did manage to get a few things done.
Discovered Andrej Karpathy's Zero to Hero course and plan to start working through it through August. So far I've finished up with his intro to neural networks and I built a basic binary classifier which has ~42% accuracy using a custom neural network I coded in vanila python. Finished up with the first 2 chapters of his course and I'm really enjoying it so far.
Finally figured out how to deploy langchain on AWS lambda and spent my entire weekend trying to automate a 20 min task with aws sdk
June has just started and my plan now is to work on more applications of LLMs. I believe that using LLMs to augment my learning will help tremendously when it comes to generating new insights and finding interesting angles to explore.
The plan is to build a local LLM using gpt to be able to query and discover new insights about my previous notes and chats. I tried implementing a basic clone with memory and embeddings here but ended up getting side tracked with other ideas.
I also started experimenting with Open AI Functions and built out a simple classifier using Yake and GPT that was able to classify places that I had been to before using my reviews and other metadata ( Link )
I've managed to finish up Part 1 of Fast AI's course and boy have I learnt a lot about machine learning in general. The course seems to cover a lot more of traditional machine learning techniques and there's a lot which I'll definitely need to revisit. You can read my notes here FastAI Part 1