In this demo, you’ll see how LLMs and prompt engineering work. Starting with LLMs, remember that they’re basically programs that understand natural language and respond with same. Each LLM undergoes different kinds of training, using different kinds of data. So you will rarely find any two LLMs that are the same in terms of their performance.
LLM
Because of this, it’s important to note that LLMs are ranked based on certain parameters such as reasoning, coding, conversation, and more. Search online for “LLM leaderboard” to get the most recent public statistics for the various LLMs and how they fare in different scenarios. A few notable ones can be found at Vellum, LMArena, Artificial Analysis, LLM Stats, Scale, Hugging Face, LiveBench, Convex and Aider.
Here’s the leaderboard for LiveBench. As you can see, it ranks LLMs based on criteria such as global average, reasoning, coding, agentic coding, mathematics, data analysis, language, and instruction-following (IF).
Aider has a leaderboard based on metrics like code editing, refactoring, and release date.
So you see that depending on your use case, an LLM will perform certain tasks better than others. Another thing to look out for is transparency. Convex for instance has their methodology and benchmark tool available on GitHub. Not all leaderboards are created equal, so look out for credible ones.
Prompt Engineering
Open your starter project. It’s a simple Python script that allows you to interact with an LLM. Visit python.org/downloads to install Python if you don’t have it already. To verify that Python is installed, type python --version in your Terminal.
Then, install the uv package with pip:
pip install uv
uv is a package installer like pip but faster, and includes more convenience utilities. Switch to the Starter project in the Terminal with cd Starter, and create a virtual environment by typing uv venv. This creates a virtual environment called .venv, from which you’ll install this project’s dependencies.
Activate the environment with:
source .venv/bin/activate
Now, install the following packages with the command:
uv pip install langchain langchain-openai
Then, export your OpenAI API key to add it to your current session’s environment variable. The script will use it to access OpenAI models:
export OPENAI_API_KEY=<your-api-key>
All set to run the script. In the code, under TODO: 1 - gpt-4o, you’ve set openai:gpt-4o as your model. Then under TODO: 3 - Simple prompt, you use a simple prompt that says "Tell me about python". Run the script with:
python3 prompt_engineering.py
Observe the response. In some models, poor prompts may return poor results or limited and generic output.
Finally, comment out the line under TODO: 3 - Simple prompt, and uncomment the line under TODO: 4 - Enhanced prompt to use the enhanced prompt.
Rerun the script and observe the output. You’ll see that this produces an even better response. It’s cleaner, direct, yet rich.
That’s all for this section.