#1 Gemini 1.5 Pro, NVIDIA Chat with RTX

Gemini Pro 1.5

Gemini 1.5 Pro Mixture of Experts multi modal model in the gemini family of models from Google. This is competitive with Gemini 1.0 Ultra on benchmarks. The key highlight is, in production, it can handle up-to 1M token length and they also claim to have experimented with up to 10M tokens. Its 128k by default with 1M only available for a limited group of developers slowly expanding. The demo videos are super impressive.

Untitled

https://youtu.be/LHKL_210CcU

NVIDIA Chat with RTX

I know, the names are getting weirder everyday. Chat with RTX is basically a Windows app where you can talk to LLMs locally, that is, assuming you have an RTX GPU with at least 8GB VRAM. It lets you talk with your documents and can answer queries from a Youtube video as context.

Under the hood, this runs LlaMa2 13B with 4bit AWQ quantization.

Untitled

Reka Flash

Reka Flash is a new family of models from reka.ai. The 21B variant outperforms much larger models like GPT-3.5, Gemini Pro, Mixtral and LlaMa2 70B. It is pretrained on 32 different languages that include the likes of Greek, Telugu and Tamil apart from the usual ones. Another trick up its sleeve is that it is MultiModal.

It is one of the few models that answers the “Kilo of feathers vs Pound of bricks” question perfectly. Even when explicitly said that it might be wrong, it admits the inexistent mistake but still answers the same (correct) answer.

There’s also a 7B model which outperforms the likes of Mistral 7B on popular benchmarks.

Untitled

A new leader on the OpenLLM leaderboard

Abacus.ai released a new model this week called Smaug which tops the OpenLLM leaderboard.

There’s also a 34B model which ranks among the top. This is based off of Qwen-72B and improves the average score from 73.6 to 80.48

Untitled