A Hacker News user calculated that the total number of parameters in Gemini 3.5 Flash is between 250 billion and 300 billion based on TPU performance.



On Hacker News, a news-sharing site for engineers, there's a lively discussion about Google's newly announced high-speed model, 'Gemini 3.5 Flash,' with questions such as 'What kind of model is it really?', 'Isn't it too expensive considering the name 'Flash'?', and 'What are the total number of parameters and the number of active parameters?'

Gemini 3.5 Flash | Hacker News

https://news.ycombinator.com/item?id=48196570

Gemini 3.5 Flash is the first lightweight, high-speed model offered in Google's Gemini 3.5 series. According to Google, Gemini 3.5 Flash can be used with Gemini apps, Google Search's AI Mode, Google Antigravity, Gemini API, Google AI Studio, Android Studio, Gemini Enterprise, and more, and delivers high performance for coding and agent tasks. You can find more details in the article below.

Google announces the 'Gemini 3.5' series, starting with the lightweight 'Gemini 3.5 Flash' - GIGAZINE



Hacker News has made an interesting calculation regarding the contents of Gemini 3.5 Flash. User easygenes estimated that, based on the fact that Google is providing Gemini 3.5 Flash with a TPU 8i, the memory capacity, memory bandwidth, and computing performance of the TPU 8i, and Google's assumption of an output speed of approximately 280 tokens per second, the total number of parameters in Gemini 3.5 Flash is approximately 250 billion to 300 billion, and the number of active parameters is approximately 10 billion to 16 billion.

The total number of parameters refers to the total number of weights in the entire AI model. On the other hand, the number of active parameters refers to the number of weights actually used in a single inference. In methods like Mixture of Experts (MoE), instead of running the entire model every time, only certain areas of expertise are used depending on the input. Therefore, even if the total number of parameters is large, the number of active parameters that are actually used may be considerably smaller than the total number of parameters.

easygenes also provides an estimate that, assuming the TPU 8i has 288GB of VRAM, the static weights of the model would be around 110GB to 150GB, and the dynamic allocation and compressed KV cache would be around 138GB to 178GB. The KV cache is intermediate data that the AI stores to efficiently refer to past context. When dealing with long conversations or long documents, the memory usage of the KV cache becomes a major constraint.



However, easygenes also states that 'this is an estimate.' Furthermore, if compression and quantization techniques like TurboQuant are used, the model can be stored in a smaller size without significantly compromising quality, so the total number of parameters could potentially reach around 400 billion. Since Google has not officially released the total number of parameters or active parameters for Gemini 3.5 Flash, the estimate on Hacker News is not an 'official value,' but rather a 'figure calculated fairly realistically from publicly available information.'

Hacker News is also

focusing on the pricing of Gemini 3.5 Flash. The price per million input tokens and per million output tokens for each model are as follows. A simple calculation shows that Gemini 3.5 Flash is three times the price of Gemini 3 Flash Preview.
input output
Gemini 2.5 Flash $0.30
(Approximately 48 yen)
$2.50
(Approximately 397 yen)
Gemini 3 Flash Preview $0.50
(Approximately 79 yen)
$3.00
(Approximately 477 yen)
Gemini 3.5 Flash $1.50
(Approximately 238 yen)
$9.00
(Approximately 1430 yen)


Input tokens are units that divide the text, code, etc., that the user provides to the AI into smaller parts, while output tokens are units that divide the responses generated by the AI into smaller parts. In other words, if you feed the AI long documents, generate long responses, or generate a large amount of code, both the input tokens and output tokens will increase, and so will the cost.

Furthermore, it has been pointed out that Gemini 3.5 Flash is not only more expensive per token, but also has a higher total cost when running the entire benchmark. Hacker News reports that the cost of running the entire Artificial Analysis evaluation was $172 (approximately 27,000 yen) for Gemini 2.5 Flash, $278 (approximately 44,000 yen) for Gemini 3 Flash Preview, and $1,552 (approximately 250,000 yen) for Gemini 3.5 Flash. This means that Gemini 3.5 Flash costs about 5.6 times more than Gemini 3 Flash Preview and about 9 times more than Gemini 2.5 Flash.

With the price increase, some users have commented that 'the performance is mediocre for the price, and it's inferior to the Gemma4 26B-A4B in every respect.'

Regarding its actual usability, some users have commented that it's 'quite clever, but prone to inflated output.' Simon Willison reported that when he had Gemini 3.5 Flash generate an SVG of a pelican riding a bicycle, it produced a very detailed image, but it was missing essential parts of the bicycle, such as the pedals and the rear wheel. Furthermore, that single generation cost him about 13 cents.



Hacker News has made jokes about the generated pelican image, saying things like 'It looks like something you'd see at a cryptocurrency conference' and 'It looks like it's from 1992.' On the other hand, a more practical issue has been raised: when relying on AI to improve SVGs or web pages, it tends to add backgrounds, buttons, and decorations rather than correcting fundamental errors. Gemini 3.5 Flash seems to be good at creating flashy, information-rich output, but it struggles to accurately correct subtle structural errors.

Another Hacker News user praised Gemini 3.5 Flash, calling it 'very smart and near the cutting edge for one-shot coding reasoning,' but noted that it doesn't perform as well on long-term agent tasks using arbitrary tools. While it excels at solving short problems quickly, some users noted that for applications involving stepping through longer tasks, detailed planning and iterative execution are necessary.

Judging from the comments on Hacker News, it seems more accurate to view the Gemini 3.5 Flash not as a cheap, lightweight model as the name 'Flash' might suggest, but rather as a 'very fast and high-performance model, but with a higher price and output token count.' In particular, the price is significantly higher than previous Flash models, so it will be necessary to carefully consider when and where it will be used.

in AI, Posted by log1d_ts