FMA4 on Zen: Forgotten Instruction set, but not yet gone


  • Am Vor 2 Monate

    Level1TechsLevel1Techs

    Dauer: 14:53

    ALSO: forgot to mention this guys blog. it's awesome.
    www.agner.org/optimize/blog/read.php?i=838

    **********************************
    Thanks for watching our videos! If you want more, check us out online at the following places:
    + Website: level1techs.com/
    + Forums: forum.level1techs.com/
    + Store: store.level1techs.com/
    + Patreon: www.patreon.com/level1
    + L1 Twitter: twitter.com/level1techs
    + L1 Facebook: facebook.com/level1techs
    + L1/PGP Streaming: www.twitch.tv/teampgp
    + Wendell Twitter: twitter.com/tekwendell
    + Ryan Twitter: twitter.com/pgpryan
    + Krista Twitter: twitter.com/kreestuh
    + Business Inquiries/Brand Integrations: Queries@level1techs.com
    *IMPORTANT* Any email lacking “level1techs.com” should be ignored and immediately reported to Queries@level1techs.com.
    -------------------------------------------------------------------------------------------------------------
    Intro and Outro Music By: Kevin MacLeod (incompetech.com)
    Licensed under Creative Commons: By Attribution 3.0 License
    creativecommons.org/licenses/by/3.0/

    technology  science  design  ux  computers  hardware  software  programming  level1  l1  level one  

RetroGamerBB2019
RetroGamerBB2019

will it boot windows 95/98 or xp, asking for the oldskool pc gamers out there. Are there any modern cpu's that can boot 98

Vor Monat
phillip martin
phillip martin

1:32 - No 3DNow support. For shame AMD

Vor Monat
Carewolf
Carewolf

Too bad the Zen doesn't have XOP, that one was really good, just needed a 256bit version.

Vor Monat
RockitMan2001
RockitMan2001

Esoteric subject, but still very interesting. Bravo.

Vor Monat
Quast
Quast

Well I ran Monster Hunter World on a Sandy Bridge E3 and it worked flawlessly.

Vor Monat
Greg Turner
Greg Turner

News to me! Good work.

Vor Monat
charlie brownau
charlie brownau

Ohh no your bumping the music up again on a tutorial

Vor Monat
L0rDLuCk
L0rDLuCk

could you please add some numbers for AVX512 in comparison with your benchmark? Thanks a lot! Great video!

Vor 2 Monate
Simos Katsiaris
Simos Katsiaris

why is Ubuntu always having crashes.....

Vor 2 Monate
DoctorWho8675309
DoctorWho8675309

Your mouse battery is low engagement.

Vor 2 Monate
Daniel Turner
Daniel Turner

You are the only TechTuber that makes me feel stupid.

Vor 2 Monate
Johan Wildeboer
Johan Wildeboer

My brain was fucked by......... Words ! !

Vor 2 Monate
farmerwoody123
farmerwoody123

Was that an Edgar Allen Poe reference?!

Vor 2 Monate
P.J
P.J

There's one big intruction set that is missing in Ryzen (and most AMD's cpus) and it's BMI2 (introduce with the Haswell architecture). Because of that, many applications (chess engines is one) runs faster on Intel if they are using it (BMI2), and AMD is stuck to run the same application by emulating BMI2 instructions - AMD's is better to have the same application compiled with POPCNT to get a better performance instead of emulating BMI2. And this aspect is one that is never mentioned in benchmark reviews : many games are using BMI2 instructions and when you run them on AMD, performances are lackluster because of the emulation of needed. If you want to dabble with this, just download the Stockfish 9 chess engine and check out the difference between the supported instruction set (BMI2 vs POPCNT) on Ryzen. It is staggering!

Vor 2 Monate
Snow Star
Snow Star

+P.J Hope Ryzen 3000 series implements it in hardware :)

Vor Monat
P.J
P.J

+Snow Star From your link: "Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting only BMI1 _without BMI2_ ; BMI2 is _supported_ by AMDs Excavator architecture and newer.[9]" : Supported doesn't mean implemented, and are rather emulated. That is why the PEXT instruction is so slow on Ryzen: it is emulated.

Vor Monat
Snow Star
Snow Star

http://en.m.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets

Vor Monat
Dakrith
Dakrith

awesome stuff. I love learning about computer architecture and ISAs.

Vor 2 Monate
zen fu
zen fu

Wendell needs to be complimented on his expertise. Ah..... the stale air in the old bunker, the flickering light (:

Vor 2 Monate
TheEVEInspiration
TheEVEInspiration

CPUs always had undocumented / semi-documented instructions. At least since the C64 days.

Vor 2 Monate
calaphos
calaphos

x86 (and ARM on some extend) is in an interesting situation right now. There are tons of instructions and new ones added every generation or so, but no one really knows how to use them. Compilers often dont target them and there is this whole issue of auto vectorisation to even use those AVX units. Software distributed as binaries also contributes to the problem, that very little programs even use (faster), more speciallised instructions.

Vor 2 Monate
Level1Techs
Level1Techs

So true. We've noted intel microcode, for example, apparently rewriting "dumb" memory copy routines that just use i386 eax/edx registers into something more modern...

Vor 2 Monate
DING DONG
DING DONG

What's the music title ? Thanks

Vor 2 Monate
Sędziwój
Sędziwój

I know many people like collecting old stuff, but at some point they must go to trash or museum. Collecting instruction sets is bad, only make more problems in long run. I know, but compatibility... but at some point (currently) we have too many and it make more bad then good.

Vor 2 Monate
jonteno
jonteno

Haha! That was quiite interesting , right? Thanks wendel!

Vor 2 Monate
CreeperOnYourHouse
CreeperOnYourHouse

The cask of Amontillado, right next to the FMA4. I swear it's there, ignore the trowel and bricks.

Vor 2 Monate
Andrew Canrinus
Andrew Canrinus

also wish amd would hire you for a cool million a year to be head of cpu developement, good things would happen!

Vor 2 Monate
Andrew Canrinus
Andrew Canrinus

All i know is that rail road tycoon 3 works on my ryzen 1600x no probs, but without a lot of fancy hack files from the interwebs i cant get it to run on my old i7-2600 (none k) and when i do 15fps so i team red wins i think....

Vor 2 Monate
GBATD
GBATD

I never used to be interested in the intricacies of micro architecture but you really got me hooked :D

Vor 2 Monate
Super Smash Dolls
Super Smash Dolls

Another reason why FMA4 is still in Ryzen, is likely so it can be used to facilitate backwards compatibility on next-generation Xbox and PlayStation hardware. It's extremely likely that most PS4 and Xbox One games have FMA4 instructions baked in with no CPU check. Microsoft and Sony are some of AMD's biggest customers, so leaving them without an upgrade path (like last gen's PowerPC/Cell debacle) would really suck.

Vor 2 Monate
Joe Tooly
Joe Tooly

Is Wendell losing weight?

Vor 2 Monate
Murshed Choudhury
Murshed Choudhury

Doesn't having such unofficially supported instruction sets, create opportunities for compromising security? Sounds like you get to execute code that nobody using the chip would likely anticipate and you can prompt reactions/errors from the chip that are also unpredictable.

Vor 2 Monate
Slymayer
Slymayer

However sometimes FMA instructions do not pipeline very well as they're like a 2 instructions block and therefore might not be scheduled as flexibly across execution ports as separate multiply and add instructions, but it depends on the CPU's implementation to be honest. AVX512 however is more of a big deal, the registers are insanely large and scaling from AVX2 to AVX512 is as easy as iterating on 2 times more data per loop iteration, or even just doing nothing if you're using C++ abstractions like Boost.SIMD.

Vor 2 Monate
Slymayer
Slymayer

btw there's a lot to say about obscure instruction sets. If you disassemble the MKL you'll find out some instructions used in it do not exist in known documentations... And they allegedly allow them to perform 2 loads per cycle.

Vor 2 Monate
Rolando Olivas
Rolando Olivas

Any chance we can get your take on the whole "Linux kill switch" controversy that's currently unfolding? I know it doesn't, at face value, have much to do with this video or much of the content you guys normally cover, but the "breaking" of Linux has dire consequences where it would undoubtedly effect all users regardless of their platform of choice. While I understand that its not my place to dictate what you guys should or shouldn't cover, I do feel as if there is an inherent obligation and/or responsibility to those with a voice to voice the concerns of their communities and constituencies. Be that as it may, love the content keep up the great work!

Vor 2 Monate
oiSnowy
oiSnowy

Nothing new. Look up 6502 undocumented instructions. ;)

Vor 2 Monate
Dik dic
Dik dic

this video is full of moronic hyperbole, hilarious simplifications and blatant misinformation.

Vor 2 Monate
MrTuK Pitsu
MrTuK Pitsu

Talking about instruction sets, i did love the Motorala MC68XXX instruction set, its seemed so very logical compared to X86 !

Vor 2 Monate
Matthew Warren
Matthew Warren

i had no idea what FMA was so naturally the first thing that came to mind when i saw the title was, "Full Metal Alchemist 4"? im not alone here right?

Vor 2 Monate
CubicleNate
CubicleNate

Great video, I didn't realize all the different instruction sets in the CPU. It is amazing how well software works that is pre-compiled by someone else. I do have the say, tho, the best part of this video is the picture of the Enterprise-D on the back wall.

Vor 2 Monate
Skoopsro
Skoopsro

YOU ARE SPEAKING GIBBERISH

Vor 2 Monate
Mefist0
Mefist0

this video is so relaxing to watch...

Vor 2 Monate
Simon T
Simon T

Cool episode, subscribed ;)

Vor 2 Monate
Neumah
Neumah

Love it.

Vor 2 Monate
Mora Fermi
Mora Fermi

It would be glorious if somebody made an x86 that's just long mode with pure virtual addressing, exposed pipeline and static scheduling with zero 8/16/32 bit legacy garbage. The original Atom was close to that, but it was crippled by tiny cache, low clock speed and anaemic memory bus.

Vor 2 Monate
larry fleming
larry fleming

I love these AMD VIDEOS it’s completely different from everyone else. PLEASE GIVE US MORE!!

Vor 2 Monate
Jeffrey Hramika
Jeffrey Hramika

Holy smokes Hill Billy JIm dolls.

Vor 2 Monate
Stu Bur
Stu Bur

Is there a way to automate a thorough testing of an unsupported instruction set? If we as the public can do the quality assurance testing perhaps we can find whatever bugs exist and which parts may be missing. Then we could more reasonably use the parts that work and avoid the parts that don't. Partial functionality might be very useful for very specific use cases while using supported instruction sets for everything else.

Vor 2 Monate
Phrashee Kwerk
Phrashee Kwerk

i would love it if you would include the links you discussed in the video. Telling us information but not telling us where to find it is a bit of a wank. And i'm not here for sexual gratification. I also write assembly code, so if you could kindly include the link where they are discussing wrong results from executing fma4 instructions on ryzen, it would be useful to the rest of us trying to keep our hands on our keyboards.

Vor 2 Monate
Joonas Loppi
Joonas Loppi

Awesome, interesting, well researched content!

Vor 2 Monate
Nico Sun
Nico Sun

I wonder if it's possible clean up the x86 architecture and it's instruction set to only the most used functions to significantly improve performance and power efficiency. Apple can do it to their ARM cpus and they are always way better than the competition.

Vor 2 Monate
BattleToads
BattleToads

Wait, so I can't play monster hunter on my Xeon e3-1220?

Vor 2 Monate
LORD ODIN
LORD ODIN

Could add fma4 to cycles since its an opensource render engine ;)

Vor 2 Monate
sharkbyte FPV
sharkbyte FPV

I would like to see a piece on OpenBLAS

Vor 2 Monate
Arnout1990
Arnout1990

Liquido - Narcotic 1998 ... subtle though.

Vor 2 Monate
Fratal
Fratal

I really liked this one, even though I didn't learn anything useful. Always great to hear Wendle enthusiastic about things.

Vor 2 Monate
James Fox
James Fox

@Level1Techs By George I Think He`s Got It !!! - If You Keep Traveling Down that Rabbit hole - you May Never Find Your Way Out - Suggestion = Compile a Directory Interface for Some Fresh Air - lul`zzz

Vor 2 Monate
Muzzled
Muzzled

I want to see this format a lot more often. Like, weekly, if at all possible.

Vor 2 Monate
Kristjan Kütaru
Kristjan Kütaru

First decided to upvote for OpenSUSE's Geeko. Then decided to upvote again for that smooth forum promotion. Too bad that I can only upvote once.

Vor 2 Monate
Slashfic
Slashfic

I always enjoy a history lesson ;)

Vor 2 Monate
PyroRomancer
PyroRomancer

im constructing a new data center to stockpile memes for the impending meme apocalypse. Wendell, would you recommend Epyc or Xeon?

Vor 2 Monate
Serial Thrilla
Serial Thrilla

Wow, did this really need A YouTube Video? WTF.?

Vor 2 Monate
silverphinex
silverphinex

Yes because its interesting.

Vor 2 Monate
CapnTates
CapnTates

And that's why any programmer worth its salt discourages in line assembly.

Vor 2 Monate
Jesper Andersen
Jesper Andersen

Would be nice to "skip" some of the backwards compatability and just "emulate" it IF you wanted to run old programs, it would make CPU's somewhat simpler, possibly faster and definetly easier to make.

Vor 2 Monate
Josh Russell
Josh Russell

The *blurst* of times?!...

Vor 2 Monate
erroneum
erroneum

The x86 ISA is a mess... depending on how you count the number of instructions is somewhere between 1500 and 6000, there are (I think) 3 different execution modes (16/32/64 bit, possibly a 4th if you count "unreal mode"), memory can be addressed directly or by segment, a single instruction can be as little as one byte or as many as 15, the way different bits of hardware inside the CPU was dealt with actually changed over time (the APIC becoming movable and then not being allowed in certain locations, but only on Intel, comes to mind), and from everything I've heard there was never actually enough effort put into forwards compatibility to make a difference. I personally think we should ditch x86 and put serious effort into making a fast, simple, consistent, and easily extended fixed length ISA which could be efficiently implemented in silicon. If done right then I wouldn't rule out 6 GHz or more for a regular processor while also being able to maintain better than 1 instruction/cycle execution rates.

Vor 2 Monate
gormomma
gormomma

can I get an amazon link for the glass tea pot on that stand? Also, I see an all in one gong fu brewing device next to it as well. did not know you were quite the tea head.

Vor 2 Monate
TheGuruStud
TheGuruStud

FMA4 is great. Encoding speeds were greatly accelerated on bulldozer. RIP.

Vor 2 Monate
Stefan Payne
Stefan Payne

So does FMA4 actually work? Is it slower than FMA3??

Vor 2 Monate
ElZamo92
ElZamo92

Can you do transmutations with these instruction sets?

Vor 2 Monate
Danny from the beer store
Danny from the beer store

👌✌🖖

Vor 2 Monate
ImTheSlyDevil
ImTheSlyDevil

Very cool.

Vor 2 Monate
movax20h
movax20h

I think you should be testing ATLAS, not OpenBLAS. Also, I am guessing that they wanted to gave FMA4 in Zen, and they ported it from Bulldozer, but they found some bug during testing, and disabled it using microcode, before the lunch. It is still there, just advertised. They do not throw illegal instruction exception because it probably adds unnecessary checks. I am suspecting they are going to fix it and officially reenable in Zen2. It gives them substantial gains in scientific codes. Maybe even they will add F16 (half-precission) support natively in the cores? :D (The Intel and AMD do have vectorized F16 to F32 (and back) instructions that can load F16 data into F32 registers, so one saves on memory storage, cache and bus bandwidth, but still do calculations using F32 units. But would be great to have native F16 units for add and multiply, as this will help a bit with image, video, audio processing, machine learning, and some scientific applications, etc. , and could potentially double the throughput. 130GFlops per core would be amazing. :D

Vor 2 Monate
Level1Techs
Level1Techs

Atlas was included in our testing -- see the forum thread -- I was trying to keep the video short. Some of the lessons learned from Atlas made their way into OpenBLAS -- there is a lot that has been hand tuned in OpenBLAS as part of the lessons learned from experimental analysis of what instruction set extensions on what processor was fastest.

Vor 2 Monate
J_Morris143
J_Morris143

This time last year, I would had to have for an explanation. My college has taught me something. Thanks you.

Vor 2 Monate
TechFan
TechFan

Why don't we have more information about which instruction sets are baked into each cpus? And What about all of those security enhancements from Intel? Where are those on Zen? Please, no comments about Spectre and Meltdown. We're all aware.

Vor 2 Monate
silverphinex
silverphinex

Why would AMD need to do more security enhancements

Vor 2 Monate
Delta Johnson
Delta Johnson

Isn't the new RISC V supposed to free us of some of this chaos?

Vor 2 Monate
Delta Johnson
Delta Johnson

Oh and I too like the new set.

Vor 2 Monate
Conenion
Conenion

A compiler cannot make the assumption that the host processor is also the target processor. If you compile a program for yourself, yes, host=target. But if you compile for a larger group, you can not assume everyone has the latest processor supporting all extensions of the ISA. That is why gcc has the -march=... flag. If host=target you use -march=native. If host is e.g. Haswell (or later) you would use -march=core-avx2. If you want to do everything with a single binary you need different code paths for the different extensions being used. Preferably with a benchmark at the beginning to find out the extension(s) that work best, which is what OpenBLAS does, as you described.

Vor 2 Monate
John Christman
John Christman

Fiero? Was that a Magic 2.0 reference?

Vor 2 Monate
Conenion
Conenion

Wrong wording here. SSE, AVX, FMA and so on are /extensions/ to the instruction set. The instruction set of a current x86 CPU consists of the x86 / x86_64 base ISA and all its extensions.

Vor 2 Monate
donny woody
donny woody

You lost me on that intro, but I l liked it.

Vor 2 Monate
Xuramaz
Xuramaz

it's been so long since I've seen this sexy room. I see you might be turning it into a tad bit more of a proper set.

Vor 2 Monate
deviroz
deviroz

As far as I know, compiling OpenBLAS from source won't do anything special (at least, following the official documentation). If you want or really need to optimize an openBLAS build you need to use something like the Automatically Tuned Linear Algebra Software (ATLAS), which will, as you say, run a bunch of tests to determine the better flags for compiling openblas. Also, if you are really hardcore about that, you will need several linear algebra libraries on your system, some linear algebra operations are faster on MKL, OpenBLAS or ATLAS-OpenBLAS. It's all a big clusterfuck.

Vor 2 Monate
wwwelkam
wwwelkam

there is also mir glas

Vor 2 Monate
Rockmandash12
Rockmandash12

Fascinating video. More of this please!

Vor 2 Monate
Jadesprite
Jadesprite

This is the video I was hoping you would make!!

Vor 2 Monate
TheShorterboy
TheShorterboy

Last I heard there were something like 3800 instructions

Vor 2 Monate
Ottavianus Augustus
Ottavianus Augustus

hey man, i'm drunk so i cannot follow what you are saying, brb tomowrrow

Vor 2 Monate
Nevexo
Nevexo

Anyone else noticed how Wendell always has a low mouse battery?

Vor 2 Monate
Tuchulu
Tuchulu

This video was a BLAS. also that deepcool case behind you looks amazing

Vor 2 Monate
Jawsh
Jawsh

The difference in Wendel's level of comfort in front of the camera now, compared back to when he used to hide his face, make me want to get more comfortable in front of a camera. I've made a couple of videos over the past few years, but I usually take them down out of embarrassment/fear that someone will one day see them. Also, I'm a big fan of these types of videos. *thumbs up*

Vor 2 Monate
utp216
utp216

How can anyone downvote Wendell videos? I just don’t know how that is possible! Thanks for another good one!

Vor 2 Monate
Jesper Andersen
Jesper Andersen

easy... it is those people that look like their brain is rebooting while watching the video, and then clicking over to play candycrush and the likes

Vor 2 Monate
Jagger Ryder
Jagger Ryder

I need a video on how to setup a thermal camera or night vision with ai that identifies people within a perimeter, if there's anyone i know can do it its you wendall Thanks

Vor 2 Monate
schtive81
schtive81

Charge your mouse battery, you pleb!

Vor 2 Monate
Djhg2000
Djhg2000

Battery hypermiling at its best.

Vor 2 Monate
Zod
Zod

Oh the complexity..

Vor 2 Monate
halistine jenkins
halistine jenkins

HILLBILLY JIM!!!!!!! \m/

Vor 2 Monate
GurtTarctor
GurtTarctor

Another wonderful sermon from Tech Socrates :)

Vor 2 Monate
Marcus Klaas
Marcus Klaas

this video could be half as long

Vor 2 Monate
Nevexo
Nevexo

What's the bloody fun in that Less Wendell? You must be insane!

Vor 2 Monate
Dh66
Dh66

It should have been 4* longer.

Vor 2 Monate
Noom
Noom

so the dudes making theses chips dont even know the full capabilities of saids chips ?? :)

Vor 2 Monate
Dh66
Dh66

+Noom bahahahaha

Vor 2 Monate
Noom
Noom

finally i now ascend into the realm of ultimate engagement, A reply from lvl1. BLESS i'm gonna return to playing chopcoin and bustabit while a wait for bitcoin to moon :)

Vor 2 Monate
Level1Techs
Level1Techs

Not if you ask them things like...whats the fastest way to multiply+add? Lol otherwise compiler designer guys jobs would be easier...

Vor 2 Monate
rztrzt
rztrzt

Nope, best instruction set was motorola 680x0...

Vor 2 Monate
Justin Phillips
Justin Phillips

An incredibly informative introduction into the subject - love this kind of thing!

Vor 2 Monate
souta95
souta95

Reminds me of how Pentium M chips did not advertise PAE, when they did, in fact, support it.

Vor 2 Monate
Djay Hiryuu
Djay Hiryuu

now i will waste time playing with this rabbit hole like i played with memcpy , i hate you wendell (not really)

Vor 2 Monate
shu172
shu172

Really interesting, keep up the good work

Vor 2 Monate
Dimitris Anagnostou
Dimitris Anagnostou

System problem detected. Why does this pop up keeps popping up in Ubuntu? Interesting video btw!

Vor 2 Monate
misium
misium

I dont know if it was made clear, but FMA3 the instruction set that survived instead of FMA4. Apparently those two are pretty much same functionally, but differ in operands, FMA3 is re-using one of the input operands for its output, while FMA4 uses one separate operand for each (for the total of 4). According to wikipedia, this happened because of failure to communicate between AMD and Intel, as both companies changed their mind at the same time.

Vor 2 Monate
Carewolf
Carewolf

Yeah Intel proposed FMA4, and AMD proposed FMA3. Then AMD gave up and switched to FMA4 to be compatible with Intel, and Intel decided AMD's FMA3 was actually easier to implement and went with that :D

Vor Monat
RyTrapp0
RyTrapp0

+Jesper Andersen I'm sure a lot of things would be so much different if things like Intel matching Dell's yearly profits if they didn't offer AMD CPUs for their products didn't happen...

Vor 2 Monate
Jesper Andersen
Jesper Andersen

life would be so much easier if they actually worked TOGETHER on a lot of this stuff

Vor 2 Monate
tanmay panadi
tanmay panadi

What os was he using ?

Vor 2 Monate
Conenion
Conenion

Linux in an Ubuntu flavor. The CPU flags can be seen using cat /proc/cpuinfo. /proc is a pseudo file system populated by the kernel.

Vor 2 Monate

Nächstes Video

Underground | Michelle Rodriguez | Val Kilmer