Soumil Shah
Soumil Shah
  • Видео 1 730
  • Просмотров 6 678 780

Видео

Use S3 as an External Volume in Snowflake along with X table to interoperate as Hudi|Iceberg|Delta
Просмотров 7112 часов назад
How to Use AWS S3 as an External Volume in Snowflake along with X table to interoperate as Hudi | Iceberg | Delta and query the data in Athena In this video, I’m diving deep into how you can interoperate with Snowflake tables and extend their capabilities by leveraging AWS services. Watch part 1: ruclips.net/video/FZsO5qeXPfM/видео.html Lab : github.com/soumilshah1995/snowflake-xtable 🔍 Key Hig...
How to Use AWS S3 as an External Volume in Snowflake | Hands on guide
Просмотров 6314 часов назад
How to Use AWS S3 as an External Volume in Snowflake Are you looking to integrate AWS S3 with Snowflake and streamline your data management process? Check out my latest video where I walk you through the entire setup: 🔹 Creating IAM Roles and Policies: Learn how to configure AWS IAM roles and policies to grant Snowflake access to your S3 buckets. 🔹 Setting Up Snowflake: Discover how to create a...
Personal Opinion: Which Table Format Do I Prefer? (Hudi, Iceberg, or Delta)
Просмотров 7716 часов назад
Disclaimer: All table formats are excellent, and this video reflects my personal choice. I am not paid by any company or vendor to favor any format. These are my personal views, and it's perfectly okay if your preference differs. I've used all three formats, and I genuinely appreciate each one.
Insert | Update| Delete | TimeTravel|Schema Evolution|with Iceberg and Minio Requested by Saurabh
Просмотров 7416 часов назад
Insert | Update| Delete | TimeTravel with Iceberg Tables and Minio Requested by Saurabh Exercise Notebook github.com/soumilshah1995/code-snippets/blob/main/iceberg-soumil.ipynb Github project to clone and try github.com/tabular-io/docker-spark-iceberg.git To seamlessly interoperate with other data formats, I recommend using X Table. Learn more about it here: X Table xtable.apache.org Feel free ...
Using Bucket Index & Right Partitioning with Hudi for 660GB Tables & 4.4B Records on AWS Glue 4.0
Просмотров 8019 часов назад
Using Bucket Index & Right Partitioning with Hudi for 660GB Tables & 4.4B Records on AWS Glue 4.0 Resources to read www.linkedin.com/pulse/how-we-used-bucket-index-apache-hudi-right-660gb-tables-soumil-shah-gkgke/?trackingId=I1uqQzduSae9LBiGl3ll1g www.linkedin.com/pulse/apache-hudi-accelerating-upsert-simple-index-choosing-soumil-shah/?lipi=urn:li:page:d_flagship3_pulse_read;upOgTzuUTzS2VGegusm...
What are some of the common Interview Questions for Apache Hudi
Просмотров 65День назад
Ever wondered what kind of questions you might face in an interview where Apache Hudi is listed in the job description? Look no further! 🎯 In my latest RUclips video, "What are some of the Common Interview Questions for Apache Hudi," I dive deep into popular questions you can expect during the interview process. From basic concepts to advanced scenarios, I cover it all to help you ace your inte...
Learn How to Run the Apache X Table in Docker Environments with Rocky Linux (Hudi| IceBerg|Delta)
Просмотров 51День назад
Blog: www.linkedin.com/pulse/learn-how-run-apache-x-table-sync-command-docker-rocky-soumil-shah-ypkye/?trackingId=y2kj7CxyTYGgU+MyFANcsA Exercise Files github.com/soumilshah1995/apache-x-table-docker-tutorial/blob/main/README.md
Understanding Apache Hudi's MERGE INTO Command with Minio and HiveMetaStore
Просмотров 4814 дней назад
Understanding Apache Hudi's MERGE INTO Command with Minio and HiveMetaStore Blog : www.linkedin.com/pulse/understanding-apache-hudis-merge-command-minio-soumil-shah-ejapf/?trackingId=W2uo8swbTDazhxx3hYqBHg Exercises Files github.com/soumilshah1995/hudi-mergeinto-labs/blob/main/README.md
Apache Hudi 1.0.0 leverages LSM trees to achieve faster writes and save storage.
Просмотров 10814 дней назад
Apache Hudi 1.0.0 leverages LSM trees to achieve faster writes and save storage. In this video, I delve into how LSM trees work within Hudi, showcasing their benefits for efficient data management. For beginners curious about LSM trees, I've included an excellent explanatory video from the RUclips channel "ByteByteGo," where the concept is very well explained. As data writes exceed a certain th...
My Journey into Apache Hudi: How and When I Started Learning and Tips for Beginners
Просмотров 10714 дней назад
My Journey into Apache Hudi: How and When I Started Learning and Tips for Beginners
Hudi JAR Compilation: Build & Compile Hudi JARs for Specific Spark Versions Using Docker Containers
Просмотров 2221 день назад
Master Hudi JAR Compilation: Building and Compiling Hudi JAR Files for Specific Spark Versions Using Docker Containers Steps github.com/soumilshah1995/Hudi-compile-jar-docker/tree/main References medium.com/@life-is-short-so-enjoy-it/hudi-build-your-jars-with-your-patches-rocky-linux-in-docker-9010b275321a Apache Hudi
Unlock Insights: A Step-by-Step Guide to LakeView Free Community Edition with AWS Glue
Просмотров 6321 день назад
Unlock Deep Insights from Data Lakes: A Step-by-Step Guide to Using LakeView Free Community Edition with AWS Glue Step By Step Guide github.com/soumilshah1995/Hudi-lake-view/blob/main/README.md References www.onehouse.ai/blog/announcing-lakeview github.com/onehouseinc/LakeView/tree/main
Milestone Achieved: 42,000 Subscribers! Thank You, Everyone!
Просмотров 7021 день назад
Milestone Achieved: 42,000 Subscribers! Thank You, Everyone!
Fast GeoSearch on Data Lakes: Efficiently Build Geo Search Using Hudi for Lightning-Fast Retrieval
Просмотров 7121 день назад
Fast GeoSearch on Data Lakes: Efficiently Build Geo Search Using Hudi for Lightning-Fast Retrieval
Building Keyword Search in Hudi: Inverted Indexes, Record Level | Keyword Search in Datalakes
Просмотров 7628 дней назад
Building Keyword Search in Hudi: Inverted Indexes, Record Level | Keyword Search in Datalakes
Storing Athena Query Metrics in Hudi for Advanced Analysis and Audit using AWS Glue
Просмотров 65Месяц назад
Storing Athena Query Metrics in Hudi for Advanced Analysis and Audit using AWS Glue
Using OpenAI Vector Embedding to Store Large Vectors in Hudi with MiniO for Cost-Effective AI Apps
Просмотров 108Месяц назад
Using OpenAI Vector Embedding to Store Large Vectors in Hudi with MiniO for Cost-Effective AI Apps
Learn How to Use Apache Hudi Streamer with DataHUB An Open Source Metadata Platform
Просмотров 67Месяц назад
Learn How to Use Apache Hudi Streamer with DataHUB An Open Source Metadata Platform
Getting Started with X-Table and Unity Catalog | Universal Datalakes | Hands on Labs
Просмотров 143Месяц назад
Getting Started with X-Table and Unity Catalog | Universal Datalakes | Hands on Labs
Hudi Using Spark SQL on AWS S3: Insert, Update, Deletes, Stored Procedures on AWS Glue Notebooks
Просмотров 85Месяц назад
Hudi Using Spark SQL on AWS S3: Insert, Update, Deletes, Stored Procedures on AWS Glue Notebooks
How to Use Hudi Streamer on New EMR 7.1.0 Spark 3.5.1 and Hudi 0.14.1 | Hands-on Labs
Просмотров 89Месяц назад
How to Use Hudi Streamer on New EMR 7.1.0 Spark 3.5.1 and Hudi 0.14.1 | Hands-on Labs
How to Use Hudi Streamer with Hudi version 0.15.0 | Hands on Guide |
Просмотров 64Месяц назад
How to Use Hudi Streamer with Hudi version 0.15.0 | Hands on Guide |
How to Execute Postgres Stored procedures in Spark | Hands on Guide
Просмотров 104Месяц назад
How to Execute Postgres Stored procedures in Spark | Hands on Guide
Learn How to Ingest Data from Hudi Incrementally hudi table changes into Postgres Using Spark
Просмотров 101Месяц назад
Learn How to Ingest Data from Hudi Incrementally hudi table changes into Postgres Using Spark
Universal Datalakes: Interoperability with Hudi, Iceberg, and Delta Tables with AWS Glue Notebooks
Просмотров 125Месяц назад
Universal Datalakes: Interoperability with Hudi, Iceberg, and Delta Tables with AWS Glue Notebooks
4 Different Ways to fetch Apache Hudi Commit time in Python and PySpark
Просмотров 73Месяц назад
4 Different Ways to fetch Apache Hudi Commit time in Python and PySpark
OneTable to translate a Hudi table to Iceberg format and sync with Glue Catalog
Просмотров 75Месяц назад
OneTable to translate a Hudi table to Iceberg format and sync with Glue Catalog
Learn How to Run Apache X Table Sync Command on AWS Cloud Shell | Interoperate Hudi Iceberg delta
Просмотров 39Месяц назад
Learn How to Run Apache X Table Sync Command on AWS Cloud Shell | Interoperate Hudi Iceberg delta
Learn How to Ingest XML files with AWS Glue into Hudi Datalakes | Step by Step guide
Просмотров 130Месяц назад
Learn How to Ingest XML files with AWS Glue into Hudi Datalakes | Step by Step guide

Комментарии

  • @SachinShukla230187
    @SachinShukla230187 2 дня назад

    Thanks Soumil, because of you only i have got some confidence in REST API

  • @aosenbalongchar8076
    @aosenbalongchar8076 2 дня назад

    Hi bro, can you provide the same configuration for kubernetes? that'll help a ton

    • @SoumilShah
      @SoumilShah День назад

      That’s a headache I tried with Kubernetes man it was pure headache

  • @soumabhabasu5429
    @soumabhabasu5429 2 дня назад

    Can the snowflake managed Iceberg table be queried through Athena?I do the opposite.

    • @SoumilShah
      @SoumilShah 2 дня назад

      Already made a video on that )

    • @SoumilShah
      @SoumilShah 2 дня назад

      Use S3 as an External Volume in Snowflake along with X table to interoperate as Hudi|Iceberg|Delta ruclips.net/video/Hi5hQ6BVrWg/видео.html

  • @PrakashReddyK
    @PrakashReddyK 3 дня назад

    Dude , thank you for this , I was trying this from long time with s3 sink , I have to try this , I was able to connect with Debezium connector from Postgres to Kafka to get cdc data , I could not get the s3 sink connector

    • @SoumilShah
      @SoumilShah 3 дня назад

      Those videos are coming too in pipeline

    • @SergeyTarabara
      @SergeyTarabara 3 дня назад

      Thx! Waiting the same sink with hudi/iceberg.

  • @sujoykarmaker8467
    @sujoykarmaker8467 3 дня назад

    You're a lifesaver, man

  • @atabajunior5295
    @atabajunior5295 3 дня назад

    Very concise and clear. Thanks for this but I tried inserting an array of json objects in the create item json UI but keep getting funny syntax errors. Any help?🙏

  • @hugolemos5918
    @hugolemos5918 3 дня назад

    Well, that was not a surprise…

  • @GourBera-vx3ph
    @GourBera-vx3ph 4 дня назад

    What happens when you change column type?

  • @xspydazx
    @xspydazx 4 дня назад

    this may be a good way to interact with lmstudio

  • @TonyMoore2301
    @TonyMoore2301 4 дня назад

    Nice tutorial, man. I would like to know how do the "service name" generates random names.

  • @saiabhishek53
    @saiabhishek53 6 дней назад

    Hey Soumil, Will it be possible for you to make a series on AWS with pyspark. An end to end project with detailed explanation of AWS services used would be much appreciated. I'm looking for a real time project on this. THANKYOU!!

  • @nandkarthik
    @nandkarthik 6 дней назад

    Man, You are awesome

  • @prasanthvegesna2306
    @prasanthvegesna2306 6 дней назад

    @soumilShah , Can't we directly use the Glue job to read the data from configured S3 bucket path . What is the SQS queues and Glue polling SQS ?

  • @nandkarthik
    @nandkarthik 6 дней назад

    Man you are a genius. Essentially

  • @SoumilShah
    @SoumilShah 6 дней назад

    docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3

  • @daviditieimoh1921
    @daviditieimoh1921 6 дней назад

    Can you help scrape a website

  • @cloudncoffeewithnav
    @cloudncoffeewithnav 7 дней назад

    Thank you for making video on our blog @soumil - Navnit Shukla

    • @SoumilShah
      @SoumilShah 6 дней назад

      You guys did great job

  • @SoumilShah
    @SoumilShah 7 дней назад

    Great let me know you need any other videos happy to do so

  • @sauravpareek6298
    @sauravpareek6298 7 дней назад

    Hey Soumil , definitely a good head start for someone starting out on Iceberg. Will checkout the XTable too. Thanks for the video.

  • @vaibhavikavathekar4828
    @vaibhavikavathekar4828 7 дней назад

    Hello, I'm trying to do something similar using Nvidia jetson devices. Hosting an OpenVPN AS on AWS , so that the client can remotely connect to the server regardless of the location. The client and server are setup , although when I try to send a simple python script using sockets which just has a hello world message, the server becomes unresponsive and I'm unable to receive the message. Any help is appreciated.

  • @nandkarthik
    @nandkarthik 8 дней назад

    Awesome job. You are using "Essentially" very frequently.

    • @SoumilShah
      @SoumilShah 8 дней назад

      Essentially you are right

  • @andriifadieiev9757
    @andriifadieiev9757 8 дней назад

    Great stuff, hopefully I could recollect it during the interviews ^_^ Thanks for sharing!

  • @AmiSmita
    @AmiSmita 8 дней назад

    Hi can you share ideas on how we can optimize cost for aws transfer family

  • @Northstar2000
    @Northstar2000 9 дней назад

    Vel come to duh wideo

  • @CheckLeaning
    @CheckLeaning 9 дней назад

    Please tell me the prerequisite of ms in cs

  • @balajis4788
    @balajis4788 9 дней назад

    Many Thanks Soumil. This is what exactly I was looking for. We have Kinesis data stream and we are exporting to S3 through firehose. The firehose exports data in yyyy/MM/dd/HH format. We needed to write batch-ETL in glue to transform the data and store this in different S3 location. The new location should have the same format, yyyy/MM/dd/HH. This code really helped in solving this issue.

  • @nupurladha4005
    @nupurladha4005 10 дней назад

    Hi Soumil, we have complex hierarchy and design requirement. Is it possible for you to connect with us to discuss the many to many dynamoDB design

  • @ROHAN-xs7om
    @ROHAN-xs7om 10 дней назад

    where's the dataset please provide

  • @ravikiran1407
    @ravikiran1407 10 дней назад

    Hi Soumil, please do these kind of videos it will be more helpful.

  • @anuragsharma5678
    @anuragsharma5678 11 дней назад

    Amazing Questions with real-time scenarios. Please make such videos on other AWS services.

  • @user-ho5xt8ep5l
    @user-ho5xt8ep5l 13 дней назад

    thank you for sharing knowledge

  • @Harshita-ch7fh
    @Harshita-ch7fh 14 дней назад

    explained well !!!!!!!!!

  • @SergeyTarabara
    @SergeyTarabara 14 дней назад

    Thx for video Soumil! Waiting for video about integration StarRocks + Hudi/Iceberg

    • @SoumilShah
      @SoumilShah 14 дней назад

      Here is video ruclips.net/video/_DcOp2YP774/видео.htmlsi=xLrqjd_YmguS5k2Y

  • @mauliksangani705
    @mauliksangani705 15 дней назад

    Congratulations