Talk Python to Me is a weekly podcast hosted by developer and entrepreneur Michael Kennedy. We dive deep into the popular packages and software developers, data scientists, and incredible hobbyists doing amazing things with Python. If you're new to Python, you'll quickly learn the ins and outs of the community by hearing from the leaders. And if you've been Pythoning for years, you'll learn about your favorite packages and the hot new ones coming out of open source.
…
continue reading
محتوای ارائه شده توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. تمام محتوای پادکست شامل قسمتها، گرافیکها و توضیحات پادکست مستقیماً توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones یا شریک پلتفرم پادکست آنها آپلود و ارائه میشوند. اگر فکر میکنید شخصی بدون اجازه شما از اثر دارای حق نسخهبرداری شما استفاده میکند، میتوانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !
با برنامه Player FM !
Ep 022: Evidence of Attempted Posting
Manage episode 230309366 series 2463849
محتوای ارائه شده توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. تمام محتوای پادکست شامل قسمتها، گرافیکها و توضیحات پادکست مستقیماً توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones یا شریک پلتفرم پادکست آنها آپلود و ارائه میشوند. اگر فکر میکنید شخصی بدون اجازه شما از اثر دارای حق نسخهبرداری شما استفاده میکند، میتوانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Christoph questions his attempts to post to Twitter.
- This week, continuing to dig into the "Twitter problem". We want to post to Twitter on a schedule.
- "Writing code to help out with laziness."
- Start with data to keep track of: inside (our data) and outside (Twitter data)
- "Data from a foreign land."
- We need to determine our "working view" of Twitter's data.
- What is in our data? For each "scheduled tweet":
- Text to post: the "status"
- Timestamp of when to post
- Timestamps are nice
- Milliseconds since the epoch
- Universal instant
- Allows the client to localize
- How do we know a scheduled tweet has been posted? A "posted?" boolean?
- Boolean says, "Yes! It has been posted somewhere on the Internet."
- Correlating identifiers are more useful than a Boolean.
- The tweet ID is a correlating identifier. We can use it to lookup all of Twitter's data about it.
- "We don't need to store all of Twitter in our database."
- What is the story you need to tell about what happened?
- A record of all the attempts allows us to tell a story about what happened.
- Useful to have the timestamp of when our application posted it.
- Make a separate log for attempts.
- Attempting to post is a separate concern than what to post.
- Don't complicate the scheduled tweet information by embedding the log.
- "Once you have all the data, it allows you to ask new questions you didn't originally think of."
- Clojure makes it easy to work with a large tree of data that came from an external source. We don't have to care about the structure of that data. We can just write it down.
- Simply attempt to post the next scheduled tweet that does not have a Twitter ID recorded.
- If it fails, just record the attempt, and go back to sleep.
- "Handle the brick in front of you, and if you keep doing that, you'll eventually build the wall."
- What if we don't hear the success response from Twitter, but it did get posted?
- Idea: Try to detect if a tweet has already been posted.
- If we can uniquely identify something by its content, we can know two things are the same without having a common ID.
- Problem: Twitter can alter the contents.
- Idea: fuzzy "measure of similarity" between our recent tweets and the next scheduled tweet.
- We can record the fuzzy match in our attempt log too!
- If we can correlate by contents, we could even identify when we manually post in advance.
- As soon as you can determine equality by the substance of the thing itself, you can have more than one writer.
- How "recent" is "recent"? Is it 100? Is it 200? Is it 500?
- Even better, fetch all the tweets since the last ID we recorded.
- we know we're seeing all of the tweets
- can scan each of those for a match (in the case of a manual post)
- know when the tweet stream ends, so we can know a posting is still needed
- The worker will get there eventually. Can just give up on an error. No complex retry and recovery logic.
- With more than one writer, we still can have a race condition. Ultimately Twitter has to deal with deduplication to avoid a double post in a short interval.
Message Queue discussion:
- Namespacing in a map is really useful
- A flat, namespaced map is easier to traverse than a nested map.
- One use for namespaces: indicate the origin of the data
- Eg.
:twitter/id, :twitter/status
vs:local/id, :local/text
- You see the namespace in your code, so it makes the data origin very visible.
Related episodes:
- Schemas for data. "Internal" vs "external" data.
- Creating a "big bag of data" and asking it questions.
Related projects:
Clojure in this episode:
pr-str
118 قسمت
Manage episode 230309366 series 2463849
محتوای ارائه شده توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. تمام محتوای پادکست شامل قسمتها، گرافیکها و توضیحات پادکست مستقیماً توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones یا شریک پلتفرم پادکست آنها آپلود و ارائه میشوند. اگر فکر میکنید شخصی بدون اجازه شما از اثر دارای حق نسخهبرداری شما استفاده میکند، میتوانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Christoph questions his attempts to post to Twitter.
- This week, continuing to dig into the "Twitter problem". We want to post to Twitter on a schedule.
- "Writing code to help out with laziness."
- Start with data to keep track of: inside (our data) and outside (Twitter data)
- "Data from a foreign land."
- We need to determine our "working view" of Twitter's data.
- What is in our data? For each "scheduled tweet":
- Text to post: the "status"
- Timestamp of when to post
- Timestamps are nice
- Milliseconds since the epoch
- Universal instant
- Allows the client to localize
- How do we know a scheduled tweet has been posted? A "posted?" boolean?
- Boolean says, "Yes! It has been posted somewhere on the Internet."
- Correlating identifiers are more useful than a Boolean.
- The tweet ID is a correlating identifier. We can use it to lookup all of Twitter's data about it.
- "We don't need to store all of Twitter in our database."
- What is the story you need to tell about what happened?
- A record of all the attempts allows us to tell a story about what happened.
- Useful to have the timestamp of when our application posted it.
- Make a separate log for attempts.
- Attempting to post is a separate concern than what to post.
- Don't complicate the scheduled tweet information by embedding the log.
- "Once you have all the data, it allows you to ask new questions you didn't originally think of."
- Clojure makes it easy to work with a large tree of data that came from an external source. We don't have to care about the structure of that data. We can just write it down.
- Simply attempt to post the next scheduled tweet that does not have a Twitter ID recorded.
- If it fails, just record the attempt, and go back to sleep.
- "Handle the brick in front of you, and if you keep doing that, you'll eventually build the wall."
- What if we don't hear the success response from Twitter, but it did get posted?
- Idea: Try to detect if a tweet has already been posted.
- If we can uniquely identify something by its content, we can know two things are the same without having a common ID.
- Problem: Twitter can alter the contents.
- Idea: fuzzy "measure of similarity" between our recent tweets and the next scheduled tweet.
- We can record the fuzzy match in our attempt log too!
- If we can correlate by contents, we could even identify when we manually post in advance.
- As soon as you can determine equality by the substance of the thing itself, you can have more than one writer.
- How "recent" is "recent"? Is it 100? Is it 200? Is it 500?
- Even better, fetch all the tweets since the last ID we recorded.
- we know we're seeing all of the tweets
- can scan each of those for a match (in the case of a manual post)
- know when the tweet stream ends, so we can know a posting is still needed
- The worker will get there eventually. Can just give up on an error. No complex retry and recovery logic.
- With more than one writer, we still can have a race condition. Ultimately Twitter has to deal with deduplication to avoid a double post in a short interval.
Message Queue discussion:
- Namespacing in a map is really useful
- A flat, namespaced map is easier to traverse than a nested map.
- One use for namespaces: indicate the origin of the data
- Eg.
:twitter/id, :twitter/status
vs:local/id, :local/text
- You see the namespace in your code, so it makes the data origin very visible.
Related episodes:
- Schemas for data. "Internal" vs "external" data.
- Creating a "big bag of data" and asking it questions.
Related projects:
Clojure in this episode:
pr-str
118 قسمت
همه قسمت ها
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.